On December 20, OpenAI announced the launch of its next-generation model O3 and its lite version O3-mini on the last day of its 12-working day online new product launch event. O3 significantly outperforms its predecessor, O1, in several ways, with outstanding performance in software engineering, competition mathematics, and the ability to master human PhD-level knowledge in the natural sciences. In particular, in the ARC-AGI assessment, the O3 score reached 75.7% to 87.5%, exceeding the 85% threshold at the human level. How powerful can |O3 be? According to OpenAI, the o3 model achieved record-breaking results on the ARC-AGI benchmark. ARC-AGI was developed by François, the father of Keras Chollet development mainly evaluates the reasoning ability of the model through graphical logical reasoning. The benchmark has a perfect score of 100%, and in the low calculation scenario, O3 achieved a high score of 75.7%; In the high-computing test, this score reached 87.5%, exceeding the threshold of 85% that marks the level of human beings. In contrast, the currently open O1 model scores between 25% and 32%, and O3 performs almost three times as well. Programming skills and code generation Codeforces measure programming ability In the Elo score, the O3 model achieved an Elo score of 2727, far exceeding the O1 score of 1891. This shows that O3 not only has a breakthrough in reasoning, but also excels in programming. In fact, even O3 The mini version, which has also surpassed O1 in medium inference time mode. Code generation evaluation In the SWE-bench launched by OpenAI in August In the Verified Code Generation benchmark, O3 achieved an accuracy of 71.7%, which is 22.8 percentage points higher than O1. This is further evidence of O3's significant progress in code generation. Mathematics competitions and academic tests o3 also achieved a 96.7% accuracy rate in the 2024 American AIME Math Competition, missing only one question, and winning in GPQA Diamond, a set of graduate-level biology, physics and chemistry questions, achieved a high score of 87.7%. These results demonstrate O3's ability to handle complex mathematical problems and advanced academic problems. A new level of mathematical reasoning In particular, o3 set a new record in EpochAI's "FrontierMath" benchmark, successfully solving 25.2% of problems – a test where no other model has previously been able to exceed a 2% solution rate. The |o3 model can come close to achieving AGI "Artificial General Intelligence" (AGI, Artificial General Intelligence) refers to an artificial intelligence system that is capable of accomplishing any task that a human is capable of. OpenAI has a unique definition of this: "a highly autonomous system that outperforms humans in the most economically valuable work." "Achieving AGI is not only a bold technical statement, but also has far-reaching practical significance for OpenAI. Under the terms of the agreement between OpenAI and its close partner and investor Microsoft, once OpenAI meets the AGI standard, it will no longer be obligated to give Microsoft access to its state-of-the-art technology — those that meet the definition of AGI. OpenAI CEO Sam Altman Altman) announced that it plans to officially launch O3 by the end of January mini, and then the full version of the O3. This new series of models marks an important step forward for OpenAI in building more powerful large language models, aiming to surpass existing models and attract new investments and users. Technological advancements and security testing OpenAI mentioned in its blog post that the O1 model has demonstrated the ability to handle complex tasks, being able to solve more challenging problems than previous scientific, coding, and mathematical models. And the latest O3 and O3 The mini models are currently undergoing internal security testing, and it is expected that they will significantly outperform the previous O1 model. The prelude to the AI arms race OpenAI released ChatGPT two years ago, a GPT-3.5-powered chatbot that kicked off the AI arms race. In 2023, OpenAI launched GPT-4, which is more accurate and creative. Recently, the company has further expanded its technological boundaries with the launch of its first inference model, O1. Major competitors have also launched inference models With the release of OpenAI's first inference model, O1, major competitors such as Google and Meta have also launched their own inference models. Earlier this month, Google released a new version of its flagship model, Gemini, which it claims is twice as fast as its predecessor and capable of "thinking, remembering, planning, and even acting on your behalf." Meta CEO Mark Zuckerberg also recently revealed plans to launch Llama next year 4。 These developments show that competition in the field of AI is intensifying, with efforts to develop smarter and more efficient models to solve complex real-world problems. OpenAI's latest appearance on Friday capped off the company's 12-day livestream product launch. In this livestream, OpenAI not only launched a new ChatGPT, which is more expensive The Pro subscription option ($200 per month) also officially launched the AI video generation model Sora Turbo and other new products. In addition, ChatGPT's search function has also been fully upgraded, adding new functions such as map integration and real-time search, and is open to all users. The o3 model not only demonstrates the technical prowess, but also drives the discussion on AI safety and ethics. With the introduction of more advanced models, the field of artificial intelligence will continue to develop rapidly, bringing more opportunities and challenges to society.