Recent evaluations have shown that OpenAI’s ChatGPT-5.2 outperforms Google’s Gemini across several critical artificial intelligence benchmarks. While both systems are powerful, specific assessments reveal distinct advantages for ChatGPT in areas such as complex reasoning and problem-solving abilities. These findings highlight the evolving landscape of AI technology, where competition is fierce and capabilities are rapidly advancing.
The most notable benchmark is the GPQA Diamond, which emphasizes PhD-level reasoning in the fields of physics, chemistry, and biology. This test, created to be “Google-proof,” requires AI systems to demonstrate not only factual knowledge but also the ability to apply complex scientific reasoning. In the latest results, ChatGPT-5.2 achieved a score of 92.4%, narrowly leading over Gemini 3 Pro, which scored 91.9%. For context, a PhD graduate is expected to score around 65%, while non-expert humans typically score 34%.
Another vital area of evaluation is software engineering, particularly through the SWE-Bench Pro (Private Dataset). This benchmark assesses AI’s capacity to resolve real-world coding issues sourced from the GitHub developer platform. ChatGPT-5.2 successfully resolved approximately 24% of the challenges, compared to Gemini’s 18%. While these percentages may seem modest, they reflect the complexities of the tasks, which are significantly more challenging than simpler coding tests where AI can achieve higher success rates.
The ARC-AGI-2 test, launched in March 2025, evaluates AI’s abstract reasoning abilities. In this context, ChatGPT-5.2 Pro scored 54.2%, outperforming Gemini models, with the closest competitor, Gemini 3 Deep Think, scoring 45.1%. Notably, Gemini 3 Pro lagged behind significantly at 31.1%, illustrating ChatGPT’s advantage in tackling abstract reasoning tasks.
These benchmarks illustrate the dynamic nature of AI development. As of now, results can shift rapidly with new releases. The current versions under review—GPT-5.2 and Gemini 3—represent the forefront of AI capabilities, particularly in their paid Pro versions, which have shown to excel in these evaluations.
While it is essential to acknowledge that Gemini has its strengths in other benchmarks, such as SWE-Bench Bash Only and Humanity’s Last Exam, the focus here on three specific benchmarks illustrates ChatGPT’s solid performance in knowledge, reasoning, and abstract thinking. Each benchmark has been selected to reflect different aspects of AI capabilities, providing a more nuanced understanding of their performance.
As AI technology continues to evolve, regular assessments of these systems are crucial. The landscape is competitive, and while ChatGPT currently leads in specific areas, Gemini’s ongoing advancements suggest that the competition remains robust. Future developments from both OpenAI and Google will likely shift these dynamics once again, making it essential to stay informed about the latest iterations and their capabilities.
