Gemini’s Latest Update Exceeds Other AIs in Mathematics, Science, and Logic

Google’s latest Gemini Pro eclipses other AIs in areas such as reasoning, science, and coding. As per benchmark findings shared by Google on Thursday, Gemini 2.5 Pro outshines its closest competitors in almost every category, although those rivals may contest this assertion.

Google’s insights reveal that Gemini 2.5 Pro outperforms OpenAI o3, Claude Opus 4, Grok 3 Beta, and DeepSeek R1 in the Humanity’s Last Exam benchmark, which evaluates mathematics, science, knowledge, and reasoning. It also shows superior performance in code editing (as per the Aider Polyglot benchmark) and exceeds all rivals in various factuality benchmarks, including FACTS Grounding, suggesting it has a lower tendency to generate factually incorrect information.

The sole benchmark where Gemini 2.5 Pro does not clearly dominate is the mathematics-oriented AIME 2025, although the disparities are minimal.

Thanks to these advancements, Gemini 2.5 Pro now leads the LMArena leaderboard with a score of 1470.

Nonetheless, the finalized version of Gemini 2.5 Pro is not broadly available at this time. Google describes this iteration as an “enhanced preview,” with a stable release anticipated “in a few weeks.” The preview is now likely accessible within the Gemini app.