“Google has experimented with AI reasoning models before, previously releasing a ‘thinking’ version of Gemini in December. But Gemini 2.5 represents the company’s most serious attempt yet at besting OpenAI’s ‘o’ series of models.

Google claims that Gemini 2.5 Pro outperforms its previous frontier AI models, and some of the leading competing AI models, on several benchmarks. Specifically, Google says it designed Gemini 2.5 to excel at creating visually compelling web apps and agentic coding applications.

On an evaluation measuring code editing, called Aider Polyglot, Google says Gemini 2.5 Pro scores 68.6%, outperforming top AI models from OpenAI, Anthropic, and Chinese AI lab DeepSeek.

However, on another test measuring software dev abilities, SWE-bench Verified, Gemini 2.5 Pro scores 63.8%, outperforming OpenAI’s o3-mini and DeepSeek’s R1, but underperforming Anthropic’s Claude 3.7 Sonnet, which scored 70.3%.

On Humanity’s Last Exam, a multimodal test consisting of thousands of crowdsourced questions relating to mathematics, humanities, and the natural sciences, Google says Gemini 2.5 Pro scores 18.8%, performing better than most rival flagship models.”

From TechCrunch.