GPQA Diamond Scores by AI Model, 2025

Table View

ModelKnowledge
Gemini 3 Flash Thinking90.4
Gemini 3 Pro Thinking91.9
Gemini 2.5 Flash Thinking82.8
Gemini 2.5 Pro Thinking86.4
Claude Sonnet 4.5 Thinking83.4
GPT-5.2 Extra high92.4
Grok 4.1 Fast Reasoning84.3

Insights

GPT-5.2 Extra high achieves 92.4% on GPQA Diamond, leading scientific knowledge benchmarks in 2025. Gemini 3 Pro follows closely at 91.9%, demonstrating near-human expertise. These high scores signal AI's growing reliability for research tasks.

Metrics

Accuracy percentages on GPQA Diamond scientific benchmark, no tools. Higher percent reflects deeper knowledge. Evaluated by DeepMind in 2025.

Tags

#Science#GPQA#Knowledge#2025#OpenAI

Source

deepmind.google
Source Authority100
Correctness100