GPQA Diamond Scores by AI Model, 2025

Table View

Model	Knowledge
Gemini 3 Flash Thinking	90.4
Gemini 3 Pro Thinking	91.9
Gemini 2.5 Flash Thinking	82.8
Gemini 2.5 Pro Thinking	86.4
Claude Sonnet 4.5 Thinking	83.4
GPT-5.2 Extra high	92.4
Grok 4.1 Fast Reasoning	84.3

Insights

GPT-5.2 Extra high achieves 92.4% on GPQA Diamond, leading scientific knowledge benchmarks in 2025. Gemini 3 Pro follows closely at 91.9%, demonstrating near-human expertise. These high scores signal AI's growing reliability for research tasks.

Metrics

Accuracy percentages on GPQA Diamond scientific benchmark, no tools. Higher percent reflects deeper knowledge. Evaluated by DeepMind in 2025.

Source

deepmind.google

Source Authority100

Correctness100

inforia.ai

inforia.ai

inforia.ai

GPQA Diamond Scores by AI Model, 2025

Table View

Insights

Metrics

Tags

Source