GPT-5.2 Extra high achieves 92.4% on GPQA Diamond, leading scientific knowledge benchmarks in 2025. Gemini 3 Pro follows closely at 91.9%, demonstrating near-human expertise. These high scores signal AI's growing reliability for research tasks.
Metrics
Accuracy percentages on GPQA Diamond scientific benchmark, no tools. Higher percent reflects deeper knowledge. Evaluated by DeepMind in 2025.