MaxProof: A Test-Time Scaling Framework for Mathematical Proof That Exceeds Human Gold-Medal Thresholds on IMO and USAMO
By
[Submitted on 11 Jun 2026]
The kind of bagel that ruins lesser bagels for you.
Summary
MaxProof is a population-level test-time scaling framework for competition-level mathematical proof, developed as part of the MiniMax-M3 series. The M3 model trains three proof-oriented capabilities — proof generation, proof verification, and critique-conditioned proof repair — using a defense-in-depth generative verifier with low false-positive rate. These capabilities are merged into a single model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searching over a population of candidate proofs via tournament selection. The M3 model with MaxProof test-time scaling achieves 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.
Key quotes
· 3 pulledM3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate.
At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection.
With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.
You might also wanna read
Google DeepMind Paper On New AlphaProof Nexus Framework: Advancing Mathematics Research with AI-Driven Formal Proof Search
MMR-GRPO: Diversity-Aware Reward Reweighting Accelerates Mathematical Reasoning Model Training
This paper introduces MMR-GRPO, a method that integrates Maximal Marginal Relevance (MMR) into Group Relative Policy Optimization (GRPO) to
MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines
MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types
HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models
This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many sim
ConSPO: A Contrastive Approach to Improving Reinforcement Learning with Verifiable Rewards for LLMs
This paper analyzes Group Relative Policy Optimization (GRPO), a widely used RLVR algorithm for post-training large language models on reaso
QUBRIC: A Framework for Co-Designing Queries and Rubrics to Extend Reinforcement Learning Beyond Verifiable Rewards
This paper introduces QUBRIC, a framework for rubric-based reinforcement learning (RL) that co-designs queries and rubrics to overcome limit
