All Topics

Technology

Art

MaxProof: A Test-Time Scaling Framework for Mathematical Proof That Exceeds Human Gold-Medal Thresholds on IMO and USAMO

[Submitted on 11 Jun 2026]

1d ago· 2 min readenInsight

85/100

Golden Brown

Bagelometer↗

The kind of bagel that ruins lesser bagels for you.

Score85TypeanalysisSentimentpositive

Summary

MaxProof is a population-level test-time scaling framework for competition-level mathematical proof, developed as part of the MiniMax-M3 series. The M3 model trains three proof-oriented capabilities — proof generation, proof verification, and critique-conditioned proof repair — using a defense-in-depth generative verifier with low false-positive rate. These capabilities are merged into a single model. At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searching over a population of candidate proofs via tournament selection. The M3 model with MaxProof test-time scaling achieves 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.

Key quotes

· 3 pulled

M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proof repair -- using a defense-in-depth generative verifier engineered for low false-positive rate.

At test time, MaxProof treats the model as a generator, verifier, refiner, and ranker, searches over a population of candidate proofs, and returns one final proof through tournament selection.

With MaxProof test-time scaling, the M3 model reaches 35/42 on IMO 2025 and 36/42 on USAMO 2026, exceeding the human gold-medal threshold on both.

Snippet from the RSS feed

We present MaxProof, a population-level test-time scaling framework for competition-level mathematical proof in the MiniMax-M3 series. M3 first trains three proof-oriented capabilities -- proof generation, proof verification, and critique-conditioned proo

You might also wanna read

Google DeepMind Paper On New AlphaProof Nexus Framework: Advancing Mathematics Research with AI-Driven Formal Proof Search

arxiv.org·17d ago

MMR-GRPO: Diversity-Aware Reward Reweighting Accelerates Mathematical Reasoning Model Training

This paper introduces MMR-GRPO, a method that integrates Maximal Marginal Relevance (MMR) into Group Relative Policy Optimization (GRPO) to

arxiv.org·2d ago

MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines

MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types

arxiv.org·12d ago

HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models

This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many sim

arxiv.org·17d ago

ConSPO: A Contrastive Approach to Improving Reinforcement Learning with Verifiable Rewards for LLMs

This paper analyzes Group Relative Policy Optimization (GRPO), a widely used RLVR algorithm for post-training large language models on reaso

arxiv.org·10d ago

QUBRIC: A Framework for Co-Designing Queries and Rubrics to Extend Reinforcement Learning Beyond Verifiable Rewards

This paper introduces QUBRIC, a framework for rubric-based reinforcement learning (RL) that co-designs queries and rubrics to overcome limit

arxiv.org·9d ago