MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines
By
[Submitted on 26 May 2026 (v1), last revised 27 May 2026 (this version, v2)]
Properly proved. Has structure, has flavour, has a point.
Summary
MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types (Planning, Check, and Lean) composed by a recursive outer loop. It requires no fine-tuning, custom RL objective, or theorem-specific scaffolding. On FormalQualBench (23 PhD-qualifying-exam theorems), it solves 10/23, surpassing the strongest open-source baseline (OpenGauss, 8/23). On Putnam2025, it closes 12/12 with lower wall-clock time than the next-best system. The harness also transfers to smaller models (Sonnet and Haiku). Results suggest harness design is a central factor in Lean4 theorem proving alongside raw model capability.
Key quotes
· 5 pulledMerLean-Prover is an end-to-end Lean4 theorem prover that replaces sorry declarations with kernel-checkable proofs.
On FormalQualBench, a benchmark of 23 PhD-qualifying-exam theorems, MerLean-Prover solves 10/23, surpassing the strongest published open-source baseline (OpenGauss, 8/23).
On Putnam2025, the same harness closes 12/12 with substantially lower total wall-clock than the next-best system that closes the full set.
These results suggest that harness design is a central factor in end-to-end Lean4 theorem proving, alongside raw model capability.
The harness also transfers to smaller models: Sonnet closes all four tested FormalQualBench problems, and Haiku closes the two short ones.
You might also wanna read
AI Solves 80-Year-Old Erdős Math Problem in Combinatorial Geometry
An AI system has solved a famous unsolved math problem (an Erdős problem) in combinatorial geometry that stumped mathematicians for 80 years
Reflections on DwarfStar 4's rapid rise in local AI inference
The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve
Reflections on DwarfStar 4's rapid rise in local AI inference
The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve
OpenAI's AI model solves 80-year-old Erdős math problem, verified by mathematicians
OpenAI's internal AI model has solved the planar unit distance problem, an 80-year-old math puzzle first posed by Hungarian mathematician Pa
livescience.com·1d agoOpenAI yapay zeka modeli, 80 yıllık çözülememiş matematik problemi "birim uzaklık problemi"ni otonom olarak çözdü
OpenAI, geliştirdiği bir yapay zeka modelinin, matematik tarihinin en önemli açmazlarından biri olan "birim uzaklık problemi"ni (unit distan
Building a Personal AI Agent with Markdown-Based Skills and Local Models
The article describes a personal AI agent built on Pi that manages the author's inbox, calendar, deal pipeline, blog publishing, and researc
