All Topics

Technology

Art

MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines

[Submitted on 26 May 2026 (v1), last revised 27 May 2026 (this version, v2)]

5h ago· 2 min readenInsight

70/100

Toasty

Bagelometer↗

Properly proved. Has structure, has flavour, has a point.

Score70TypeanalysisSentimentpositive

Summary

MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types (Planning, Check, and Lean) composed by a recursive outer loop. It requires no fine-tuning, custom RL objective, or theorem-specific scaffolding. On FormalQualBench (23 PhD-qualifying-exam theorems), it solves 10/23, surpassing the strongest open-source baseline (OpenGauss, 8/23). On Putnam2025, it closes 12/12 with lower wall-clock time than the next-best system. The harness also transfers to smaller models (Sonnet and Haiku). Results suggest harness design is a central factor in Lean4 theorem proving alongside raw model capability.

Key quotes

· 5 pulled

MerLean-Prover is an end-to-end Lean4 theorem prover that replaces sorry declarations with kernel-checkable proofs.

On FormalQualBench, a benchmark of 23 PhD-qualifying-exam theorems, MerLean-Prover solves 10/23, surpassing the strongest published open-source baseline (OpenGauss, 8/23).

On Putnam2025, the same harness closes 12/12 with substantially lower total wall-clock than the next-best system that closes the full set.

These results suggest that harness design is a central factor in end-to-end Lean4 theorem proving, alongside raw model capability.

The harness also transfers to smaller models: Sonnet closes all four tested FormalQualBench problems, and Haiku closes the two short ones.

Snippet from the RSS feed

MerLean-Prover is an end-to-end Lean4 theorem prover that replaces sorry declarations with kernel-checkable proofs. It is built from three agent types (Planning, Check, and Lean) composed by a recursive outer loop whose unit of revision is the proof plan

You might also wanna read

AI Solves 80-Year-Old Erdős Math Problem in Combinatorial Geometry

An AI system has solved a famous unsolved math problem (an Erdős problem) in combinatorial geometry that stumped mathematicians for 80 years

wsj.com·1d ago

Reflections on DwarfStar 4's rapid rise in local AI inference

The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve

antirez.com·1d ago

Reflections on DwarfStar 4's rapid rise in local AI inference

The author reflects on the unexpected popularity of DwarfStar 4 (DS4), a local AI inference project. They attribute its success to the conve

antirez.com·1d ago

OpenAI's AI model solves 80-year-old Erdős math problem, verified by mathematicians

OpenAI's internal AI model has solved the planar unit distance problem, an 80-year-old math puzzle first posed by Hungarian mathematician Pa

livescience.com·1d ago

OpenAI yapay zeka modeli, 80 yıllık çözülememiş matematik problemi "birim uzaklık problemi"ni otonom olarak çözdü

OpenAI, geliştirdiği bir yapay zeka modelinin, matematik tarihinin en önemli açmazlarından biri olan "birim uzaklık problemi"ni (unit distan

gazeteoksijen.com·1d ago

Building a Personal AI Agent with Markdown-Based Skills and Local Models

The article describes a personal AI agent built on Pi that manages the author's inbox, calendar, deal pipeline, blog publishing, and researc

tomtunguz.com·2d ago