Coding Agent Formalization of Numerical Analysis Reveals Flaws in Compilation-Based Quality Metrics

[Submitted on 12 Jun 2026]

4h ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

A good honest bake. Not flashy, but you'll finish the whole bagel.

Score75TypeanalysisSentimentneutral

Summary

This paper presents research on using coding agents to formalize advanced mathematics textbooks in Lean 4, specifically focusing on Numerical Methods for Ordinary Differential Equations—a domain largely absent from mathlib. The authors introduce a three-dimensional evaluation framework (semantic correctness, Mathlib reuse, and cross-file reuse via LLM-as-judge) to assess formalization quality beyond mere kernel acceptance. Their analysis reveals recurring unfaithful formalization patterns (incomplete multi-part statements, weakened hypotheses, parameter restrictions) that compilation-based metrics fail to detect, concluding that kernel acceptance substantially overstates formalization quality.

Key quotes

· 4 pulled

We address both limitations by applying a coding agent to formalize Numerical Methods for Ordinary Differential Equations, a textbook in numerical analysis that is largely absent from mathlib, stressing the agent's capacity to develop new theory from scratch.

We further introduce a systematic, reproducible three-dimensional framework for evaluating the quality of agent-produced formalizations beyond compilation: semantic correctness, Mathlib reuse, and cross-file reuse via LLM-as-judge methods.

Applying this framework to our own formalization and to the released outputs of RepoProver and M2F, we uncover recurring unfaithful formalization patterns, including incomplete multi-part statements, added weakening hypotheses, and parameter restrictions, that kernel acceptance entirely obscures.

Our results suggest that compilation-based metrics substantially overstate formalization quality, and we provide a reproducible audit methodology to support more rigorous evaluation of future autoformalization systems.

Snippet from the RSS feed

Recent work has demonstrated that coding agents can formalize entire advanced mathematics textbooks in Lean 4, yet existing efforts concentrate on branches of mathematics already well-represented in mathlib and measure success solely through kernel accept

You might also wanna read

Building a Custom Minimal Coding Agent: Lessons from Three Years of LLM-Assisted Development

The article details the author's three-year journey using various LLM-powered coding assistants, from early ChatGPT usage to modern coding a

mariozechner.at·4mo ago

Study Reveals "Constraint Decay" in LLM Agents for Backend Code Generation Under Structural Requirements

This paper presents a systematic study on how LLM agents handle structural constraints in multi-file backend code generation. The authors in

arxiv.org·21d ago

The Challenge of Verifying Code Quality from AI Coding Agents

The article discusses the author's experience building AI coding agents that work autonomously while they sleep, and the resulting challenge

claudecodecamp.com·3mo ago

A critique of Lean and the culture of formal mathematics communities

The article discusses the author's perspective on the modern formal mathematics community, particularly the pressure to justify not using th

lawrencecpaulson.github.io·1mo ago

Critique of AI-Generated Code and the Problem of 'Vibe-Coding' in Software Development

The article critiques the problematic use of AI tools like LLMs in software development, particularly focusing on 'vibe-coding' where develo

acko.net·3mo ago

The Growing Importance of Formal Specification in AI-Driven Software Development

The article discusses the evolving role of software engineers in an AI-driven development landscape, arguing that while initial predictions

benjamincongdon.me·6mo ago