Evaluating LLMs for TLA+ System Modeling: The Specula Team's Experience with Claude and Raft
By
Qian Cheng, Ruize Tang, Emilie Ma, Finn Hackett, Peiyang He, Yiming Su, Ivan Beschastnikh, Yu Huang, Xiaoxing Ma, and Tianyin Xu
23d ago· 11 min readenInsight
98/100
Golden Brown
Bagelometer↗
A five-star bake. Worth schmearing, sharing, saving.
Score98TypeanalysisSentimentpositive
Summary
The Specula team evaluates LLMs (specifically Claude) on their ability to model real-world systems using TLA+, a formal specification language for concurrent and distributed systems. They tested whether LLMs could write a TLA+ specification for Etcd's Raft implementation, which passed syntax checks and ran through the TLC model checker. The article explores the potential of AI in applied formal methods and agentic model checking for computing systems.
Key quotes
· 2 pulledSeveral months ago, we asked Claude to write a TLA+ specification (spec) for Etcd's Raft implementation.
It passed syntax checks, ran through the TLC model checker, and at fi
Editors’ note: AI has been actively pushing the frontier of applied formal methods for computing systems. In this article, the Specula team wrote about their experience of evaluating LLMs on modeling system code, the basic capability for agentic model che
