Comprehensive Survey of Reasoning Failures in Large Language Models
By
T-A
Toasted just enough. A reliable bake, gently seasoned.
Summary
This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization framework that distinguishes between embodied and non-embodied reasoning types. The survey systematically classifies reasoning failures into three categories: fundamental failures intrinsic to LLM architectures, application-specific limitations in particular domains, and robustness issues characterized by inconsistent performance across minor variations. For each failure type, the authors provide definitions, analyze existing studies, explore root causes, and present mitigation strategies, aiming to unify fragmented research efforts and guide future work toward building more reliable LLM reasoning capabilities.
Key quotes
· 4 pulledLarge Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios.
We introduce a novel categorization framework that distinguishes reasoning into embodied and non-embodied types, with the latter further subdivided into informal (intuitive) and formal (logical) reasoning.
We classify reasoning failures along a complementary axis into three types: fundamental failures intrinsic to LLM architectures that broadly affect downstream tasks; application-specific limitations that manifest in particular domains; and robustness issues characterized by inconsistent performance across minor variations.
By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities.
You might also wanna read
RICP: A Teacher-Student Framework for Retrieved In-Context Principles from Mistakes in LLMs
This paper introduces Retrieved In-Context Principles (RICP), a novel teacher-student framework for improving Large Language Models (LLMs) t
Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models
This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains
HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models
This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many sim

Study finds large language models vulnerable to classic persuasion tactics for harmful requests
This study tested whether three widely used large language models (LLMs) are susceptible to classic persuasion principles (authority, social
RTP-LLM: Alibaba's High-Performance Inference Engine for Large Language Model Deployment
This paper presents RTP-LLM, a high-performance inference engine developed by Alibaba for industrial-scale deployment of Large Language Mode
Why Treating LLMs as Black-Box Problem Solvers Fails: Lessons from Processing 100 Compliance PDFs
The article discusses the author's experience transforming 100 messy compliance PDFs into structured JSON rules. It critiques the common app
