Comprehensive Survey of Reasoning Failures in Large Language Models

T-A

3mo ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

Toasted just enough. A reliable bake, gently seasoned.

Score75TypeanalysisSentimentneutral

Summary

This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization framework that distinguishes between embodied and non-embodied reasoning types. The survey systematically classifies reasoning failures into three categories: fundamental failures intrinsic to LLM architectures, application-specific limitations in particular domains, and robustness issues characterized by inconsistent performance across minor variations. For each failure type, the authors provide definitions, analyze existing studies, explore root causes, and present mitigation strategies, aiming to unify fragmented research efforts and guide future work toward building more reliable LLM reasoning capabilities.

Key quotes

· 4 pulled

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios.

We introduce a novel categorization framework that distinguishes reasoning into embodied and non-embodied types, with the latter further subdivided into informal (intuitive) and formal (logical) reasoning.

We classify reasoning failures along a complementary axis into three types: fundamental failures intrinsic to LLM architectures that broadly affect downstream tasks; application-specific limitations that manifest in particular domains; and robustness issues characterized by inconsistent performance across minor variations.

By unifying fragmented research efforts, our survey provides a structured perspective on systemic weaknesses in LLM reasoning, offering valuable insights and guiding future research towards building stronger, more reliable, and robust reasoning capabilities.

Snippet from the RSS feed

Large Language Models (LLMs) have exhibited remarkable reasoning capabilities, achieving impressive results across a wide range of tasks. Despite these advances, significant reasoning failures persist, occurring even in seemingly simple scenarios. To syst

You might also wanna read

RICP: A Teacher-Student Framework for Retrieved In-Context Principles from Mistakes in LLMs

This paper introduces Retrieved In-Context Principles (RICP), a novel teacher-student framework for improving Large Language Models (LLMs) t

arxiv.org·4d ago

Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models

This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains

arxiv.org·3d ago

HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models

This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many sim

arxiv.org·5d ago

Study finds large language models vulnerable to classic persuasion tactics for harmful requests

This study tested whether three widely used large language models (LLMs) are susceptible to classic persuasion principles (authority, social

pnas.org·4d ago

RTP-LLM: Alibaba's High-Performance Inference Engine for Large Language Model Deployment

This paper presents RTP-LLM, a high-performance inference engine developed by Alibaba for industrial-scale deployment of Large Language Mode

arxiv.org·2d ago

Why Treating LLMs as Black-Box Problem Solvers Fails: Lessons from Processing 100 Compliance PDFs

The article discusses the author's experience transforming 100 messy compliance PDFs into structured JSON rules. It critiques the common app

towardsdatascience.com·4d ago