Study reveals why in-context learning fails on complex specification-heavy tasks and how fine-tuning can help
By
[Submitted on 15 Nov 2023]
Summary
This research paper investigates the limitations of in-context learning (ICL) for large language models (LLMs) when applied to specification-heavy tasks—tasks with complex, extensive instructions that take humans hours to master, such as traditional information extraction. Through experiments on 18 such tasks, the authors identify three key reasons for ICL failure: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding. They demonstrate that fine-tuning, rather than ICL, can achieve decent performance on these tasks, suggesting the issue is not an inherent LLM flaw but a drawback of existing alignment methods. Dedicated instruction tuning showed notable improvement, pointing toward better alignment methods for sophisticated human demands.
Source
Key quotes
· 5 pulledIn this paper, we find that ICL falls short of handling specification-heavy tasks, which are tasks with complicated and extensive task specifications, requiring several hours for ordinary humans to master.
The performance of ICL on these tasks mostly cannot reach half of the state-of-the-art results.
We identify three primary reasons: inability to specifically understand context, misalignment in task schema comprehension with humans, and inadequate long-text understanding ability.
The failure of ICL is not an inherent flaw of LLMs, but rather a drawback of existing alignment methods that renders LLMs incapable of handling complicated specification-heavy tasks via ICL.
We perform dedicated instruction tuning on LLMs for these tasks and observe a notable improvement.
You might also wanna read
Strategies for Mitigating Context Failures in LLM Applications
This article provides practical strategies for mitigating and avoiding context failures in large language model applications, focusing on in
Hyperlinks as a Solution to Context Engineering Challenges in Large Language Models
The article discusses context engineering for Large Language Models (LLMs), highlighting key limitations such as the need for append-only co
Study Finds AI Discourse in Pretraining Data Creates Self-Fulfilling (Mis)alignment in LLMs
This research paper presents the first controlled study of how pretraining corpora containing discourse about AI systems causally influences
EntropyLong: Using Predictive Uncertainty to Improve Long-Context Language Model Training
Researchers propose EntropyLong, a novel data construction method for training long-context language models that uses predictive uncertainty
Comprehensive Survey of Reasoning Failures in Large Language Models
This article presents a comprehensive survey of reasoning failures in Large Language Models (LLMs), introducing a novel categorization frame
Research: LLMs Encode Human-Labeled Problem Difficulty Better Than Model-Derived Difficulty
This research paper investigates whether large language models (LLMs) internally encode problem difficulty in alignment with human judgment.

Comments
Sign in to join the conversation.
No comments yet. Be the first.