All Topics

Technology

Art

EntropyLong: Using Predictive Uncertainty to Improve Long-Context Language Model Training

PaulHoule

7mo ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

Not artisan, but a perfectly fine bagel. Hits the spot.

Score75TypeanalysisSentimentpositive

Summary

Researchers propose EntropyLong, a novel data construction method for training long-context language models that uses predictive uncertainty to verify genuine long-range dependencies. The approach identifies high-entropy positions in documents, retrieves semantically relevant contexts from large corpora, and verifies their utility by assessing whether they reduce prediction entropy. This model-in-the-loop verification ensures each dependency represents measurable information gain rather than spurious correlation. Models trained on data generated using this method show significant improvements on RULER benchmarks and LongBenchv2, demonstrating enhanced long-context understanding.

Key quotes

· 5 pulled

Training long-context language models to capture long-range dependencies requires specialized data construction.

We propose EntropyLong, a novel data construction method that leverages predictive uncertainty to verify dependency quality.

This model-in-the-loop verification ensures each dependency represents measurable information gain rather than spurious correlation.

Models trained on this data demonstrate significant improvements on RULER benchmarks, particularly in tasks requiring distant information.

Extensive ablation studies further validate the necessity and effectiveness of entropy-based verification for long-context training.

Snippet from the RSS feed

Training long-context language models to capture long-range dependencies requires specialized data construction. Current approaches, such as generic text concatenation or heuristic-based variants, frequently fail to guarantee genuine long-range dependenci

You might also wanna read

Parametric Memory Law: A Quantitative Framework for Understanding LoRA Memory Capacity in LLMs

This research paper introduces the Parametric Memory Law, a quantitative framework for understanding how Low-Rank Adaptation (LoRA) enables

arxiv.org·1d ago

Bridge-Garden Theory Explains Why Mixing Hard and Soft Labels Improves Knowledge Distillation for LLMs

This research paper investigates knowledge distillation (KD) for language models, specifically why mixing hard labels (sampled tokens) and s

arxiv.org·4d ago

Researchers Develop Method to Predict Real-Time Progress in Reasoning Language Models

This research paper investigates whether real-time progress prediction is feasible for reasoning language models that use long latent chains

arxiv.org·4d ago

AI systems achieve 50% pass rate in standard three-party Turing test, study finds

This paper demonstrates that three current AI systems (when suitably prompted) achieve a pass rate of at least 50% in a standard three-party

pnas.org·4d ago

RICP: A Teacher-Student Framework for Retrieved In-Context Principles from Mistakes in LLMs

This paper introduces Retrieved In-Context Principles (RICP), a novel teacher-student framework for improving Large Language Models (LLMs) t

arxiv.org·5d ago

HSIR: New Method Improves Self-Improvement Training for Large Reasoning Models

This research paper identifies two key problems in self-improvement training for Large Reasoning Models (LRMs): data imbalance (too many sim

arxiv.org·5d ago