Sleep-Like Consolidation Mechanism Improves Long-Context Performance in Transformer Language Models

Transformer-based large language models are increasingly used for long-horizon tasks; however, their attention mechanism scales poorly with context length. To handle this, we study a sleep-like…

Read the full article

juxtapose1mo ago2 min readenInsight

technology science machine learning artificial intelligence research

You might also wanna read

Sleep Paradigm for LLMs: Memory Consolidation and Self-Improvement Through Dreaming

The past few decades have witnessed significant advances in the design of machine learning algorithms, from early studies on task-specific s

arxiv.org·4d ago

Flash-MSA Method Aims to Speed Up AI Training on Million-Token Sequences

Researchers have introduced Flash-MSA, a technique designed to accelerate the training of large language models on very long sequences of up

ShortSingh·5d ago

Per-Token Fixed-Point Convergence in Depth-Recurrent Transformers

arXiv:2607.14427v1 Announce Type: new Abstract: A depth-recurrent transformer applies a weight-tied core a variable number of times, and pri

machinebrief.com·1d ago

Study Finds Larger Language Models Delay But Don't Prevent Plasticity Loss During Training

The loss of plasticity - the ability of a network to learn new information after having already learned older information - is a fundamental

arxiv.org·22d ago

Prompt Compression via Activation Aggregation

arXiv:2607.08399v1 Announce Type: new Abstract: Large language models process prompts by propagating activations through dozens of layers be

machinebrief.com·8d ago

Context Tuning: Efficient LLM Adaptation via Direct Memory Representation Optimization

Context Tuning directly optimizes an LLM's memory representation for efficient adaptation without updating model weights.

agenticlearning.ai·8d ago

Comments

No comments yet. Be the first.