All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Study Finds Larger Language Models Delay But Don't Prevent Plasticity Loss During Training

By

[Submitted on 23 Jun 2026]

1h ago· 2 min readenInsight

Summary

This research paper investigates whether loss of plasticity (the inability of a neural network to learn new information after training on older data) remains a problem in modern transformer-based large language models. The authors study GPT-style Transformers (5M to 314M parameters) trained on a multilingual continual learning problem and find evidence of plasticity loss across all model sizes, measured by deterioration on a held-out Vietnamese probing task. They discover that plasticity loss onset follows a predictable scaling law, growing sublinearly with model size, suggesting larger models delay but don't prevent the phenomenon. Additionally, plasticity loss was observed even under stationary (non-continual) training, challenging the view that it's exclusive to continual learning scenarios.

Source

bskyStudy Finds Larger Language Models Delay But Don't Prevent Plasticity Loss During Trainingarxiv.org

Key quotes

· 4 pulled
The loss of plasticity - the ability of a network to learn new information after having already learned older information - is a fundamental challenge in creating artificial neural networks capable of continual learning.
These results suggest that larger models may delay the measurable effects of plasticity loss, but that increasing parameter count alone is likely to be insufficient to completely prevent it.
We also find evidence of plasticity loss under stationary multilingual training, challenging the view that the phenomenon is exclusive to continual learning with abrupt task changes.
Overall, our results suggest that even large Transformer language models trained on natural-language will eventually lose the ability to efficiently adapt to new data after sufficiently long training, in both continual and stationary settings.
Snippet from the RSS feed
The loss of plasticity - the ability of a network to learn new information after having already learned older information - is a fundamental challenge in creating artificial neural networks capable of continual learning. Although this phenomenon has been

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.