Analyzing Memorization in Transformers Through Loss Landscape Curvature Decomposition
By
andy12_
A bagel you'd recommend to a friend without hedging.
Summary
This research paper analyzes how memorization manifests in transformer models (both language models and vision transformers) through loss landscape curvature analysis. The study shows that memorized training points exhibit sharper curvature than non-memorized ones, allowing for weight decomposition based on curvature. This enables a weight editing procedure that effectively suppresses memorized data recitation while maintaining model performance. The research finds that fact retrieval and arithmetic tasks are particularly affected by this editing, suggesting these tasks rely on specialized weight directions rather than general-purpose mechanisms.
Key quotes
· 4 pulledWe characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature.
This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels.
We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized.
Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiosyncratic, narrowly-used structures involved in solving tasks like math and fact retrieval.
You might also wanna read
Multi-Stream LLMs: A Parallel Architecture to Overcome Single-Stream Bottlenecks in Language Models
This paper introduces "Multi-Stream LLMs," a novel approach to overcoming the limitations of current language model architectures that rely
Research Reveals AI Models Show 'Flinch' Effect in Word Probability Allocation
The article presents research on how AI language models exhibit subtle behavioral differences even when they appear 'uncensored.' Researcher
Stabilizing LLM Behavior: The Assistant Axis Approach to Preventing Harmful Persona Drift
The article discusses how large language models (LLMs) develop character personas during training and introduces the concept of an "Assistan
OpenAI Research Shows AI Hallucinations Are Mathematically Inevitable in Current Models
OpenAI's research paper provides a rigorous mathematical explanation for why AI language models like ChatGPT inevitably hallucinate (confide
theconversation.com·8mo agoOpenAI Research Explains Why Language Models Hallucinate and How to Improve Reliability
OpenAI's research paper explains that language models hallucinate because standard training and evaluation procedures reward guessing over a
DeepSeek-V4: Hybrid Sparse-Attention Architecture Enables Efficient Million-Token Context Inference
DeepSeek-V4 introduces a hybrid sparse-attention architecture combined with on-policy distillation across domain specialists, enabling 1M-to
