Analyzing Memorization in Transformers Through Loss Landscape Curvature Decomposition

andy12_

6mo ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

A bagel you'd recommend to a friend without hedging.

Score75TypeanalysisSentimentneutral

Summary

This research paper analyzes how memorization manifests in transformer models (both language models and vision transformers) through loss landscape curvature analysis. The study shows that memorized training points exhibit sharper curvature than non-memorized ones, allowing for weight decomposition based on curvature. This enables a weight editing procedure that effectively suppresses memorized data recitation while maintaining model performance. The research finds that fact retrieval and arithmetic tasks are particularly affected by this editing, suggesting these tasks rely on specialized weight directions rather than general-purpose mechanisms.

Key quotes

· 4 pulled

We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature.

This insight is based on prior theoretical and empirical work showing that the curvature for memorized training points is much sharper than non memorized, meaning ordering weight components from high to low curvature can reveal a distinction without explicit labels.

We posit these tasks rely heavily on specialized directions in weight space rather than general purpose mechanisms, regardless of whether those individual datapoints are memorized.

Our work enhances the understanding of memorization in neural networks with practical applications towards removing it, and provides evidence for idiosyncratic, narrowly-used structures involved in solving tasks like math and fact retrieval.

Snippet from the RSS feed

We characterize how memorization is represented in transformer models and show that it can be disentangled in the weights of both language models (LMs) and vision transformers (ViTs) using a decomposition based on the loss landscape curvature. This insigh

You might also wanna read

Multi-Stream LLMs: A Parallel Architecture to Overcome Single-Stream Bottlenecks in Language Models

This paper introduces "Multi-Stream LLMs," a novel approach to overcoming the limitations of current language model architectures that rely

arxiv.org·10d ago

Research Reveals AI Models Show 'Flinch' Effect in Word Probability Allocation

The article presents research on how AI language models exhibit subtle behavioral differences even when they appear 'uncensored.' Researcher

morgin.ai·1mo ago

Stabilizing LLM Behavior: The Assistant Axis Approach to Preventing Harmful Persona Drift

The article discusses how large language models (LLMs) develop character personas during training and introduces the concept of an "Assistan

anthropic.com·4mo ago

OpenAI Research Shows AI Hallucinations Are Mathematically Inevitable in Current Models

OpenAI's research paper provides a rigorous mathematical explanation for why AI language models like ChatGPT inevitably hallucinate (confide

theconversation.com·8mo ago

OpenAI Research Explains Why Language Models Hallucinate and How to Improve Reliability

OpenAI's research paper explains that language models hallucinate because standard training and evaluation procedures reward guessing over a

openai.com·8mo ago

DeepSeek-V4: Hybrid Sparse-Attention Architecture Enables Efficient Million-Token Context Inference

DeepSeek-V4 introduces a hybrid sparse-attention architecture combined with on-policy distillation across domain specialists, enabling 1M-to

artgor.medium.com·12h ago