All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Verbalized Sampling: A Training-Free Method to Mitigate Mode Collapse and Improve LLM Output Diversity

By

[Submitted on 1 Oct 2025 (v1), last revised 10 Oct 2025 (this version, v3)]

8d ago· 2 min readenInsight

Summary

This paper identifies a fundamental data-level cause of mode collapse in LLM post-training alignment: typicality bias in preference data, where annotators systematically favor familiar text due to cognitive psychology principles. The authors introduce Verbalized Sampling (VS), a training-free prompting strategy that asks models to verbalize a probability distribution over multiple responses. Experiments show VS improves diversity by 1.6-2.1x in creative writing tasks without sacrificing factual accuracy or safety, with more capable models benefiting more from the approach.

Source

Twitter / XVerbalized Sampling: A Training-Free Method to Mitigate Mode Collapse and Improve LLM Output Diversityarxiv.org

Key quotes

· 5 pulled
Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in preference data, whereby annotators systematically favor familiar text as a result of well-established findings in cognitive psychology.
We introduce Verbalized Sampling, a simple, training-free prompting strategy to circumvent mode collapse.
Comprehensive experiments show that VS significantly improves performance across creative writing (poems, stories, jokes), dialogue simulation, open-ended QA, and synthetic data generation, without sacrificing factual accuracy and safety.
In creative writing, VS increases diversity by 1.6-2.1x over direct prompting.
We further observe an emergent trend that more capable models benefit more from VS.
Snippet from the RSS feed
Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this effect to algorithmic limitations, we identify a fundamental, pervasive data-level driver: typicality bias in prefe

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.