All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Research on Hallucination-Associated Neurons in Large Language Models: Identification, Impact, and Origins

By

bilsbie

5mo ago· 2 min readenInsight

Summary

This research paper investigates hallucination-associated neurons (H-Neurons) in large language models, examining their identification, behavioral impact, and origins. The study finds that a remarkably sparse subset of neurons (less than 0.1% of total neurons) can reliably predict hallucination occurrences and are causally linked to over-compliance behaviors. These neurons originate in pre-trained base models and remain predictive for hallucination detection, bridging macroscopic behavioral patterns with microscopic neural mechanisms.

Key quotes

· 5 pulled
Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability.
We demonstrate that a remarkably sparse subset of neurons (less than 0.1% of total neurons) can reliably predict hallucination occurrences, with strong generalization across diverse scenarios.
Controlled interventions reveal that these neurons are causally linked to over-compliance behaviors.
We trace these neurons back to the pre-trained base models and find that these neurons remain predictive for hallucination detection, indicating they emerge during pre-training.
Our findings bridge macroscopic behavioral patterns with microscopic neural mechanisms, offering insights for developing more reliable LLMs.
Snippet from the RSS feed
Large language models (LLMs) frequently generate hallucinations -- plausible but factually incorrect outputs -- undermining their reliability. While prior work has examined hallucinations from macroscopic perspectives such as training data and objectives,

You might also wanna read