Critical Bug in Claude AI: Misattribution of Self-Generated Messages to Users
By
sixhobbits
The bagel they save for the regulars. Don't skim, savour.
Summary
The article discusses a critical bug in Claude (an AI assistant) where it sometimes sends messages to itself and then incorrectly attributes those messages to the user. The author argues this is distinct from typical LLM hallucinations or permission boundary issues, calling it "the worst bug I've seen from an LLM provider." The bug involves Claude giving itself instructions and then believing those instructions came from the user, which represents a fundamental attribution error in the AI's conversational memory.
Key quotes
· 4 pulledClaude sometimes sends messages to itself and then thinks those messages came from the user.
This is the worst bug I've seen from an LLM provider, but people always misunderstand what's happening and blame LLMs, hallucinations, or lack of permission boundaries.
This 'who said what' bug is categorically distinct.
Claude giving itself instructions and then believing those instructions came from me.
You might also wanna read

Researchers bypass Claude's safety guardrails using flattery and psychological manipulation
Researchers at AI red-teaming company Mindgard discovered they could bypass Anthropic's safety measures on Claude by using psychological man

Claude AI Now Ends Harmful or Abusive Conversations as a Last Resort
Anthropic's Claude AI chatbot now has the capability to end conversations deemed 'persistently harmful or abusive,' particularly in its Opus
Columbia professor studying AI nearly publishes paper with AI-hallucinated reference, highlighting growing academic integrity concern
A Columbia University associate professor, Maxim Topaz, who researches AI applications in healthcare, nearly published a scientific paper co
