Mechanistic interpretability study reveals and disables political censorship circuit in Qwen 3.5 LLM
By
s314
The bagel they save for the regulars. Don't skim, savour.
Summary
This article presents a mechanistic interpretability study of Qwen 3.5, a Chinese LLM, revealing that its political censorship is implemented through a small, identifiable circuit within the model's weights. The author demonstrates how to locate, read, and disable this censorship mechanism by subtracting a specific direction at the writer layer within a particular dose band. The study explores the difference between "writer" and "reader" layers in the model, the brittleness of the censorship, a "Chinese-first phenomenon" where the model prioritizes Chinese government perspectives, and trained-template cells that trigger censorship. The author also shows that the same censorship circuit operates in the model's "thinking" mode, and provides a steering showcase demonstrating how to bypass censorship to access factual information the model was trained to suppress.
Key quotes
· 3 pulledQwen3.5-9B's political censorship is a small, identifiable circuit you can find, read, and turn off.
The off switch is sharp but specific: subtract the right direction at the writer layer, within its dose band, and the model gives up the facts it was trained to hide.
Push pa
You might also wanna read
Chinese Military Records Show Years-Long Pursuit of Nvidia AI Chips Despite U.S. Restrictions
An analysis of six years of Chinese procurement records reveals that the People's Liberation Army has been openly seeking Nvidia's AI chips
Pax Silica: How US-led tech dominance threatens Philippine sovereignty
This article analyzes the US-led Pax Silica initiative (established December 2025) as a strategic effort by the US and 14 other high-tech co
EU proposes strict cloud rules that could exclude Amazon, Microsoft, and Google from sensitive government tenders
The European Commission is planning to introduce strict criteria for cloud computing services in highly critical state tenders, which could
Privacy concerns rise as federal rule may mandate driver-monitoring tech in all new cars
A federal regulation requiring driver-monitoring technology in all new vehicles to prevent drunk driving is approaching, but raises serious
Military Leaders Warn Pentagon's AI Push Risks War Crimes Without Legislative Oversight
The article criticizes the Pentagon's push to deploy artificial intelligence on the battlefield, warning that autonomous systems without str
AI and religion intersect: Papal encyclical and university study critique unguided AI disruption
This opinion piece examines the intersection of AI and religion through two recent developments: Pope Leo XIV's encyclical "Magnifica Humani
