Mechanistic interpretability study reveals and disables political censorship circuit in Qwen 3.5 LLM

s314

13d ago· 72 min readenInsight

100/100

Golden Brown

Bagelometer↗

The bagel they save for the regulars. Don't skim, savour.

Score100TypeanalysisSentimentneutral

Summary

This article presents a mechanistic interpretability study of Qwen 3.5, a Chinese LLM, revealing that its political censorship is implemented through a small, identifiable circuit within the model's weights. The author demonstrates how to locate, read, and disable this censorship mechanism by subtracting a specific direction at the writer layer within a particular dose band. The study explores the difference between "writer" and "reader" layers in the model, the brittleness of the censorship, a "Chinese-first phenomenon" where the model prioritizes Chinese government perspectives, and trained-template cells that trigger censorship. The author also shows that the same censorship circuit operates in the model's "thinking" mode, and provides a steering showcase demonstrating how to bypass censorship to access factual information the model was trained to suppress.

Key quotes

· 3 pulled

Qwen3.5-9B's political censorship is a small, identifiable circuit you can find, read, and turn off.

The off switch is sharp but specific: subtract the right direction at the writer layer, within its dose band, and the model gives up the facts it was trained to hide.

Push pa

Snippet from the RSS feed

TL;DR

You might also wanna read

Chinese Military Records Show Years-Long Pursuit of Nvidia AI Chips Despite U.S. Restrictions

An analysis of six years of Chinese procurement records reveals that the People's Liberation Army has been openly seeking Nvidia's AI chips

nytimes.com·1h ago

Pax Silica: How US-led tech dominance threatens Philippine sovereignty

This article analyzes the US-led Pax Silica initiative (established December 2025) as a strategic effort by the US and 14 other high-tech co

links.org.au·1h ago

EU proposes strict cloud rules that could exclude Amazon, Microsoft, and Google from sensitive government tenders

The European Commission is planning to introduce strict criteria for cloud computing services in highly critical state tenders, which could

channelnewsasia.com·1h ago

Privacy concerns rise as federal rule may mandate driver-monitoring tech in all new cars

A federal regulation requiring driver-monitoring technology in all new vehicles to prevent drunk driving is approaching, but raises serious

usatoday.com·1h ago

Military Leaders Warn Pentagon's AI Push Risks War Crimes Without Legislative Oversight

The article criticizes the Pentagon's push to deploy artificial intelligence on the battlefield, warning that autonomous systems without str

resist.bot·2h ago

AI and religion intersect: Papal encyclical and university study critique unguided AI disruption

This opinion piece examines the intersection of AI and religion through two recent developments: Pope Leo XIV's encyclical "Magnifica Humani

theregister.com·2h ago