Minimal Transformer Circuits Achieve Perfect Indirect Object Identification with Only Two Attention Heads
This paper presents research on mechanistic interpretability of transformers, specifically training small attention-only models from scratch on a symbolic Indirect Object Identification (IOI) task. The authors find that a single-layer model with just two attention heads achieves