TurboQuant: Compressing AI Vectors to 2-4 Bits Using Random Rotations
TurboQuant is a novel compression technique for AI vectors (KV caches, embeddings, attention keys) that compresses each coordinate to 2-4 bits per number without losing accuracy. The key insight is that in high dimensions, a random rotation transforms input vectors into ones with known coordinate distributions, enabling provably near-optimal distortion with
arkaung.github.io1mo ago