Why LLMs Are So Expensive: The Quadratic Cost of Dense Attention — and How Subquadratic Claims to Fix It
By
Shashi Bellamkonda
Summary
This article explains the fundamental computational bottleneck behind large language models: dense attention, which requires every token to compare against every other token, creating quadratic compute scaling (O(n²)). As context windows grow, this becomes exponentially more expensive. The piece introduces Subquadratic's SubQ model as a potential breakthrough that breaks this quadratic constraint through independent benchmarks. It explores the architectural implications for enterprise teams deploying AI at scale, including cost modeling, inference efficiency, and the trade-offs between attention mechanisms.
Source
Key quotes
· 3 pulledEach dot is a token, roughly a word or part of a word. Each line is a computation the model runs to figure out how that token relates to every other token in the document.
The technical term is dense attention. The practical result is that every LLM compares every word to every other word.
Subquadratic's SubQ model posts independent benchmarks suggesting that constraint is breakable.
You might also wanna read
New Method Enables Constant-Cost Self-Attention Computation for Transformers
Researchers present a novel mathematical approach to compute self-attention in Transformer AI models with constant cost per token, rather th
Subquadratic launches AI architecture with 12-million-token context window, outperforming GPT-5.5 on retrieval benchmarks
Subquadratic has launched a new AI architecture featuring a 12-million-token context window, shattering the current million-token standard s
The New Stack·1mo agoHow New Open-Weight LLMs Are Reducing Long-Context Costs: KV Sharing, Attention Budgeting, and Compressed Attention
The article analyzes recent developments in open-weight LLM architectures, focusing on how newer models like Gemma 4 and DeepSeek V4 are imp
Understanding Continuous Batching in Large Language Models: From Attention Mechanisms to Throughput Optimization
This technical blog post explains continuous batching in large language models (LLMs) by starting from first principles of attention mechani
Subquadratic Launches SubQ 1.1 Small LLM with Competitive Benchmark Performance
Subquadratic introduces SubQ 1.1 Small, a new LLM that balances long-context optimization with general reasoning. The model achieves strong

Comments
Sign in to join the conversation.
No comments yet. Be the first.