All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Why LLMs Are So Expensive: The Quadratic Cost of Dense Attention — and How Subquadratic Claims to Fix It

By

Shashi Bellamkonda

3h ago· 8 min readenInsight

Summary

This article explains the fundamental computational bottleneck behind large language models: dense attention, which requires every token to compare against every other token, creating quadratic compute scaling (O(n²)). As context windows grow, this becomes exponentially more expensive. The piece introduces Subquadratic's SubQ model as a potential breakthrough that breaks this quadratic constraint through independent benchmarks. It explores the architectural implications for enterprise teams deploying AI at scale, including cost modeling, inference efficiency, and the trade-offs between attention mechanisms.

Source

bskyWhy LLMs Are So Expensive: The Quadratic Cost of Dense Attention — and How Subquadratic Claims to Fix Itshashi.co

Key quotes

· 3 pulled
Each dot is a token, roughly a word or part of a word. Each line is a computation the model runs to figure out how that token relates to every other token in the document.
The technical term is dense attention. The practical result is that every LLM compares every word to every other word.
Subquadratic's SubQ model posts independent benchmarks suggesting that constraint is breakable.
Snippet from the RSS feed
Dense attention's quadratic compute scaling has been the hidden cost driver behind enterprise AI since 2017. Subquadratic's SubQ model posts independent benchmarks suggesting that constraint is breakable. Here is what the architecture actually means for e

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.