Taalas Develops ASIC Chip Running Llama 3.1 at 17,000 Tokens Per Second

beAroundHere

3mo ago· 4 min readenNews

80/100

Golden Brown

Bagelometer↗

The kind of bagel that ruins lesser bagels for you.

Score80TypenewsSentimentpositive

Summary

Taalas, a startup, has developed an ASIC chip that runs Llama 3.1 8B model at 17,000 tokens per second, which is equivalent to generating about 30 A4 pages of text per second. The company claims their chip offers 10x lower ownership costs, 10x less power consumption, and 10x faster performance compared to state-of-the-art GPU-based inference systems. The key innovation is that they've "hardwired" or "printed" the model's weights directly onto the chip, essentially creating specialized hardware optimized for this specific AI model.

Key quotes

· 5 pulled

A startup called Taalas, recently released an ASIC chip running Llama 3.1 8B (3/6 bit quant) at an inference rate of 17,000 tokens per seconds.

That's like writing around 30 A4 sized pages in one second.

They claim it's 10x cheaper in ownership cost than GPU based inference systems and is 10x less electricity hog.

And yeah, about 10x faster than state of art inference.

I tried to read through their blog and they've literally 'hardwired' the model's weights on chip.

Snippet from the RSS feed

or how to generate 17000 tokens per second?

You might also wanna read

EXO Labs Runs Llama 2 AI Model on 1997 Pentium II Using BitNet Optimization

EXO Labs successfully ran a lightweight Llama 2 AI model on a 1997 Pentium II processor with only 128 MB of RAM by leveraging BitNet's terna

news.bitcoin.com·2d ago

Microsoft Launches Maia 200 AI Accelerator Chip to Compete with Amazon and Google

Microsoft announces the Maia 200, its latest in-house AI accelerator chip built on TSMC's 3nm process. The chip features over 100 billion tr

The Verge·4mo ago

General Compute Launches ASIC-Based Inference Cloud for Faster AI Agent Performance

General Compute is an inference cloud built on ASICs (purpose-built alternatives to Nvidia GPUs) designed specifically for AI inference, not

Product Hunt·1mo ago