All Topics

Technology

Art

OpenAI's GPT OSS 120B Model Now Available on Cerebras Inference Cloud

samspenc

6mo ago· 3 min readenNews

80/100

Golden Brown

Bagelometer↗

Crackling crust, pillowy middle. The kind of bagel that earns a second cup of coffee.

Score80TypenewsSentimentpositive

Summary

OpenAI's GPT OSS 120B model is now available on Cerebras' Inference Cloud, offering high-speed AI inference performance. The 120-billion parameter mixture-of-expert model delivers accuracy comparable to OpenAI's o4-mini while achieving speeds up to 3,000 tokens per second. The model features 131K context length and is priced at $0.25 per million input tokens and $0.69 per million output tokens. Cerebras positions itself as a platform for fast AI training and inference.

Key quotes

· 4 pulled

The first open weight reasoning model by OpenAI, OSS 120B delivers model accuracy that rivals o4-mini while running at up to 3,000 tokens per second on the Cerebras Inference Cloud.

Reasoning tasks that take up to a minute to complete on GPUs finish in just one second on Cerebras.

OSS 120B is available today with 131K context at $0.25 per M input tokens and $0.69 per M output tokens.

GPTOSS120B is a 120 billion parameter mixture-of-expert model that delivers near parity performance with OpenAI's popular o4mini on core reasoning benchmarks.

Snippet from the RSS feed

Cerebras is the go-to platform for fast and effortless AI training. Learn more at cerebras.ai.

You might also wanna read

OpenAI Launches Free GPT-OSS Model for Laptops with Customization Options

OpenAI has introduced GPT-OSS, a free open-weight model available in two variants (120-billion-parameter and 20-billion-parameter) that can

The Verge·9mo ago

MiniCPM 4.0: Open-source 8B multimodal AI model outperforms GPT-4o and Gemini Pro on vision benchmarks

MiniCPM 4.0 is an ultra-efficient 8B open-source multimodal AI model designed for on-device use that outperforms larger models like GPT-4o a

Product Hunt·9mo ago

General Compute Launches ASIC-Based Inference Cloud for Faster AI Agent Performance

General Compute is an inference cloud built on ASICs (purpose-built alternatives to Nvidia GPUs) designed specifically for AI inference, not

Product Hunt·1mo ago

Microsoft Integrates OpenAI's Open GPT Model into Windows AI Foundry

Microsoft has integrated OpenAI's new lightweight and open GPT model, gpt-oss-20b, into Windows AI Foundry, making it accessible for Windows

The Verge·9mo ago

Arcee AI Launches Trinity-Large-Thinking: Open-Source AI Model Matching Opus 4.6 Performance at 96% Lower Cost

Arcee AI has launched Trinity-Large-Thinking, an open-source AI model that claims to match the performance of OpenAI's Opus 4.6 while being

Product Hunt·1mo ago

MiniCPM 4.0: Ultra-Efficient Open-Source AI Models for On-Device Deployment

MiniCPM 4.0 is a family of ultra-efficient, open-source AI models designed for on-device deployment, offering significant speed improvements

Product Hunt·11mo ago