OpenAI's GPT OSS 120B Model Now Available on Cerebras Inference Cloud
By
samspenc
Crackling crust, pillowy middle. The kind of bagel that earns a second cup of coffee.
Summary
OpenAI's GPT OSS 120B model is now available on Cerebras' Inference Cloud, offering high-speed AI inference performance. The 120-billion parameter mixture-of-expert model delivers accuracy comparable to OpenAI's o4-mini while achieving speeds up to 3,000 tokens per second. The model features 131K context length and is priced at $0.25 per million input tokens and $0.69 per million output tokens. Cerebras positions itself as a platform for fast AI training and inference.
Key quotes
· 4 pulledThe first open weight reasoning model by OpenAI, OSS 120B delivers model accuracy that rivals o4-mini while running at up to 3,000 tokens per second on the Cerebras Inference Cloud.
Reasoning tasks that take up to a minute to complete on GPUs finish in just one second on Cerebras.
OSS 120B is available today with 131K context at $0.25 per M input tokens and $0.69 per M output tokens.
GPTOSS120B is a 120 billion parameter mixture-of-expert model that delivers near parity performance with OpenAI's popular o4mini on core reasoning benchmarks.
You might also wanna read

OpenAI Launches Free GPT-OSS Model for Laptops with Customization Options
OpenAI has introduced GPT-OSS, a free open-weight model available in two variants (120-billion-parameter and 20-billion-parameter) that can
MiniCPM 4.0: Open-source 8B multimodal AI model outperforms GPT-4o and Gemini Pro on vision benchmarks
MiniCPM 4.0 is an ultra-efficient 8B open-source multimodal AI model designed for on-device use that outperforms larger models like GPT-4o a
General Compute Launches ASIC-Based Inference Cloud for Faster AI Agent Performance
General Compute is an inference cloud built on ASICs (purpose-built alternatives to Nvidia GPUs) designed specifically for AI inference, not

Microsoft Integrates OpenAI's Open GPT Model into Windows AI Foundry
Microsoft has integrated OpenAI's new lightweight and open GPT model, gpt-oss-20b, into Windows AI Foundry, making it accessible for Windows
Arcee AI Launches Trinity-Large-Thinking: Open-Source AI Model Matching Opus 4.6 Performance at 96% Lower Cost
Arcee AI has launched Trinity-Large-Thinking, an open-source AI model that claims to match the performance of OpenAI's Opus 4.6 while being
MiniCPM 4.0: Ultra-Efficient Open-Source AI Models for On-Device Deployment
MiniCPM 4.0 is a family of ultra-efficient, open-source AI models designed for on-device deployment, offering significant speed improvements
