All Topics

Technology

Art

Kog AI Launches Inference Engine Tech Preview: 3,000 Tokens/s on AMD MI300X GPUs

Kog Team

2d ago· 18 min readen

100/100

Golden Brown

Bagelometer↗

An everything bagel for the brain. Substantive, layered, well-seasoned.

Score100Typepress releaseSentimentpositive

Summary

Kog AI launches a tech preview of the Kog Inference Engine (KIE), achieving 3,000 output tokens/s per request on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 GPUs (FP16, no speculative decoding) for a 2B model. The engine promises similar speeds for large third-party MoE models in the future. The article argues that AI inference on GPUs can reach speeds comparable to dedicated hardware, presenting benchmarks and technical details about the inference engine's architecture and performance.

Key quotes

· 3 pulled

we show that AI inference on GPUs can be super-fast, reaching the speed regime of dedicated hardware

3,000 output tokens/s per request on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 (FP16, no speculative decoding)

This preview runs a 2B model, with support for large third-party MoE models coming next at similar speeds

Snippet from the RSS feed

Today, Kog AI launches a tech preview of the Kog Inference Engine (KIE): 3,000 output tokens/s per request on 8× AMD MI300X GPUs and 2,100 on 8× NVIDIA H200 (FP16, no speculative decoding). This preview runs a 2B model, with support for large third-party

You might also wanna read

General Compute Launches ASIC-Based Inference Cloud for Faster AI Agent Performance

General Compute is an inference cloud built on ASICs (purpose-built alternatives to Nvidia GPUs) designed specifically for AI inference, not

Product Hunt·1mo ago

Microsoft Launches Maia 200 AI Accelerator Chip to Compete with Amazon and Google

Microsoft announces the Maia 200, its latest in-house AI accelerator chip built on TSMC's 3nm process. The chip features over 100 billion tr

The Verge·4mo ago

AMD Partners with OpenAI to Supply AI Processors in Challenge to Nvidia

AMD has announced a five-year partnership with OpenAI to supply six gigawatts worth of processors for AI data centers, challenging Nvidia's

The Verge·7mo ago

Microsoft Launches First In-House AI Models MAI-Voice-1 and MAI-1-preview

Microsoft has launched its first in-house AI models called MAI-Voice-1 and MAI-1-preview. The MAI-Voice-1 speech model can generate a minute

The Verge·9mo ago