All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Parakeet.cpp: Fast C++ Implementation of NVIDIA's Speech Recognition Models for On-Device Inference

By

noahkay13

3mo ago· 6 min readenCode

Summary

The article introduces parakeet.cpp, a C++ implementation of NVIDIA's Parakeet speech recognition models optimized for on-device inference. It uses the Axiom tensor library for automatic Metal GPU acceleration on Apple Silicon, achieving significant performance improvements over CPU inference. The implementation eliminates dependencies on Python and ONNX runtime, offering ultra-fast inference times (27ms for 10-second audio on Apple Silicon GPU) and memory efficiency with FP16 support. It supports multiple Parakeet models including English and multilingual variants for offline speech recognition.

Key quotes

· 5 pulled
Fast speech recognition with NVIDIA's Parakeet models in pure C++
Built on axiom — a lightweight tensor library with automatic Metal GPU acceleration
No ONNX runtime, no Python runtime, no heavyweight dependencies. Just C++ and one tensor library that outruns PyTorch MPS
~27ms encoder inference on Apple Silicon GPU for 10s audio (110M model) — 96x faster than CPU
FP16 support for ~2x memory reduction
Snippet from the RSS feed
Ultra fast and portable Parakeet implementation for on-device inference in C++ using Axiom with MPS+Unified Memory - Frikallo/parakeet.cpp

You might also wanna read