KugelAudio launches real-time TTS with voice cloning, sub-60ms latency, and on-premise deployment
By
Viktor Presber
Leave it on the tray for the seagulls.
Summary
KugelAudio launches a real-time text-to-speech model with voice cloning capabilities on Product Hunt. The model can clone a voice from just 30-60 seconds of audio, achieves sub-60ms latency (excluding network), supports input/output streaming, and offers both API and on-premise deployment options. It features grammar-aware normalization across 25+ languages for natural reading of phone numbers, IBANs, addresses, and medications, plus word-level timestamps and IPA support. Built by a team of 4 in Berlin (currently in SF for YC), it includes adapters for LiveKit, Pipecat, and Vapi.
Key quotes
· 4 pulledBuilding for the future, which we believe will be conversational.
You can clone a voice from 30 to 60 seconds of audio. Drop in a short sample and you get a working voice immediately.
We optimized for voice agents with latencies below 60ms (excl. network), input streaming and output streaming.
We offer on premise support, run the model in your own cluster instead of calling our API.
You might also wanna read
Real-Time Voice Cloning Implementation Using SV2TTS Deep Learning Framework
This repository implements a real-time voice cloning system called SV2TTS (Transfer Learning from Speaker Verification to Multispeaker Text-
Kitten TTS: A Lightweight 25MB AI Voice Model for CPU-Based Speech Synthesis
The article introduces Kitten TTS, a groundbreaking 25MB AI voice model that operates efficiently on CPUs without requiring GPUs or expensiv
algogist.com·9mo agoHow OpenAI rebuilt its WebRTC stack for low-latency voice AI at scale
OpenAI rearchitected its WebRTC stack to address three key constraints for real-time voice AI: low-latency audio delivery, global scale, and
Building a Sub-500ms Latency Voice Agent: Technical Architecture and Implementation
Nick Tikhonov shares his technical journey building a sub-500ms latency voice agent from scratch, detailing the challenges of achieving real
OpenAI Releases Realtime API with Production Voice Agent Features and Advanced GPT-Realtime Model
OpenAI has made its Realtime API generally available with new production-ready features for voice agents, including support for remote MCP s
Local Speech-to-Speech AI Assistant Technologies and Recommendations
The article discusses local/open speech-to-speech setups for AI assistants, focusing on technologies that run entirely locally in browsers w
