All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

KugelAudio launches real-time TTS with voice cloning, sub-60ms latency, and on-premise deployment

By

Viktor Presber

4d ago· 1 min readenProduct

Summary

KugelAudio launches a real-time text-to-speech model with voice cloning capabilities on Product Hunt. The model can clone a voice from just 30-60 seconds of audio, achieves sub-60ms latency (excluding network), supports input/output streaming, and offers both API and on-premise deployment options. It features grammar-aware normalization across 25+ languages for natural reading of phone numbers, IBANs, addresses, and medications, plus word-level timestamps and IPA support. Built by a team of 4 in Berlin (currently in SF for YC), it includes adapters for LiveKit, Pipecat, and Vapi.

Key quotes

· 4 pulled
Building for the future, which we believe will be conversational.
You can clone a voice from 30 to 60 seconds of audio. Drop in a short sample and you get a working voice immediately.
We optimized for voice agents with latencies below 60ms (excl. network), input streaming and output streaming.
We offer on premise support, run the model in your own cluster instead of calling our API.
Snippet from the RSS feed
Most natural real-time TTS with voice cloning and sub-60ms latency, on-prem or via API. Grammar-aware normalization reads phone numbers, IBANs, addresses, and medications naturally across 25+ languages, with word-level timestamps and IPA support. Adapters

You might also wanna read