All Topics

Technology

Art

KugelAudio launches real-time TTS with voice cloning, sub-60ms latency, and on-premise deployment

Viktor Presber

4d ago· 1 min readenProduct

38/100

Stale

Bagelometer↗

Leave it on the tray for the seagulls.

Score38Typepress releaseSentimentpositive

Summary

KugelAudio launches a real-time text-to-speech model with voice cloning capabilities on Product Hunt. The model can clone a voice from just 30-60 seconds of audio, achieves sub-60ms latency (excluding network), supports input/output streaming, and offers both API and on-premise deployment options. It features grammar-aware normalization across 25+ languages for natural reading of phone numbers, IBANs, addresses, and medications, plus word-level timestamps and IPA support. Built by a team of 4 in Berlin (currently in SF for YC), it includes adapters for LiveKit, Pipecat, and Vapi.

Key quotes

· 4 pulled

Building for the future, which we believe will be conversational.

You can clone a voice from 30 to 60 seconds of audio. Drop in a short sample and you get a working voice immediately.

We optimized for voice agents with latencies below 60ms (excl. network), input streaming and output streaming.

We offer on premise support, run the model in your own cluster instead of calling our API.

Snippet from the RSS feed

Most natural real-time TTS with voice cloning and sub-60ms latency, on-prem or via API. Grammar-aware normalization reads phone numbers, IBANs, addresses, and medications naturally across 25+ languages, with word-level timestamps and IPA support. Adapters

You might also wanna read

Real-Time Voice Cloning Implementation Using SV2TTS Deep Learning Framework

This repository implements a real-time voice cloning system called SV2TTS (Transfer Learning from Speaker Verification to Multispeaker Text-

github.com·8mo ago

Kitten TTS: A Lightweight 25MB AI Voice Model for CPU-Based Speech Synthesis

The article introduces Kitten TTS, a groundbreaking 25MB AI voice model that operates efficiently on CPUs without requiring GPUs or expensiv

algogist.com·9mo ago

How OpenAI rebuilt its WebRTC stack for low-latency voice AI at scale

OpenAI rearchitected its WebRTC stack to address three key constraints for real-time voice AI: low-latency audio delivery, global scale, and

openai.com·27d ago

Building a Sub-500ms Latency Voice Agent: Technical Architecture and Implementation

Nick Tikhonov shares his technical journey building a sub-500ms latency voice agent from scratch, detailing the challenges of achieving real

ntik.me·3mo ago

OpenAI Releases Realtime API with Production Voice Agent Features and Advanced GPT-Realtime Model

OpenAI has made its Realtime API generally available with new production-ready features for voice agents, including support for remote MCP s

openai.com·9mo ago

Local Speech-to-Speech AI Assistant Technologies and Recommendations

The article discusses local/open speech-to-speech setups for AI assistants, focusing on technologies that run entirely locally in browsers w

news.ycombinator.com·4mo ago