All Topics

Technology

Art

Local Speech-to-Speech AI Assistant Technologies and Recommendations

dsrtslnd23

4mo ago· 4 min readen

95/100

Golden Brown

Bagelometer↗

Sesame, salt, and substance. A flagship bake.

Score95Typehow-toSentimentpositive

Summary

The article discusses local/open speech-to-speech setups for AI assistants, focusing on technologies that run entirely locally in browsers without cloud dependencies. The author shares their experience building a local assistant using web-first technologies that fits small language models in memory and handles speech-to-text and text-to-speech without stuttering. Recommendations include vosk-browser for speech recognition, vits-web for text-to-speech, and KittenTTS for its size/performance ratio, though the latter requires custom JavaScript integration since it's a Python project.

Key quotes

· 4 pulled

I have a great local assistant that works end-to-end with voice. It's built on local, web-first technologies, it fits small LLMs in memory and manages inference and TTS/STT without stuttering.

If you want something simple that runs in browser, look at vosk-browser[0] and vits-web[1].

I'd also recommend checking out KittenTTS[2], I use it and it's great for the size/performance.

However, you'd need to implement a custom JavaScript harness for the model since it's a python project.

Snippet from the RSS feed

I have a great local assistant that works end-to-end with voice. It's built on local, web-first technologies, it fits small LLMs in memory and manages inference and TTS/STT without stuttering. I've been shaping it up over a couple years and constantly swi

You might also wanna read

KugelAudio launches real-time TTS with voice cloning, sub-60ms latency, and on-premise deployment

KugelAudio launches a real-time text-to-speech model with voice cloning capabilities on Product Hunt. The model can clone a voice from just

Product Hunt·4d ago

Microsoft Launches MAI-Voice-1 Speech Generation Model with Sub-Second Audio Processing

Microsoft has launched MAI-Voice-1, a highly efficient speech generation model that can generate a full minute of audio in under a second on

Product Hunt·9mo ago

OpenWispr: A Local Open-Source AI Speech-to-Text Model

OpenWispr is an open-source AI speech-to-text model that operates entirely locally, offering 3-5x faster transcription than typing. It is de

Product Hunt·10mo ago

Raspberry Pi Can Run AI Assistants Like OpenClaw, But Needs Cloud LLM for Practical Use

A Raspberry Pi can run an AI assistant like OpenClaw, but it is only practical when paired with a cloud-based LLM. Running it fully locally

raspberrytips.com·1d ago

Building a Trustworthy Personal AI Assistant: Architecture and Security Trade-offs

The author describes building a personal AI assistant to manage the chaos of running multiple parallel projects (family, company, relocation

paragraph.com·5d ago