All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Local Speech-to-Speech AI Assistant Technologies and Recommendations

By

dsrtslnd23

4mo ago· 4 min readen

Summary

The article discusses local/open speech-to-speech setups for AI assistants, focusing on technologies that run entirely locally in browsers without cloud dependencies. The author shares their experience building a local assistant using web-first technologies that fits small language models in memory and handles speech-to-text and text-to-speech without stuttering. Recommendations include vosk-browser for speech recognition, vits-web for text-to-speech, and KittenTTS for its size/performance ratio, though the latter requires custom JavaScript integration since it's a Python project.

Key quotes

· 4 pulled
I have a great local assistant that works end-to-end with voice. It's built on local, web-first technologies, it fits small LLMs in memory and manages inference and TTS/STT without stuttering.
If you want something simple that runs in browser, look at vosk-browser[0] and vits-web[1].
I'd also recommend checking out KittenTTS[2], I use it and it's great for the size/performance.
However, you'd need to implement a custom JavaScript harness for the model since it's a python project.
Snippet from the RSS feed
I have a great local assistant that works end-to-end with voice. It's built on local, web-first technologies, it fits small LLMs in memory and manages inference and TTS/STT without stuttering. I've been shaping it up over a couple years and constantly swi

You might also wanna read