Inworld launches TTS-2 with cross-lingual synthesis and natural language voice control
By
Aleksey Tikhonov
Sesame, salt, and substance. A flagship bake.
Summary
Inworld announces TTS-2, the successor to their #1 ranked text-to-speech model (TTS 1.5), featuring six major upgrades including natural language voice direction, text-based voice design, cross-lingual synthesis across 100+ languages, IPA phonetic control, and improved pronunciation. The company offers a unified API platform combining speech-to-text, LLM routing, and top-ranked TTS for developers building voice agents, AI companions, and conversational applications.
Key quotes
· 5 pulledRealtime TTS 1.5 is #1 on Artificial Analysis, voted best in blind tests by thousands of real users.
TTS-2 builds on that with six major upgrades: natural language voice direction for tone, emotion, speed, and pitch.
Cross-lingual synthesis across 100+ languages preserving speaker identity.
One platform with speech-to-text, an LLM router, and the top-ranked text-to-speech, all connected on a single API so context flows between every layer.
Used by developers building voice agents, AI companions, and conversational apps.
You might also wanna read
OpenAI Releases Realtime API with Production Voice Agent Features and Advanced GPT-Realtime Model
OpenAI has made its Realtime API generally available with new production-ready features for voice agents, including support for remote MCP s
Mistral AI Releases Voxtral Transcribe 2 Speech-to-Text Models with Real-time Capabilities
Mistral AI has released Voxtral Transcribe 2, a new generation of speech-to-text models featuring state-of-the-art transcription quality, di
