NVIDIA PersonaPlex 7B Enables Real-Time Speech-to-Speech on Apple Silicon via Swift/MLX Library
By
ipotapov
Crisp on the outside, thoughtful on the inside. A keeper.
Summary
The article announces the integration of NVIDIA's PersonaPlex 7B model into a Swift/MLX speech library for Apple Silicon, enabling full-duplex speech-to-speech capabilities. The system allows real-time conversation with a laptop where audio input and output happen simultaneously through a single model, eliminating the traditional three-step pipeline of transcription, processing, and synthesis. The implementation achieves faster-than-real-time performance with 68ms per step and a real-time factor of 0.87, running natively on Apple Silicon hardware.
Key quotes
· 3 pulledWhat if you could talk to your laptop and it talked back — not through a three-step pipeline of transcribe-think-synthesize, but as a single model that listens and speaks at the same time, faster than real-time, streaming audio chunks back as it generates them?
Our speech-swift Swift/MLX speech library now handles full-duplex speech-to-speech with streaming via NVIDIA's PersonaPlex 7B — faster than real-time (~68ms/step, RTF 0.87), alongside ASR, TTS, and multilingual synthesis.
Audio in, audio out, native Swift
You might also wanna read
Russet: On-Device AI Platform for Apple Silicon with MLX Models and Local Processing
Russet is an on-device AI platform for Apple silicon that combines Apple Intelligence with hardware-optimized MLX models. It offers pre-conf
Solo: On-Device AI Speech Transcription and Rewriting App for Apple Silicon
Solo is an AI-powered speech transcription and rewriting application that operates entirely on-device, ensuring privacy by architecture rath
Microsoft Launches MAI-Voice-1 Speech Generation Model with Sub-Second Audio Processing
Microsoft has launched MAI-Voice-1, a highly efficient speech generation model that can generate a full minute of audio in under a second on
Silkwave Voice: macOS App for Simultaneous Microphone and System Audio Recording with On-Device Transcription
Silkwave Voice is a macOS application that allows users to record microphone and system audio simultaneously, with on-device transcription u
Silkwave: Unified AI Workspace for Mac with BYOK Model Support and On-Device Transcription
Silkwave is a unified AI workspace application for Mac that consolidates multiple AI models into a single chat interface using a Bring Your
