All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Erm: A Local CLI Tool for Removing Speech Disfluencies from Audio Recordings

By

"Doug Calobrisi"

3h ago· 8 min readen

Summary

A developer built "erm," a free, open-source, local CLI tool that removes disfluencies (ums, uhs, ers) from English speech recordings. It uses faster-whisper for transcription, audio-level detectors, and ffmpeg to produce cleaned audio files and JSON cut lists. The tool runs entirely locally, respects privacy, and is designed for podcasters, voiceover artists, and anyone editing spoken audio.

Key quotes

· 3 pulled
Linguists have a word for the ums, uhs, ers, and elongated versions (ummmm, uhhhhh) that pad spoken English: disfluencies.
I don't record a lot of voice audio, but a few friends do, and they tell me editing those out by hand is miserable. So I built erm to do it.
That's the whole interface for the common case. It writes a cleaned .wav and a JSON cut list next to the
Snippet from the RSS feed
"A tour of erm, a local CLI that removes disfluencies from English speech recordings using faster-whisper, a few detectors that look at the audio directly, and ffmpeg."

You might also wanna read