All Topics

Technology

Art

Stable Audio 3: Open-Source Latent Diffusion Models for Variable-Length Audio Generation

guardienaveugle

11d ago· 2 min readen

75/100

Toasty

Bagelometer↗

Solid neighbourhood-bakery energy. Trustworthy and warm.

Score75Typepress releaseSentimentpositive

Summary

Stable Audio 3 is a family of latent diffusion models (small, medium, large) for variable-length audio generation and editing. The models can generate several minutes of audio, support inpainting for targeted editing, and use a novel semantic-acoustic autoencoder for efficient latent-space generation. They are trained on licensed and Creative Commons data, can generate music and sounds in under 2 seconds on an H200 GPU, and the small and medium model weights are released open-source for consumer-grade hardware.

Key quotes

· 3 pulled

Our latent diffusion models operate on top of a novel semantic-acoustic autoencoder that projects audio into a compact latent space, enabling efficient diffusion-based generation while preserving audio fidelity and encouraging semantic structure in the latent.

We release the weights of small and medium, that can run on consumer-grade hardware, together with their training and inference pipeline.

Stable Audio 3 models are trained on licensed and Creative Commons data to generate music and sounds in less than a 2s on an H200 GPU and less than a few seconds on a MacBook Pro M4.

Snippet from the RSS feed

Stable Audio 3 is a family of fast latent diffusion models (small, medium, large) for variable-length audio generation and editing. Since our models can generate several minutes of audio, variable-length generations are key to avoid the cost of producing

You might also wanna read

Stability AI Launches Commercial Audio Generation Model for Brand Applications

Stability AI has launched a new commercial audio generation model designed specifically for brand applications, addressing concerns about tr

Product Hunt·8mo ago