Helios: A 14B Parameter Real-Time Video Generation Model for Minute-Scale Content

tzury

2mo ago· 1 min readen

80/100

Golden Brown

Bagelometer↗

An everything bagel for the brain. Substantive, layered, well-seasoned.

Score80Typepress releaseSentimentpositive

Summary

Helios is a 14B parameter video generation model that achieves real-time performance at 19.5 FPS on a single NVIDIA H100 GPU while supporting minute-scale video generation. The model makes breakthroughs in three key areas: robustness to long-video drifting without anti-drifting heuristics, real-time generation without standard acceleration techniques, and efficient training without parallelism frameworks. It uses a unified input representation supporting text-to-video, image-to-video, and video-to-video tasks, with infrastructure-level optimizations that reduce computational costs below those of smaller 1.3B models.

Key quotes

· 5 pulled

Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline.

We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics; (2) real-time generation without standard acceleration techniques; and (3) training without parallelism or sharding frameworks.

Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V tasks.

For efficiency, we heavily compress the historical and noisy context and reduce the number of sampling steps, yielding computational costs comparable to -- or lower than -- those of 1.3B video generative models.

Extensive experiments demonstrate that Helios consistently outperforms prior methods on both short- and long-video generation.

Snippet from the RSS feed

View recent discussion. Abstract: We introduce Helios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline. We make breakthroughs along

You might also wanna read

Lumos-Nexus: A Training-Efficient Two-Stage Framework for High-Fidelity Video Generation with Reasoning Capabilities

Lumos-Nexus is a training-efficient unified video generation framework that addresses the computational challenge of integrating large high-

arxiv.org·3h ago

What pretraining on unlabeled text teaches large language models about language structure

Pretraining on unlabeled text teaches large language models to model the statistical structure of language by optimizing next-token predicti

sebastianraschka.com·2d ago

Apple to present 14 AI research papers at CVPR conference in Denver ahead of WWDC

Apple will present 14 AI research papers at the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) in Denver next we

appleinsider.com·3d ago

ByteDance Releases Lance: A 3B-Parameter Unified Multimodal Model for Image and Video Tasks

ByteDance has released Lance, a 3B-active-parameter native unified multimodal model capable of handling image and video understanding, gener

github.com·11d ago

Decoding AI's Internal Language: How Sparse Autoencoders Help Interpret Neural Activations

This article discusses how AI models like Claude process language through numerical activations, similar to neural activity in the human bra

anthropic.com·24d ago

Allen Institute Releases Objaverse: 800K+ Annotated 3D Objects Dataset

The Allen Institute of Artificial Intelligence has released Objaverse, a massive dataset containing over 800,000 annotated 3D objects. This

Product Hunt·26d ago