Helios: A 14B Parameter Real-Time Video Generation Model for Minute-Scale Content
By
tzury
An everything bagel for the brain. Substantive, layered, well-seasoned.
Summary
Helios is a 14B parameter video generation model that achieves real-time performance at 19.5 FPS on a single NVIDIA H100 GPU while supporting minute-scale video generation. The model makes breakthroughs in three key areas: robustness to long-video drifting without anti-drifting heuristics, real-time generation without standard acceleration techniques, and efficient training without parallelism frameworks. It uses a unified input representation supporting text-to-video, image-to-video, and video-to-video tasks, with infrastructure-level optimizations that reduce computational costs below those of smaller 1.3B models.
Key quotes
· 5 pulledHelios, the first 14B video generation model that runs at 19.5 FPS on a single NVIDIA H100 GPU and supports minute-scale generation while matching the quality of a strong baseline.
We make breakthroughs along three key dimensions: (1) robustness to long-video drifting without commonly used anti-drifting heuristics; (2) real-time generation without standard acceleration techniques; and (3) training without parallelism or sharding frameworks.
Helios is a 14B autoregressive diffusion model with a unified input representation that natively supports T2V, I2V, and V2V tasks.
For efficiency, we heavily compress the historical and noisy context and reduce the number of sampling steps, yielding computational costs comparable to -- or lower than -- those of 1.3B video generative models.
Extensive experiments demonstrate that Helios consistently outperforms prior methods on both short- and long-video generation.
You might also wanna read
Lumos-Nexus: A Training-Efficient Two-Stage Framework for High-Fidelity Video Generation with Reasoning Capabilities
Lumos-Nexus is a training-efficient unified video generation framework that addresses the computational challenge of integrating large high-

What pretraining on unlabeled text teaches large language models about language structure
Pretraining on unlabeled text teaches large language models to model the statistical structure of language by optimizing next-token predicti
Apple to present 14 AI research papers at CVPR conference in Denver ahead of WWDC
Apple will present 14 AI research papers at the 2026 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) in Denver next we
ByteDance Releases Lance: A 3B-Parameter Unified Multimodal Model for Image and Video Tasks
ByteDance has released Lance, a 3B-active-parameter native unified multimodal model capable of handling image and video understanding, gener
Decoding AI's Internal Language: How Sparse Autoencoders Help Interpret Neural Activations
This article discusses how AI models like Claude process language through numerical activations, similar to neural activity in the human bra
Allen Institute Releases Objaverse: 800K+ Annotated 3D Objects Dataset
The Allen Institute of Artificial Intelligence has released Objaverse, a massive dataset containing over 800,000 annotated 3D objects. This
