NVIDIA Releases Nemotron-TwoTower-30B-A3B: A Block-Wise Diffusion Language Model
Summary
NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, a block-wise autoregressive diffusion language model built on the Nemotron-3-Nano-30B-A3B backbone. Unlike traditional autoregressive models that generate tokens one at a time, this model generates text by iteratively denoising blocks of tokens in parallel. The model was developed between September 2025 and April 2026, with pre-training data cutoff of June 25, 2025. The page provides model architecture details, comparison with the autoregressive baseline, and links to the model on Hugging Face.
Source
Key quotes
· 3 pulledNemotron-TwoTower-30B-A3B-Base-BF16 is a block-wise autoregressive diffusion language model built on the NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 backbone.
It generates text by iteratively denoising blocks of tokens in parallel rather than one token at a time.
We're on a journey to advance and democratize artificial intelligence through open source and open science.
You might also wanna read
NVIDIA launches Nemotron 3 Ultra, a 550B open-weight AI model for long-running enterprise agents
NVIDIA has announced Nemotron 3 Ultra, a 550-billion-parameter open-weight Mixture-of-Experts language model (with 55 billion active paramet
Workers AI - NVIDIA Nemotron 3 Super now available on Workers AI
NVIDIA Optimizes Google DeepMind's DiffusionGemma for Faster Parallel Text Generation on RTX GPUs
Google DeepMind has released DiffusionGemma, an experimental open model that generates text in parallel rather than one token at a time, ena
Jet-Nemotron: Hybrid Language Model Architecture with PostNAS Achieves High Efficiency and Accuracy
Jet-Nemotron is a new family of hybrid-architecture language models that achieves comparable or superior accuracy to leading models like Qwe
NVIDIA Tests DFlash, a Block-Diffusion Method to Accelerate LLM Inference on GPUs
NVIDIA is testing DFlash, a new method that accelerates LLM inference by replacing sequential speculative drafting with a block-diffusion mo
Orthrus: A Dual-Architecture Framework for Fast, Lossless LLM Inference via Diffusion Decoding
Orthrus is a dual-architecture framework that combines autoregressive LLMs with diffusion models to enable fast, lossless parallel token gen

Comments
Sign in to join the conversation.
No comments yet. Be the first.