Technology

Art

NVIDIA Releases Nemotron-TwoTower-30B-A3B: A Block-Wise Diffusion Language Model

8d ago· 7 min readen

technology programming

Summary

NVIDIA has released Nemotron-TwoTower-30B-A3B-Base-BF16, a block-wise autoregressive diffusion language model built on the Nemotron-3-Nano-30B-A3B backbone. Unlike traditional autoregressive models that generate tokens one at a time, this model generates text by iteratively denoising blocks of tokens in parallel. The model was developed between September 2025 and April 2026, with pre-training data cutoff of June 25, 2025. The page provides model architecture details, comparison with the autoregressive baseline, and links to the model on Hugging Face.

Source

Twitter / XNVIDIA Releases Nemotron-TwoTower-30B-A3B: A Block-Wise Diffusion Language Modelhuggingface.co

Key quotes

· 3 pulled

Nemotron-TwoTower-30B-A3B-Base-BF16 is a block-wise autoregressive diffusion language model built on the NVIDIA-Nemotron-3-Nano-30B-A3B-Base-BF16 backbone.

It generates text by iteratively denoising blocks of tokens in parallel rather than one token at a time.

We're on a journey to advance and democratize artificial intelligence through open source and open science.

Snippet from the RSS feed

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

You might also wanna read

NVIDIA launches Nemotron 3 Ultra, a 550B open-weight AI model for long-running enterprise agents

NVIDIA has announced Nemotron 3 Ultra, a 550-billion-parameter open-weight Mixture-of-Experts language model (with 55 billion active paramet

gadgetbond.com·28d ago

Workers AI - NVIDIA Nemotron 3 Super now available on Workers AI

Cloudflare·3mo ago

NVIDIA Optimizes Google DeepMind's DiffusionGemma for Faster Parallel Text Generation on RTX GPUs

Google DeepMind has released DiffusionGemma, an experimental open model that generates text in parallel rather than one token at a time, ena

blogs.nvidia.com·23d ago

Jet-Nemotron: Hybrid Language Model Architecture with PostNAS Achieves High Efficiency and Accuracy

Jet-Nemotron is a new family of hybrid-architecture language models that achieves comparable or superior accuracy to leading models like Qwe

arxiv.org·9mo ago

NVIDIA Tests DFlash, a Block-Diffusion Method to Accelerate LLM Inference on GPUs

NVIDIA is testing DFlash, a new method that accelerates LLM inference by replacing sequential speculative drafting with a block-diffusion mo

stechtimes.com·9d ago

Orthrus: A Dual-Architecture Framework for Fast, Lossless LLM Inference via Diffusion Decoding

Orthrus is a dual-architecture framework that combines autoregressive LLMs with diffusion models to enable fast, lossless parallel token gen

github.com·1mo ago

Comments

No comments yet. Be the first.