Efficient Training of Diffusion Models with Token Routing (TREAD)

Diffusion models have emerged as the mainstream approach for visual generation. However, these models typically suffer from sample inefficiency and high training costs. Consequently, methods for…

Read the full article

fzliu11mo ago2 min readenInsight

technology science artificial intelligence machine learning

You might also wanna read

Google's DiffusionGemma achieves 4x faster text generation using diffusion-based approach

An overview of DiffusionGemma, an exceptionally fast text generation model with up to 4x faster speeds.

deepmind.google·1mo ago

DiffusionGemma: The Developer Guide

DiffusionGemma is an experimental text-generation model built on the Gemma 4 architecture that uses diffusion-based parallel generation inst

Google Ads Developer Blog

Self-Routing: A Parameter-Free Approach to Expert Routing in Mixture-of-Experts Models

Mixture-of-Experts (MoE) layers increase model capacity by activating only a small subset of experts per token, and typically rely on a lear

arxiv.org·8d ago

Three training-time interventions improve diffusion-based speculative decoding by 21-76%

Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs s

arxiv.org·22d ago

OrbitQuant: Efficient Quantization for Diffusion Transformers

OrbitQuant is introduced, a method for post-training quantization of diffusion transformers without the need for calibration data. The techn

mlllm.io·10d ago

Google's DiffusionGemma open AI model offers 4x faster text generation but faces accuracy trade-offs

Diffusion AI is most common in image generation, but it can make text outputs much faster.

arstechnica.com·1mo ago

Comments

No comments yet. Be the first.