Fast-dLLM: Training-Free Acceleration Method for Diffusion Language Models Using KV Cache and Parallel Decoding

Diffusion-based large language models (Diffusion LLMs) have shown promise for non-autoregressive text generation with parallel decoding capabilities. However, the practical inference speed of…

Read the full article

nathan-barry8mo ago2 min readenInsight

technology artificial intelligence programming machine learning research

You might also wanna read

Accelerating GPU Inference of Large Language Models with Moderately Unstructured Sparse Weight Matrices

arXiv:2607.08786v1 Announce Type: new Abstract: With the growing deployment of large language models (LLMs), LLM inference cost has become a

machinebrief.com·3d ago

LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate

arxiv.org·1mo ago

BlockServe: Block-Grained Continuous Batching for High-Throughput Diffusion LLM Serving

arXiv:2607.08930v1 Announce Type: new Abstract: Efficient serving of diffusion large language models (dLLMs) is hindered by convergence hete

machinebrief.com·3d ago

SemDLM+: Improving Diffusion Language Models by Balancing Bias and Variance in Transition Kernel Design

Diffusion Language Models (DLMs) have demonstrated strong scaling capacity as alternatives to autoregressive language models. However, their

arxiv.org·1mo ago

RTP-LLM: Alibaba's High-Performance Inference Engine for Large Language Model Deployment

Large Language Models (LLMs) have revolutionized AI applications, but deploying them at scale presents significant challenges. We present RT

arxiv.org·1mo ago

iLLaDA: An 8B Masked Diffusion Language Model Trained with Bidirectional Attention

Modern large language models are predominantly trained with autoregressive factorization and causal attention. We present \emph{iLLaDA}, an

arxiv.org·21d ago

Comments

No comments yet. Be the first.