EAGLE 3.1: Collaborative Speculative Decoding Update Improves LLM Performance and Robustness

The EAGLE series — including EAGLE 1, EAGLE 2, and EAGLE 3 — has become one of the most widely adopted and practically deployed families of speculative decoding algorithms across both research and...

Read the full article

berlianta1mo ago4 min readen

technology programming

You might also wanna read

P-EAGLE: Faster LLM inference with Parallel Speculative Decoding in vLLM

How P-EAGLE brings parallel speculative decoding to vLLM by generating multiple draft tokens in one forward pass, with pre-trained drafter h

vLLM·4mo ago

EAGLE-3 Speculative Decoding on AMD Instinct GPUs: Training and Serving with vLLM and AMD Quark

How AMD Quark trains, quantizes, and serves EAGLE-3 speculative-decoding drafts with vLLM on AMD Instinct GPUs, delivering up to 2.00x throu

vLLM·4d ago

Accelerating Large-Scale LLM Inference on AMD Instinct MI350X/MI355X with Eagle3 and AMD Quark

Large language model (LLM) inference is increasingly constrained by autoregressive decoding. Even when prefill is highly optimized, the deco

AMD·14d ago

Roofline Model for Estimating Speculative Decoding Speedup in LLM Inference

A roofline model for estimating the optimal speculative-decoding draft length and the speedup it yields across models, hardware, and batch s

modal.com·25d ago

LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs

Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate

arxiv.org·1mo ago

speculators/examples/train/dspark_qwen3_0_6b_sharegpt_online.sh at main · vllm-project/speculators

A unified library for building, evaluating, and storing speculative decoding algorithms for LLM inference in vLLM - vllm-project/speculators

github.com·14d ago

Comments

No comments yet. Be the first.