Understanding Speculative Sampling: Using Draft Distributions to Match Target Sampling Results

Speculative Sampling The idea of speculative sampling is to use a draft sampling to achieve the same sampling result as the target sampling. We have a target sampling distribution $p(x)$ and a draft…

Read the full article

teleforce5mo ago2 min readen

technology machine learning programming algorithms

You might also wanna read

Three training-time interventions improve diffusion-based speculative decoding by 21-76%

Large language models (LLMs) achieve remarkable performance across a wide range of tasks, but their autoregressive decoding process incurs s

arxiv.org·22d ago

DominoTree: Conditional Tree-Structured Drafting with Domino for Speculative Decoding

arXiv:2607.08642v1 Announce Type: new Abstract: Speculative decoding accelerates LLM inference by drafting several tokens and verifying them

machinebrief.com·7d ago

Hardware-Aware Dynamic Speculative Decoding: Optimizing LLM Inference Through Adaptive Draft Token Selection

See how DSD overcomes the limitations of standard SD by controlling the optimal K based on hardware constraints.

cohere.com·6d ago

Weaver: Autoregressive drafting with factorized priors for efficient speculative decoding

Speculative decoding greatly increases the interactivity of autoregressive language models by trading off computation for extra tokens gener

arxiv.org·7d ago

Weaver: Autoregressive drafting with factorized priors for efficient speculative decoding

Speculative decoding greatly increases the interactivity of autoregressive language models by trading off computation for extra tokens gener

arxiv.org·7d ago

DFlash Speculative Decoding Boosts NVIDIA Blackwell Inference Performance Up to 15x

As AI systems move from single-turn interactions to coordinated multiagent workflows, low-latency inference becomes increasingly important.

developer.nvidia.com·7d ago

Verbalized Sampling: A Training-Free Method to Mitigate Mode Collapse and Improve LLM Output Diversity

Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this

arxiv.org·20d ago

Verbalized Sampling: A Training-Free Method to Mitigate Mode Collapse and Improve LLM Output Diversity

Post-training alignment often reduces LLM diversity, leading to a phenomenon known as mode collapse. Unlike prior work that attributes this

arxiv.org·20d ago

Comments

No comments yet. Be the first.