All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs

By

[Submitted on 27 Feb 2026 (v1), last revised 1 Jun 2026 (this version, v2)]

8d ago· 2 min readenInsight

Summary

This paper introduces LK losses, a novel training objective for speculative decoding in large language models (LLMs). Speculative decoding accelerates LLM inference by using a lightweight draft model to propose tokens that are verified in parallel by the target model. While standard training minimizes KL divergence as a proxy for acceptance rate, small draft models often converge to suboptimal solutions where minimizing KL doesn't maximize acceptance rate. The proposed LK losses directly target acceptance rate optimization. Experiments across four draft architectures and six target models (8B to 685B parameters) show consistent improvements of 8-10% in average acceptance length across general, coding, and math domains. The approach is easy to implement, introduces no computational overhead, and integrates into existing training frameworks.

Key quotes

· 5 pulled
While KL divergence and acceptance rate share the same global optimum, small draft models, having limited capacity, typically converge to suboptimal solutions where minimizing KL does not guarantee maximizing acceptance rate.
We propose LK losses, special training objectives that directly target acceptance rate.
Comprehensive experiments across four draft architectures and six target models, ranging from 8B to 685B parameters, demonstrate consistent improvements in acceptance metrics across all configurations compared to the standard KL-based training.
We evaluate our approach on general, coding and math domains and report gains of up to 8-10% in average acceptance length.
LK losses are easy to implement, introduce no computational overhead and can be directly integrated into any existing speculator training framework, making them a compelling alternative to the existing draft training objectives.
Snippet from the RSS feed
Speculative decoding accelerates autoregressive large language model (LLM) inference by using a lightweight draft model to propose candidate tokens that are then verified in parallel by the target model. The speedup is significantly determined by the acce

You might also wanna read