RegMix-D: A Dynamic Data Mixing Method for LLM Pretraining Using Proxy Training Trajectories
By
[Submitted on 17 Jun 2026]
Summary
RegMix-D is a new method for dynamic data mixture selection in Large Language Model pretraining. It extends the static RegMix approach by leveraging full loss trajectories from proxy runs (not just endpoint losses) to predict optimal data mixtures at multiple training stages. It supports offline (pre-generated schedule) and online (adaptive during training) deployment modes. Experiments on 25B tokens of the Pile dataset with a 1B parameter model show RegMix-D outperforms RegMix and DoReMi across 13 downstream tasks, while being proxy-efficient — surpassing RegMix with only 128 proxy models (25% of RegMix's proxy compute budget).
Source
Key quotes
· 4 pulledOur key observation is that proxy runs produce not only endpoint losses, but also full loss trajectories, which can be used to further improve data mixture.
RegMix-D supports two deployment modes: an offline variant that generates a complete mixture schedule before target training, and an online variant that adapts the mixture during training using observed loss.
Experiments on 25B tokens of the Pile dataset with a 1B parameter target model show that RegMix-D consistently improves over RegMix and DoReMi across 13 downstream tasks while remaining proxy-efficient.
It surpasses RegMix even with only 128 proxy models (25% of RegMix's proxy compute budget).
You might also wanna read
LLM-Deflate: Reversing Model Training to Extract Structured Datasets from Large Language Models
LLM-Deflate is a novel technique that reverses the training process of Large Language Models by systematically extracting structured dataset
Fast-dLLM: Training-Free Acceleration Method for Diffusion Language Models Using KV Cache and Parallel Decoding
Researchers introduce Fast-dLLM, a training-free acceleration method for diffusion-based large language models that addresses their slower i
LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs
This paper introduces LK losses, a novel training objective for speculative decoding in large language models (LLMs). Speculative decoding a
EntropyLong: Using Predictive Uncertainty to Improve Long-Context Language Model Training
Researchers propose EntropyLong, a novel data construction method for training long-context language models that uses predictive uncertainty
SparseLoCo: Communication-Efficient LLM Training with Extreme Compression via Sparsification and Quantization
SparseLoCo is a new communication-efficient training algorithm for Large Language Models (LLMs) that combines Top-k sparsification and quant
Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration
Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-

Comments
Sign in to join the conversation.
No comments yet. Be the first.