RegMix-D: A Dynamic Data Mixing Method for LLM Pretraining Using Proxy Training Trajectories

[Submitted on 17 Jun 2026]

2d ago· 2 min readenNews

technology science machine learning ai research

Summary

RegMix-D is a new method for dynamic data mixture selection in Large Language Model pretraining. It extends the static RegMix approach by leveraging full loss trajectories from proxy runs (not just endpoint losses) to predict optimal data mixtures at multiple training stages. It supports offline (pre-generated schedule) and online (adaptive during training) deployment modes. Experiments on 25B tokens of the Pile dataset with a 1B parameter model show RegMix-D outperforms RegMix and DoReMi across 13 downstream tasks, while being proxy-efficient — surpassing RegMix with only 128 proxy models (25% of RegMix's proxy compute budget).

Source

Twitter / XRegMix-D: A Dynamic Data Mixing Method for LLM Pretraining Using Proxy Training Trajectoriesarxiv.org

Key quotes

· 4 pulled

Our key observation is that proxy runs produce not only endpoint losses, but also full loss trajectories, which can be used to further improve data mixture.

RegMix-D supports two deployment modes: an offline variant that generates a complete mixture schedule before target training, and an online variant that adapts the mixture during training using observed loss.

Experiments on 25B tokens of the Pile dataset with a 1B parameter target model show that RegMix-D consistently improves over RegMix and DoReMi across 13 downstream tasks while remaining proxy-efficient.

It surpasses RegMix even with only 128 proxy models (25% of RegMix's proxy compute budget).

Snippet from the RSS feed

Data mixture selection is critical for Large Language Model pretraining. Existing methods such as RegMix select a single static mixture by fitting a regression model on small-scale proxy runs. We propose RegMix-D, a simple extension of RegMix to dynamic m

You might also wanna read

LLM-Deflate: Reversing Model Training to Extract Structured Datasets from Large Language Models

LLM-Deflate is a novel technique that reverses the training process of Large Language Models by systematically extracting structured dataset

scalarlm.com·9mo ago

Fast-dLLM: Training-Free Acceleration Method for Diffusion Language Models Using KV Cache and Parallel Decoding

Researchers introduce Fast-dLLM, a training-free acceleration method for diffusion-based large language models that addresses their slower i

arxiv.org·8mo ago

LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs

This paper introduces LK losses, a novel training objective for speculative decoding in large language models (LLMs). Speculative decoding a

arxiv.org·1mo ago

EntropyLong: Using Predictive Uncertainty to Improve Long-Context Language Model Training

Researchers propose EntropyLong, a novel data construction method for training long-context language models that uses predictive uncertainty

arxiv.org·8mo ago

SparseLoCo: Communication-Efficient LLM Training with Extreme Compression via Sparsification and Quantization

SparseLoCo is a new communication-efficient training algorithm for Large Language Models (LLMs) that combines Top-k sparsification and quant

arxiv.org·10mo ago

Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration

Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-

arxiv.org·6mo ago

Comments

No comments yet. Be the first.