All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

RegMix-D: A Dynamic Data Mixing Method for LLM Pretraining Using Proxy Training Trajectories

By

[Submitted on 17 Jun 2026]

2d ago· 2 min readenNews

Summary

RegMix-D is a new method for dynamic data mixture selection in Large Language Model pretraining. It extends the static RegMix approach by leveraging full loss trajectories from proxy runs (not just endpoint losses) to predict optimal data mixtures at multiple training stages. It supports offline (pre-generated schedule) and online (adaptive during training) deployment modes. Experiments on 25B tokens of the Pile dataset with a 1B parameter model show RegMix-D outperforms RegMix and DoReMi across 13 downstream tasks, while being proxy-efficient — surpassing RegMix with only 128 proxy models (25% of RegMix's proxy compute budget).

Source

Twitter / XRegMix-D: A Dynamic Data Mixing Method for LLM Pretraining Using Proxy Training Trajectoriesarxiv.org

Key quotes

· 4 pulled
Our key observation is that proxy runs produce not only endpoint losses, but also full loss trajectories, which can be used to further improve data mixture.
RegMix-D supports two deployment modes: an offline variant that generates a complete mixture schedule before target training, and an online variant that adapts the mixture during training using observed loss.
Experiments on 25B tokens of the Pile dataset with a 1B parameter target model show that RegMix-D consistently improves over RegMix and DoReMi across 13 downstream tasks while remaining proxy-efficient.
It surpasses RegMix even with only 128 proxy models (25% of RegMix's proxy compute budget).
Snippet from the RSS feed
Data mixture selection is critical for Large Language Model pretraining. Existing methods such as RegMix select a single static mixture by fitting a regression model on small-scale proxy runs. We propose RegMix-D, a simple extension of RegMix to dynamic m

You might also wanna read

LLM-Deflate: Reversing Model Training to Extract Structured Datasets from Large Language Models

LLM-Deflate is a novel technique that reverses the training process of Large Language Models by systematically extracting structured dataset

scalarlm.com·9mo ago

Fast-dLLM: Training-Free Acceleration Method for Diffusion Language Models Using KV Cache and Parallel Decoding

Researchers introduce Fast-dLLM, a training-free acceleration method for diffusion-based large language models that addresses their slower i

arxiv.org·8mo ago

LK Losses: A New Training Objective to Optimize Acceptance Rate in Speculative Decoding for LLMs

This paper introduces LK losses, a novel training objective for speculative decoding in large language models (LLMs). Speculative decoding a

arxiv.org·1mo ago

EntropyLong: Using Predictive Uncertainty to Improve Long-Context Language Model Training

Researchers propose EntropyLong, a novel data construction method for training long-context language models that uses predictive uncertainty

arxiv.org·8mo ago

SparseLoCo: Communication-Efficient LLM Training with Extreme Compression via Sparsification and Quantization

SparseLoCo is a new communication-efficient training algorithm for Large Language Models (LLMs) that combines Top-k sparsification and quant

arxiv.org·10mo ago

Ouro: Looped Language Models That Build Reasoning into Pre-Training Through Latent Space Iteration

Researchers introduce Ouro, a family of pre-trained Looped Language Models (LoopLM) that build reasoning capabilities directly into the pre-

arxiv.org·6mo ago

Comments

Sign in to join the conversation.

No comments yet. Be the first.