Self-Distillation Fine-Tuning (SDFT): A Method for Continual Learning from Demonstrations

Continual learning, enabling models to acquire new skills and knowledge without degrading existing capabilities, remains a fundamental challenge for foundation models. While on-policy reinforcement…

Read the full article

teleforce2mo ago2 min readenInsight

technology science machine learning artificial intelligence research

You might also wanna read

Self-Distillation Fine-Tuning Lets LLMs Learn New Skills Without Forgetting

Self-Distillation Fine-Tuning prevents catastrophic forgetting, enabling LLMs to acquire new skills while retaining prior knowledge. Fine-tu

mischadohler.com·5mo ago

RLCSD: A Contrastive Self-Distillation Method to Fix Style Drift in Reasoning Models

On-policy self-distillation (OPSD) provides dense, token-level supervision for reasoning models by aligning a model's own distribution with

arxiv.org·1mo ago

Feedback Distillation: A New Training Method for Improving LLM Reasoning in Theorem Proving

Post-training for reasoning models typically combines supervised fine-tuning with reinforcement learning from verifiable rewards, most commo

arxiv.org·1mo ago

Evolution Fine-Tuning: Using LLMs to Learn and Transfer Knowledge Across 371 Optimization Tasks

Join the discussion on this paper page

huggingface.co·16d ago

LifeSkill: A Reinforcement Learning Framework for Online Lifelong Learning in LLM Agents

Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifel

arxiv.org·23d ago

Study Reveals How RL and SFT Differently Teach Transformers Chain-of-Thought Reasoning on Sparse Boolean Functions

Transformers can acquire Chain-of-Thought (CoT) capabilities to solve complex reasoning tasks through fine-tuning. Reinforcement learning (R

arxiv.org·1mo ago

Comments

No comments yet. Be the first.