Self-play reinforcement learning with minimal human data produces human-compatible autonomous driving policies
By
[Submitted on 11 Jun 2026]
Summary
This paper presents a novel approach to training autonomous driving policies that combines self-play reinforcement learning with a small amount of human demonstration data. While pure self-play methods can produce effective but alien driving behaviors incompatible with humans, and imitation learning requires massive amounts of human data, this method uses only 30 minutes of human demonstrations (2500x less than comparable approaches) as a regularization objective on top of a minimal safe goal-reaching reward. The resulting policies coordinate well with human drivers, complete training in 15 hours on a single consumer-grade GPU, and are fully open-source.
Source
Key quotes
· 3 pulledLike the spice in a good stew, we find that a little human data goes a long way: our method uses only 30 minutes of human demonstrations, 2500x fewer than comparable imitation learning approaches.
A key limitation of this approach is that policies trained through pure self-play can learn effective but alien driving conventions incompatible with people.
Instead of completely discarding human demonstrations, our method treats them as a regularization objective on top of a minimal safe goal-reaching reward.
You might also wanna read
Multi-Agent Reinforcement Learning Reduces Drone Racing Collisions by 50% While Achieving Champion-Level Performance
This article presents research demonstrating that multi-agent reinforcement learning (MARL) enables superhuman performance in shared, dynami
Using Curriculum Learning and PufferLib to Train Superhuman AI Agents for 2048 and Tetris
The article describes using PufferLib, a reinforcement learning framework, to train gaming agents that achieve superhuman performance in 204
Turing-RL: A Reinforcement Learning Approach for Training User Simulators Using Turing Test Rewards
This paper introduces Turing-RL, a novel reinforcement learning approach for training user simulator models that can mimic human users in in
Understanding Reinforcement Learning for Model Training, and future directions with GRAPE
DILLO: A Language-Based World Model for Proactive Agent Steering Without Visual Simulation
This paper introduces DILLO (DIstiLLed Language-ActiOn World Model), a proactive agent steering framework that replaces slow visual simulati
A Cognitive Science-Inspired Framework for Autonomous AI Learning
This article examines the limitations of current AI models in achieving autonomous learning and proposes a new learning architecture inspire
Comments
Sign in to join the conversation.
No comments yet. Be the first.
