Terminal-Bench-RL Project Advances Terminal Agent Training with Reinforcement Learning
By
Danau5tin
10mo ago· 10 min readenCode
100/100
Golden Brown
Bagelometer↗
Hot, fresh, and worth queueing round the block for.
Score100TypenewsSentimentpositive
Summary
The article discusses the Terminal-Bench-RL project, which extends the rLLM framework by UC Berkeley Sky Lab to train long-horizon terminal agents using reinforcement learning. The project leverages a high-cost compute setup (32x H100 GPUs) to train the Qwen3-32B agent, which has achieved top performance on Stanford's TerminalBench leaderboard. The work highlights advancements in terminal-based agent training and infrastructure.
Key quotes
· 3 pulledThis project builds upon the rLLM framework developed by UC Berkeley Sky Lab, extending it with custom environments and infrastructure specifically designed for terminal-based agent training.
Training code running at full throttle on 32x H100's, distributed across a 4x bare metal node cluster, training Qwen3-32B.
Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.
GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard. - Danau5tin/terminal-bench-rl

