All Topics

Technology

Art

Terminal-Bench-RL Project Advances Terminal Agent Training with Reinforcement Learning

Danau5tin

10mo ago· 10 min readenCode

100/100

Golden Brown

Bagelometer↗

Hot, fresh, and worth queueing round the block for.

Score100TypenewsSentimentpositive

Summary

The article discusses the Terminal-Bench-RL project, which extends the rLLM framework by UC Berkeley Sky Lab to train long-horizon terminal agents using reinforcement learning. The project leverages a high-cost compute setup (32x H100 GPUs) to train the Qwen3-32B agent, which has achieved top performance on Stanford's TerminalBench leaderboard. The work highlights advancements in terminal-based agent training and infrastructure.

Key quotes

· 3 pulled

This project builds upon the rLLM framework developed by UC Berkeley Sky Lab, extending it with custom environments and infrastructure specifically designed for terminal-based agent training.

Training code running at full throttle on 32x H100's, distributed across a 4x bare metal node cluster, training Qwen3-32B.

Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard.

Snippet from the RSS feed

GRPO training code which scales to 32xH100s for long horizon terminal/coding tasks. Base agent is now the top Qwen3 agent on Stanford's TerminalBench leaderboard. - Danau5tin/terminal-bench-rl

You might also wanna read

Microsoft Research's ARTIST: Using Reinforcement Learning to Train LLM Agents for Dynamic Tool Use

Microsoft Research's ARTIST framework uses reinforcement learning to train LLM agents to discover when and how to call tools (like search or

dev.to·5d ago