AgentGym-RL: A Reinforcement Learning Framework for Training LLM Agents in Multi-Turn Decision Making
By
[Submitted on 10 Sep 2025]
Summary
This paper introduces AgentGym-RL, a unified reinforcement learning framework for training LLM agents to perform multi-turn interactive decision-making across diverse real-world environments. Unlike existing approaches that rely on supervised fine-tuning (SFT), AgentGym-RL trains agents from scratch using RL. The framework features a modular, decoupled architecture supporting mainstream RL algorithms and a wide variety of scenarios. The authors also propose ScalingInter-RL, a training approach that balances exploration and exploitation by initially restricting interactions (emphasizing exploitation) and gradually expanding horizons (encouraging exploration). Experiments show the framework matches or surpasses commercial models on 27 tasks across diverse environments. The complete framework, including code and datasets, will be open-sourced.
Source
Key quotes
· 5 pulledLike human cognitive development, agents are expected to acquire knowledge and skills through exploration and interaction with the environment.
We introduce AgentGym-RL, a new framework to train LLM agents for multi-turn interactive decision-making through RL.
In early stages, it emphasizes exploitation by restricting the number of interactions, and gradually shifts towards exploration with larger horizons to encourage diverse problem-solving strategies.
Our agents match or surpass commercial models on 27 tasks across diverse environments.
We will open-source the complete AgentGym-RL framework — including code and datasets — to empower the research community in developing the next generation of intelligent agents.
You might also wanna read
EvoTrainer: A Framework for Co-Evolving LLM Policies and Training Harnesses in Autonomous Agentic Reinforcement Learning
The article introduces EvoTrainer, an autonomous training framework for LLMs that goes beyond traditional recipe search by co-evolving both
Latent Agents: Distilling Multi-Agent Debate into Single LLMs via Post-Training Internalization
This paper introduces "Latent Agents," a post-training framework that distills multi-agent debate into a single LLM through a two-stage fine
Latent Agents: Distilling Multi-Agent Debate into Single LLMs via Post-Training Internalization
This paper introduces "Latent Agents," a post-training framework that distills multi-agent debate into a single LLM through a two-stage fine
Using Curriculum Learning and PufferLib to Train Superhuman AI Agents for 2048 and Tetris
The article describes using PufferLib, a reinforcement learning framework, to train gaming agents that achieve superhuman performance in 204
Arbor: A Multi-Agent Framework Using Tree Search for Autonomous LLM Inference Optimization
Arbor is a multi-agent framework that uses structured tree search as a cognition layer for autonomous agents operating in large, stateful ac
Multi-Agent Reinforcement Learning Reduces Drone Racing Collisions by 50% While Achieving Champion-Level Performance
This article presents research demonstrating that multi-agent reinforcement learning (MARL) enables superhuman performance in shared, dynami
JAMEL: A Framework for Joint Memory and Exploration Learning in Language Model Agents
This paper introduces JAMEL (Joint Agent Memory and Exploration Learning), a framework that trains language model agents to explore open-end
Comments
Sign in to join the conversation.
No comments yet. Be the first.
