LifeSkill: A Reinforcement Learning Framework for Online Lifelong Learning in LLM Agents
By
[Submitted on 3 Jun 2026 (v1), last revised 19 Jun 2026 (this version, v2)]
Summary
This paper introduces LifeSkill, a two-stage reinforcement learning framework for online lifelong learning in Large Language Model (LLM) agents. It addresses the limitation of existing lifelong learning agents that rely on static parameters and discrete skill retrieval during inference, preventing them from continuously learning from test-time feedback. The framework includes Verifier-Guided Skill Learning (which rewards candidate skills based on verifier success of skill-conditioned rollouts) and Online Skill Internalization (which improves the policy model during test-time by converting skill-conditioned trajectories into reward signals). Experiments on LifelongAgentBench show a 7-point average performance improvement over existing baselines.
Source
Key quotes
· 5 pulledLifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments.
Existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from continuously internalizing test-time feedback like human learners.
We propose Skill-enhanced Test-Time Co-Evolution (LifeSkill), a two-stage reinforcement learning framework for Online Lifelong Learning Agents.
We design Verifier-Guided Skill Learning that addresses the lack of direct supervision for skill extraction by rewarding candidate skills according to the average verifier success of multiple skill-conditioned policy rollouts.
Experiments on LifelongAgentBench show that LifeSkill improves average performance by 7 absolute points by comparing with existing lifelong agent baselines.
You might also wanna read
Skill-MAS: A Meta-Skill Approach to Improving Multi-Agent Systems Without Retraining
Skill-MAS proposes a novel approach to LLM-based automatic Multi-Agent Systems (MAS) generation that bridges the gap between inference-time
Skill-MAS: A Meta-Skill Approach to Improving Multi-Agent Systems Without Retraining
Skill-MAS proposes a novel approach to LLM-based automatic Multi-Agent Systems (MAS) generation that bridges the gap between inference-time
AgentGym-RL: A Reinforcement Learning Framework for Training LLM Agents in Multi-Turn Decision Making
This paper introduces AgentGym-RL, a unified reinforcement learning framework for training LLM agents to perform multi-turn interactive deci
AgentGym-RL: A Reinforcement Learning Framework for Training LLM Agents in Multi-Turn Decision Making
This paper introduces AgentGym-RL, a unified reinforcement learning framework for training LLM agents to perform multi-turn interactive deci
Using Curriculum Learning and PufferLib to Train Superhuman AI Agents for 2048 and Tetris
The article describes using PufferLib, a reinforcement learning framework, to train gaming agents that achieve superhuman performance in 204
Qwen-AgentWorld: Language World Models for Simulating Agentic Environments Across 7 Domains
This paper introduces Qwen-AgentWorld, a family of language world models (35B-A3B and 397B-A17B) designed to simulate agentic environments a
Survey of Self-Evolving AI Agents: Bridging Foundation Models and Lifelong Adaptability
The article surveys the emerging field of self-evolving AI agents, which aim to bridge the static capabilities of foundation models with the
R-Zero: A Self-Evolving LLM Framework That Generates Its Own Training Data Without Human Input
R-Zero is a fully autonomous framework for training self-evolving Large Language Models (LLMs) that generates its own training data from scr
Comments
Sign in to join the conversation.
No comments yet. Be the first.
