LifeSkill: A Reinforcement Learning Framework for Online Lifelong Learning in LLM Agents

[Submitted on 3 Jun 2026 (v1), last revised 19 Jun 2026 (this version, v2)]

3h ago· 2 min readenInsight

technology science artificial intelligence machine learning research

Summary

This paper introduces LifeSkill, a two-stage reinforcement learning framework for online lifelong learning in Large Language Model (LLM) agents. It addresses the limitation of existing lifelong learning agents that rely on static parameters and discrete skill retrieval during inference, preventing them from continuously learning from test-time feedback. The framework includes Verifier-Guided Skill Learning (which rewards candidate skills based on verifier success of skill-conditioned rollouts) and Online Skill Internalization (which improves the policy model during test-time by converting skill-conditioned trajectories into reward signals). Experiments on LifelongAgentBench show a 7-point average performance improvement over existing baselines.

Source

bskyLifeSkill: A Reinforcement Learning Framework for Online Lifelong Learning in LLM Agentsarxiv.org

Key quotes

· 5 pulled

Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments.

Existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from continuously internalizing test-time feedback like human learners.

We propose Skill-enhanced Test-Time Co-Evolution (LifeSkill), a two-stage reinforcement learning framework for Online Lifelong Learning Agents.

We design Verifier-Guided Skill Learning that addresses the lack of direct supervision for skill extraction by rewarding candidate skills according to the average verifier success of multiple skill-conditioned policy rollouts.

Experiments on LifelongAgentBench show that LifeSkill improves average performance by 7 absolute points by comparing with existing lifelong agent baselines.

Snippet from the RSS feed

Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with st

You might also wanna read

Skill-MAS: A Meta-Skill Approach to Improving Multi-Agent Systems Without Retraining

Skill-MAS proposes a novel approach to LLM-based automatic Multi-Agent Systems (MAS) generation that bridges the gap between inference-time

arxiv.org·3d ago

Skill-MAS: A Meta-Skill Approach to Improving Multi-Agent Systems Without Retraining

Skill-MAS proposes a novel approach to LLM-based automatic Multi-Agent Systems (MAS) generation that bridges the gap between inference-time

arxiv.org·3d ago

AgentGym-RL: A Reinforcement Learning Framework for Training LLM Agents in Multi-Turn Decision Making

This paper introduces AgentGym-RL, a unified reinforcement learning framework for training LLM agents to perform multi-turn interactive deci

arxiv.org·3d ago

AgentGym-RL: A Reinforcement Learning Framework for Training LLM Agents in Multi-Turn Decision Making

This paper introduces AgentGym-RL, a unified reinforcement learning framework for training LLM agents to perform multi-turn interactive deci

arxiv.org·3d ago

Using Curriculum Learning and PufferLib to Train Superhuman AI Agents for 2048 and Tetris

The article describes using PufferLib, a reinforcement learning framework, to train gaming agents that achieve superhuman performance in 204

kywch.github.io·5mo ago

Qwen-AgentWorld: Language World Models for Simulating Agentic Environments Across 7 Domains

This paper introduces Qwen-AgentWorld, a family of language world models (35B-A3B and 397B-A17B) designed to simulate agentic environments a

arxiv.org·11h ago

Survey of Self-Evolving AI Agents: Bridging Foundation Models and Lifelong Adaptability

The article surveys the emerging field of self-evolving AI agents, which aim to bridge the static capabilities of foundation models with the

arxiv.org·10mo ago

R-Zero: A Self-Evolving LLM Framework That Generates Its Own Training Data Without Human Input

R-Zero is a fully autonomous framework for training self-evolving Large Language Models (LLMs) that generates its own training data from scr

arxiv.org·9mo ago

Comments

No comments yet. Be the first.