SkillRL: Evolving Agents via Recursive Skill-Augmented Reinforcement Learning

Paper · arXiv 2602.08234
Reinforcement LearningLLM AgentsLLM MemoryTraining and Fine-Tuning

Large Language Model (LLM) agents have shown stunning results in complex tasks, yet they often operate in isolation, failing to learn from past experiences. Existing memory-based methods primarily store raw trajectories, which are often redundant and noise-heavy. This prevents agents from extracting high-level, reusable behavioral patterns that are essential for generalization. In this paper, we propose SkillRL, a framework that bridges the gap between raw experience and policy improvement through automatic skill discovery and recursive evolution. Our approach introduces an experience-based distillation mechanism to build a hierarchical skill library SkillBank, an adaptive retrieval strategy for general and task-specific heuristics, and a recursive evolution mechanism that allows the skill library to co-evolve with the agent's policy during reinforcement learning.

We argue that these approaches miss a crucial insight: effective experience transfer requires abstraction. Human experts do not memorize every action in every situation; instead, they develop skills, compact and reusable strategies that capture the essence of how to accomplish specific subtasks. Inspired by this observation, we propose SkillRL, a framework that bridges the gap between raw experience and efficient policy improvement through automatic skill discovery and recursive skill evolution. SkillRL first introduces an experience-based skill distillation mechanism, which gathers diverse trajectories from environment rollouts and applies differential processing: successful episodes are preserved as demonstrations, while failed ones are synthesized into concise failure lessons to mitigate context noise. Secondly, we transform these experiences into a hierarchical skill library SkillBank, differentiating between general skills for universal strategic guidance and task-specific skills for task-level heuristics.

Unlike previous memory-augmented RL which treats memory as a static or auxiliary source, recent trends suggest that the key to efficient experience transfer lies in abstraction. Our work builds on this by treating the skill library as a dynamic component that co-evolves with the agent's policy, utilizing RL to refine structured skills through recursive failure analysis. While RL is widely used to align LLMs or improve reasoning via rule-based verifiers, applying it to agentic skills remains challenging due to sparse rewards and long horizons. Memory mechanisms in agents have become a cornerstone of agent design, with early systems utilizing a static RAG paradigm or storing raw trajectories as few-shot examples. However, raw trajectories are token-heavy and contain significant redundancy and noise, which can lead to performance degradation. Current research has moved toward self-improving memory, distilling interactions into higher-level insights or procedural tips.

We introduced SkillRL, a framework for skill-augmented reinforcement learning in LLM agents. By distilling raw trajectories into compact, reusable skills and enabling dynamic skill evolution during training, SkillRL achieves state-of-the-art performance on ALFWorld and WebShop while using substantially less context than memory-based approaches. Our work demonstrates that the abstraction from experience to skill is a powerful principle for building capable, sample-efficient agents.