A Survey of Continual Reinforcement Learning

Paper · arXiv 2506.21872 · Published June 27, 2025

To address these challenges, researchers have been exploring methods to enable RL agents to avoid catastrophic forgetting and effectively transfer knowledge, with the ultimate goal of steering the field toward more human-like intelligence. Humans excel at leveraging prior knowledge to solve new tasks without significantly forgetting previously learned skills [15]. Inspired by this capability, the field of Continual Learning (CL), also referred to as lifelong learning or incremental learning, aims to develop learning systems that can adapt to new tasks while retaining knowledge from previous ones [16]–[19]. The central challenge in CL lies in achieving a balance between stability and plasticity—maintaining the stability of previously learned knowledge while allowing sufficient flexibility to adapt to new tasks. The overarching goal is to build intelligent systems that are capable of learning and adapting throughout their lifetimes, rather than starting anew for each task.

Current research in CL primarily focuses on two key aspects: addressing catastrophic forgetting and enabling knowledge transfer. Catastrophic forgetting refers to the phenomenon where learning new tasks causes the model to overwrite and lose knowledge of previously learned tasks. Knowledge transfer, on the other hand, involves leveraging accumulated knowledge from past tasks to improve the learning efficiency and performance on new or even previously seen tasks. Successfully addressing both aspects is critical for developing robust, continual learning systems.

Continual Reinforcement Learning (CRL, a.k.a. Lifelong Reinforcement Learning, LRL) emerges from the intersection of RL and CL, aspiring to address the numerous limitations present in current RL algorithms to achieve agents that can continuously learn and adapt to a series of complex tasks [20], [21]. Fig. 1 illustrates the setting of CRL. Unlike traditional DRL, which focuses primarily on optimizing performance for a single task, CRL emphasizes maintaining and enhancing generalization capabilities across a sequence of tasks. This shift in focus is crucial for deploying RL agents in dynamic, non-stationary environments.

It is worth noting that the terms “lifelong” and “continual” are often used interchangeably in the RL literature, but their usage can vary significantly across studies, potentially leading to confusion [22]. In general, most LRL research emphasizes rapid adaptation to new tasks, while CRL research prioritizes avoiding catastrophic forgetting. In this survey, we unify the two terms under the umbrella of CRL, reflecting the broader trend in CL research to address both aspects simultaneously. A CRL agent is expected to achieve two key objectives: 1) minimizing the forgetting of knowledge from previously learned tasks and 2) leveraging prior experiences to learn new tasks more efficiently. By fulfilling these objectives, CRL holds the promise of addressing the current limitations of DRL, paving the way for RL techniques to be applied in broader and more complex domains. Ultimately, CRL aspires to achieve human-like lifelong learning capabilities, making it a compelling direction for advancing the field of RL.

In this section, we present our taxonomy of CRL methods. Khetarpal et al. [21] proposed a taxonomy for CRL, classifying approaches into three categories: explicit knowledge retention, leveraging shared structures, and learning to learn. While this taxonomy provides valuable insights, it does not adequately provide the unique characteristics of CRL, and it falls short of encompassing the breadth of recent advancements in the field. To address these limitations, we propose a new taxonomy that focuses on the unique aspects of CRL, distinguishing it from traditional CL methods. Our taxonomy is grounded in the key components of RL and organizes CRL methods based on the type of knowledge they store and transfer. In addition, we provide the most updated and comprehensive review of CRL methods, including the latest advancements in the field. Fig. 4 presents a timeline with the representative methods in CRL, allowing one to evaluate the novelty and popularity of each class of methods.

A. Taxonomy Methodology

Fig. 5 illustrates the general structure of CRL methods. In this framework, an agent’s knowledge can be broadly categorized into four main types: policy, experience, dynamics, and reward. While other elements in RL, such as action space and state space, can also be considered forms of knowledge, they are often overlooked in existing CRL methods. Therefore, our taxonomy primarily focuses on four categories, which are central to the design and implementation of CRL systems. To systematically organize CRL methods, we address the following key question: “What knowledge is stored and/or transferred?” Based on this guiding question, we classify CRL methods into four main categories: policy-focused, experience-focused, dynamic-focused, and reward-focused. We further divide some categories into sub-categories based on how the knowledge is utilized. It is important to note that this taxonomy is not exhaustive, and many methods may span multiple categories. To facilitate a comprehensive overview of the development of CRL methods, we list representative approaches in Table IV, organized chronologically.