Memorization and Knowledge Injection in Gated LLMs

Paper · arXiv 2504.21239 · Published April 30, 2025

Large Language Models (LLMs) currently struggle to sequentially add new memories and integrate new knowledge. These limitations contrast with the human ability to continuously learn from new experiences and acquire knowledge throughout life. Most existing approaches add memories either through large context windows or external memory buffers (e.g., Retrieval-Augmented Generation), and studies on knowledge injection rarely test scenarios resembling everyday life events. In this work, we introduce a continual learning framework, Memory Embedded in Gated LLMs (MEGa), which injects event memories directly into the weights of LLMs. Each memory is stored in a dedicated set of gated lowrank weights. During inference, a gating mechanism activates relevant memory weights by matching query embeddings to stored memory embeddings. This enables the model to both recall entire memories and answer related questions.

In this paper, we propose to study long-term declarative memory (e.g. episodic memory and semantic memory), one of the hallmarks of human cognition, using LLMs as a model cognitive system. To achieve this, we augment a pretrained LLM with gated memory modules, enabling rapid continual encoding and retrieval of memories while mitigating catastrophic forgetting.

Classical models of long-term memory in neural networks are based on the paradigm of associative memory in recurrent neural networks (RNNs), such as the Hopfield model (Hopfield, 1982), where each memory corresponds to a stable activation pattern of the network. This paradigm was later extended to the memorization of sequences of states (Kleinfeld & Sompolinsky, 1988; Kanter & Sompolinsky, 1987). A common feature of these models is the use of Hebbian-like learning rules

In-context learning (Brown et al., 2020) appears to avoid the limitations of classical associative memory models: it does not suffer from catastrophic forgetting and can learn new memories, even when they are correlated with existing ones, by smoothly integrating them into the LLM’s existing semantic knowledge. However, the demands of long-term memory may exceed the capacity of context windows (Bai et al., 2023). Notably, the context window more closely resembles human working memory. Attempting to unify working and long-term memory into a single representation is a biologically implausible model of memory, as these two functions rely on distinct cognitive resources and brain systems (Fountas et al., 2024).

Another fundamental limitation of classical memory models in “shallow” RNNs is their tendency to encode memories as isolated knowledge items. In contrast, real-world memories are composed of events rich in semantic structure, with elements that are typically already familiar to the organism. As a result, new factual memories must be embedded within or interact closely with an existing, fully developed semantic system.

RAG has proven to be an effective method for enhancing the memory capabilities of LLMs and is widely used in real-world applications (Lewis et al., 2020). In this approach, user queries are processed by an embedding model to retrieve relevant entries—based on a similarity metric—from the database. These entries are then inserted into the LLM’s context window, enabling the model to generate informed responses by combining its inherent capabilities with the retrieved knowledge.

To address these challenges, we introduce MEGa (Memory Embedded in Gated LLMs), a long-term memory framework designed to enable LLMs to sequentially store new memories in a manner that reflects key aspects of human memory. To ensure biological plausibility, MEGa encodes new memories by fine-tuning the network’s weights.

To mitigate catastrophic forgetting, it employs a gating mechanism that, at inference time, routes input queries to a collection of gated memory modules and activates those most relevant to the query.

We show that MEGa is capable not only of retrieving the learned memories but also of performing questionanswering (QA) tasks based on them, demonstrating the successful integration of the memories into the knowledge base of the LLM.

Injecting new knowledge into pretrained LLMs has recently garnered significant attention (Hsueh et al., 2024; Shi et al., 2024; Zhang et al., 2024; Thede et al., 2025). A straightforward approach involves fine-tuning the model on the knowledge text (Ovadia et al., 2023; Gangadhar & Stratos, 2024), or on the answers when the knowledge is provided in the form of QA pairs (Mecklenburg et al., 2024). More recent methods aim to localize weight updates by identifying a knowledge-relevant subspace of the model’s weights (Meng et al., 2022a; Mitchell et al., 2021), or by distilling knowledge from the context window into the model’s parameters (Qi et al., 2024; Padmanabhan et al., 2024; Wang et al.; Kujanp¨a¨a et al., 2024). However, there is evidence that these approaches are not significantly more effective than standard fine-tuning (Gangadhar & Stratos, 2024; Thede et al., 2025).

2.3. Gating Networks Our model, MEGa, uses gating units to route queries to the most relevant stored memories. In general, gating networks function by selectively activating or suppressing connection paths based on the context or input provided to the system. Both empirical studies (Hochreiter & Schmidhuber, 1997; Chung et al., 2014; Sezener et al., 2021; Veness et al., 2021) and theoretical analyses (Saxe et al., 2022; Li & Sompolinsky, 2022) have shown that gated architectures are effective at mitigating catastrophic forgetting and are well-suited for training across multiple tasks. Gating mechanisms are widely used in modern deep neural networks. One prominent example is the Mixture of Experts (MoEs) architecture, a type of gated network that has gained popularity and contributes to some of the state-of-the-art LLMs (Shazeer et al., 2017; Fedus et al., 2021; Jiang et al., 2024).