Making Sense of Memory in AI Agents

Paper · Source
MemoryAssistants Personalization

However, there’s also another approach to categorizing memory types for AI agents from a design pattern perspective. Sarah Wooders from Letta argues that an LLM is a tokens-in-tokens-out function, not a brain, and that, therefore, the overly anthropomorphized analogies are not fit. If you look at how Letta defines the types of agent memory, you will see that they define it differently:

Message Buffer (Recent messages) stores the most recent messages from the current conversation.

Core Memory (In-Context Memory Blocks) is specific information that the agent itself manages (e.g., the user’s birthday or the boyfriend’s name if this is relevant to the current conversation)

Recall Memory (Conversational History) is the raw conversation history.

Archival Memory (Explicitly Stored Knowledge) is explicitly formulated information stored in an external database. The difference lies in how they design in-context and out-of-context memory. For example, CoALA’s working memory is one category, while Letta splits this into message buff er and core memory. The long-term memory from the CoALA paper can be thought of as the out-of-context memory in Letta. However, the long-term memory types of procedural, episodic, and semantic aren’t directly mappable to Letta’s recall and archival memory. You can think of CoALA’s semantic memory as Letta’s archival memory, but the other are different from each other. Notably, the CoALA taxonomy doesn’t include the raw conversation history in long-term memory.

Memory Type What is stored Human Example Agent Example
Working memory Contents of the contextwindow Current conversation(e.g., “Hi, my name isSam.”) Current conversation(e.g., “Hi, my name isSam.”)
Semantic memory Facts Things I learned inschool (e.g., “Waterfreezes at 0°C”) Facts about a user (e.g.,“Dog’s name is Henry”)
Episodic memory Experiences Things I did (e.g., “Wentto Six Flags on 10thbirthday”) Past actions (e.g.,“Failed to calculate 1+1without

using acalculator”)
Procedural memory Instructions Instincts or motor skills(e.g., “How to ride abike”) Instructions in thesystem prompt (e.g.,“Always ask

follow-upquestions beforeanswering a question.”)

AI Agent Memory Management

Memory management in AI agents refers to how to manage information within the LLM’s context window and in external storage, as well as how to transfer information between them. Richmond Alake lists the following core components of agent memory management: generation, storage, retrieval, integration, updating, and deletion (forgetting).

Managing memory in the context window The goal of managing memory in the context window is to ensure that only relevant information is retained, thereby avoiding confusion for the LLM with incorrect, irrelevant, or contradictory information. Additionally, as the conversation progresses, the conversation history grows (involving more tokens) and leads to slower responses and higher costs, potentially reaching the context window’s limit.

To mitigate this problem, you can maintain the conversation history in different ways. For example, you can manually remove old and obsolete information from the context window. Alternatively, you can periodically summarize the previous conversation and retain only the summary, then delete the old messages

Explicit memory (hot path) describes the agent memory system’s ability to autonomously recognize important information and decide to explicitly remember it (via tool calling). Explicit memory in humans is the conscious storage of information (e.g., episodic and semantic memory). While ideally, remembering important information in the hot path is how humans remember information, it can be challenging to implement a robust solution that understands which information is important to remember.

Implicit memory (background) describes when memory management is programmatically defined in the system at specific times during or after a conversation. Implicit memory in humans is the unconscious storage of information (e.g., procedural memory). The Google whitepaper on session and memory describes the following three scenarios:

After a session: You can batch process the entire conversation after a session.

In periodic intervals: If your use case has long-running conversations, you can defi ne an interval at which session data is transferred to long-term memory.

After every turn: If your use case has requirements for real-time updates. However, keep in mind that the raw conversation history is typically appended and stored in the context window for a short period(“short-term memory”).