Think-in-Memory: Recalling and Post-thinking Enable LLMs with Long-Term Memory

Paper · arXiv 2311.08719 · Published November 15, 2023
Memory

Memory-augmented Large Language Models (LLMs) have demonstrated remarkable performance in long-term human-machine interactions, which basically relies on iterative recalling and reasoning of history to generate high-quality responses. However, such repeated recall-reason steps easily produce biased thoughts, i.e., inconsistent reasoning results when recalling the same history for different questions. On the contrary, humans can keep thoughts in the memory and recall them without repeated reasoning. Motivated by this human capability, we propose a novel memory mechanism called TiM (Think-in-Memory) that enables LLMs to maintain an evolved memory for storing historical thoughts along the conversation stream. The TiM framework consists of two crucial stages: (1) before generating a response, a LLM agent recalls relevant thoughts from memory, and (2) after generating a response, the LLM agent post-thinks and incorporates both historical and new thoughts to update the memory. Thus, TiM can eliminate the issue of repeated reasoning by saving the post-thinking thoughts as the history. Besides, we formulate the basic principles to organize the thoughts in memory based on the well-established operations, (i.e., insert, forget, and merge operations), allowing for dynamic updates and evolution of the thoughts.

Unfortunately, this paradigm is prone to encountering several issues and potentially causes a performance bottleneck in real-world applications. The main issues are shown in follows:

• Inconsistent reasoning paths. Prior studies [13, 14] has shown that LLMs easily generate diverse reasoning paths for the same query. As shown in Figure 1 (Left), LLMs give a wrong response due to inconsistent reasoning over the context.

• Unsatisfying retrieval cost. To retrieve relevant history, previous memory mechanisms need to calculate pairwise similarity between the question and each historical conversation, which is time-consuming for long-term dialogue.