ComoRAG: A Cognitive-Inspired Memory-Organized RAG for Stateful Long Narrative Reasoning

Paper · arXiv 2508.10419 · Published August 14, 2025

Narrative comprehension on long stories and novels has been a challenging domain attributed to their intricate plotlines and entangled, often evolving relations among characters and entities. Given the LLM’s diminished reasoning over extended context and high computational cost, retrieval-based approaches remain a pivotal role in practice. However, traditional RAG methods can fall short due to their stateless, single-step retrieval process, which often overlooks the dynamic nature of capturing interconnected relations within long-range context. In this work, we propose ComoRAG, holding the principle that narrative reasoning is not a one-shot process, but a dynamic, evolving interplay between new evidence acquisition and past knowledge consolidation, analogous to human cognition when reasoning with memory-related signals in the brain. Specifically, when encountering a reasoning impasse, ComoRAG undergoes iterative reasoning cycles while interacting with a dynamic memory workspace. In each cycle, it generates probing queries to devise new exploratory paths, then integrates the retrieved evidence of new aspects into a global memory pool, thereby supporting the emergence of a coherent context for the query resolution. Across four challenging long-context narrative benchmarks (200K+ tokens), ComoRAG outperforms strong RAG baselines with consistent relative gains up to 11% compared to the strongest baseline. Further analysis reveals that ComoRAG is particularly advantageous for complex queries requiring global comprehension, offering a principled, cognitively motivated paradigm for retrieval-based long context comprehension towards stateful reasoning.

As a remedy, multi-step retrieval methods offer a more promising direction, such as IRCoT (Trivedi et al. 2023), which interleaves the retrieval process with Chain-of- Thought reasoning (Wei et al. 2022); Self-RAG (Asai et al. 2024), which trains a model to adaptively retrieve and reflect on evidence; and MemoRAG (Qian et al. 2025), which uses a dual-system architecture to generate clues from compressed global context. These methods all target to obtain richer context through iterative retrieval. However, their retrieval steps are typically independent, which lack coherent reasoning throughout explicit narrative progression, featuring fragmented evidence with a stateless comprehension. As illustrated in Figure 1(b), due to a lack of dynamic memory, multi-step retrieval fails to integrate contradictory evidence such as “Snape protects/bullies Harry” and cannot understand the evolution of his actions, ultimately unable to yield the correct answer.

In this work, we seek inspiration from the function of Prefrontal Cortex (PFC) in human brains, which employs a sophisticated reasoning process called Metacognitive Regulation (Fernandez-Duque, Baird, and Posner 2000). This process is not a single action but a dynamic interplay between new evidence acquisition, driven by goal-directed memory probes (Dobbins and Han 2006; Miller and Constantinidis 2024), and subsequent knowledge consolidation. During consolidation, new findings are integrated with past information to construct an evolving, coherent narrative. This iterative cycle allows the PFC to continuously assess its understanding and revise its strategy, providing a direct cognitive blueprint for our framework’s stateful reasoning approach.

We introduce ComoRAG, a cognitive-inspired, memory-organized RAG framework, imitating the human Prefrontal Cortex (PFC) for achieving true stateful reasoning. At its core is a dynamic cognitive loop operating on a memory workspace, which actively probes and integrates new evidence to build a coherent narrative comprehension.

This process, as illustrated in Figure 1(c), is a closed loop of evolving reasoning states. Faced with a complex query like “Why did Snape kill Dumbledore?”, the system’s memory state evolves from an initial “causally incomplete event” (Snape kills Albus), to an “apparent contradiction” upon finding contradictory information (Snape protects Harry), and ultimately to a logically consistent coherent context through deeper exploration and evidence fusion. Only in this final, complete cognitive state can ComoRAG perform the correct stateful reasoning, deriving the profound insight that it was “an act of loyalty, not betrayal”.

2.2 The Hierarchical Knowledge Source

To overcome the limitations of a monolithic representation of the given context, our framework first builds a hierarchical knowledge index X for retrieval that models the raw text from three complementary cognitive dimensions, analogous to how the PFC integrates different memory types from various brain regions, particularly supporting cross-layer reasoning from raw evidence to abstract relationships.

Veridical Layer: Grounding in Factual Evidence. To ensure all reasoning is traceable to source evidence, a veridical layer Xver is firstly established, constituted by raw text chunks directly, analogous to the precise recall of factual details in human memory. For more accurate retrieval on text chunks, we instruct a LLM to generate knowledge triples (subject-predicate-object) for each text chunk. These triples participate in each retrieval, and strengthen the matching between an incoming query and the corresponding text chunk, which is proven effective by HippoRAG (Jimenez Gutierrez et al. 2024). Further details are described in Appendix B.

Semantic Layer: Abstracting Thematic Structure. To capture thematic and conceptual connections that transcend across long-range contextual dependencies, a semantic layer Xsem is built, inspired by the prior work RAPTOR that employs a GMM-driven clustering algorithm to recursively summarize semantically similar text chunks into a hierarchical summary tree. We reckon such semantic abstraction is necessary for deeper comprehension and follow the same formulism. These summary nodes enable the framework to retrieve conceptual information beyond the surface level.

Episodic Layer: Reconstructing Narrative Flow. The previous two layers equip views of both factual details and high-level concepts. However, they lack temporal development or plot progression that can be especially crucial for narratives. To enable such view with long-range causal chains, we introduce the episodic layer, Xepi, which aims to reconstruct the plotline and story arc by capturing the sequential narrative development. The process features a sliding window summarization across text chunks; each resulting node is then a summary that aggregates the narrative development of continuous or causally related events according to the timeline. Optionally, the sliding window process can be applied recursively to form higher-level views of content progression, extracting different levels of narrative flow as part of the knowledge source.

The core of ComoRAG is a control loop that fully realizes the concept of metacognitive regulation. It is composed of a Regulatory Process for reflection and planning at each step, and a Metacognitive Process for executing reasoning and memory management with the MemoryWorkspace.