Can one model compress all conversation memory and eliminate retrieval?

Instead of storing and retrieving discrete memories, can a single LLM compress all past conversations into event recaps, user portraits, and relationship dynamics? This explores whether compression-based memory avoids the bottleneck of traditional retrieval systems.

Note · 2026-02-23 · sourced from Memory

The standard pipeline for long-term conversational memory is: (1) generate memories from past sessions, (2) store in a memory bank, (3) retrieve relevant memories via embedding similarity, (4) generate response using retrieved memories. COMEDY (Compressive Memory-Enhanced Dialogue Systems) collapses this into a single model that handles all four steps.

The departure is architectural: instead of storing discrete memory items and retrieving the most relevant ones, COMEDY reprocesses and condenses ALL past memories into a compressive representation with three dimensions:

Event recaps — concise summaries of what happened across all conversations, creating a historical narrative
User portraits — detailed user profile derived from conversational events
Relationship dynamics — how the user-chatbot relationship changes across sessions

This compressive memory inherently prioritizes salient information — unlike retrieval systems that must correctly rank relevance against a potentially vast database. The memory is always "up to date" because it is regenerated through compression, not queried from a static store.

Since Can long-context models resolve retriever-reader imbalance?, COMEDY takes this further: it eliminates the retriever entirely. The imbalance is resolved not by rebalancing, but by merging retrieval and generation into a single operation. The trade-off: compression necessarily loses some information, and there is no way to go back to the raw conversation for details that were compressed away.

The relationship dynamics dimension is particularly notable. Most memory systems track facts about the user (semantic memory) or events that occurred (episodic memory). Tracking how the relationship between user and agent evolves across sessions — increasing trust, shifting topic preferences, developing shared references — is a distinct memory type that neither retrieval nor summarization naturally captures.

Source: Memory

Related concepts in this collection

Can long-context models resolve retriever-reader imbalance? Traditional RAG systems force retrievers to find precise passages because readers had small context windows. Do modern long-context LLMs change what architecture makes sense?
COMEDY goes further: eliminates the retriever entirely rather than rebalancing
How should chatbot design vary by relationship duration? Do chatbots serving one-time users need different design than those supporting long-term relationships? This matters because applying the same design to all temporal profiles creates usability mismatches.
COMEDY's relationship dynamics dimension directly serves the persistent companion archetype
Do chatbot relationships lose their appeal as novelty wears off? Explores whether the positive social dynamics observed in one-time chatbot studies persist or fade through repeated interactions. Critical for designing systems intended for sustained engagement over weeks or months.
compressive memory tracking relationship dynamics could detect and respond to novelty decay
Does chatbot personalization build trust or expose privacy risks? Explores whether personalization features that increase user trust and social connection simultaneously heighten privacy concerns and create rising behavioral expectations over time.
storing user portraits and relationship dynamics raises the dual-dynamic stakes

Concept map

14 direct connections · 114 in 2-hop network ·medium cluster

Can one model compress all conversation memory a… Can long-context models resolve retriever-reader i… How should chatbot design vary by relationship dur… Do chatbot relationships lose their appeal as nove… Does chatbot personalization build trust or expose…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

compressive memory replaces retrieval with a single model that generates summarizes and responds — eliminating the retrieval bottleneck for long-term conversation