Can compressive memory track what matters most across 35 conversation sessions?

This explores whether a single compressing memory — one model that keeps rewriting a summary of the conversation instead of looking things up — can actually hold onto what matters over many sessions, and where that approach breaks down.

This explores whether 'compressive' memory — collapsing a long conversation history into one continuously rewritten summary rather than retrieving past turns on demand — can keep what matters across dozens of sessions. The corpus has a direct answer and a warning attached to it. COMEDY folds memory generation, compression, and response into a single operation, tracking event recaps, user portraits, and relationship dynamics with no vector database in the loop Can a single model replace retrieval for long-term conversation memory?. The appeal is obvious: no retrieval bottleneck, no guessing which old turn is relevant. But the same note flags a fragile consolidation pattern — continuous reprocessing follows an inverted-U curve and can actually drop *below* a no-memory baseline once misgrouping, lost context, and overfitting accumulate. So the honest answer to 'across 35 sessions' is: it can, until it can't, and the failure is gradual rather than obvious.

The more interesting discovery is that several notes converge on *why* compression decays — and they point to selection as the missing ingredient. Including everything turns out to hurt: selective history retrieval beats full-context inclusion because topic switches inject irrelevant information, and jointly learning what to select beats both full context and human annotation Does including all conversation history actually help retrieval?. Compressive memory is, in a sense, full-context inclusion smeared into a summary — which is exactly why 'what matters most' is the hard part. A summary that compresses indiscriminately carries forward the same noise that selective retrieval was designed to drop.

There's a sharper reframe hiding in the corpus: the bottleneck may not be memory at all, but *compute*. One note argues the long-context problem is really the cost of consolidating evicted context into the model's fast weights during offline 'sleep' phases, and that performance keeps improving with more consolidation passes — a test-time scaling pattern Is long-context bottleneck really about memory or compute?. Read alongside COMEDY's inverted-U, this suggests the decay across sessions isn't because the summary is too small, but because each cheap, in-line rewrite under-processes what it absorbs. Tracking what matters across 35 sessions might be less about a bigger memory and more about spending more thinking on each consolidation.

What to actually *store* is its own question, and here the corpus pushes against raw compression. The PRIME work finds that semantic memory — abstracted preference summaries — consistently beats episodic recall of specific past interactions, and notably that recency-based recall beats similarity-based retrieval Does abstract preference knowledge outperform specific interaction recall?. That's an argument *for* compression done right: distill preferences, don't hoard transcripts. The recommender-systems angle adds the structural piece compression tends to flatten — users have at least three distinct preference channels (current session, historical dialogue, look-alike users), and collapsing them loses signal that traditional systems proved valuable Can conversational recommenders recover lost preference signals from history?.

The thing you might not have known you wanted to know: 'what matters most' in a conversation isn't only informational. One note argues conversation maintenance — reference repair, topic hand-off, the relational glue — is social action that models never learn because training rewards information prediction, not relational work Why don't language models develop conversation maintenance skills?. A compressive memory optimized to summarize *content* across 35 sessions can faithfully track facts and still drop the thread of the relationship — which, over that many sessions, is often the thing a user most expects to be remembered.

Sources 6 notes

Can a single model replace retrieval for long-term conversation memory?

COMEDY merges memory generation, compression, and response into one operation, tracking event recaps, user portraits, and relationship dynamics without vector-DB retrieval. However, empirical work shows continuous reprocessing follows an inverted-U curve, degrading below no-memory baseline due to misgrouping, context loss, and overfitting.

Does including all conversation history actually help retrieval?

Research shows that automatically selecting relevant previous turns improves retrieval effectiveness more than including all context. Topic switches inject irrelevant information; joint optimization of selection and retrieval beats both full-context baselines and human annotation.

Is long-context bottleneck really about memory or compute?

Research shows the bottleneck is not memory capacity but the compute required to consolidate evicted context into fast weights during offline sleep phases. Performance improves with more consolidation passes, following a test-time scaling pattern on harder reasoning tasks.

Does abstract preference knowledge outperform specific interaction recall?

PRIME framework shows semantic memory (preference summaries, parametric encodings) consistently beats episodic memory (retrieved past interactions) across models. Recency-based recall outperforms similarity-based retrieval, and task fine-tuning exceeds preference tuning methods.

Can conversational recommenders recover lost preference signals from history?

Current CRS systems only use the active dialogue session to infer preferences, losing item-CF and user-CF signals proven valuable in traditional recommenders. Integrating current session, historical dialogues, and look-alike users—conditioned on current intent—recovers essential user representation structure.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Can compressive memory track what matters most across 35 conversation sessions?

Sources 6 notes

Next inquiring lines