Conversational AI Systems

How do time gaps shape what people discuss across conversation sessions?

Do AI systems account for how elapsed time between conversations changes the way people reference and discuss past events? Current models mostly handle single sessions, but real interactions span days, weeks, and months.

Note · 2026-02-22 · sourced from Conversation Architecture Structure
Why do AI conversations reliably break down after multiple turns? How should researchers navigate LLM reasoning research?

Most chatbot research focuses on single-session dialogue — generating responses based only on the current conversation. But real-world interactions are multi-session: people return to AI systems across days, weeks, and months. Two elements shape these cross-session dynamics that current models ignore:

Time intervals between sessions influence how past events are discussed. A conversation about yesterday's meeting differs from a conversation about a meeting three months ago — in specificity, emotional tone, and relevance. Depending on the time elapsed, responses about past events vary significantly. Previous multi-session datasets had relatively short time ranges, limiting the types of transitions they could capture.

Speaker relationships evolve across sessions. The degree of formality, assumed shared knowledge, and topical expectations shift as interactions accumulate. Fine-grained relationship modeling (not just "stranger" vs "friend" but the specific history of this relationship) is required.

The Conversation Chronicles dataset (1M dialogues) addresses both gaps, using LLM generation with human evaluation to ensure coherent and consistent interactions across sessions. The REBOT model introduces chronological summarization — processing past session context through a temporal lens before dialogue generation, using ~630M parameters.

This connects to the broader context management challenge. Since Why do language models fail in gradually revealed conversations?, the multi-session case is even harder: the model must track not just within-session context but cross-session continuity, temporal distance, and relationship evolution.

The finding that current models have "limited ability that only understands short-term dialogue context" points to a structural gap, not a parameter gap. Adding more parameters or longer context windows does not by itself create sensitivity to temporal dynamics or relationship evolution.

COMEDY's compressive memory as implementation (2402.11975): The COMEDY framework directly addresses these temporal dynamics through compressive memory that tracks three dimensions across sessions: (1) concise event recaps forming a historical narrative, (2) detailed user portraits derived from conversational events, and (3) dynamic relationship changes between user and chatbot. This three-dimensional compression mirrors the temporal dynamics problem: event recaps capture what happened when, user portraits capture evolving preferences, and relationship dynamics capture the interpersonal evolution. By reprocessing and condensing ALL past memories rather than retrieving from a bank, COMEDY inherently prioritizes salient information — a structural advantage over retrieval-based approaches for multi-session continuity.

LOCOMO benchmark (2402.17753): The LOCOMO dataset provides the evaluation infrastructure for very long-term conversations: 300 turns and 9K tokens on average, over up to 35 sessions, grounded on personas and temporal event graphs. This extends the Conversation Chronicles dataset by adding image sharing/reaction capabilities and human verification for long-range consistency.


Source: Conversation Architecture Structure; enriched from Memory

Related concepts in this collection

Concept map
12 direct connections · 103 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

time intervals between conversation sessions create dynamics that single-session models miss — responses about past events vary based on elapsed time and speaker relationships