How do time gaps shape what people discuss across conversation sessions?
Do AI systems account for how elapsed time between conversations changes the way people reference and discuss past events? Current models mostly handle single sessions, but real interactions span days, weeks, and months.
Most chatbot research focuses on single-session dialogue — generating responses based only on the current conversation. But real-world interactions are multi-session: people return to AI systems across days, weeks, and months. Two elements shape these cross-session dynamics that current models ignore:
Time intervals between sessions influence how past events are discussed. A conversation about yesterday's meeting differs from a conversation about a meeting three months ago — in specificity, emotional tone, and relevance. Depending on the time elapsed, responses about past events vary significantly. Previous multi-session datasets had relatively short time ranges, limiting the types of transitions they could capture.
Speaker relationships evolve across sessions. The degree of formality, assumed shared knowledge, and topical expectations shift as interactions accumulate. Fine-grained relationship modeling (not just "stranger" vs "friend" but the specific history of this relationship) is required.
The Conversation Chronicles dataset (1M dialogues) addresses both gaps, using LLM generation with human evaluation to ensure coherent and consistent interactions across sessions. The REBOT model introduces chronological summarization — processing past session context through a temporal lens before dialogue generation, using ~630M parameters.
This connects to the broader context management challenge. Since Why do language models fail in gradually revealed conversations?, the multi-session case is even harder: the model must track not just within-session context but cross-session continuity, temporal distance, and relationship evolution.
The finding that current models have "limited ability that only understands short-term dialogue context" points to a structural gap, not a parameter gap. Adding more parameters or longer context windows does not by itself create sensitivity to temporal dynamics or relationship evolution.
COMEDY's compressive memory as implementation (2402.11975): The COMEDY framework directly addresses these temporal dynamics through compressive memory that tracks three dimensions across sessions: (1) concise event recaps forming a historical narrative, (2) detailed user portraits derived from conversational events, and (3) dynamic relationship changes between user and chatbot. This three-dimensional compression mirrors the temporal dynamics problem: event recaps capture what happened when, user portraits capture evolving preferences, and relationship dynamics capture the interpersonal evolution. By reprocessing and condensing ALL past memories rather than retrieving from a bank, COMEDY inherently prioritizes salient information — a structural advantage over retrieval-based approaches for multi-session continuity.
LOCOMO benchmark (2402.17753): The LOCOMO dataset provides the evaluation infrastructure for very long-term conversations: 300 turns and 9K tokens on average, over up to 35 sessions, grounded on personas and temporal event graphs. This extends the Conversation Chronicles dataset by adding image sharing/reaction capabilities and human verification for long-range consistency.
Source: Conversation Architecture Structure; enriched from Memory
Related concepts in this collection
-
Do chatbot relationships lose their appeal as novelty wears off?
Explores whether the positive social dynamics observed in one-time chatbot studies persist or fade through repeated interactions. Critical for designing systems intended for sustained engagement over weeks or months.
temporal dynamics of relationship formation across sessions
-
Why do language models fail in gradually revealed conversations?
Explores why LLMs perform 39% worse when instructions arrive incrementally rather than upfront, and whether they can recover from early mistakes in multi-turn dialogue.
multi-session amplifies the multi-turn problem
-
How should chatbot design vary by relationship duration?
Do chatbots serving one-time users need different design than those supporting long-term relationships? This matters because applying the same design to all temporal profiles creates usability mismatches.
the three temporal archetypes create different demands on cross-session dynamics: ad-hoc supporters have no temporal continuity, temporary assistants need medium-term consistency, and persistent companions require the full temporal modeling this note describes
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
time intervals between conversation sessions create dynamics that single-session models miss — responses about past events vary based on elapsed time and speaker relationships