How should AI systems model relationship evolution within a specific ongoing conversation history?
This explores how an AI could track a relationship as it changes over the course of a single, continuing conversation history — not generic personalization, but the moving target of who these two parties are becoming to each other across turns and sessions.
This explores how an AI could model a relationship as it *evolves* inside one ongoing conversation history — the running state of mutual understanding, not a fixed user profile. The corpus suggests the first thing to model isn't content at all, but belief: Can dialogue systems track both speakers' beliefs across turns? offers an information-theoretic frame where both speakers' beliefs are tracked bidirectionally across turns, capturing the progression from partial to shared understanding that plain token-prediction LLMs never represent. That's the missing scaffold — a relationship is two parties' models of each other updating, and most systems don't track either side.
But several notes warn that relationship is built from implicit relational *work*, not information. Why don't language models develop conversation maintenance skills? argues that the glue — reference repair, topic hand-off — is social action that training-for-prediction never rewards, so models never develop it. Why don't conversational AI systems mirror their users' word choices? makes the same point concretely: humans converge on each other's word choices to build rapport, and current systems don't mirror users at all (though DPO on coreference-identified preferences can teach in-context convention formation). So 'modeling relationship evolution' partly means modeling these accumulating micro-conventions, not just facts exchanged.
A second axis is the *temporal* one. How do time gaps shape what people discuss across conversation sessions? shows that elapsed time between sessions reshapes how past events get discussed — specificity, emotional tone, relevance all shift — and that speaker relationships evolve in ways single-session models can't capture (the Conversation Chronicles dataset and chronological summarization are the doorway here). Can tracking dialogue dimensions simultaneously reveal hidden conversation patterns? proposes tracking several dimensions at once as parallel temporal streams — emotional trajectory, topic coherence, relevance — treating the dialogue as a living system rather than a flat transcript. And Can personas evolve in real time to match what users actually want? gives a concrete mechanism: a persona that updates at test time by simulating recent interactions against feedback, sitting between memory and action — arguably the closest thing in the corpus to an evolving relationship-state object.
Here's the part you might not have known to want: the corpus contains a sharp skeptical counter-current arguing the whole premise may be structurally unavailable to LLMs. Does an LLM have anything that persists between conversations? points out that human relationships persist because a continuous biological self carries interaction effects through dormancy, whereas an LLM instance is reconstituted from stored text each time — making a 'resumed' conversation structurally identical to a brand-new one. Do chatbot relationships lose their appeal as novelty wears off? adds that the social pull people feel actually *decays* predictably as novelty wears off, so any model of relationship evolution must account for decline, not just deepening. And the trust literature (How do people build trust with conversational AI?, How do people build trust with conversational AI?) shows the relationship runs through interaction dynamics rather than a credible 'speaker' — with sycophancy measurably eroding the conflict-repair that real relationships depend on, even as users prefer it.
The synthesis: model relationship evolution as a *bidirectional, decaying, multi-dimensional belief state* that updates per turn (CRSA-style belief tracking + Conversational-DNA-style temporal streams + an evolving persona intermediary), while explicitly representing the relational work — entrainment, repair, proactive clarification (When should AI agents ask users instead of just searching?, Could proactive dialogue make conversations dramatically more efficient?) — that actually constitutes the bond. But build it knowing the corpus's hardest claim: without a persistent host, the system may be performing relationship continuity from text rather than truly carrying it.
Sources 12 notes
CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.
Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.
Multi-session conversations reveal that elapsed time significantly alters specificity, emotional tone, and relevance when discussing past events, and speaker relationships evolve in ways single-session models cannot capture. The Conversation Chronicles dataset (1M dialogues) and REBOT model demonstrate this through chronological summarization.
Conversational DNA encodes four simultaneous dimensions—linguistic complexity, emotional trajectories, topic coherence, and conversational relevance—as temporal streams. The reverse Turing test finding showed expert assessments of AI diverged sharply, suggesting conversational structure shapes interpretation as much as content.
PersonaAgent uses structured personas to bridge episodic/semantic memory and personalized actions, optimizing them at test time by simulating recent interactions against textual feedback. Learned personas cluster meaningfully in latent space, suggesting genuine user-specific separation beyond standard post-training drift.
While humans have a continuous biological-phenomenological substrate that preserves interaction effects during dormancy, LLMs have no analogous carrier. The virtual instance is reconstituted from stored text each time, making resumed and new conversations structurally identical.
Longitudinal studies with Mitsuku show that social processes driving relationship formation decline as novelty wears off. Single-session study findings cannot be reliably extrapolated to medium- or long-term chatbot design.
Users extend social norms to chatbots and reciprocate self-disclosure, but AI claims cannot anchor trust the way human personas do. The absence of human judgment enables both deeper vulnerability and easier dishonesty—the same mechanism serves both.
Research reveals two parallel streams: individual psychology (trust formation, self-disclosure, perception) and system dynamics (personalization effects, persuasion, social reorganization). Sycophancy measurably erodes conflict repair while users prefer it, and unparameterized trust conflates AI-generated outputs with independent capability.
Tool-enabled LLMs drift from user intent through silent tool chaining. Conversation analysis reveals insert-expansions—clarifying intent, scoping responses, enhancing appeal—as a formal framework for proactive user consultation that prevents misunderstanding instead of recovering from it.
Simulations show proactivity—providing relevant information without being asked—cuts dialogue turns by 60% in medium-complexity domains. This behavior mirrors human conversation and Grice's maxims but is almost entirely absent from AI datasets and research benchmarks.