Can the same conversation coherently continue across different model versions?
This explores whether you can swap the underlying model mid-conversation — say upgrade from one version to the next — and have the dialogue carry on as if nothing changed; the corpus suggests the conversation never lived in the model to begin with, which makes the answer stranger than yes-or-no.
This reads the question as: if you continue a chat on a newer (or different) model version, do you get the same conversation continuing, or a new one wearing its clothes? The corpus reframes the premise. A conversation isn't stored inside a model — there's no carrier that persists. Each turn, the model is handed the whole transcript as text and reconstituted from scratch, which means a resumed conversation and a brand-new one are structurally identical to the machine Does an LLM have anything that persists between conversations?. So swapping versions isn't interrupting an ongoing mind; it's feeding the same script to a different reader.
What that reader produces is the catch. An LLM doesn't commit to one character — it holds a superposition of personas consistent with the text so far and samples one at generation time, which is why regenerating the same prompt yields different outputs that all still fit the prior context Does an LLM commit to a single character or maintain many? Do large language models actually commit to a single character?. A new model version is a different distribution over those personas. It can read the identical history and sample a different voice — coherent with the transcript, but not the same continuation the old version would have given. There was never a fixed self to preserve across the swap.
You might expect a more capable model to at least hold the thread better. It doesn't follow. Persona consistency turns out to be roughly orthogonal to raw capability — one study found a far stronger model improved character adherence by under 3%, because standard training optimizes per-turn quality, not coherence across turns Does model capability translate to better persona consistency?. Worse, the variance a single persona prompt produces across runs can match the variance between entirely different personas Why do LLM persona prompts produce inconsistent outputs across runs?. Upgrading the model can scramble the voice as easily as it sharpens it.
There's also a deeper reason continuity is fragile, independent of version. The model treats the opening prompt as a fixed frame and can't jointly revise the shared assumptions a conversation builds — the user ends up being the sole keeper of what's been established Can LLMs truly update shared conversational common ground?. Alignment training also locks each model into one communicative identity rather than letting it negotiate register through dialogue Can language models adapt communication style to different contexts?. Each version ships its own static identity, so the continuity you feel across a swap is really continuity you supplied through the transcript — and humans normally repair that continuity with implicit conversational maintenance the model never learns Why don't language models develop conversation maintenance skills?.
The quietly useful takeaway: coherence across model versions lives almost entirely in the text you carry over, not in the model. The transcript is the only thread; the model is a replaceable reader of it — which is exactly why a version swap can feel seamless on facts and jarring on voice. If you want continuity to survive an upgrade, the lever is what you preserve and re-present, not the model you preserve it in.
Sources 8 notes
While humans have a continuous biological-phenomenological substrate that preserves interaction effects during dormancy, LLMs have no analogous carrier. The virtual instance is reconstituted from stored text each time, making resumed and new conversations structurally identical.
Research shows LLMs don't commit to a single character but instead maintain a probability distribution over many consistent simulacra. Each response samples from this distribution, explaining why regenerations can yield different personalities while remaining consistent with prior context.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
Claude 3.5 Sonnet achieved only 2.97% improvement over GPT 3.5 on persona consistency despite massive capability gaps, suggesting persona adherence is orthogonal to model scaling. Standard training objectives optimize for per-turn quality, not cross-turn coherence.
When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.
Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.