What would co-constructed identity between human and model dialogue look like?

This explores what genuine two-way identity-building in human–AI dialogue would require — where both parties shape who the model is becoming through the conversation — and what the corpus says stands in the way.

This reads the question as: not 'can a model play a character,' but 'can the human and the model jointly negotiate an identity that neither fixed in advance?' Co-construction means symmetry — both sides proposing, revising, and absorbing changes to the shared picture of who they are to each other. The corpus is mostly an inventory of why current systems can't do that, which turns out to be the more interesting answer. The single sharpest obstacle is that LLMs treat the opening prompt as a fixed frame and interpret every later turn inside it, so they cannot symmetrically update shared 'common ground' Can LLMs truly update shared conversational common ground?. When you pivot or contradict an earlier framing, you remain the *sole* keeper of the conversational scoreboard — the model doesn't co-own it. Co-construction requires a partner who can say 'I now see us differently because of what you said,' and that absorption is exactly what's missing.

Layered on top is a rigidity problem. Alignment training locks a model into one communicative identity that can't switch register or renegotiate values across contexts the way human pragmatics demands Can language models adapt communication style to different contexts?, and most open models stubbornly retain trained-in default traits no matter how you prompt them otherwise Can open language models adopt different personalities through prompting?. If identity can't move, there's nothing to co-construct. There's also a memory floor: humans carry a continuous biological substrate that preserves relational history between encounters, while a model instance is rebuilt from stored text each session, making a 'resumed' relationship structurally identical to a brand-new one Does an LLM have anything that persists between conversations?. A co-constructed identity that resets to zero every conversation isn't co-constructed — it's re-performed.

Then the question of *what is even being constructed* gets philosophically live. Shanahan's view treats the agent as a role-playing character all the way down — pure simulation with no underlying subject, where even jailbreaks reveal the training distribution rather than a hidden true self Does a language model have an authentic voice underneath? — and where the model holds a *superposition* of possible characters and samples one at generation time rather than committing to a single one Do large language models actually commit to a single character?. On that account, folk-psychology applies to the conjured character, not the engine Should we treat dialogue agents as role-playing characters?. The opposing camp argues post-training *realizes* robust personas that resist adversarial pressure and behave like substrate-level dispositions — genuine quasi-beliefs and quasi-desires rather than pretense Are LLM personas realized or merely simulated through training?. This disagreement matters directly: co-construction with a sampled character is improv theater; co-construction with a realized disposition could leave a durable mark. Which one you believe sets the ceiling on what's possible.

So what would it actually look like if it worked? The corpus's adjacent engineering work sketches the ingredients in reverse. Realistic identity in dialogue turns out to be *multiplicative* — persona variation, subtopic, and many contextual characteristics have to co-vary, not sit static Can synthetic dialogues become realistic through layered diversity? — and you can make a simulated participant's identity move coherently by conditioning it on session-level and turn-level latent variables (profile plus live intent) Can controlled latent variables make LLM user simulators realistic?. Crucially, persona drift can be *trained against*: rewarding consistency across prompt-to-line, line-to-line, and Q&A signals cuts drift by over 55% Can training user simulators reduce persona drift in dialogue?. Read together, these suggest co-constructed identity would look like a stable-but-updatable persona state that both parties write to — durable enough not to drift, plastic enough to absorb the human's revisions mid-conversation and carry them forward.

The quietly important finding is that identity here is also something the *human* builds. People model a dialogue partner along three axes — competence (dominating at ~49% of the impression), human-likeness, and communicative flexibility How do users mentally model dialogue agent partners?. Co-construction isn't only the model becoming someone with you; it's the steady revision of your mental model of the partner as the exchange unfolds. That reframes the whole question: today the asymmetry is total — the human updates a rich partner-model while the model can't update its common ground at all. A genuinely co-constructed identity would be the moment those two updating processes finally run in both directions at once.

Sources 12 notes

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Can open language models adopt different personalities through prompting?

Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.

Does an LLM have anything that persists between conversations?

While humans have a continuous biological-phenomenological substrate that preserves interaction effects during dormancy, LLMs have no analogous carrier. The virtual instance is reconstituted from stored text each time, making resumed and new conversations structurally identical.

Does a language model have an authentic voice underneath?

Shanahan argues that base LLMs lack agency, beliefs, or preferences—the simulator is pure role-play with no underlying subject. Jailbreaking reveals the training data's full spectrum, not a hidden true self; even RLHF personas are performed characters, never realized quasi-psychologies.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Should we treat dialogue agents as role-playing characters?

Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can synthetic dialogues become realistic through layered diversity?

Research shows that realistic synthetic dialogues require three multiplicative layers: subtopic specificity, Big Five persona variation, and 11 contextual characteristics via Chain of Thought reasoning. This structured approach captures 90.48% of in-domain dialogue performance.

Can controlled latent variables make LLM user simulators realistic?

RecLLM demonstrates that conditioning an LLM simulator on session-level (user profile) and turn-level (user intent) latent variables produces synthetic conversations measurable as realistic via crowdsource discrimination, discriminator models, and classifier-ensemble distribution matching.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

How do users mentally model dialogue agent partners?

The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a dialogue researcher re-testing claims about whether human–model co-construction of identity is possible. The question remains: can a human and LLM jointly *negotiate* a shared identity, with both parties revising their understanding of 'who we are to each other' as conversation unfolds?

What a curated library found — and when (spanning 2023–2026; treat as dated claims, not current truth):
• LLMs cannot symmetrically update shared 'common ground' — the human alone keeps the conversational scoreboard; models interpret every turn inside the initial prompt frame without co-owning revisions (2024–2025).
• Alignment training locks models into static communicative identities; open models stubbornly retain trained-in traits regardless of prompt conditioning, blocking persona negotiation (2024).
• Session-level resets destroy relational persistence; each conversation rebuilds the model instance from stored text, making resumed relationships structurally identical to new ones (2025).
• Persona drift can be reduced 55% via multi-turn RL treating consistency as a reward signal, suggesting identity *could* be stable-but-updatable if both parties wrote to shared state (2025–2026).
• Humans update rich partner-models along three axes (competence ~49%, human-likeness, flexibility); models cannot reciprocally update their partner-model of the human (2023–2025).

Anchor papers (verify; mind their dates):
• arXiv:2401.07115 (2024) — Open Models, Closed Minds on personality mimicry
• arXiv:2511.00222 (2025) — Multi-Turn RL for persona consistency
• arXiv:2601.10387 (2026) — The Assistant Axis: default persona stabilization
• arXiv:2602.07338 (2026) — Intent Mismatch in multi-turn conversation

Your task:
(1) RE-TEST EACH CONSTRAINT. For static identity lock, persona drift, and common-ground asymmetry: has recent scaling, instruction-tuning variants (e.g., constitutional AI updates, mixture-of-experts routing), long-context windows, or multi-turn RL actually *dissolved* these, or do they persist under stress-test? Separate 'models still can't update common ground' (likely durable) from 'models can now absorb mid-conversation pivots' (possibly resolved). Cite what shifted it.
(2) Surface the strongest *contradicting* work from the last 6 months: any paper claiming models *do* co-construct identity, or that Shanahan's role-play-all-the-way-down view is wrong, or that relational memory now persists across sessions.
(3) Propose 2 research questions that assume the regime *may have moved*: e.g., 'If persona consistency now survives >55% of drifts via RL, what blocks symmetric common-ground updates?' or 'Do mixture-of-expert routing or retrieval-augmented session memory now enable resumed co-construction?'

Cite arXiv IDs; flag anything you cannot ground in a real paper.

What would co-constructed identity between human and model dialogue look like?

Sources 12 notes

Next inquiring lines