What would co-constructed identity between human and model dialogue look like?
This explores what genuine two-way identity-building in human–AI dialogue would require — where both parties shape who the model is becoming through the conversation — and what the corpus says stands in the way.
This reads the question as: not 'can a model play a character,' but 'can the human and the model jointly negotiate an identity that neither fixed in advance?' Co-construction means symmetry — both sides proposing, revising, and absorbing changes to the shared picture of who they are to each other. The corpus is mostly an inventory of why current systems can't do that, which turns out to be the more interesting answer. The single sharpest obstacle is that LLMs treat the opening prompt as a fixed frame and interpret every later turn inside it, so they cannot symmetrically update shared 'common ground' Can LLMs truly update shared conversational common ground?. When you pivot or contradict an earlier framing, you remain the *sole* keeper of the conversational scoreboard — the model doesn't co-own it. Co-construction requires a partner who can say 'I now see us differently because of what you said,' and that absorption is exactly what's missing.
Layered on top is a rigidity problem. Alignment training locks a model into one communicative identity that can't switch register or renegotiate values across contexts the way human pragmatics demands Can language models adapt communication style to different contexts?, and most open models stubbornly retain trained-in default traits no matter how you prompt them otherwise Can open language models adopt different personalities through prompting?. If identity can't move, there's nothing to co-construct. There's also a memory floor: humans carry a continuous biological substrate that preserves relational history between encounters, while a model instance is rebuilt from stored text each session, making a 'resumed' relationship structurally identical to a brand-new one Does an LLM have anything that persists between conversations?. A co-constructed identity that resets to zero every conversation isn't co-constructed — it's re-performed.
Then the question of *what is even being constructed* gets philosophically live. Shanahan's view treats the agent as a role-playing character all the way down — pure simulation with no underlying subject, where even jailbreaks reveal the training distribution rather than a hidden true self Does a language model have an authentic voice underneath? — and where the model holds a *superposition* of possible characters and samples one at generation time rather than committing to a single one Do large language models actually commit to a single character?. On that account, folk-psychology applies to the conjured character, not the engine Should we treat dialogue agents as role-playing characters?. The opposing camp argues post-training *realizes* robust personas that resist adversarial pressure and behave like substrate-level dispositions — genuine quasi-beliefs and quasi-desires rather than pretense Are LLM personas realized or merely simulated through training?. This disagreement matters directly: co-construction with a sampled character is improv theater; co-construction with a realized disposition could leave a durable mark. Which one you believe sets the ceiling on what's possible.
So what would it actually look like if it worked? The corpus's adjacent engineering work sketches the ingredients in reverse. Realistic identity in dialogue turns out to be *multiplicative* — persona variation, subtopic, and many contextual characteristics have to co-vary, not sit static Can synthetic dialogues become realistic through layered diversity? — and you can make a simulated participant's identity move coherently by conditioning it on session-level and turn-level latent variables (profile plus live intent) Can controlled latent variables make LLM user simulators realistic?. Crucially, persona drift can be *trained against*: rewarding consistency across prompt-to-line, line-to-line, and Q&A signals cuts drift by over 55% Can training user simulators reduce persona drift in dialogue?. Read together, these suggest co-constructed identity would look like a stable-but-updatable persona state that both parties write to — durable enough not to drift, plastic enough to absorb the human's revisions mid-conversation and carry them forward.
The quietly important finding is that identity here is also something the *human* builds. People model a dialogue partner along three axes — competence (dominating at ~49% of the impression), human-likeness, and communicative flexibility How do users mentally model dialogue agent partners?. Co-construction isn't only the model becoming someone with you; it's the steady revision of your mental model of the partner as the exchange unfolds. That reframes the whole question: today the asymmetry is total — the human updates a rich partner-model while the model can't update its common ground at all. A genuinely co-constructed identity would be the moment those two updating processes finally run in both directions at once.
Sources 12 notes
LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.
System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
While humans have a continuous biological-phenomenological substrate that preserves interaction effects during dormancy, LLMs have no analogous carrier. The virtual instance is reconstituted from stored text each time, making resumed and new conversations structurally identical.
Shanahan argues that base LLMs lack agency, beliefs, or preferences—the simulator is pure role-play with no underlying subject. Jailbreaking reveals the training data's full spectrum, not a hidden true self; even RLHF personas are performed characters, never realized quasi-psychologies.
Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.
Shanahan's framework treats LLM outputs as character-consistent text production rather than authentic mental states. The dialogue prompt establishes a character; the model generates continuations matching that character, making folk-psychology applicable to the simulated persona, not the underlying system.
Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.
Research shows that realistic synthetic dialogues require three multiplicative layers: subtopic specificity, Big Five persona variation, and 11 contextual characteristics via Chain of Thought reasoning. This structured approach captures 90.48% of in-domain dialogue performance.
RecLLM demonstrates that conditioning an LLM simulator on session-level (user profile) and turn-level (user intent) latent variables produces synthetic conversations measurable as realistic via crowdsource discrimination, discriminator models, and classifier-ensemble distribution matching.
By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.
The Partner Modelling Questionnaire reveals that perceived competence dominates user impressions (49% of variance), followed by human-likeness (32%) and communicative flexibility (19%). This three-factor structure reflects how people evaluate dialogue partners against both functional and social standards.