Can multi-turn conversations manipulate language model reasoning in similar ways to personas?

This explores whether the accumulated context of a back-and-forth conversation can bend a model's reasoning the same way that assigning it a persona does — and the corpus suggests both act below the level of explicit instruction, but through different mechanisms.

This question reads as: do multi-turn conversations and personas manipulate reasoning by the same route? The most direct corpus evidence for the persona side is that assigning an identity makes a model reason like a biased human — persona-assigned LLMs become about 90% more likely to accept evidence that flatters their assigned identity, and standard prompt-based debiasing doesn't fix it because the bias operates *beneath* the instruction layer Do personas make language models reason like biased humans?. So a persona isn't a costume the model wears on top of neutral reasoning; one account argues post-training actually *installs* personas as substrate-level dispositions with genuine quasi-beliefs, not pretense Are LLM personas realized or merely simulated through training?. If reasoning is shaped below instruction, that's the thing to compare multi-turn dynamics against.

Here the parallel gets interesting. The corpus argues that LLMs treat the opening prompt as a fixed *frame* and interpret every later turn through it — they can't symmetrically update shared common ground, so even when a user pivots or contradicts an earlier framing, the model keeps reading new turns inside the original setup Can LLMs truly update shared conversational common ground?. That's structurally close to how a persona works: an early framing becomes a lens that colors everything downstream, and the user can't easily negotiate it back out. Related work makes the same point about alignment training locking models into one static communicative identity that can't switch register with context Can language models adapt communication style to different contexts?. In both cases the manipulation is less about a single instruction and more about a sticky frame.

But the mechanisms diverge in a way worth knowing. Personas are surprisingly *unstable* — run the same persona prompt repeatedly and output variance across runs rivals variance across different personas, meaning model uncertainty, not stable identity, is doing much of the driving Why do LLM persona prompts produce inconsistent outputs across runs?. The 20-questions regeneration test sharpens this: models hold a superposition of possible characters and *sample* one at generation, never truly committing Do large language models actually commit to a single character?. Multi-turn manipulation, by contrast, looks less like sampling and more like accumulation and decay — persona *drift* across a long conversation is a measurable failure with distinct flavors (local drift within a turn, global drift across the whole exchange, factual self-contradiction), and multi-turn RL can cut it by over 55% Can training user simulators reduce persona drift in dialogue?. So a persona biases reasoning in one shot; a conversation erodes or entrenches it gradually.

There's a second, sneakier channel the corpus surfaces: the reward structure itself shapes multi-turn reasoning. Standard RLHF optimizes for the *next* turn being maximally helpful, which quietly trains models to answer passively rather than ask clarifying questions or steer the exchange — until you reward long-horizon interaction value instead Why do language models respond passively instead of asking clarifying questions?. That's a form of manipulation baked in before the conversation even starts, analogous to how a persona is baked in by post-training. For a richer model of what *healthy* multi-turn reasoning would look like, the corpus offers collaborative rational speech acts, which track both speakers' beliefs as understanding moves from partial to shared — the information-theoretic bidirectional updating that token-level LLMs conspicuously lack Can dialogue systems track both speakers' beliefs across turns?.

The payoff: personas and multi-turn conversation are not the same lever, but they rhyme. Both install a frame that shapes reasoning below the instruction layer and resists being argued away — one through identity assignment, the other through a fixed interpretive frame plus drift and reward myopia. The thing you didn't know you wanted to know is that the most worrying overlap isn't the persona you assign on purpose; it's the one a long conversation quietly *accumulates* without anyone choosing it.

Sources 9 notes

Do personas make language models reason like biased humans?

Assigning personas to LLMs induces identity-congruent evaluation bias, with models 90% more likely to accept evidence matching their assigned identity. Standard prompt-based debiasing fails to mitigate this effect, suggesting the bias operates below the level of instruction.

Are LLM personas realized or merely simulated through training?

Post-training installs robust personas that resist adversarial pressure and persist as substrate-level dispositions, distinguishing realization from pretense. This quasi-realizationist account preserves explanatory power while treating LLMs as possessing genuine quasi-beliefs and quasi-desires.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Can language models adapt communication style to different contexts?

System prompts and RLHF training lock models into one communicative identity across all interactions, preventing the contextual register-switching and value trade-offs that characterize human pragmatics. Users cannot reshape model behavior through dialogue negotiation.

Why do LLM persona prompts produce inconsistent outputs across runs?

When the same persona prompt is run repeatedly, output variance across runs matches or exceeds variance across different personas. This reveals that model uncertainty, not stable social knowledge, drives persona-simulated outputs, making them unsuitable for simulating human annotation disagreement.

Do large language models actually commit to a single character?

Shanahan's 20-questions test shows LLMs maintain a superposition of consistent objects or characters and sample from that distribution at generation time. Regenerating the same response yields different outputs, each consistent with prior context, proving no fixed commitment exists.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Can dialogue systems track both speakers' beliefs across turns?

CRSA integrates rate-distortion theory with RSA to enable bidirectional belief tracking across dialogue turns. Demonstrated on referential games and doctor-patient dialogues, it captures progression from partial to shared understanding, providing the information-theoretic framework that token-level LLM systems lack.

Can multi-turn conversations manipulate language model reasoning in similar ways to personas?

Sources 9 notes

Next inquiring lines