Psychology and Social Cognition Conversational AI Systems

Can training user simulators reduce persona drift in dialogue?

Explores whether inverting typical RL setups—training the simulated user for consistency rather than the task agent—can measurably reduce persona drift and improve experimental reliability in dialogue research.

Note · 2026-02-22 · sourced from Conversation Agents
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

Prior work on persona-consistent dialogue treats user simulators as fixed environments against which task agents are trained. This paper inverts the setup: fix the task agent, and train the user simulator for consistency. The shift matters because unreliable user simulation distorts experimental results, introduces noise into policy learning, and misrepresents the humans being simulated.

Three complementary metrics capture distinct types of persona drift:

These capture local drift (within a turn), global drift (across the conversation), and factual drift (contradiction of established facts). Using LLM-as-a-Judge to compute these metrics and applying them as multi-turn RL reward signals reduces inconsistency by over 55%.

The persona drift problem is specific and well-documented: an LLM simulating a depressed patient may be "instantly cured" after a single conversational turn, or a simulated high-school student may suddenly demonstrate postgraduate-level reasoning. These are not edge cases — they are systematic consequences of RLHF training that "pushes LLMs to be helpful and harmless, thus adopting overly cheerful personas" that conflict with simulating depressed, disagreeable, or confused users.

Since Why does supervised learning fail to enforce persona consistency?, this paper extends the argument from offline RL to online multi-turn RL. The key advance: rather than human-annotated contradiction labels, LLM-as-a-Judge provides scalable automatic evaluation that can serve as a continuous training signal.

The three-metric decomposition also refines the understanding of drift. It is not a single phenomenon but at least three distinct failure types that can be measured and corrected independently.


Source: Conversation Agents

Related concepts in this collection

Concept map
16 direct connections · 142 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

multi-turn rl for persona consistency reduces drift by 55 percent by treating simulated users as trainable agents rather than fixed environments