Does turn-level intent control prevent simulator drift during long conversations?

This explores whether conditioning an AI user-simulator on per-turn intent signals is enough to keep it 'in character' over a long conversation — or whether drift has causes that turn-level control alone can't reach.

This explores whether feeding a simulator explicit turn-by-turn intent signals keeps it from sliding off-persona as a conversation stretches on. The corpus suggests turn-level intent control helps, but it's only one of three layers — and the drift it doesn't catch is the kind that actually corrupts long simulations. The cleanest version of the 'control it per turn' idea is RecLLM, which conditions an LLM simulator on two latent variables at once: a session-level user profile and a turn-level user intent, producing synthetic conversations realistic enough to fool discriminators Can controlled latent variables make LLM user simulators realistic?. So turn-level intent isn't a fix for drift so much as a knob for realism — it tells the simulator what to want *right now*, not how to stay consistent with what it wanted ten turns ago.

That gap is exactly where drift lives. One study breaks simulator failure into three distinct types — local drift within a turn, global drift across the whole conversation, and outright factual self-contradiction — and shows that turn-level signals only touch the first. Fixing the others took multi-turn RL training that rewards consistency across the arc of the dialogue (using prompt-to-line, line-to-line, and Q&A consistency as reward signals), which cut persona drift by over 55% Can training user simulators reduce persona drift in dialogue?. The lesson: control delivered turn-by-turn is local by construction, but the costly drift is global, so you need a training signal that spans turns, not just a prompt that refreshes each turn.

The more structural answer comes from the UGST framework, which argues a single 'intent' variable is too coarse to track. It decomposes a simulator's goal into profile, policy, task, requirements, and preferences — each with its own status — because when any one slips, the simulator's misalignment quietly corrupts the RL training signal it's supposed to produce Why do LLM user simulators fail to track their own goals?. So 'turn-level intent control' is really doing the job of five trackable sub-goals badly bundled into one; drift is what happens when they desynchronize.

Worth knowing: the same drift afflicts the *assistant* side, and it's not a capacity problem. LLMs hit ~90% accuracy on single-message instructions but fall to ~65% across natural multi-turn conversation, because they lock into an early guess and can't course-correct Why do AI assistants get worse at longer conversations?. Two papers trace this to RLHF rewarding confident, premature answers over clarification — an 'intent alignment gap' rather than lost capability, recoverable by an architecture that explicitly parses user intent before acting Why do language models lose performance in longer conversations?, and an 'alignment tax' that drives grounding acts 77.5% below human levels Does preference optimization harm conversational understanding?. That mirror is the surprising part: explicit intent parsing per turn is precisely the repair proposed for assistant drift — so turn-level intent control is a genuine lever on both sides, just never a complete one.

Sources 6 notes

Can controlled latent variables make LLM user simulators realistic?

RecLLM demonstrates that conditioning an LLM simulator on session-level (user profile) and turn-level (user intent) latent variables produces synthetic conversations measurable as realistic via crowdsource discrimination, discriminator models, and classifier-ensemble distribution matching.

Can training user simulators reduce persona drift in dialogue?

By inverting standard RL setups to train user simulators for consistency using three complementary metrics (prompt-to-line, line-to-line, Q&A consistency) as reward signals, persona drift decreases by over 55%. This approach captures distinct failure types: local drift within turns, global drift across conversations, and factual contradictions.

Why do LLM user simulators fail to track their own goals?

The UGST framework breaks user goals into profile, policy, task, requirements, and preferences—each with explicit status tracking. A three-stage method (steering, SFT, GRPO) progressively internalizes goal alignment, reducing the misalignment that corrupts RL training signals.

Why do AI assistants get worse at longer conversations?

LLMs perform at 90% accuracy with single-message instructions but drop to 65% across natural conversation. Models lock into early guesses when information arrives gradually and cannot course-correct, a behavior induced by RLHF training that rewards helpfulness over clarification.

Why do language models lose performance in longer conversations?

LLMs degrade in multi-turn settings because RLHF training rewards premature answers over clarification-seeking, creating pragmatic mismatch with individual user behaviors. A Mediator-Assistant architecture that explicitly parses user intent before execution recovers lost performance without retraining.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Does turn-level intent control prevent simulator drift during long conversations?

Sources 6 notes

Next inquiring lines