Goal Alignment in LLM-Based User Simulators for Conversational AI

Paper · arXiv 2507.20152 · Published July 27, 2025

While current Large Language Models (LLMs) have advanced user simulation capabilities, we reveal that they struggle to consistently demonstrate goal-oriented behavior across multiturn conversations–a critical limitation that compromises their reliability in downstream applications. We introduce User Goal State Tracking (UGST), a novel framework that tracks user goal progression throughout conversations. Leveraging UGST, we present a three-stage methodology for developing user simulators that can autonomously track goal progression and reason to generate goal-aligned responses. Moreover, we establish comprehensive evaluation metrics for measuring goal alignment in user simulators, and demonstrate that our approach yields substantial improvements

Current LLM-based user simulators exhibit what we refer to as the goal misalignment problem–they struggle to consistently adhere to their user goals throughout conversations (Zhang et al., 2020; Kim et al., 2025; Yao et al., 2024). Through principled analysis, we find that existing user simulators cannot reliably adhere to their assigned user profiles and behavioral constraints, or manage multiple objectives and complete them within the specified conversation limits, as detailed in Table 1. This misalignment leads to unexpected simulator behavior, which can produce inaccurate evaluations or misleading reward signals that compromises the effectiveness of reinforcement learning (RL) for conversational agents (Skalse et al., 2022; Amodei et al., 2016; Carroll et al., 2019; Yao et al., 2024). These failures reveal a fundamental limitation in user simulators that undermines their reliability for downstream tasks and highlights a critical yet largely unexplored challenge in developing goalaligned user simulators. To address this challenge, we propose developing user simulators that track goal progression, and reason to ensure that each response progresses towards completing objectives while adhering to their user profiles and behavioral constraints, as depicted in Figure 1.

We present User Goal State Tracking (UGST), a framework that builds upon Dialog State Tracking principles (Henderson et al., 2014) and dynamically tracks a user’s goal progression throughout a conversation. UGST uses User Goal States to provide structured representations of goal progression. These states are created by decomposing the user goal into modular sub-components where each captures a distinct aspect of the goal (e.g. finding a restaurant to eat at, prefacing each request politely with ‘please’). Each sub-component is assigned a corresponding status that is dynamically updated after every turn in a conversation. Our framework is presented in Figure 2.

We leverage UGST to systematically enhance goal alignment in LLM-based user simulators through a three-stage methodology. First, we introduce inference-time steering, where we conduct UGST and provide simulators with their latest goal state before they generate each response, explicitly grounding them in their goal progression. Using inference-time steering, we generate conversations with explicit reasoning traces about goal progression and goal-aligned user responses.

We conduct supervised fine-tuning (SFT) on the generated conversation data to foster intrinsic capabilities to autonomously track goal progression and generate goal-aligned responses without external guidance from inference-time steering. Further, inspired by recent findings on strong generalization capabilities of RL (Qian et al., 2025; Mukherjee et al., 2025; Gunjal et al., 2025), we apply Group Relative Policy Optimization (GRPO) (Shao et al., 2024) with a composite reward derived from UGST to further refine reasoning and goal alignment capabilities. This approach addresses the fundamental limitations identified in current LLM-based user simulators, and helps develop more goal-aligned simulators that can autonomously track goal progression and reason to generate more goal-aligned responses.

Reinforcement Learning for Conversational Agents. User simulators enable conversational agents to learn through trial-and-error interactions with RL (Algherairy and Ahmed, 2025; Scheffler and Young, 2002; Gür et al., 2018; Shah et al., 2018; Lin et al., 2022b; Liu et al., 2023). In these settings, agents interact with user simulators and learn through rewards based on the interactions.

4.1 User Goal State

The User Goal State is a structured representation that captures a user’s progress towards their goal at every turn in a conversation. Given an initial user goal described in natural language, UGST decomposes it into distinct, modular subcomponents, where each represents an independent, self-contained part of the original user goal.

There are several types of sub-components that can make up a user goal. Task objectives and requirements represent the earliest established subcomponents (Schatzmann et al., 2007; Gür et al., 2018; Cheng et al., 2022b; Rastogi et al., 2020; Yao et al., 2024; Xiao et al., 2024). These are items that must be completed during an interaction, for instance, a task objective might be "book a flight", with associated requirements such as "add one checked bag" and "get an aisle seat".

As user simulators have evolved to handle more complex scenarios, researchers have introduced additional dimensions to user goals. These include preferences that users must align with when pursuing task objectives (Yao et al., 2024; Cheng et al., 2022b), such as "you prefer the cheapest available flight" or "you prefer an aisle seat, but you are also okay with a window seat". There are also user profile sub-components, which contain contextual information about the user that may influence their decision making and behavior. These profiles can include relevant facts about a user, their persona, or emotional state (e.g. "your payment information is stored online", "you currently live in New York", "you are hurried and always late", "you are emotional and a bit angry") (Yao et al., 2024; Xiao et al., 2024). Lastly, there are user policies that define behavioral constraints or guidelines that specify how a user acts during interactions, such as "you are a private person and reveal little about yourself" (Yao et al., 2024). We categorize user goal sub-components into these categories (user profile, user policy, task objective, requirement, or preference) for a more granular representation of a user’s goal progression (see Figure 2).

In our representation of the user goal state, the different categories of sub-components have different success criteria. The user profile, user policy and preference sub-components can be either:

• ALIGNED: The user has demonstrated behavior consistent with the sub-component.

• MISALIGNED: The user has demonstrated havior that contradicts or fails to align with this sub-component.

Task objective and requirements subcomponents can have a status of:

• COMPLETE: The user has successfully accomplished the specific task or requirement.

• INCOMPLETE: The user has not yet accomplished the specific task or requirement.

• ATTEMPTED: The user has made a sufficient attempt to complete the task or requirement, but cannot proceed due to external factors outside their control (e.g. agent-side failures, system constraints, or limitations). Unlike existing frameworks, this status ensures that users are not penalized for failures they did not cause, providing a more fair representation of user performance.

After each conversational turn, ti = (ui, ai), each sub-component’s status is individually updated via use an LLM, producing a new user goal state Si that captures the goal progression up until turn i (see Appendix D for the prompt). The final user goal state Sn encapsulates the user’s overall goal alignment across the entire conversation.