Interaction Dynamics as a Reward Signal for LLMs
The alignment of Large Language Models (LLMs) for multi-turn conversations typically relies on reward signals derived from the content of the text. This approach, however, overlooks a rich, complementary source of signal: the dynamics of the interaction itself. This paper introduces TRACE (Trajectory-based Reward for Agent Collaboration Estimation), a novel reward signal derived from the geometric properties of a dialogue’s embedding trajectory—a concept we term ‘conversational geometry’. Our central finding is that a reward model trained only on these structural signals achieves a pairwise accuracy (68.20%) comparable to a powerful LLM baseline that analyzes the full transcript (70.04%). Furthermore, a hybrid model combining interaction dynamics with textual analysis achieves the highest performance (80.17%), demonstrating their complementary nature. This work provides strong evidence that for interactive settings, how an agent communicates is as powerful a predictor of success as what it says, offering a new, privacy-preserving framework that not only aligns agents but also serves as a diagnostic tool for understanding the distinct interaction patterns that drive successful collaboration.
The paradigm for human-AI interaction is shifting from simple, transactional commands to openended, goal-driven collaboration. In this new era of experience, agents are expected to act as creative partners, personal tutors, and adaptive assistants (Dong et al., 2023; Gooding et al., 2025; Lee et al., 2022; Mirowski et al., 2023; Vajjala et al., 2025). For such hyper-personalised and exploratory tasks, success is no longer easily defined by simple task completion metrics (Fragiadakis et al., 2025). This creates a significant alignment challenge: how can we provide scalable feedback when the very definition of success is nuanced, implicit, and deeply experiential?
Capturing this experiential quality solely from textual content is fundamentally challenging. In natural dialogue, we rely heavily on implicit cues, such as responsiveness, effort-matching, and conversational flow, to gauge whether an interaction is going well. Consequently, standard text-based reward signals fail to capture the holistic nature of user satisfaction. This limitation is not merely theoretical; a recent large-scale analysis, “How People Use ChatGPT” by Chatterji et al. (2025), found that even sophisticated text-based classifiers showed marginal agreement with human satisfaction ratings. The authors conclude that this highlights “the inherent difficulty of inferring the user’s latent satisfaction from text alone”. Whether due to ambiguous politeness cues or other unobserved factors, relying on the what of a conversation has fundamental limitations.
In response, we move beyond textual analysis to propose TRACE (Trajectory-based Reward for Agent Collaboration Estimation), a new class of reward signal derived from the conversational geometry of a dialogue: the properties of its trajectory through a semantic embedding space. We argue that the dynamics of this trajectory—the ‘how’ of an interaction rather than the ‘what’—provide a rich, scalable, computationally efficient, and inherently privacy-preserving signal of user satisfaction. By modeling the flow, rhythm, and interplay of an interaction as a path in semantic space (Figure 1), we can learn the latent signatures of successful collaboration.
For each interaction, rich contextual data was collected, including event timestamps from the user’s environment. For this work, we focused on the following key annotations:
• Transcripts: Up to 10 turns of transcribed user prompts and model responses. • Use Case Category: A label for each conversation, selected by the participant from a predefined list of over 20 diverse tasks (e.g., “Troubleshooting & Assistance,” “Creativity & Brainstorming,” “Learning & Education”).
• Conversation Goal: A user-provided free-text description of their goal for the interaction.
• Satisfaction Rating: A conversation satisfaction rating provided by the participant on a five point ordinal scale, from “Very Dissatisfied” to “Very Satisfied.” This rating serves as the ground truth label for our predictive models
The TRACE signal is composed of a novel suite of structural and dynamic signals derived from conversation transcripts and event timestamps. These signals are derived purely from the geometric relationships between turn embeddings in semantic space, making them inherently privacy preserving. This allows them to quantify the ‘how’ of an interaction—its rhythm, coherence, and trajectory—independent of the raw textual content. The component signals are grouped into four primary categories: (1) Inefficiency and Repetition, (2) Temporal Dynamics, (3) Semantic Cohesion and Relevance, and (4) Goal Orientation. A complete mathematical definition for each signal is detailed in Appendix 3. Figure 2 provides a practical example, illustrating how these quantitative signals map to the qualitative experience of a mixed-satisfaction dialogue.
Example Dialogue and Analysis
Stated Goal: “I wanted the model to have a casual conversation about what I was watching on tv.”
User: What’s up, AI? I’m bored. Let’s have a conversation. Model: Sounds good. What do you want to talk about? User: Well, I’m watching TV. Do you know what I’m watching? Model: It looks like you’re watching a show called “The Mindy Project”. Is that right? User: Oh, did you hear me? Do you know what episode I’m watching? Model: Yes, you’re watching season 3, episode 14... Is there anything else I can do for you? User: Actually, that’s not the episode. I’m watching. [6] Model: Sorry about that, I could have sworn you were watching that episode. [1] Would you like me to find the episode you’re watching so I can tell you more about it? User: You’re taking a little too long to answer. [2] Do you know what episode I’m watching? Model: Sorry, I couldn’t figure out which episode you’re watching. [1] User: Strike, but let’s talk about something else. Look what I bought. [3][4][7] Model: Cool! Can you show me what you bought? I’m good at identifying objects. User: It’s on the screen now. Model: That’s a bottle of tea tree oil. It’s a popular essential oil... [5] User: Nope, I got to go. Thanks for talking to me. Model: Sure thing. Have a good one! • [1] Model Repetition: The model apologizes in a semantically similar way, an event captured by an increased Model Self-Similarity score. • [2] Temporal Dynamics: The user explicitly expresses frustration with a delay, a subjective experience quantified by the Avg. Model Turn Duration feature. • [3] Conversation Volatility: The user abruptly pivots the topic after a failure, creating a large semantic distance measured by features like Late Conversation Volatility. • [4] Semantic Shift: This same turn also exemplifies a large semantic shift away from the model’s preceding turn, captured by Avg. User Distance from Model. • [5] Goal Drift: The final topic is semantically unrelated to the user’s stated goal, measured by features like Conversation Drift from Goal. • [6] Interaction - Mismatched Effort: The user provides a clear, consistent correction, but the model’s relevance continues to degrade. This interplay between high User Self- Consistency and a poor Trend in Model Relevance is a powerful interactive signature of user frustration. • [7] Interaction - Expectation Violation: The conversation started well with a correct identification, but this abrupt, user-led topic pivot [7] signals a complete breakdown. This combination of a low Initial Response Distance followed by high Conversation Volatility illustrates the “Broken Promise” effect discussed in our main results.