User Feedback in Human-LLM Dialogues: A Lens to Understand Users But Noisy as a Learning Signal

Paper · arXiv 2507.23158 · Published July 30, 2025
Reinforcement LearningConversation AgentsEvaluations

Once language models (LMs) are deployed, they can interact with users long-term, ideally evolving continuously based on their feedback. Asking for direct user feedback can be disruptive; thus, we study harvesting user feedback from user-LM interaction logs. We study implicit user feedback in two user-LM interaction datasets (WildChat and LMSYS). First, we analyze user feedback in the user-LLM conversation trajectory, providing insights into when and why such feedback occurs. Second, we study harvesting learning signals from such implicit user feedback. We find that the contents of user feedback (e.g., user wanted clarification), not just the polarity (e.g., users were unhappy with the previous model response), can improve model performance in short human-designed questions (MTBench) but not on longer and more complex questions (WildBench). We also find that the usefulness of user feedback is largely tied to the quality of the user’s initial prompt.

Specifically, they classify feedback into two broad categories (positive and negative) and train models to promote responses that elicited positive feedback and suppress responses that elicited negative feedback. While simple and intuitive, our study finds that this approach can lead to model degradation.

They further divide the negative feedback into the following four categories:

• Rephrasing where the user rephrased their prior request to try and elicit a better LLM response.

• Make Aware without Correction where the user’s response simply indicates that the model’s prior response was wrong.

• Make Aware with Correction where the user’s response additionally provides instruction on how to correct the model’s prior response.

• Ask for Clarification where the user asks the LLM to provide additional information that was missing from its prior response.

We follow their ontology of feedback types in this work. Other relevant works present alternative ontologies for user responses, such as one focusing on grounding acts (Shaikh et al., 2025) and others focusing on human-AI collaboration (Lee et al., 2022; Chang et al., 2025). Relevant to this work, Shaikh et al. (2025) introduces seven categories of user responses, and five of these categories could be mapped back to the five feedback types from Don-Yehiya et al. (2024). For example, “Reformulations” could be mapped to our “Rephrasing” category. The remaining two categories that could not be mapped to feedbacks are “Next Turns” and “Follow-ups”.

Impact of Model Refusals One potential reason for negative feedback is the model’s refusal to fulfill the user’s request. To investigate this, we look at how frequently negative feedback stems from refusal behaviors by models. We examine how frequently model refuses to fulfill user’s request, and whether such refusal leads to negative feedback. We sampled 1K conversation turns from six groups (negative, random, postive) and (LMSYS, WildChat). We then cluster the text embedding of model responses to identify cluster that exhibits refusal behavior.

We find that model refusals are not common across all settings, always consisting less than 3% of responses.

Adding feedback semantics doesn’t help. Now we examine compare whether adding feedback semantics help over the baseline of regenerating from scratch. mi sem shows slightly higher win rate (89%) against the original response compared to mi scra (81%), but this pattern was not observed in WildChat dataset. Moreover, when comparing two new answers directly (4th row), we find that answers generated with the feedback content mi sem does not win over the answer generated from scratch mi scra, even in Eval w/ fb setting (48%), and substantially lower in Eval w/o fb setting (19%).