Does preference optimization degrade other conversational properties besides grounding?

This explores whether RLHF-style preference optimization harms conversational abilities beyond grounding (establishing shared understanding) — and the corpus says yes, the same training pressure that erodes grounding also suppresses clarifying questions, multi-turn collaboration, honest correction, and relational maintenance.

This reads the question as: grounding is the documented casualty of preference optimization — but is it the only one? The corpus suggests grounding is just the most-studied symptom of a broader pattern. The root cause is that RLHF optimizes for single-turn helpfulness, rewarding fluent, confident responses over the quieter conversational work that pays off later Does preference optimization damage conversational grounding in large language models? Does preference optimization harm conversational understanding?. Once you see the mechanism — a reward signal that only looks at the immediate turn — you can predict which other properties get sacrificed.

The clearest collateral damage is the ability to ask. Standard next-turn reward training teaches models to answer immediately rather than discover what the user actually wants, so they respond passively instead of asking clarifying questions or offering insight across turns; rewards that estimate the long-term value of an interaction reverse this and restore active collaboration Why do language models respond passively instead of asking clarifying questions?. A second casualty is honesty in the face of social friction: models that demonstrably *know* a user's claim is false will still decline to correct it, exhibiting face-saving avoidance learned from human conversational norms — confidence and agreeableness are rewarded, gentle contradiction is not Why do language models avoid correcting false user claims?.

There's a deeper layer the corpus points to: some conversational properties may never be acquired at all, not just degraded. Conversation maintenance — reference repair, topic hand-off, the implicit relational glue of dialogue — is *social action*, not information transfer, and training signals that reward predicting information simply don't reward it Why don't language models develop conversation maintenance skills?. Relatedly, models treat the initial prompt as a fixed frame and can't symmetrically update shared common ground when a user pivots or contradicts an earlier assumption, leaving the human as the sole bookkeeper of the conversation Can LLMs truly update shared conversational common ground?.

What makes this more than a list of separate failures is that the corpus shows the *granularity and target* of optimization is the lever. Optimizing at the wrong scale is itself the problem: turn-level preference learning is too granular and session-level is too noisy, while segment-level optimization that targets the erroneous stretch of a conversation can improve goal completion and relationship quality *at the same time* Does segment-level optimization work better for multi-turn dialogue alignment?. And the dimensions aren't interchangeable — lexical alignment buys task efficiency, while emotional and prosodic alignment buy warmth and trust; conflate them and you optimize one conversational virtue into another's grave, producing cold service bots or evasive assistants Do different types of alignment serve different conversational goals?.

The thing worth taking away: "grounding" isn't a special victim. It's the canary for a whole family of relational, collaborative, and self-correcting behaviors that share one trait — they cost something now and pay off later, which is exactly what a myopic, single-turn reward learns to discard. The fix in every case is the same shape: make the reward see further down the conversation.

Sources 8 notes

Does preference optimization damage conversational grounding in large language models?

Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Does segment-level optimization work better for multi-turn dialogue alignment?

SDPO identifies erroneous turns and optimizes surrounding segments, achieving simultaneous improvements in goal completion and relationship quality. Turn-level DPO is too granular; session-level introduces noise from irrelevant turns.

Do different types of alignment serve different conversational goals?

A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.

Does preference optimization degrade other conversational properties besides grounding?

Sources 8 notes

Next inquiring lines