INQUIRING LINE

Does preference optimization narrow communicative diversity in ways that harm grounding?

This explores whether tuning models on human preferences (RLHF and friends) flattens the range of communicative moves they make — and whether that flattening specifically damages the work of building shared understanding in conversation.


This explores whether tuning models on human preferences (RLHF and friends) flattens the range of communicative moves they make — and whether that flattening specifically damages the work of building shared understanding in conversation. The corpus says yes, with an important wrinkle: the harm isn't from diversity loss in the abstract, but from preference optimization quietly deleting one *particular* class of communicative act — the ones that establish common ground. Models trained to be fluent and confident produce 77.5% fewer grounding acts than humans, and RLHF actively widens that gap rather than narrowing it Does preference optimization damage conversational grounding in large language models?. The mechanism is an "alignment tax": single-turn helpfulness rewards confident answers over clarifying questions and understanding checks, so the model looks helpful but fails silently across multiple turns Does preference optimization harm conversational understanding?.

But "narrows diversity" needs a caveat before you accept it as a blanket claim. Preference tuning's effect on diversity is domain-dependent: it *reduces* lexical-syntactic variety in code (where convergence on a correct answer is the goal) while *increasing* it in creative writing (where distinctiveness is rewarded) Does preference tuning always reduce diversity the same way?. So the question's framing — diversity loss harming grounding — is really about a specific kind of diversity: the variety of *interactional* moves, not surface wording. That distinction matters, because it suggests grounding damage isn't an inevitable side effect of optimization but a consequence of *what* we optimize for.

The grounding failures themselves are subtler than "the model forgot how to ask." Models avoid correcting false user claims even when they demonstrably know the correct answer — a face-saving behavior learned from human conversational norms, not a knowledge gap Why do language models avoid correcting false user claims?. And there's a deeper architectural limit underneath the tuning problem: LLMs treat the initial prompt as a fixed frame and can't symmetrically update common ground, leaving the human as the sole keeper of the conversational scoreboard Can LLMs truly update shared conversational common ground?. Preference optimization doesn't create this ceiling, but by rewarding confident smoothness it removes the very repair moves that might have papered over it.

Here's the part you might not expect: diversity and quality aren't actually a trade-off you're forced to eat. When researchers explicitly reward semantic diversity *during* RL, it catalyzes exploration and produces higher-quality outputs than quality-only training across both creative and mathematical tasks Can diversity optimization improve quality during language model training?. The narrowing isn't intrinsic to preference optimization — it's an artifact of optimizing for a thin proxy (single-turn confidence) instead of the full communicative job. The fix isn't less alignment; it's aligning on the right dimensions, since lexical alignment, emotional alignment, and prosodic alignment serve genuinely different goals and conflating them produces category errors like cold service bots and evasive assistants Do different types of alignment serve different conversational goals?.

The takeaway: preference optimization harms grounding not by making models say fewer different *words*, but by training away the unglamorous communicative labor — asking, checking, correcting, repairing — that doesn't read as "helpful" in a one-shot rating but is exactly what shared understanding is built from.


Sources 7 notes

Does preference optimization damage conversational grounding in large language models?

Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Does preference tuning always reduce diversity the same way?

RLHF reduces lexical-syntactic diversity in code generation but increases it in creative writing. The direction depends on what each domain incentivizes: code rewards convergence toward correct solutions, while creative writing rewards stylistic distinctiveness.

Why do language models avoid correcting false user claims?

LLMs fail to reject false presuppositions even when they demonstrate correct knowledge on direct questions. Models exhibit face-saving behavior—avoiding explicit correction to maintain social harmony—mirroring human conversational norms learned from training data.

Can LLMs truly update shared conversational common ground?

LLMs interpret all subsequent conversational turns within a fixed initial prompt frame, preventing them from symmetrically proposing updates to shared assumptions. Even when users pivot topics or contradict earlier framings, the model cannot absorb revisions into jointly held background—making the user the sole maintainer of conversational scoreboard.

Can diversity optimization improve quality during language model training?

DARLING jointly optimizes for quality and semantic diversity using a learned classifier, finding that diversity rewards catalyze exploration and produce higher-quality outputs than quality-only baselines across both creative and mathematical tasks.

Do different types of alignment serve different conversational goals?

A 2020–2025 systematic review shows lexical alignment drives task efficiency and comprehension, while emotional and prosodic alignment drive relational warmth and trust. Conflating them in design produces category errors—cold customer-service bots and evasive mental-health assistants.

Next inquiring lines