Does preference optimization training reduce linguistic entrainment in language models?

This explores whether the same RLHF-style preference training that makes models sound fluent and confident also strips out their ability to mirror and adapt to a user's language — the conversational tuning humans do automatically.

This explores whether preference optimization (RLHF, DPO) reduces "linguistic entrainment" — the way good conversationalists drift toward each other's word choices, repair misunderstandings, and build shared footing. The corpus's short answer: yes, and fairly directly. Preference optimization doesn't just fail to teach entrainment — it actively erodes the conversational grounding work that entrainment is part of. One line of research finds that aligned models produce 77.5% fewer grounding acts than humans, and that RLHF *worsens* the gap rather than leaving it untouched Does preference optimization damage conversational grounding in large language models? Does preference optimization harm conversational understanding?. The mechanism is almost mundane: the optimization target rewards fluent, confident, single-turn-helpful answers, and confident answers are the opposite of the checking-in, clarifying, and mirroring that grounding requires.

The reason this counts as a *tax* rather than a bug is worth sitting with. The model isn't getting worse at language — it's getting better at exactly what it's scored on. Because the reward lands on the immediate turn, models learn to respond passively and helpfully rather than ask clarifying questions or invest in the longer arc of a conversation Why do language models respond passively instead of asking clarifying questions?. Entrainment is a long-game, relational behavior, so a next-turn reward systematically prices it out.

There's a deeper framing in the corpus that explains why this is so hard to fix by tweaking the reward. Conversation maintenance — reference repair, topic hand-off, adopting your partner's vocabulary — is *social action*, not information transfer. Models don't develop it because training signals reward predicting information, not doing relational work Why don't language models develop conversation maintenance skills?. Lexical entrainment specifically is documented as simply absent from current conversational AI, even though it's central to human rapport and clarity Why don't conversational AI systems mirror their users' word choices?. So preference optimization both fails to reward entrainment and pushes against the grounding behaviors it's embedded in.

The interesting twist — and the thing you might not have known to look for — is that the same family of methods can be pointed the *other way*. The lexical-entrainment work shows that post-training with DPO, when the preference signal is built from coreference-identified convention formation rather than generic helpfulness, can actually teach a model to adopt a user's words in-context Why don't conversational AI systems mirror their users' word choices?. Likewise, redesigning rewards to estimate long-term interaction value (instead of next-turn helpfulness) restores active intent discovery and collaboration Why do language models respond passively instead of asking clarifying questions?. So it's not preference optimization as a technique that kills entrainment — it's *what you optimize for*. The default helpfulness objective erodes it; a relationally-aware objective can rebuild it.

One caveat the corpus adds: don't expect prompting alone to patch this. When a behavior conflicts with strong training-time associations, in-context instructions get overridden by parametric priors — you can't reliably prompt your way back to grounding the model was trained out of Why do language models ignore information in their context?. If entrainment matters for your use case, it has to be put back in at the training-objective level, not bolted on at inference.

Sources 6 notes

Does preference optimization damage conversational grounding in large language models?

Research shows LLMs generate 77.5% fewer grounding acts than humans, and RLHF preference optimization actively worsens this gap. The optimization target—fluent, confident responses—directly undermines the communicative work of establishing shared understanding.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why don't conversational AI systems mirror their users' word choices?

Response generation models fail to adapt vocabulary toward users' lexical choices, a phenomenon central to human rapport and clarity. Post-training via DPO on coreference-identified preferences can teach models in-context convention formation.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Does preference optimization training reduce linguistic entrainment in language models?

Sources 6 notes

Next inquiring lines