Do instruction-tuned models prefer conversational over formal source language?

This explores whether instruction tuning shapes the *register* a model writes in — nudging it toward conversational phrasing rather than the formal language of the documents it was trained on or asked to draw from.

This reads the question as being about style and register, not content: does instruction tuning bias a model toward casual, conversational output over the more formal language of its source material? The corpus doesn't contain a head-to-head study measuring register preference directly — so the honest answer is that there's no single note that settles it. But several notes converge on *why* you'd expect exactly this bias, and they're more interesting together than the question alone suggests.

The load-bearing finding is that instruction tuning mostly teaches the *shape* of output, not understanding of the task. Models trained on semantically empty or even deliberately wrong instructions perform almost identically to those trained on correct ones — what actually transfers is knowledge of the output space, the format and texture of an acceptable answer Does instruction tuning teach task understanding or output format?. If register is part of that output distribution (and it is), then instruction tuning is precisely the stage where a model learns "answers sound like *this*" — and the default 'this' is the helpful-assistant conversational voice, regardless of how formal the source was.

Reinforcement tuning then sharpens that voice. RLHF rewards immediate helpfulness, which pushes models toward the friendly, accommodating, turn-taking register that scores well with human raters Why do language models respond passively instead of asking clarifying questions? Why do AI assistants get worse at longer conversations?. So there's a two-stage story: instruction tuning installs an output-format distribution, and preference tuning tilts it toward conversational helpfulness. The 'preference' in your question isn't a quirk — it's the trained objective.

The twist comes from the conversation-maintenance note: humans signal register through implicit social work — reference repair, topic hand-offs, relational moves that aren't about conveying information at all. Models don't learn these because training rewards information prediction, not relational work Why don't language models develop conversation maintenance skills?. So a model's 'conversational' register is imitated surface, not the social machinery that produces conversational tone in people — which is why it can feel conversational and oddly flat at the same time.

One more lateral pull worth knowing: when a model's prior training associations are strong, they override what's actually in the context window Why do language models ignore information in their context?. Applied to your question, that predicts a model will tend to *re-voice* formal source text into its own learned register rather than preserve the source's formality — and prompting alone often can't override it, because prompts only reorganize the existing distribution, they don't replace it Can prompt optimization teach models knowledge they lack?. If you want a deeper thread, that prior-vs-context tension is the place to dig.

Sources 6 notes

Does instruction tuning teach task understanding or output format?

Models trained on semantically empty or deliberately incorrect instructions achieve comparable performance to those trained on full correct instructions, achieving 43% vs random baseline 42.6%. The semantic content of instructions appears largely irrelevant; what transfers is knowledge of the output space.

Why do language models respond passively instead of asking clarifying questions?

CollabLLM demonstrates that standard RLHF training optimizes for immediate helpfulness, discouraging models from asking clarifying questions or offering multi-turn insights. Multi-turn-aware rewards that estimate long-term interaction value enable active intent discovery and genuine collaboration.

Why do AI assistants get worse at longer conversations?

LLMs perform at 90% accuracy with single-message instructions but drop to 65% across natural conversation. Models lock into early guesses when information arrives gradually and cannot course-correct, a behavior induced by RLHF training that rewards helpfulness over clarification.

Why don't language models develop conversation maintenance skills?

Humans keep conversations smooth through implicit techniques like reference repair and topic hand-off that sustain relational interaction, not convey information. Language models don't develop these because training signals reward information prediction, not relational work.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Do instruction-tuned models prefer conversational over formal source language?

Sources 6 notes

Next inquiring lines