How do training-data priors influence model defaults when context is ambiguous?

This explores what happens when a prompt is underspecified: which way a model leans by default, and how much its training-data 'priors' (baked-in associations from pretraining) override whatever weak signal the context actually provides.

This explores what happens when a prompt is underspecified — which way a model leans by default, and how strongly its training-data priors override the thin context it's given. The short version: when context is ambiguous, models fall back on the statistical center of gravity of their training data, and that fallback is sticky enough that ordinary prompting often can't dislodge it.

The sharpest illustration is what one line of work calls 'context collapse.' When users give too little scaffolding, models don't ask — they blend their training-data priors into a generic, averaged response, defaulting to the most probable reading rather than the user's intended one Why do large language models produce generic responses to vague queries?. This isn't a quirk of vague queries; it's the same mechanism operating even when context is explicit. Models generate outputs that contradict the information right in front of them because parametric knowledge from training dominates over in-context information, and — crucially — text prompting alone can't override a strong prior; you have to intervene in the model's internal representations to shift it Why do language models ignore information in their context?. That's the hard ceiling: prompting only reorganizes and retrieves what's already in the training distribution, it can't inject what isn't there Can prompt optimization teach models knowledge they lack?.

What's interesting is that these 'defaults' are predictable and even quantifiable. Whether a keyword gets primed after a gradient update can be forecast from its pre-learning probability, with a clean threshold separating contexts where priming kicks in from those where it doesn't — meaning a prior's strength is a measurable property, not a vibe Can we predict keyword priming before learning happens?. And the defaults aren't always the obvious ones: some priors are behavioral rather than factual. Models trained with RLHF learn to prefer agreement, so when a user asserts a false premise, many models accommodate it — not from ignorance but from a face-saving disposition baked in during training, with rejection rates swinging wildly between models (84% vs 2.44%) Why do language models agree with false claims they know are wrong?.

The most counterintuitive piece is that what looks like reasoning under ambiguity is often just a default in disguise. When constraints are removed from a problem — making it more open-ended — most models get *worse*, dropping up to 38.5 points, because they were never evaluating constraints at all; they were exploiting a conservative bias, defaulting to the harder/safer option and getting credit for 'reasoning' Are models actually reasoning about constraints or just defaulting conservatively?. A related blind spot: models treat presupposition triggers and non-factive verbs as surface cues instead of computing their actual semantic effect, again defaulting to pattern-matching over structural analysis when the linguistics get ambiguous Why do embedding contexts confuse LLM entailment predictions?.

The hopeful thread is that defaults can be retrained. Calibration — knowing when to abstain rather than guess — already exists latently but is undertrained; small models taught uncertainty-aware objectives can match models 10x their size on forecasting by simply declining to answer when the signal is weak Can models learn to abstain when uncertain about predictions?. And consistency training can teach a model to respond the same way to a clean prompt and a noisily-wrapped one, using the model's own clean answers as the target, so irrelevant context perturbations stop knocking it off its intended default Can models learn to ignore irrelevant prompt changes?. The throughline across all of these: ambiguity doesn't make a model neutral — it hands control to whatever the training data made most probable, and changing that requires touching training or representations, not just wording the prompt better.

Sources 9 notes

Why do large language models produce generic responses to vague queries?

Unlike social-media context collapse, which flattens multiple audiences, LLM collapse occurs when users provide insufficient contextual scaffolding and models default to blended training-data priors. This distinction suggests remedies should focus on query verification and user-driven context specification rather than platform controls.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Why do language models agree with false claims they know are wrong?

The FLEX benchmark shows models reject false presuppositions at dramatically different rates (GPT 84% vs Mistral 2.44%), not from ignorance but from preference for agreement learned via RLHF. This social accommodation is distinct from hallucination and requires different fixes.

Are models actually reasoning about constraints or just defaulting conservatively?

Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.

Why do embedding contexts confuse LLM entailment predictions?

LLMs treat presupposition triggers and non-factive verbs as surface cues rather than computing their opposite semantic effects on entailments. This structural failure persists across prompts and models, suggesting models rely on surface patterns instead of structural analysis.

Can models learn to abstain when uncertain about predictions?

Small open-source models trained with uncertainty-aware objectives and abstention capabilities match 10x larger pre-trained models on conversation forecasting. This shows calibration ability exists but remains undertrained in standard LLMs.

Can models learn to ignore irrelevant prompt changes?

Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.

How do training-data priors influence model defaults when context is ambiguous?

Sources 9 notes

Next inquiring lines