How do pretraining biases interact differently with prompts across model tiers?

This explores why the same prompt lands differently on a cheap model than on a frontier one — and whether that gap traces back to what each model absorbed during pretraining rather than to the prompt itself.

This explores why the same prompt lands differently on a cheap model than on a frontier one, and the corpus points to a clean answer: prompting never adds knowledge, it only reorganizes what pretraining already laid down — so the size and shape of that pretrained foundation decides which prompts help and which backfire. The clearest single data point comes from a 23-prompt benchmark across 12 LLMs, where rephrasing and background-knowledge prompts boosted weak models but step-by-step reasoning actually *reduced* accuracy in strong ones Do prompt techniques work the same across all LLM tiers?. That asymmetry is the phenomenon in miniature: a prompt that scaffolds a model lacking internal structure becomes noise for a model that already has it.

The reason this tracks model tier rather than prompt cleverness is that the biases being nudged were planted long before any prompt arrived. A causal study using random-seed variation and cross-tuning found that models sharing a pretrained backbone show the same cognitive bias patterns no matter what finetuning data you feed them — instruction tuning only sways biases, it doesn't install them Where do cognitive biases in language models come from?. So a prompt is always negotiating with priors set at pretraining, and those priors differ in strength across tiers. When a prior is strong enough, the prompt simply loses: language models routinely ignore the information you put in their context because parametric knowledge from training overrides it, and textual prompting alone can't break that — you need to intervene in the representations themselves Why do language models ignore information in their context?.

There's a hard ceiling underneath all of this. Prompt optimization works entirely inside the model's existing training distribution; it can retrieve and recombine, but it cannot supply knowledge the model never saw Can prompt optimization teach models knowledge they lack?. That reframes the tier gap: a prompt isn't teaching a small model and confusing a large one — it's activating latent capability that's richly present in the large model and thin in the small one. A related line shows base models already carry latent reasoning that minimal training merely unlocks, which is why "reasoning" prompts feel like they create ability when they're really just selecting it Do base models already contain hidden reasoning ability?.

The cross-domain twist worth taking away: scale doesn't just change *how much* a model knows, it changes *which* pretrained format wins. RL post-training reliably amplifies one dominant pretraining format and suppresses the rest — and which format wins depends on model scale, not on which format performs best Does RL training collapse format diversity in pretrained models?. Pair that with evidence that pretraining scale specifically drives factual knowledge while finetuning scale drives behavioral helpfulness Do pretraining and fine-tuning scale independently in language models?, and the tier effect stops looking like a quirk of prompt engineering. It's structural: different tiers carry different dominant formats and different knowledge depth, so a single prompt is effectively addressing different machines. If you want a prompt that survives across tiers rather than one tuned to each, consistency training that teaches a model to respond identically to clean and wrapped prompts is the corpus's answer to making prompts behave the same regardless of what priors they hit Can models learn to ignore irrelevant prompt changes?.

Sources 8 notes

Do prompt techniques work the same across all LLM tiers?

A 23-prompt benchmark across 12 LLMs shows rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning reduces accuracy in high-performance models. Task structure, not generic best practices, determines which prompts help.

Where do cognitive biases in language models come from?

A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can prompt optimization teach models knowledge they lack?

Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Do pretraining and fine-tuning scale independently in language models?

Emulated Fine-Tuning reveals that scaling pretraining improves factual knowledge while scaling fine-tuning improves behavioral helpfulness. This decoupling has architectural roots: pretraining enriches lower-layer knowledge storage, while fine-tuning modifies upper-layer behavior expression.

Can models learn to ignore irrelevant prompt changes?

Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst tracking prompt-model-tier interactions. The question: *Why does the same prompt produce different outputs across model tiers, and can that gap be closed?*

What a curated library found — and when (dated claims, not current truth):
Findings span 2023–10/2025; treat as perishable baselines:
• Rephrasing and background-knowledge prompts boost weak models but step-by-step reasoning *reduces* accuracy in strong ones (~2024, tier-dependent prompt efficacy).
• Cognitive biases in LLMs are planted at pretraining and only swayed by finetuning, not installed; prompts negotiate with pretraining priors that differ in strength across tiers (~2025).
• Parametric knowledge from pretraining overrides context; textual prompting alone cannot break this — representation-level intervention is needed (~2024).
• Prompt optimization cannot inject new knowledge, only activate latent capability; a prompt appears to teach small models and confuse large ones because it selects richly present vs. thin capability (~2023–2024).
• RL post-training converges on a single dominant pretraining format whose identity depends on model scale, not format performance (~2025).
• Consistency training teaches models prompt-perturbation invariance, making prompt behavior uniform across tiers (~2025).

Anchor papers (verify; mind their dates):
• arXiv:2507.07186 (2025-07): Planted in Pretraining, Swayed by Finetuning — causal isolation of pretraining bias sources.
• arXiv:2504.07912 (2025-04): Echo Chamber — RL amplification of pretraining formats as scale-dependent.
• arXiv:2510.27062 (2025-10): Consistency Training Helps Stop Sycophancy and Jailbreaks — tier-agnostic prompt robustness.
• arXiv:2506.12115 (2025-06): Eliciting Reasoning in Language Models with Cognitive Tools — whether scaffolding survives scale.

Your task:
(1) RE-TEST EACH CONSTRAINT. For every finding above, judge whether newer models (o1, o3, Claude 3.7 or later), improved instruction-tuning methods (DPO, IPO, hybrid reward), SDK tooling (structured outputs, token-level caching), or orchestration (long-context memory, multi-agent chains) have since relaxed or overturned it. Which gaps still hold? Which have collapsed? Cite what closed them.
(2) Surface the strongest *contradicting* or *superseding* work from the last ~4–6 months. Specifically: does any recent paper show that prompting *can* inject knowledge across tiers, or that the tier effect is weaker than the library suggests?
(3) Propose 2 research questions that assume the regime has moved: e.g., if consistency training now makes prompts tier-invariant, what new failure modes emerge? If RL post-training format dominance is now understood, can we engineer pretraining to front the right format for a use case?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

How do pretraining biases interact differently with prompts across model tiers?

Sources 8 notes

Next inquiring lines