How do pretraining biases interact differently with prompts across model tiers?
This explores why the same prompt lands differently on a cheap model than on a frontier one — and whether that gap traces back to what each model absorbed during pretraining rather than to the prompt itself.
This explores why the same prompt lands differently on a cheap model than on a frontier one, and the corpus points to a clean answer: prompting never adds knowledge, it only reorganizes what pretraining already laid down — so the size and shape of that pretrained foundation decides which prompts help and which backfire. The clearest single data point comes from a 23-prompt benchmark across 12 LLMs, where rephrasing and background-knowledge prompts boosted weak models but step-by-step reasoning actually *reduced* accuracy in strong ones Do prompt techniques work the same across all LLM tiers?. That asymmetry is the phenomenon in miniature: a prompt that scaffolds a model lacking internal structure becomes noise for a model that already has it.
The reason this tracks model tier rather than prompt cleverness is that the biases being nudged were planted long before any prompt arrived. A causal study using random-seed variation and cross-tuning found that models sharing a pretrained backbone show the same cognitive bias patterns no matter what finetuning data you feed them — instruction tuning only sways biases, it doesn't install them Where do cognitive biases in language models come from?. So a prompt is always negotiating with priors set at pretraining, and those priors differ in strength across tiers. When a prior is strong enough, the prompt simply loses: language models routinely ignore the information you put in their context because parametric knowledge from training overrides it, and textual prompting alone can't break that — you need to intervene in the representations themselves Why do language models ignore information in their context?.
There's a hard ceiling underneath all of this. Prompt optimization works entirely inside the model's existing training distribution; it can retrieve and recombine, but it cannot supply knowledge the model never saw Can prompt optimization teach models knowledge they lack?. That reframes the tier gap: a prompt isn't teaching a small model and confusing a large one — it's activating latent capability that's richly present in the large model and thin in the small one. A related line shows base models already carry latent reasoning that minimal training merely unlocks, which is why "reasoning" prompts feel like they create ability when they're really just selecting it Do base models already contain hidden reasoning ability?.
The cross-domain twist worth taking away: scale doesn't just change *how much* a model knows, it changes *which* pretrained format wins. RL post-training reliably amplifies one dominant pretraining format and suppresses the rest — and which format wins depends on model scale, not on which format performs best Does RL training collapse format diversity in pretrained models?. Pair that with evidence that pretraining scale specifically drives factual knowledge while finetuning scale drives behavioral helpfulness Do pretraining and fine-tuning scale independently in language models?, and the tier effect stops looking like a quirk of prompt engineering. It's structural: different tiers carry different dominant formats and different knowledge depth, so a single prompt is effectively addressing different machines. If you want a prompt that survives across tiers rather than one tuned to each, consistency training that teaches a model to respond identically to clean and wrapped prompts is the corpus's answer to making prompts behave the same regardless of what priors they hit Can models learn to ignore irrelevant prompt changes?.
Sources 8 notes
A 23-prompt benchmark across 12 LLMs shows rephrasing and background-knowledge prompts boost cheap models, while step-by-step reasoning reduces accuracy in high-performance models. Task structure, not generic best practices, determines which prompts help.
A causal experiment using random-seed variation and cross-tuning showed that models sharing a pretrained backbone exhibit similar bias patterns regardless of finetuning data. Biases are planted during pretraining and merely swayed by instruction tuning.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
Prompting works entirely within a model's pre-existing training distribution and cannot supply domain knowledge absent from training data. This creates a hard ceiling: no prompt strategy can compensate for missing foundational knowledge, only reorganize what already exists.
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.
Emulated Fine-Tuning reveals that scaling pretraining improves factual knowledge while scaling fine-tuning improves behavioral helpfulness. This decoupling has architectural roots: pretraining enriches lower-layer knowledge storage, while fine-tuning modifies upper-layer behavior expression.
Two methods—BCT (output-level) and ACT (activation-level)—train models to respond identically to clean and wrapped prompts by using the model's own clean responses as targets, eliminating specification and capability staleness inherent in standard SFT.