Why do pretrained model priors reduce the usefulness of retrieved experience?

This explores why a model's baked-in knowledge from pretraining can crowd out or override the fresh information it pulls in at run time — retrieved documents, episodic memory, in-context examples — making that retrieved experience less useful than it should be.

This explores why a model's baked-in knowledge from pretraining can crowd out the fresh information it retrieves at run time. The cleanest answer in the corpus is a competition story: when the associations learned during pretraining are strong enough, the model generates outputs from those parametric memories and quietly ignores what's sitting in its context window. Why do language models ignore information in their context? shows this directly — prompting alone can't override a strong prior, and only intervening in the model's internal representations forces it to attend to the retrieved evidence. So retrieved experience isn't competing on a level field; it's competing against a default that already has its mind made up.

Why are the priors so sticky? Two notes give the mechanism. Can we predict keyword priming before learning happens? finds that whether new information 'takes' is largely predictable from how probable that content already was before learning — there's a threshold below which new material barely registers. And Does RL training collapse format diversity in pretrained models? shows that even active post-training tends to amplify whichever format pretraining already favored rather than introduce something genuinely new. The picture that emerges across both: pretraining doesn't just seed knowledge, it sets the gravity well that everything afterward — including retrieval — has to climb out of.

The interesting twist is that this isn't always a bug. Several notes argue the priors are doing real work and the 'retrieved experience' is the weaker signal. Does procedural knowledge drive reasoning more than factual retrieval? shows reasoning leans on broad, transferable procedures absorbed during pretraining, not on retrieving specific facts. Do base models already contain hidden reasoning ability? goes further — post-training mostly *selects* capability that's already latent rather than creating it. If most of the competence lives in the priors, then a retrieved example that conflicts with them often *should* lose. The cost only shows up when the retrieved experience is the thing you actually needed.

That cost is sharpest in the agent-memory work, which is where 'retrieved experience' is the whole design. Can agents learn continuously from experience without updating weights? and Can agents learn from failure without updating their weights? both deliberately route learning through external memory *instead of* weight updates — partly because storing experience outside the parameters keeps it from being overwritten or diluted by the frozen priors. Reflexion's insight that uncompressed, unambiguous feedback survives best is really a workaround for the same problem: priors will rationalize away anything fuzzy, so the retrieved signal has to be sharp enough to win the argument. And Can agents learn beyond what their training data shows? names the ceiling — what an agent can absorb is bounded by what its training distribution already imagined, so genuinely novel retrieved experience hits resistance precisely where it's most valuable.

The thing worth taking away: 'retrieval' and 'pretrained priors' aren't a clean pipeline where context refines parameters. They're rivals for control of the next token, and the prior usually has home-field advantage. That reframes a lot of RAG and memory-augmentation work — the engineering problem isn't just fetching the right experience, it's getting the model to *defer* to it, which is why the most robust approaches either intervene in representations directly or keep experience in an external store the priors can't quietly overwrite.

Sources 8 notes

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can agents learn continuously from experience without updating weights?

AgentFly formalizes agent learning as a Memory-augmented MDP with three memory modules (case, subtask, tool) that enable credit assignment and policy improvement entirely through memory operations. The approach achieved 87.88% on GAIA validation without modifying LLM parameters.

Can agents learn from failure without updating their weights?

Reflexion demonstrates that unambiguous environmental feedback (success/failure) enables agents to write useful self-diagnoses and improve across episodes without parameter updates. The binary signal prevents rationalization, and keeping reflections uncompressed preserves their usability.

Can agents learn beyond what their training data shows?

Agents trained on static expert datasets cannot learn from their own failures or generalize beyond demonstrated scenarios because they never interact with environments during training. Competence is capped by what curators imagined, not by agent capacity.

Why do pretrained model priors reduce the usefulness of retrieved experience?

Sources 8 notes

Next inquiring lines