How does in-context learning trigger phase transitions in model behavior?

This explores whether feeding examples into a model's context (rather than updating its weights) can flip it between qualitatively different modes of behavior — and the corpus turns out to hold two different things people call 'phase transitions,' which is worth untangling.

This explores whether in-context learning — adapting from examples in the prompt, with no weight changes — can flip a model between qualitatively different modes of behavior. The corpus actually contains two distinct phenomena that both get called 'phase transitions,' and the most useful thing here is seeing they're not the same. One is a sharp, threshold-crossing change during *training*; the other is a regime shift that happens *in context* at inference time. Reading them side by side is where the interesting part lives.

The cleanest training-time phase transition is grokking. Models memorize until they hit a measurable ceiling — roughly 3.6 bits of information per parameter — and only once that capacity fills do they abruptly switch from memorizing to genuinely generalizing When do language models stop memorizing and start generalizing?. RL training shows a related staged structure: an early phase where getting execution correct drives all the gains, followed by a later phase where strategic planning becomes the bottleneck, visible as planning-token entropy rising while execution entropy settles Does RL training follow a predictable two-phase learning sequence?. These are sequenced shifts the model walks through as gradients accumulate — not something the prompt triggers.

The genuinely in-context version is subtler and more interesting for your question. In-context learning of sequential decision-making doesn't switch on from isolated examples — it requires *trajectories* from the same environment, full or partial runs rather than scattered demonstrations. That structural property, 'trajectory burstiness,' is the thing that lets a frozen model suddenly generalize across wildly different tasks with no weight update Why do trajectories matter more than individual examples for in-context learning?. So the 'trigger' isn't more examples — it's the *shape* of what's in context crossing a structural threshold. A similar threshold logic shows up in priming: whether learning takes hold is predictable from a keyword's pre-learning probability, with a sharp cutoff around 10^-3 separating contexts where priming fires from those where it doesn't Can we predict keyword priming before learning happens?. Thresholds, not gradual ramps.

There's a deeper reframe lurking under all of this. Several lines of work argue post-training doesn't *create* new behavior so much as *select* it — base models already carry latent reasoning that minimal training, decoding tweaks, or feature steering simply elicit Do base models already contain hidden reasoning ability?. If capability is already present and dormant, then a 'phase transition' is really the moment a context configuration crosses the threshold needed to unlock what was always there. Skill extraction makes this literal: pulling natural-language rules out of context into reusable skills lifts a frozen model's reasoning with zero weight updates Can frozen models learn better by extracting context into skills?. And post-training itself produces a measurable behavioral flip — models start treating their own outputs as actions that shape future inputs, closing a perception-action loop absent during pretraining Do models recognize their own outputs as actions shaping future inputs?.

The limiting case is worth knowing too, because it sets the boundary on all of this: context can't flip behavior when training priors are too strong. Models routinely ignore in-context information when parametric associations dominate, and prompting alone won't override them — you need causal intervention in the representations Why do language models ignore information in their context?. So the honest synthesis is this: in-context learning triggers a behavioral shift only when the context crosses a structural or probabilistic threshold (trajectory completeness, priming probability) *and* the relevant capability is latent rather than buried under a stronger prior. Outside that window, more context changes nothing at all.

Sources 8 notes

When do language models stop memorizing and start generalizing?

GPT-family models have a measurable memorization capacity of approximately 3.6 bits-per-parameter. When this capacity fills, a phase transition triggers grokking—the shift from memorization to genuine generalization. This capacity is a property of individual models, not training algorithms.

Does RL training follow a predictable two-phase learning sequence?

Across eight models, RL training consistently shows a first phase where execution correctness drives learning, followed by a second phase where strategic planning becomes the bottleneck. Planning token entropy increases while execution entropy stabilizes, and concentration of optimization on planning tokens yields significant performance gains.

Why do trajectories matter more than individual examples for in-context learning?

In-context learning for sequential decision-making requires full or partial trajectories from the same environment level, not just isolated examples. This structural property—trajectory burstiness—allows models to generalize across vastly different tasks without weight updates.

Can we predict keyword priming before learning happens?

Pre-learning keyword probability strongly predicts post-learning priming across architectures and model sizes, with a ~10^-3 threshold separating contexts where priming occurs from those where it doesn't. Just 3 training exposures suffice to establish the effect.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can frozen models learn better by extracting context into skills?

Extracting natural-language rules from context into reusable skills improves frozen model reasoning without weight updates. On CL-bench, this lifts GPT-4.1 from 11.1% to 16.5%, with skills transferable across model backbones.

Do models recognize their own outputs as actions shaping future inputs?

Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.

Why do language models ignore information in their context?

Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.

How does in-context learning trigger phase transitions in model behavior?

Sources 8 notes

Next inquiring lines