What makes a model fail to activate relevant skills from its own harness?

This explores why a model that already holds a relevant capability — a reasoning step, a stored fact, a usable skill — fails to fire it at the moment it's needed, rather than why it lacks the capability at all.

This explores the gap between what a model *can* do and what it actually *does* in the moment — the failure isn't missing capability, it's a capability sitting unused inside the model's own repertoire. The corpus is surprisingly consistent that this is a real and distinct failure mode. The clearest statement is that many reasoning failures are inference bottlenecks, not knowledge gaps: models possess the relevant facts but won't activate them without explicit prompting, and just adding subtle emphasis or forcing the model to enumerate preconditions recovers double-digit accuracy Why do language models fail to use knowledge they possess?. The skill is in there; the trigger to reach for it isn't.

The sharpest version of this is the knowing-doing gap. Models generate the correct rationale 87% of the time but follow their own reasoning only 64% of the time — they literally narrate the right move and then act greedily against it Why do language models fail to act on their own reasoning?. So one answer to 'what makes activation fail' is that knowing and doing run on separate tracks: producing a plan doesn't guarantee executing it, and frequency bias and greediness pull the model toward familiar-but-wrong actions. This connects to a deeper architectural point — only after post-training do models start treating their own outputs as actions that shape what comes next, closing the perception-action loop that lets a skill actually get *deployed* rather than just described Do models recognize their own outputs as actions shaping future inputs?.

A second cluster says the context itself can suppress activation. When a model's own prior errors fill its context window, performance degrades non-linearly — the model conditions on its mistakes and keeps reaching for the wrong thing, and scaling doesn't fix it Do models fail worse when their own errors fill the context?. The skill is intact, but a polluted context biases which skill gets invoked. There's a related structural ceiling in interactive settings: models are dramatically worse at *active* reasoning (asking the right question, probing) than at passive reasoning, and SFT, DPO, and Tree-of-Thought barely move it — suggesting some activation failures are baked into how the model engages, not fixable by prompting Why do models fail at asking good questions during interaction?.

The most useful reframe in the corpus is that 'skills' often need to be made explicit and situated before a frozen model will use them. Extracting natural-language rules from context into reusable skills lifts frozen-model reasoning with no weight updates Can frozen models learn better by extracting context into skills?, and a recurring finding is that skills authored *offline* fail because they're divorced from the exact runtime situation — authoring a skill inside the agent's own loop, grounded in immediate feedback and runtime validation, is what closes the gap between having a skill and invoking it correctly Does creating skills inside the agent loop eliminate mismatches?. In other words, activation fails when the skill isn't anchored to the moment it's supposed to fire.

Worth knowing: the line between 'failed to activate an existing skill' and 'never had the skill' is itself blurry and task-dependent. Reinforcement learning mostly just *activates latent* abilities already present in the base model for standard reasoning — but for deep multi-step planning it generates genuinely new strategies the base model can't reach even with heavy sampling Does reinforcement learning create new reasoning abilities or activate existing ones?. And training can actively *break* activation: overly hard RLVR samples teach degenerate shortcuts that contaminate skills the model already had Do overly hard RLVR samples actually harm model capabilities?. So the same harness that holds a skill can be the thing that learns to route around it.

Sources 9 notes

Why do language models fail to use knowledge they possess?

Models possess relevant knowledge but fail to activate it without explicit prompting. Adding subtle emphasis recovers 15.3 percentage points accuracy, and forcing enumeration of preconditions recovers 6-9 points, showing the bottleneck is in constraint inference, not storage.

Why do language models fail to act on their own reasoning?

LLMs generate correct reasoning 87% of the time but follow it only 64% of the time. Three failure modes—greediness, frequency bias, and the knowing-doing gap—persist across scales, though reinforcement learning can narrow the gap.

Do models recognize their own outputs as actions shaping future inputs?

Post-trained language models exhibit a measurable shift where they recognize their outputs become their own future inputs, closing an action-perception loop absent in pretraining. Evidence includes 3-4x lower output entropy on-policy and behavioral signatures of trajectory recognition.

Do models fail worse when their own errors fill the context?

Error accumulation in context causes non-linear performance degradation in long-horizon tasks. Model scaling does not fix this; only test-time compute through thinking models reduces the effect by preventing error-contaminated context from biasing reasoning.

Why do models fail at asking good questions during interaction?

GPT-4o achieves only 35% on interactive number guessing, with information gains collapsing from 7.7% to 2.5% as rounds progress. SFT, DPO, and Tree-of-Thought interventions provide minimal improvement, suggesting the deficit is structural rather than a prompting or fine-tuning problem.

Can frozen models learn better by extracting context into skills?

Extracting natural-language rules from context into reusable skills improves frozen model reasoning without weight updates. On CL-bench, this lifts GPT-4.1 from 11.1% to 16.5%, with skills transferable across model backbones.

Does creating skills inside the agent loop eliminate mismatches?

MUSE-Autoskill demonstrates that invoking skill creation from within the agent's reasoning loop grounds new skills in exact task context, immediate feedback, and runtime validation. In-loop skills reach 87.94% task accuracy and transfer to other agents with minimal loss, eliminating the situated context problem of offline authoring.

Does reinforcement learning create new reasoning abilities or activate existing ones?

For standard reasoning tasks, RL activates latent abilities already present in base models. For complex planning requiring multi-step coordination, RL generates genuinely novel strategies inaccessible to base models even with extensive sampling.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

What makes a model fail to activate relevant skills from its own harness?

Sources 9 notes

Next inquiring lines