Can frozen models learn better by extracting context into skills?
When a model encounters unfamiliar material in its context, can we help it reason more effectively by explicitly extracting rules and procedures from that material rather than changing the model itself?
Many real tasks demand reasoning over contexts that exceed a model's parametric knowledge — unseen product documentation, technically dense domain material. The intuitive fix Ctx2Skill formalizes is inference-time skill augmentation: rather than tuning weights, extract the relevant rules and procedures from the context into explicit, natural-language skills, then plug those skills into any frozen language model to improve its context-learning ability. On CL-bench this lifts GPT-4.1 from 11.1% to 16.5% and GPT-5.1 from 21.2% to 25.8%, and the skills transfer across backbones.
This matters because it reframes "learning from context" as a representation problem, not a capacity problem. The knowledge is already present in the prompt — the model simply fails to operationalize it under raw conditions. Distilling that latent context into compact, explicit procedural skills makes the same model reason beyond its pretrained knowledge, much as a person turns a manual into a checklist. The model's parameters never change; what changes is the externalized, reusable procedure it consults.
The open question is where the boundary of this approach lies. Extracted skills help most where the context contains a learnable rule; for tasks needing genuinely novel synthesis rather than rule application, externalized procedures may add little, and the gains here (a few absolute points) are real but modest. Therefore the durable claim is narrow and useful: frozen models have more usable context-knowledge than their raw outputs reveal, and explicit skill extraction is a training-free, transferable way to unlock it — complementing weight-based adaptation rather than replacing it.
— "From Context to Skills: Can Language Models Learn from Context Skillfully?", https://arxiv.org/abs/2604.27660
Related concepts in this collection
-
Does staying close to the base model preserve learning ability?
Explores whether limiting how far training pushes a model from its base distribution (measured by KL divergence) helps it learn new tasks more effectively over time, and why that trade-off matters for continual learning.
the complementary axis this note names: skill extraction is the zero-drift, training-free channel; FST shows why keeping weights near base also helps, both routing adaptation away from parameter updates
-
Can skill documents be optimized like neural network weights?
Can natural-language skill documents be treated as trainable parameters and improved through iterative optimization with validation gating, similar to how model weights are tuned in deep learning?
extends inference-time skills from one-shot context extraction to an optimized, validated skill artifact; same frozen-model substrate, learned rather than extracted
-
Does creating skills inside the agent loop eliminate mismatches?
Can coupling skill creation directly to the runtime reasoning loop—rather than authoring skills offline—close the gap between when skills are made and when they're used? This matters for whether agents can ground new capabilities in their actual situated context.
addresses the boundary this note flags: extracted-out-of-context skills risk the creation-usage gap, so situating skill creation in the runtime loop grounds the rule where it is applied
-
Can agents learn new skills without forgetting old ones?
Explores whether externalized skill libraries—storing learned behaviors as retrievable code rather than parameter updates—can solve the catastrophic forgetting problem that plagues continual learning systems.
shares the externalize-knowledge-as-retrievable-skills principle, scaling single-context extraction into an accumulating, weight-free skill store
Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph
Original note title
inference-time skill augmentation lets frozen models reason beyond parametric knowledge by extracting rules from context