From Context to Skills: Can Language Models Learn from Context Skillfully?
Many real-world tasks require language models (LMs) to reason over complex contexts that exceed their parametric knowledge. This calls for context learning, where LMs directly learn relevant knowledge from the given context. An intuitive solution is inference-time skill augmentation: extracting the corresponding rules and procedures from context into explicit, natural-language skills. However, constructing such skills for context learning scenarios faces two fundamental challenges: the prohibitive cost of manual skill annotation for long, technically dense contexts, and the lack of external feedback for automated skill construction. In this paper, we propose Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills without human supervision or external feedback. At its core, a multi-agent self-play loop has a Challenger that generates probing tasks and rubrics, a Reasoner that attempts to solve them guided by an evolving skill set, and a neutral Judge that provides binary feedback. Crucially, both the Challenger and the Reasoner evolve through accumulated natural-language skills: dedicated Proposer and Generator agents analyze failure cases and synthesize them into targeted skill updates for both sides, enabling automated skill discovery and refinement. To prevent adversarial collapse caused by increasingly extreme task generation and over-specialized skill accumulation, we further introduce a Cross-time Replay mechanism that identifies the skill set achieving the best balance across representative cases for the Reasoner side, ensuring robust and generalizable skill evolution. The resulting skills can be plugged into any language model to obtain better context learning capability. Evaluated on four context learning tasks from CL-bench, Ctx2Skill consistently improves solving rates across backbone models, e.g., lifting GPT-4.1 from 11.1% to 16.5%, GPT-5.1 from 21.2% to 25.8%.
Current language models (LMs) have achieved impressive performance on problems whose relevant knowledge was present during large-scale pre-training [5, 27, 39], such as competition-level mathematical problems [14] and competitive programming challenges [17]. However, many real-world tasks require LMs to learn from complex contexts and leverage new knowledge rather than parametric knowledge to reason and solve them effectively. Previous works refer to this capability as context learning [12]. Effective context learning enables models to reason beyond their pre-trained knowledge and solve complex, domain-specific tasks by learning directly from rich contextual information, much as humans do. For example, it allows LMs to rapidly make use of previously unseen product documentation to generate step-by-step operational procedures or troubleshoot issues.
In this paper, we propose Ctx2Skill shown in Figure 1, which is designed to autonomously discover, refine, and select skills directly from complex contexts via skill-optimized self-play, requiring neither human annotation nor external feedback. At the core of Ctx2Skill is a multi-agent self-play loop [10, 22, 48, 21] comprising two competing but co-evolving roles: a Challenger agent, which generates a batch of tasks and associated rubrics [15] based on context and its skill set, aiming to probe deep contextual understanding, and a Reasoner agent, which reads the context and attempts to solve these tasks guided by its current skill set. The core idea of Ctx2Skill is that the two agents co-evolve by iteratively updating their respective skill sets rather than model parameters. Specifically, a neutral Judge agent evaluates the Reasoner’s responses against each rubric, producing feedback on whether it passes. Crucially, both the Challenger and Reasoner evolve through accumulated natural-language skills: failed cases are routed to dedicated Proposer and Generator agents on the Reasoner side to diagnose missing contextual knowledge and update skills accordingly, while easily solved cases are routed to the Challenger side to strengthen its task and rubric generation strategies, ensuring sustained adversarial pressure.
We presented Ctx2Skill, a self-evolving framework that autonomously discovers, refines, and selects context-specific skills from complex contexts without human annotation or external feedback. Through a skill-optimized self-play loop, a Challenger and a Reasoner co-evolve their skill sets via failure-driven textual edits, while a Cross-Time Replay mechanism prevents adversarial collapse by selecting the most generalizable skill set across iterations. Experiments on CL-bench demonstrate that Ctx2Skill consistently and substantially improves context learning performance across multiple backbone models and task categories, and that the resulting skills are transferable across models. We hope Ctx2Skill provides a practical and scalable paradigm for equipping language models with the ability to learn skillfully from complex, previously unseen contexts.