ProtoReasoning: Prototypes as the Foundation for Generalizable Reasoning in LLMs
We hypothesize that cross-domain generalization arises from shared abstract reasoning prototypes — fundamental reasoning patterns that capture the essence of problems across domains. These prototypes minimize the nuances of the representation, revealing that seemingly diverse tasks are grounded in shared reasoning structures. Based on this hypothesis, we propose ProtoReasoning, a framework that enhances the reasoning ability of LLMs by leveraging scalable and verifiable prototypical representations (Prolog for logical reasoning, PDDL for planning). ProtoReasoning features: (1) an automated prototype construction pipeline that transforms problems into corresponding prototype representations; (2) a comprehensive verification system providing reliable feedback through Prolog/PDDL interpreters; (3) the scalability to synthesize problems arbitrarily within prototype space while ensuring correctness. Extensive experiments show that ProtoReasoning achieves 4.7% improvement over baseline models on logical reasoning (Enigmata-Eval), 6.3% improvement on planning tasks, 4.0% improvement on general reasoning (MMLU) and 1.0% on mathematics (AIME24). Significantly, our ablation studies confirm that learning in prototype space also demonstrates enhanced generalization to structurally similar problems compared to training solely on natural language representations, validating our hypothesis that reasoning prototypes serve as the foundation for generalizable reasoning in large language models.
(1) Declarative nature — Both focus on problem specification instead of procedural implementation, maintaining the reasoning structure found in natural language, as shown in Figure 2; (2) Expressiveness — Prolog captures relational reasoning and constraint satisfaction through first-order predicate logic, while PDDL formalizes state transition systems for sequential planning; (3) Verifiability — both possess mature interpreters (SWI-Prolog [27] and VAL [14]) enabling rigorous verification of reasoning chains.
2.3 Planning Prototype Representations
PDDL (Planning Domain Definition Language) [3] is the standard representation for automated planning problems, modeling state transition systems through three essential components: state representations, actions with preconditions and effects, and state transitions. This representation naturally aligns with human planning cognition, particularly in reasoning about action requirements and consequences.
In this stage, RL algorithms, e.g., PPO [19], GRPO [22], DAPO [34], are adopted to guide the LLM exploring reasoning paths and to stimulate the long CoT reasoning ability, relying on verifiable rewards, such as accuracy based on ground-truth answers. These approaches collectively demonstrate the effectiveness of reinforcement learning with verifiable reward (RLVR) [15] in developing sophisticated reasoning abilities, including strategies such as recognizing correcting mistakes, breaking down difficult steps and iterating on alternative approaches, thus showcasing the powerful generalization capacity of long Chain-of-Thought reasoning [6]. Compared to previous work, our work introduces the concept of reasoning prototypes, aimed at understanding the underlying generalization mechanisms emerging from long chain-of-thought, and provides a more fundamental framework for cross-domain reasoning transfer.
Symbolic Reasoning in LLMs. Large language models conduct reasoning not only in natural language space [25] but also in a neuro-symbolic manner [9, 17, 33]. The intermediate reasoning steps can manifest themselves as code [9], domain-specific languages [33, 36], or mixtures of different symbolic representations [11, 35]. For Prolog programming language, some previous work [4, 7, 32] leveraged it as an intermediate representation to improve the reasoning ability of LLMs. Rather than emphasizing the specific manifestation of the Chain-of- Thought process, we investigate how prototypes
5 Conclusion and Future Work
This paper introduces ProtoReasoning, a framework that validates the hypothesis that abstract reasoning prototypes serve as the foundation for cross-domain generalization in large language models. By training on prototype representations (Prolog for logical reasoning, PDDL for planning), we achieve significant improvements on both logical reasoning and planning tasks, with ablation studies confirming effective transfer to structurally similar problems. Additionally, we believe this framework is generalizable for other LLM capabilities. However, our theoretical understanding remains insufficient — the precise definition of "reasoning prototypes" lacks formal rigor, and the underlying mechanisms driving cross-domain transfer require deeper investigation.