Chain-of-Reasoning: Towards Unified Mathematical Reasoning in Large Language Models via a Multi-Paradigm Perspective
Large Language Models (LLMs) have made notable progress in mathematical reasoning, yet they often rely on single-paradigm reasoning that limits their effectiveness across diverse tasks. In this paper, we introduce Chain-of-Reasoning (CoR), a novel unified framework that integrates multiple reasoning paradigms—Natural Language Reasoning (NLR), Algorithmic Reasoning (AR), and Symbolic Reasoning (SR)—to enable synergistic collaboration. CoR generates multiple potential answers using different reasoning paradigms and synthesizes them into a coherent final solution. We propose a Progressive Paradigm Training (PPT) strategy that allows models to progressively master these paradigms, culminating in the development of CoR-Math-7B.
Existing works [Xin et al., 2024, Yang et al., 2024, Wu et al., 2024] are often trained on specific tasks, aiming to enhance their ability to independently derive answer based on specific structured knowledge representation. This representation is known as the reasoning paradigm, involving Natural Language Reasoning (NLR), Algorithmic Reasoning (AR), and Symbolic Reasoning (SR), as illustrated in Figure 2 (a). Specifically, NLR leverages natural language text for reasoning based on human common sense and semantic context, with explicit step-by-step natural language explanations [Wei et al., 2022]. AR leverages code to focus on the computer’s operations and execution process, performing inference on the final target, such as the generation of Python code for compilation [Chen et al., 2023, Gao et al., 2023] to obtain the final answer. SR utilizes logical symbols and axiomatic systems as the fundamental tools for rigorously formalized reasoning, with current methods [Xin et al., 2024, Huang et al., 2024,Wu et al., 2024] considering numerous symbolic trajectories through a tree-based search process for theorem proving. However, these methods focus on improving single-paradigm reasoning and overlook the potential of collaboration among multiple paradigms, which restricts their single-task performance and hinders their cross-task generalization.
Researchers have explored various strategies to tackle these challenges. To improve the single-task performance, some works have explored integrating tools to overcome the limitations of single-paradigm reasoning [Gou et al., 2024, LI et al., 2024], as shown in Figure 2 (b). These methods combine natural language with code-based algorithms (tools) to enable interleaved reasoning and generate a final answer. Although these methods acknowledge the potential benefits of integrating different reasoning paradigms, they often rely on a single paradigm to complete the reasoning process. They overlook the possibility that the second paradigm could independently complete the reasoning, thereby constraining its overall potential. Besides, in an effort to overcome the challenge of achieving cross-task generalization, several studies [Shao et al., 2024, Huang et al., 2024] incorporate data samples from diverse task types into large-scale training datasets, such as those drawn from theorem proving tasks that focus exclusively on SR solutions, or from arithmetic problems that emphasize NLR solutions. Although models trained on such data are capable of cross-task reasoning, they still rely on demonstrations for effective transfer.
To address these limitations, we propose a novel unified reasoning framework, Chain-of-Reasoning (CoR), which chains NLR, AR, and SR together to generate synergistic benefits. As illustrated in Figure 2 (c), CoR enables multi-paradigm reasoning for a given problem, which applies different reasoning paradigms to derive multiple potential answers and are then summarized into a final answer. The framework allows the model to continue reasoning using additional paradigms based on previously generated ones, facilitating collaboration among paradigms to enhance individual task performance. Moreover, CoR implements unified multi-paradigm reasoning across different tasks to obtain the required answers, thereby achieving zero-shot reasoning across tasks. In detail, by adjusting the prompts, the depth of reasoning can be varied, which improves the model’s adaptability to diverse tasks.