Decoupling Knowledge and Reasoning in LLMs: An Exploration Using Cognitive Dual-System Theory

Paper · arXiv 2507.18178 · Published July 24, 2025
Reasoning by ReflectionDomain SpecializationRAGCognitive Models LatentReasoning Methods CoT ToT

While large language models (LLMs) leverage both knowledge and reasoning during inference, the capacity to distinguish between them plays a pivotal role in model analysis, interpretability, and development. Inspired by dual-system cognitive theory, we propose a cognition attribution framework to decouple the contribution of knowledge and reasoning. In particular, the cognition of LLMs is decomposed into two distinct yet complementary phases: knowledge retrieval (Phase 1) and reasoning adjustment (Phase 2). To separate these phases, LLMs are prompted to generate answers under two different cognitive modes, fast thinking and slow thinking, respectively. The performance under different cognitive modes is analyzed to quantify the contribution of knowledge and reasoning. This architecture is employed to 15 LLMs across 3 datasets. Results reveal: (1) reasoning adjustment is domain-specific, benefiting reasoning-intensive domains (e.g., mathematics, physics, and chemistry) and potentially imparing knowledge-intensive domains. (2) Parameter scaling improves both knowledge and reasoning, with knowledge improvements being more pronounced. Additionally, parameter scaling make LLMs reasoning significantly more prudent, while moderately more intelligent. (3) Knowledge primarily resides in lower network layers, while reasoning operates in higher layers. Our framework not only helps understand LLMs from a "decoupling" perspective, but also provides new insights into existing research, including scaling laws, hierarchical knowledge editing, and limitations of small-model reasoning.

To address this challenge, Chain-of-Thought (CoT) is proposed, enabling LLMs to mimic human-like progressive reasoning by generating intermediate reasoning steps [14]. However, early approaches to CoT generation typically relied on domain-specific prompt engineering, lacking the capability to automatically produce universally applicable reasoning chains across diverse domains.

The emergence of reasoning LLMs, such as OpenAI o1, enables automatic generation of universal CoT through distillation and reinforcement learning [15]. Although the specifics of o1 remain undisclosed, extensive replication efforts have successfully produced LLMs with powerful reasoning capability [16–18]. The breakthrough demonstrates that LLMs possess not only extensive knowledge but also advanced reasoning abilities.

In this context, it is scientifically important to distinguish between the contribution of knowledge and reasoning, as this is crucial for understanding the inference behaviours of LLMs. However, the joint employment of knowledge and reasoning during inference make it hard to discern their contribution.

For this purpose, we propose a cognition attribution framework based on dual-system cognitive theory, which decomposes LLMs inference into two distinct but complementary phases: (1) knowledge retrieval (Phase 1), where LLMs rapidly generate initial responses by accessing learned information, and (2) reasoning adjustment (Phase 2), where they refines the initial responses through CoT generation.

To separate the two cognitive phases, LLMs are prompted to generate answers under two distinct cognitive modes: fast thinking and slow thinking. During fast thinking, LLMs experience Phase 1, while during slow thinking, LLMs rely on both Phase 1 and Phase 2. The difference between cognitive modes is analyzed to decouple knowledge and reasoning. Our main findings include:

• The contribution of reasoning adjustment varies across domains. It plays more crucial roles in some reasoning-intensive domains (such as mathematics, physics, chemistry) than the others.

• Parameter scaling enhances both knowledge and reasoning, with knowledge being the dominant factor. Additionally, parameter scaling makes the reasoning significantly more "prudent" in all domains and moderately more "intelligent" in some specific domains.

• Knowledge retrieval primarily occurs in lower network layers, while reasoning adjustments are localized in higher layers, suggesting a functional separation in cognition. In conclusion, our study presents a cognition attribution architecture that decouples knowledge and reasoning in LLMs.