LLM Reasoning and Architecture Agentic and Multi-Agent Systems Design & LLM Interaction

Can modular cognitive tools boost LLM reasoning without training?

Does structuring reasoning as discrete, sandboxed tool calls elicit stronger problem-solving in language models compared to monolithic prompting approaches, and can this approach match specialized reasoning models?

Note · 2026-02-22 · sourced from Reasoning Architectures

Cognitive architectures in psychology posit that reasoning arises from the orchestrated, sequential execution of modular, predetermined cognitive operations. The Cognitive Tools paper instantiates this in a modern tool-calling framework: four cognitive tools are implemented as discrete functions, each executed by the same LLM in a sandboxed context.

The four cognitive tools:

  1. Understand question: Breaks down the problem by identifying main concepts, extracting relevant information, highlighting properties/theorems/techniques that might help
  2. Recall related: Retrieves related knowledge of similar questions the model knows how to answer — guides reasoning through analogous examples
  3. Examine answer: Self-evaluation of a generated answer
  4. Backtracking: Returns to a prior reasoning state when a path appears unproductive

Unlike standard agentic tools (external APIs, calculators), cognitive tools encapsulate reasoning operations within the LLM itself. Each tool's schema includes a prompt template that isolates a specific cognitive operation; the LLM executes it in sandboxed context and feeds the structured result back into the main reasoning loop.

Results: GPT-4.1 on AIME2024 improves from 26.7% to 43.3% pass@1 — approaching o1-preview performance without any RL training. Similar gains across closed and open-weight models.

The key insight: modularity reduces interference between operations. Cognitive prompting (monolithic structured prompts) improves reasoning but lacks the isolation that makes modular cognitive architectures powerful. A tool-calling implementation enforces the sandboxed execution that pure prompting cannot guarantee.

This provides direct evidence for Do base models already contain hidden reasoning ability? — cognitive tools elicit pre-existing latent capability through structured invocation, not through training. The tool-calling framework is the elicitation mechanism.

The connection to Can critical questions improve how language models reason?: both use structured decomposition of reasoning requirements to improve performance. Cognitive tools generalize this from argumentation-specific structure to domain-general cognitive operations.

Self-Discover as predecessor: Self-Discover (Zhou et al., 2024) is the clearest precursor to cognitive tools. It implements a two-stage process: (1) SELECT relevant atomic reasoning modules from a predefined set (critical thinking, step-by-step thinking, decomposition, etc.), (2) ADAPT selected modules to the specific task, (3) IMPLEMENT as a structured reasoning plan. The key difference from cognitive tools: Self-Discover composes a task-specific plan at inference time with only 3 extra inference steps — cheaper than the tool-calling loop but less modular. Self-Discover is more efficient (no sandboxed execution overhead) while cognitive tools provide stronger isolation between operations.


Source: Reasoning Architectures

Related concepts in this collection

Concept map
19 direct connections · 195 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

cognitive tools implement reasoning operations as modular agentic tool calls that elicit reasoning without rl training