Reasoning Strategies in Large Language Models: Can They Follow, Prefer, and Optimize?

Paper · arXiv 2507.11423 · Published July 15, 2025
Reasoning CritiquesPrompts PromptingArgumentationDiscourses

Human reasoning involves different strategies, each suited to specific problems. Prior work shows that large language model (LLMs) tend to favor a single reasoning strategy, potentially limiting their effectiveness in diverse reasoning challenges. In this work, we investigate whether prompting can control LLMs reasoning strategies and assess its impact on logical problem-solving. While our experiments show that no single strategy consistently improves accuracy, performance could be enhanced if models could adaptively choose the optimal strategy. We propose methods to guide LLMs in strategy selection, highlighting new ways to refine their reasoning abilities.

Cognitive science shows that people can switch, for example, between supposition following (hypothesising an assumption and tracing its consequences) and chain construction (building a sequential argument), selecting whichever suits the problem at hand

this study aims to go further by investigating: (i) whether an LLM can be explicitly instructed to follow different reasoning strategies, (ii) whether an LLM can autonomously determine the best strategy for solving a given problem, and (iii) whether it is possible to guide the model in selecting the most appropriate strategy for a given problem.

Controlled strategy prompting. We design prompt templates that steer a single LLM into four human-inspired reasoning modes—supposition following, chain construction, compound reasoning, and concatenation—

• Empirical analysis of strategy efficacy. On two logical-deduction benchmarks (TruthQuest and ZebraLogic) we demonstrate in Section 3 that no single strategy dominates. An oracle that always picks the best strategy per problem would raise accuracy by up to 40 percentage points, exposing substantial untapped potential.

• Ensemble-based strategy selection. Rather than asking the model to choose a strategy, we run all strategies in parallel and select one of the resulting answers using principled combination rules—majority vote, maximum answer probability, minimum entropy, and a model-based verifier. These post-hoc selectors require no meta-prompts or additional training yet consistently outperform any individual strategy prompt as we show in Section 4.

four distinct strategies3 that LLMs employ for deductive reasoning problems such as those described in Section 2:

• Supposition Following: Enumerates all propositions, makes a supposition, traces consequences, and tests alternatives if contradictions arise.

• Chain Construction: Identifies logical relationships, deduces intermediate implications, and builds a reasoning chain to the conclusion.

• Compound Strategy: Integrates multiple logical relationships, iteratively deriving and combining intermediate conclusions.

• Concatenation Strategy: Entails the concatenation of two or more statements into a single conclusion that encompasses the logical implications of each combined proposition.

when LLMs are tasked with solving deductive problems without explicit guidance, each model tends to spontaneously adopt a preferred reasoning strategy.

different LLM architectures might exhibit inherent biases toward specific reasoning pathways.

The annotation results, presented in Table 1, allow us to draw two main conclusions. First, we confirm the findings of Mondorf and Plank(Mondorf and Plank, 2024a): when no specific strategy is provided, the model tends to prefer certain strategies over others. Second, the model generally follows the strategy indicated in the prompt, even though there is some variability in behavior that our preliminary experiments did not manage to explain. This confirms that prompting is an effective way to guide the model toward reasoning strategies it might not “naturally” adopt.

when no specific strategy is provided and the model is free to choose its own, its performance is not better than when a strategy is imposed. This suggests that the model is unable to select the best strategy without additional information.