Can indirect and direct reasoning methods be combined to improve results?

This explores whether reasoning that works backward (proof by contradiction, contrapositive) can be combined with ordinary forward chain-of-thought to produce better answers than either alone.

This explores whether reasoning that works backward — proof by contradiction, contrapositive — can be combined with ordinary forward chain-of-thought, and the corpus gives a fairly direct yes. The clearest result comes from work showing that adding contrapositive and proof-by-contradiction prompts on top of standard direct reasoning improves accuracy on both factual and mathematical tasks Can indirect reasoning methods solve problems direct chain-of-thought cannot?. The interesting part isn't just that indirect reasoning works — it's *why* combining helps: the logical *form* of a request turns out to be an independent lever. A model that can't reach an answer by deriving forward can sometimes reach the same answer by assuming the opposite and showing it breaks. Direct and indirect aren't redundant; they open different doors to the same room.

That raises an uncomfortable question the corpus also answers: is the model actually doing logic, or just mimicking its shape? Strikingly, chain-of-thought exemplars that are *logically invalid* perform nearly as well as valid ones — the gains come from the structural form of reasoning, not from genuine inference logically-invalid-cot-prompts-perform-nearly-as-well-as-valid-ones-valid-reasoning. Read alongside the contrapositive result, this suggests combining methods may help less because you're supplying rigorous logic and more because you're giving the model multiple structured paths to explore — different scaffolds that surface latent competence.

If the win is really about access to more paths, then *which* method beats *which* should depend on the problem, and it does. Step-by-step reasoning isn't universally better: for simple questions, a direct question-to-answer flow outperforms forced step-by-step, and the optimal prompt depends on the question's semantics, not its task category Why do some questions perform better without step-by-step reasoning?. Meanwhile, on genuinely compositional problems like graph connectivity, sequential chain-of-thought achieves an *exponential* advantage over parallel voting because the answer requires accumulating intermediate results in order When does sequential reasoning beat parallel voting?. The lesson for combining methods: match the method to the problem shape rather than always stacking more reasoning.

There's also a caution buried in the corpus. More reasoning is not free — accuracy peaks and then declines as you add thinking tokens, with models overthinking easy problems and underthinking hard ones Does more thinking time always improve reasoning accuracy?. And the failure mode when you pile on reasoning is often disorganization, not insufficient compute: models wander, abandon promising paths, and the fix is structural steering rather than longer chains Why do reasoning models abandon promising solution paths?. So naively bolting indirect reasoning onto direct reasoning can backfire if it just adds more text to wander through.

The synthesis, then, is sharper than "yes, combine them." Combining indirect and direct reasoning helps because it multiplies the *forms* available to a model that already holds the competence but can't always reach it from one angle — and at a deeper level, one analysis finds the choice of reasoning framework matters far less than total compute and the quality of the signal guiding the search Does the choice of reasoning framework actually matter for test-time performance?. The thing you didn't know you wanted to know: the value of combining methods may have little to do with logic and everything to do with giving a model more structured doorways into knowledge it can't reach by walking straight in.

Sources 7 notes

Can indirect reasoning methods solve problems direct chain-of-thought cannot?

Adding logical contrapositive augmentation and proof-by-contradiction prompts to direct reasoning improves performance on factual and mathematical reasoning tasks. The logical form of a reasoning request acts as an independent lever, allowing models to access reasoning competence that forward derivation alone cannot reach.

Why do some questions perform better without step-by-step reasoning?

Saliency analysis reveals that CoT prompting fails when question information doesn't aggregate into the prompt structure before reasoning begins. For simple questions, direct question-to-answer flow outperforms step-by-step reasoning, showing the optimal prompt depends on question type, not just task category.

When does sequential reasoning beat parallel voting?

On structured tasks requiring sequential multi-step reasoning like graph connectivity, chain-of-thought achieves exponentially higher accuracy than parallel voting. The difference emerges because solutions genuinely require accumulating intermediate results sequentially, which short parallel chains cannot achieve.

Does more thinking time always improve reasoning accuracy?

Increasing thinking tokens from ~1,100 to ~16K reduced benchmark accuracy from 87.3% to 70.3%, revealing a non-monotonic relationship where models overthink easy problems and underthink hard ones.

Why do reasoning models abandon promising solution paths?

Reasoning LLMs exhibit two reinforcing failures: wandering (invalid exploration) and underthinking (premature path-switching). Decoding-level interventions like thought-switching penalties improve accuracy without fine-tuning, suggesting viable solutions exist but are abandoned prematurely.

Does the choice of reasoning framework actually matter for test-time performance?

Information-theoretic analysis shows BoN and MCTS converge in reasoning accuracy when controlling for total compute. Snowball errors accumulate per step regardless of framework; mitigation depends on search scope and reward function reliability, not the specific algorithm.

Can indirect and direct reasoning methods be combined to improve results?

Sources 7 notes

Next inquiring lines