LLM Reasoning and Architecture Language Understanding and Pragmatics Design & LLM Interaction

How much does the order of premises actually matter for reasoning?

When you rearrange the order of logical premises in a deduction task, does it change how well language models can solve it? This tests whether LLMs reason abstractly or process input sequentially.

Note · 2026-02-22 · sourced from Reasoning Logic Internal Rules
What makes chain-of-thought reasoning actually work? How do LLMs fail to know what they seem to understand? How should researchers navigate LLM reasoning research?

LLMs are surprisingly brittle to the ordering of premises in deductive reasoning tasks, despite the fact that premise order does not alter the underlying logical task. The "Premise Order Matters" paper shows that permuting premise order can cause a performance drop of over 30%.

The key finding is directional: LLMs achieve best performance when premises are presented in the same order as the context required in intermediate reasoning steps — essentially, when the prompt mirrors the ground truth proof sequence. When premises must be mentally reordered to construct the proof, accuracy drops sharply.

This brittleness reveals that LLM deductive reasoning is not operating on abstract logical relations but on sequential pattern matching through the input. The model processes premises in order and constructs intermediate representations that are order-dependent. When the order does not match the proof structure, the model must implicitly reorder — a capability it lacks or executes poorly.

The finding connects to multiple existing insights about surface-level processing:

Since Why do chain-of-thought examples fail across different conditions?, order sensitivity is not unique to premises — it extends across the entire prompt structure. Both findings suggest that LLMs process prompts as sequential narratives, not as unordered logical structures.

Since Does training data format shape reasoning strategy more than domain?, premise ordering is another format effect: the same logical content produces dramatically different performance depending on presentation format. The 30% gap is comparable to the 7.5x format effect documented in training data.

The practical implication is that anyone constructing prompts for deductive reasoning tasks should order premises to match the expected proof sequence. This is trivial for the prompt designer who knows the answer but impossible in production settings where the answer is unknown — creating a fundamental deployment challenge for LLM deductive reasoning.


Source: Reasoning Logic Internal Rules

Related concepts in this collection

Concept map
14 direct connections · 147 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

premise ordering affects deductive reasoning performance by over 30 percent