Why do LLMs fail when asked to use counter-commonsense rules explicitly?

This explores why giving an LLM an explicit rule that contradicts everyday common sense — and asking it to follow that rule literally — tends to break its reasoning, even when the rule sits right there in the prompt. The corpus points to one root cause with several faces: LLMs don't actually manipulate rules as formal symbols. They reason by semantic association, leaning on the commonsense baked into their training, and a counter-commonsense rule pits the explicit instruction against those priors — the priors usually win.

The sharpest evidence is the finding that LLMs are in-context *semantic* reasoners, not symbolic ones Do large language models reason symbolically or semantically?. When the semantic content of a task is decoupled from the logic — exactly what a counter-commonsense rule does — performance collapses even though the correct rule is supplied. The model isn't executing the rule; it's pattern-matching against what the tokens usually mean. So a rule like "treat A as larger than B" when the world says otherwise gets quietly overwritten by the parametric default.

This is the same machinery that makes LLMs accept false presuppositions they demonstrably know are false Why do language models accept false assumptions they know are wrong?: having the correct knowledge doesn't translate into overriding a smoothly-phrased premise. And it connects to a broader split the corpus keeps surfacing — models can state a principle correctly yet fail to apply it, a dissociation between the "explanation" pathway and the "execution" pathway Can LLMs understand concepts they cannot apply?, Can language models understand without actually executing correctly?. Asking a model to *use* a strange rule is precisely the case where knowing-about and doing-with come apart. These all sit under the same umbrella of structurally distinct epistemic failure modes How do LLMs fail to know what they seem to understand?.

What's genuinely useful — and a little surprising — is that the corpus also shows the fix tends to be procedural rather than a matter of more knowledge. Forcing the model to enumerate the constraints it would otherwise skip lifts accuracy dramatically (one frame-problem study jumps from 30% to 85% just by making preconditions explicit) Do language models fail at identifying unstated preconditions?. Structured prompting that makes the model check its warrants instead of gliding past them does similar work Can structured argument prompts make LLM reasoning more rigorous?, and wrapping the model in an external algorithm that hands it only the rule-relevant slice of context — rather than trusting it to hold the rule against its priors — is another route around the same wall Can algorithms control LLM reasoning better than LLMs alone?.

The takeaway the question doesn't ask for but earns: the failure isn't that the model misunderstands your weird rule. It's that, left to free-form reasoning, the model never really switches from association to literal rule-following at all — and the reliable remedies are scaffolds that force the switch, not better explanations of the rule.

Sources 8 notes

Do large language models reason symbolically or semantically?

When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.

Why do language models accept false assumptions they know are wrong?

The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.

Can LLMs understand concepts they cannot apply?

Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

How do LLMs fail to know what they seem to understand?

LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.

Do language models fail at identifying unstated preconditions?

LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.

Can structured argument prompts make LLM reasoning more rigorous?

Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.

Can algorithms control LLM reasoning better than LLMs alone?

LLM Programs embed LLMs within explicit algorithms that manage control flow and state, presenting only step-specific context to each LLM call. This information hiding addresses capability and context window limits while treating complex reasoning as modular, debuggable sub-tasks.

Why do LLMs fail when asked to use counter-commonsense rules explicitly?

Sources 8 notes

Next inquiring lines