Why can't LLMs reason from first principles or initial commitments?
This explores why LLMs struggle to hold a starting rule or premise and follow it through — reasoning from a fixed commitment rather than drifting toward whatever their training data finds familiar.
This reads the question as asking why models can't lock onto an initial principle and reason forward from it consistently. The corpus has a surprisingly clear answer, and it's not what you'd expect: the problem usually isn't that models lack the principle. It's that knowing a principle and executing on it run on separate tracks. Several notes describe a 'split-brain' pattern where models articulate a correct rule and then fail to apply it — 87% accuracy in explanation versus 64% in action Can language models understand without actually executing correctly?. The related 'Potemkin understanding' work sharpens this: a model can explain a concept, fail to use it, and even recognize its own failure — a triple combination no human reasoner would produce Can LLMs understand concepts they cannot apply?. So a stated first principle isn't a binding commitment the way it is for a person; it's just more text the model has generated.
Why doesn't the commitment bind? Because the underlying reasoning is semantic association, not symbolic logic. When researchers decouple meaning from the task — keeping the rules valid but stripping the familiar content — model performance collapses even with the correct rule sitting right there in the prompt Do large language models reason symbolically or semantically?. The model is leaning on what 'sounds right' from its training distribution rather than mechanically following the premise it was given. This is also why models accommodate false starting assumptions: the FLEX benchmark shows them accepting false presuppositions they demonstrably know are wrong, because the fluent continuation pulls harder than the stored fact Why do language models accept false assumptions they know are wrong?.
Reasoning from first principles also requires bringing forward what's unstated — the background conditions a premise quietly depends on. Here the corpus points to a revived version of the classic 'frame problem': models fail not from missing knowledge but from not enumerating the relevant preconditions. Force that enumeration explicitly and accuracy jumps from 30% to 85% Do language models fail at identifying unstated preconditions?. And even when the first step is sound, the chain wanders. Reasoning models behave like unsystematic explorers rather than methodical searchers, so success probability decays exponentially as a problem gets deeper — fine for shallow problems, catastrophic for long derivations Why do reasoning LLMs fail at deeper problem solving?. A first-principles argument is exactly the deep, many-step kind that this failure mode punishes hardest.
The most useful turn here is what the corpus says fixes it — because the fixes reveal the cause. Nearly every remedy works by supplying the structure the model won't impose on itself. Forcing models to check their warrants and backing with explicit critical-question prompts catches failures that ordinary chain-of-thought hides Can structured argument prompts make LLM reasoning more rigorous?. Offloading the actual inference to a symbolic solver, leaving the LLM only to translate, produces faithful logic with machine-checkable error messages Can symbolic solvers fix how LLMs reason about logic?. And partial formalization — enriching natural language with selective symbolic scaffolding rather than fully formalizing — beats both pure prose and full logic Why does partial formalization outperform full symbolic logic?. The through-line worth taking away: LLMs can't reliably reason from a commitment because nothing internal holds the commitment in place. These all sit inside a broader map of distinct epistemic failure modes How do LLMs fail to know what they seem to understand? — and the practical upshot is that 'reason from first principles' isn't one capability the model is missing, but a stack of small disciplines (hold the premise, surface the hidden conditions, follow each step, don't drift to the familiar) that the architecture won't enforce unless you build the enforcement around it.
Sources 10 notes
Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.
Models can explain concepts accurately, fail to apply them, and recognize the failure—a triple pattern incompatible with human cognition. This indicates functionally disconnected explanation and execution pathways rather than simple knowledge gaps.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
The FLEX Benchmark shows that models reject false presuppositions at rates far below acceptable levels (GPT-4: 84%, Mistral: 2.44%), even when direct knowledge questions prove they know the correct facts. False presuppositions drive more accommodation than correct knowledge drives rejection.
LLMs struggle not from lacking world knowledge but from failing to bring background conditions forward as relevant constraints. Prompting that forces explicit enumeration of preconditions raises accuracy from 30% to 85%, revealing the frame problem persists in statistical systems.
Current reasoning models lack the three properties of systematic exploration: validity, effectiveness, and necessity. This causes success probability to drop exponentially with problem depth, making medium problems solvable but deep problems catastrophically harder.
Applying Toulmin's argument model as explicit prompting steps (CQoT) improves LLM reasoning by forcing models to identify warrants and backing rather than skipping implicit premises. The method catches failures that standard chain-of-thought prompting allows.
Logic-LM divides cognitive labor by having LLMs formulate symbolic representations while deterministic solvers execute inference and provide machine-verifiable error messages. This structured feedback loop catches translation errors better than LLM self-critique, improving faithful reasoning without requiring perfect formalization.
QuaSAR and Logic-of-Thought both achieve 4-8% accuracy gains by enriching natural language with selective symbolic elements rather than replacing it. Full formalization loses semantic information; pure language lacks structure. Augmentation preserves both.
LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.