Do large language models reason symbolically or semantically?
Can LLMs follow explicit logical rules when those rules contradict their training knowledge? Testing whether reasoning operates independently of semantic associations reveals what computational mechanisms actually drive LLM multi-step inference.
The "In-Context Semantic Reasoners" paper tests a fundamental question about what drives LLM reasoning by systematically decoupling semantics from the reasoning process across deduction, induction, and abduction tasks. The findings are clear: when semantics are consistent with commonsense, LLMs perform well; when semantics are removed or made counter-commonsense, performance collapses even when correct rules are provided in context.
The experimental design is precise. By replacing relation labels with shuffled alternatives ("motherOf" → "sisterOf", "female" → "male"), the researchers create tasks where the in-context rules are logically valid but semantically counter-intuitive. LLMs cannot follow these counter-commonsense rules despite having them explicitly in the prompt. The model's parametric knowledge — its compressed commonsense from training — overrides the in-context logical structure.
This reveals a specific computational mechanism: LLMs create "superficial logical chains" through semantic token associations, not through symbolic manipulation. The connections between tokens that enable multi-step reasoning are semantic connections, not logical ones. When those semantic connections support the correct answer, reasoning appears to work. When they conflict, reasoning fails regardless of what the prompt says.
The implication is that LLM reasoning is fundamentally bounded by training distribution semantics. Since Can large language models translate natural language to logic faithfully?, the failure is bidirectional: LLMs can neither translate TO formal logic faithfully nor reason FROM formal logic when it conflicts with semantic priors. Since Do foundation models learn world models or task-specific shortcuts?, the semantic dependency IS the heuristic — the model uses semantic similarity as a proxy for logical validity.
This connects to the Dual Process Theory framework: human System II symbolic reasoning operates independently of semantic content, but LLM "reasoning" remains entangled with System I semantic associations. The paper's suggestion — integrating LLMs with external non-parametric knowledge bases and improving in-context knowledge processing — implicitly acknowledges that the LLM alone cannot escape this limitation.
Retort implication — rules out a class of anthropomorphization: The finding constrains what we can say about LLM behavior in other domains. Any account that treats LLMs as agents who "reverse-engineer" justifications for conclusions they have committed to — the standard anthropomorphization of sycophancy, rationalization, or motivated reasoning — presupposes the semantic competence this note shows LLMs lack. If reasoning collapses when semantics are decoupled, there is no separable reasoning faculty available to perform a post-hoc rationalization. What looks like reverse-engineering is pattern-matching within semantic associations. This rules out a whole class of AI commentary that treats LLMs as dishonest agents who could have reasoned correctly but chose not to.
Metaphor as paradigmatic semantic decoupling: Metaphor is the literary instantiation of this finding. A metaphor works by using one domain's vocabulary to illuminate another — "time is money," "argument is war," "memory is a jar of flies." The decoupling between the source domain's semantics and the target domain's meaning is the defining feature of metaphorical language. Since LLM reasoning collapses when semantics are decoupled from their typical packaging, and metaphor is decoupled semantics, this predicts a specific failure mode: LLMs should handle conventional metaphors (lexicalized, semantically consistent with commonsense) better than novel literary metaphors (where the mapping between domains is unexpected and requires conceptual reasoning beyond semantic association). The Diplomat dataset (Diplomat: A Dialogue Dataset for Situated PragMATic Reasoning) suggests treating all figurative language as a unified pragmatic reasoning task — but the semantic-decoupling finding predicts that this unified approach will hit a wall at the novelty threshold where metaphors stop relying on conventional semantic associations.
Source: Reasoning Logic Internal Rules; enriched from inbox/research-brief-llm-literary-analysis-2026-03-02.md
Related concepts in this collection
-
Can large language models translate natural language to logic faithfully?
This explores whether LLMs can convert natural language statements into formal logical representations without losing meaning. It matters because faithful translation is essential for any AI system that reasons formally or verifies specifications.
bidirectional semantic dependency: fails translating TO logic and reasoning FROM logic
-
Do foundation models learn world models or task-specific shortcuts?
When transformer models predict sequences accurately, are they building genuine world models that capture underlying physics and logic? Or are they exploiting narrow patterns that fail under distribution shift?
semantic associations are the heuristic mechanism
-
Why do language models ignore information in their context?
Explores why language models sometimes override contextual information with prior training associations, and whether providing more context can solve this problem.
same mechanism: parametric knowledge overrides in-context information
-
Does semantic grounding in language models come in degrees?
Rather than asking whether LLMs truly understand meaning, this explores whether grounding is actually a multi-dimensional spectrum. The question matters because it reframes the sterile understand/don't-understand debate into measurable, distinct capacities.
functional grounding through semantic associations explains why reasoning works within commonsense boundaries
-
Why do neural networks fail at compositional generalization?
Exploring whether the binding problem from neuroscience explains neural networks' inability to systematically generalize. The binding problem has three aspects—segregation, representation, and composition—each creating distinct failure modes in how networks handle structured information.
the binding problem may explain WHY semantic decoupling collapses reasoning: without compositional binding mechanisms, removing semantic content removes the only glue holding multi-step inference together; semantic associations serve as a substitute for genuine compositional binding
-
Do LLMs actually have world models or just facts?
The term 'world model' conflates two different capabilities: factual representation versus mechanistic understanding. Understanding which one LLMs actually possess matters for assessing their reasoning reliability.
semantic reasoning operates on factual world representation (Sense 1) but cannot perform mechanistic reasoning (Sense 2) when logic must override semantic priors
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
llms are in-context semantic reasoners not symbolic reasoners — when semantics are decoupled reasoning collapses