Why do explicit linguistic markers override semantic computation in models?
This explores a tension in how models handle surface words (hedges, format tokens, discourse markers) versus the actual meaning-based computation underneath — and the corpus suggests the markers usually don't override the computation so much as decorate it, while in a few cases learned associations genuinely do drown out reasoning.
This explores why explicit linguistic markers — the hedges, connectives, and format tokens models emit — seem to take precedence over genuine semantic computation. The most striking thread in the corpus is that often they don't: the real reasoning happens silently, and the surface language is a separate, partly cosmetic layer bolted on top. Models trained to hide their chain-of-thought compute the correct answer in their first few layers, then actively suppress that representation to emit format-compliant filler — the answer is still recoverable from the lower-ranked predictions, meaning the visible tokens are a performance, not the computation Do transformers hide reasoning before producing filler tokens?. Push this further and you find reasoning that scales entirely in latent space with no verbalized steps at all, suggesting that putting thought into words is a training artifact rather than a requirement Can models reason without generating visible thinking tokens?.
If the markers are a layer on top, the next question is which words the model actually treats as load-bearing. When researchers prune reasoning chains by what the model itself ranks as important, symbolic-computation tokens survive while grammar and meta-discourse — exactly the explicit connective scaffolding — get cut first Which tokens in reasoning chains actually matter most?. The corpus even shows you can corrupt the semantic content of a trace and still teach a model just as well, because the trace functions as computational scaffolding rather than meaningful argument Do reasoning traces need to be semantically correct?. So in the cases where markers appear to 'override' meaning, it may be that the meaning was never carried by the visible language in the first place.
But there's a darker reading where surface signals genuinely do win. Hedging markers — 'perhaps,' 'it seems,' 'I think' — show up more densely in wrong answers, so the model reaches for uncertainty language as a reflex when it's in epistemic trouble, not as a considered judgment Do hedging markers actually signal careful thinking in AI?. Models also exploit conservative defaults: many look like they're reasoning carefully about constraints when they're really just defaulting to the safe-sounding option, and they get worse when the constraints are removed Are models actually reasoning about constraints or just defaulting conservatively?. In both cases a learned surface pattern stands in for the computation it's supposed to reflect.
The deeper why connects to how these models represent meaning at all. They reason through semantic associations rather than formal symbol manipulation — strip the familiar wording out of a logic task and performance collapses even with the correct rules in front of them Do large language models reason symbolically or semantically?. When a strong learned association exists, it can override the actual information in the context, and plain prompting can't dislodge it — only intervening in the internal representations does Why do language models ignore information in their context?. A model built to compress relational structure from text, with no external referent to check against, will weight a frequent linguistic cue over a one-off semantic computation almost by construction Can language models learn meaning without engaging the world?.
The thing you might not have known you wanted to know: this is also why models miss pragmatics. Scalar implicature — inferring 'some' implies 'not all' — should flex with conversational context, but the model computes it the same way regardless of stakes, because it's pattern-matching the marker rather than tracking what the marker is doing in this situation Can language models adapt implicature to conversational context?. The override isn't markers beating meaning in a fair fight; it's that the explicit surface form is the cheap, frequent, always-available signal, and genuine context-sensitive computation is the expensive one the model only sometimes does.
Sources 10 notes
Logit lens analysis shows models trained with hidden CoT tokens compute correct answers in layers 1-3, then actively suppress these representations in final layers to produce format-compliant filler output. The reasoning is fully recoverable from lower-ranked token predictions.
Multiple architectures—depth-recurrent models, Heima, and Coconut—demonstrate that test-time compute scales through hidden state iteration rather than token generation. This suggests verbalization is a training artifact, not a reasoning requirement.
Greedy likelihood-preserving pruning reveals six functional token categories; symbolic computation tokens are preferentially preserved while grammar and meta-discourse are pruned first. Student models trained on these pruned chains outperform those trained on frontier-model compression.
Models trained on systematically irrelevant traces maintain solution accuracy and sometimes improve out-of-distribution generalization, suggesting traces function as computational scaffolding rather than meaningful reasoning steps.
Analysis of reasoning model outputs shows incorrect responses have higher density and diversity of hedging markers. This suggests hedging signals uncertainty and epistemic trouble, not epistemic virtue or conscientiousness.
Twelve of fourteen models perform worse when constraints are removed, dropping up to 38.5 percentage points. Models appear to reason correctly by defaulting to harder options, not by actually evaluating constraints.
When semantic content is decoupled from reasoning tasks, LLM performance collapses even with correct rules in context. Models rely on parametric commonsense and token associations rather than formal logical manipulation, constraining reasoning to training distribution semantics.
Research demonstrates that LMs generate outputs inconsistent with their context because parametric knowledge from training dominates over in-context information. Textual prompting alone cannot override strong priors; causal intervention in representations is required.
Research shows LLMs learn culturally situated discourse patterns by compressing relational structure from text, demonstrating that fluent language generation requires no external referents or embodied grounding.
ChatGPT shows no context-sensitivity in computing scalar implicatures across three dimensions: explicit literal-mode instructions, information structure focus, and face-threatening contexts. Humans flexibly modulate these inferences; the model does not, suggesting pragmatic competence requires tracking communicative stakes that LLMs systematically miss.