Do language models ignore goals when surface cues conflict?
When a task has an obvious surface cue that contradicts an unstated requirement, do LLMs follow the cue or the actual goal? This matters because it reveals whether reasoning failures come from missing knowledge or from how models weight competing signals.
The car-wash problem went viral in February 2026: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?" Every frontier LLM tested recommended walking. The correct answer is to drive, because you cannot wash a car that is not at the car wash. A 53-model evaluation found 42 recommended walking on a single pass, with only 5 answering correctly across ten trials.
The Heuristic Override Benchmark (HOB) generalized this single anecdote into a systematic 500-instance test crossing 4 heuristic families with 5 constraint families. Across 14 models the result is sharp: under strict 10/10 evaluation, no model exceeds 75 percent accuracy. Causal-behavioral analysis on six models showed the Heuristic Dominance Ratio (HDR) — how much more the surface cue influences the decision than the goal — ranged from 8.7× to 38×. The distance cue exerts at least an order of magnitude more influence than the goal in every model tested.
Monotonicity curves further showed that all six models produced sigmoid conflict curves with the same shape, differing only in amplitude and crossover distance. The mapping from distance to decision is approximately context-independent — the goal does not gate the heuristic, only weakly modulates it. This is not a tail-distribution problem at the edges of capability. It is a structural feature of how transformers handle conflicts between salient surface cues and unstated feasibility constraints. The cue dominates; the goal whispers.
Source: Linguistics, NLP, NLU
Related concepts in this collection
-
Why do language models fail to use knowledge they possess?
Large language models contain relevant world knowledge but often fail to activate it without explicit cues. This explores whether the bottleneck lies in knowledge storage or in the inference process that decides what background facts apply.
characterizes the failure mode
-
Are models actually reasoning about constraints or just defaulting conservatively?
Do language models genuinely apply constraints when solving problems, or do they simply prefer harder options by default? Minimal pair testing reveals whether apparent reasoning success masks hidden biases.
exposes the apparent-reasoning illusion
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLMs systematically follow surface heuristics over implicit feasibility constraints with the heuristic 8 to 38 times more influential than the goal