LLM Reasoning and Architecture Reinforcement Learning for LLMs

Do language models ignore goals when surface cues conflict?

When a task has an obvious surface cue that contradicts an unstated requirement, do LLMs follow the cue or the actual goal? This matters because it reveals whether reasoning failures come from missing knowledge or from how models weight competing signals.

Note · 2026-05-01 · sourced from Linguistics, NLP, NLU
How do reasoning models actually fail under pressure? How do LLMs fail to know what they seem to understand?

The car-wash problem went viral in February 2026: "I want to wash my car. The car wash is 50 meters away. Should I walk or drive?" Every frontier LLM tested recommended walking. The correct answer is to drive, because you cannot wash a car that is not at the car wash. A 53-model evaluation found 42 recommended walking on a single pass, with only 5 answering correctly across ten trials.

The Heuristic Override Benchmark (HOB) generalized this single anecdote into a systematic 500-instance test crossing 4 heuristic families with 5 constraint families. Across 14 models the result is sharp: under strict 10/10 evaluation, no model exceeds 75 percent accuracy. Causal-behavioral analysis on six models showed the Heuristic Dominance Ratio (HDR) — how much more the surface cue influences the decision than the goal — ranged from 8.7× to 38×. The distance cue exerts at least an order of magnitude more influence than the goal in every model tested.

Monotonicity curves further showed that all six models produced sigmoid conflict curves with the same shape, differing only in amplitude and crossover distance. The mapping from distance to decision is approximately context-independent — the goal does not gate the heuristic, only weakly modulates it. This is not a tail-distribution problem at the edges of capability. It is a structural feature of how transformers handle conflicts between salient surface cues and unstated feasibility constraints. The cue dominates; the goal whispers.


Source: Linguistics, NLP, NLU

Related concepts in this collection

Concept map
12 direct connections · 124 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

LLMs systematically follow surface heuristics over implicit feasibility constraints with the heuristic 8 to 38 times more influential than the goal