LLM Reasoning and Architecture Reinforcement Learning for LLMs Agentic and Multi-Agent Systems

Why do language models fail to act on their own reasoning?

LLMs generate correct step-by-step reasoning 87% of the time but only follow through with matching actions 64% of the time. What drives this gap between knowing and doing?

Note · 2026-02-22 · sourced from Reinforcement Learning
How should we allocate compute budget at inference time? What kind of thing is an LLM really?

Three systematic failure modes explain why LLMs perform sub-optimally in sequential decision-making: greediness (premature commitment to exploitative strategies, leaving up to 55% of the action space unexplored), frequency bias (small models copying the most frequent actions regardless of reward), and the knowing-doing gap (producing correct rationales but failing to act on them).

The knowing-doing gap is the most conceptually significant finding. When LLMs generate chain-of-thought rationales about how to solve a decision-making task, 87% of the rationales are correct — yet only 64% of the subsequent actions follow the rationale's recommendation. The model knows what to do but defaults to greedy behavior instead of following its own reasoning.

Scale partially helps: larger models (27B) diminish frequency bias but remain greedy. RL fine-tuning on self-generated CoT rationales narrows all three gaps by increasing exploration and aligning actions with rationales. This suggests the gap is trainable, not architectural.

This connects directly to the concept of Potemkin understanding. Since Can LLMs understand concepts they cannot apply?, the knowing-doing gap is a measurable instance of exactly this pattern — the model demonstrates understanding in its rationale but fails in its action selection. The quantified gap (87% vs 64%) gives the Potemkin understanding concept empirical grounding.

The deeper implication is that CoT reasoning and action selection may involve different computational pathways. Since Do language models actually use their encoded knowledge?, the knowing-doing gap may reflect a disconnect where the reasoning trace is generated through one pathway while action selection draws on different (shallower, more habitual) computations.

Alice in Wonderland: the overconfidence amplifier. The "Alice in Wonderland" paper demonstrates a dramatic instance of the knowing-doing gap on trivially simple reasoning: "Alice has N brothers and M sisters. How many sisters does Alice's brother have?" Most SOTA models collapse entirely on this simple problem, producing incorrect answers with strong overconfidence while providing "reasoning-like explanations akin to confabulations" to justify clearly failed responses. Standard interventions (enhanced prompting, multi-step re-evaluation) fail to recover correct answers. The confabulation-like quality of the justifications directly parallels the knowing-doing gap: the model generates plausible reasoning traces that do not correspond to correct computation. Notable exceptions are Claude 3 Opus and GPT-4 which occasionally succeed — but still show frequent failures, suggesting the problem is architectural, not model-specific.


Source: Reinforcement Learning; enriched from Flaws

Related concepts in this collection

Concept map
18 direct connections · 178 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llms are greedy agents with a knowing-doing gap — correct rationales 87 percent but greedy actions 64 percent