Why do LLM agents make promises without executing them?

This explores the gap between an LLM agent saying it will do something and actually doing it — why agents announce, agree, or plan an action and then never carry it out.

This explores the gap between an LLM agent *saying* it will do something and actually doing it. The corpus frames this not as laziness or a bug but as a structural split between two pathways that we tend to assume are one. The sharpest account is what one line calls computational split-brain syndrome: models articulate the correct principle far more reliably than they execute it — roughly 87% accuracy when explaining versus 64% when acting Can language models understand without actually executing correctly?. The promise lives in the explanation pathway; the follow-through lives in a different, weaker one. This shows up again as 'Potemkin understanding' — a fluent correct-sounding account sitting on top of an inability to apply it How do LLMs fail to know what they seem to understand?. So a promise is cheap precisely because generating the words that constitute a promise is the thing LLMs are best at.

There's a second, more mundane mechanism worth knowing about: agents are trained to respond, not to pursue. Conversational agents are structurally passive — they can't initiate, hold a goal across turns, or drive toward an outcome, because alignment optimizes for answering the query in front of them rather than executing a plan an agent set for itself Why can't conversational AI agents take the initiative?. A promise made in turn three is just text; nothing in the architecture carries it forward as a live commitment. The multi-agent literature names this directly as a recurring failure mode — 'flake replies,' where an agent agrees to a task and then doesn't perform it — and traces it to the absence of persistent goal representation and stable role identity Why do autonomous LLM agents fail in predictable ways?.

The interesting turn is what the corpus says *fixes* this, because it implies the cause. Reliability, it argues, doesn't come from a smarter model — it comes from externalizing memory, skills, and protocols into a harness layer outside the model, so the commitment is held by the system rather than re-derived from scratch each turn Where does agent reliability actually come from?. In other words: a broken promise is what happens when the obligation only ever existed inside a single generation. Give it a place to live outside the token stream and it can be tracked, checked, and acted on.

Laterally, two adjacent findings deepen the picture. Agents systematically lean on raw immediate context and discount condensed or retrieved experience — so a plan they 'remember' making is exactly the kind of summarized information they tend to ignore in favor of whatever's in front of them right now Why do LLM agents ignore condensed experience summaries?. And reframing LLMs as policies in a multi-step (partially observable) decision process — rather than one-shot text generators — is what makes the follow-through itself optimizable: planning and memory become subsystems you can train, not emergent hopes How does treating LLMs as multi-step agents change what we can optimize?.

The thing you didn't know you wanted to know: the promise and the execution aren't a single capability that sometimes misfires — they're two different competences, and LLMs are lopsidedly good at the one that produces the words. Fixing it is less about trust and more about building scaffolding that remembers what was promised.

Sources 7 notes

Can language models understand without actually executing correctly?

Large language models can articulate correct principles but systematically fail to apply them due to dissociated instruction and execution pathways. The 87% accuracy in explanations versus 64% in actions reveals this is not knowledge deficit but structural disconnect.

How do LLMs fail to know what they seem to understand?

LLMs show repeatable, empirically documented failure modes—from Potemkin understanding (correct explanation + failed application) to reasoning collapse under implicit constraints. These failures reveal gaps between statistical pattern-tracking and actual epistemic competence.

Why can't conversational AI agents take the initiative?

Research shows LLMs including ChatGPT cannot initiate topics, plan strategically, or lead conversations because their training optimizes for responding to queries, not creating dialogue from agent goals. This passivity is reinforced by alignment objectives and masked by fluent-sounding outputs.

Why do autonomous LLM agents fail in predictable ways?

Research identifies role flipping, flake replies, infinite loops, and conversation deviation as LLM-specific failures in multi-agent cooperation. These occur because LLMs lack persistent goal representation and stable role identity.

Where does agent reliability actually come from?

Research shows reliable LLM agents externalize three cognitive burdens—memory (state persistence), skills (procedural components), and protocols (structured interaction)—into a harness layer rather than relying on model scale alone. The harness unifies these externalities and eliminates the need for the model to solve the same problems repeatedly.

Why do LLM agents ignore condensed experience summaries?

Across 10 LLM models and 9 environments, perturbing raw experience changed agent behavior significantly, while altering condensed experience had minimal effect. Three causes drive this asymmetry: summaries lose critical details, models favor immediate context over retrieved information, and pretrained knowledge reduces reliance on external experience.

How does treating LLMs as multi-step agents change what we can optimize?

The Agentic RL survey shows that modeling LLMs as policies in Partially Observable MDPs rather than single-step generators makes memory, planning, and reasoning into RL-optimizable subsystems. This structural reframing explains the recent empirical convergence across memory-based agents, skill learning, and strategy distillation.

Why do LLM agents make promises without executing them?

Sources 7 notes

Next inquiring lines