Reinforcement Learning for LLMs LLM Reasoning and Architecture Language Understanding and Pragmatics

Why do accurate predictions lead to poor decisions?

Predictive models are built to fit data, not to optimize decision outcomes. This note explores when and why accurate forecasts fail to produce good choices.

Note · 2026-02-22 · sourced from LLM Architecture
What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

"All AI Models Are Wrong, but Some are Optimal" (2501.06086) formalizes a gap that practitioners experience intuitively: accurate prediction does not guarantee good decisions. The paper establishes necessary and sufficient conditions for a predictive model (AI-based or not) to support optimal sequential decision-making.

The core problem: predictive models are typically constructed to approximate the real system's future behavior as closely as possible. But real systems are stochastic, and even with abundant data, the model is always an approximation. The construction of the predictive model is generally agnostic to the decision-making objectives — it has no direct relationship to the performance measure of the resulting decisions.

This matters because sequential decision-making requires accounting for future uncertainty, the availability of new information for future decisions, and both short- and long-term consequences. A model that predicts accurately on average may systematically mispredict in the states that matter most for decision quality. Since Can utility-weighted training loss actually harm model performance?, the mechanism is precise: the loss function shapes gradients for both representation learning and decision-making simultaneously, and optimizing one can weaken the other.

The connection to reward models is direct. Since Do reward models actually consider what the prompt asks?, reward models exhibit exactly this prediction-decision gap: they predict quality accurately on average but fail to condition on the decision-relevant information (the prompt). The formal framework here provides theoretical grounding for why prompt-insensitive reward models produce suboptimal alignment.

Since Why do language models fail to act on their own reasoning?, the prediction-decision gap manifests at the individual model level too: the model can predict what the right action is (rationale) but fails to execute it (greedy action). Good prediction, suboptimal decision.


Source: LLM Architecture

Related concepts in this collection

Concept map
14 direct connections · 142 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

predictive AI models optimized for data fit produce suboptimal decisions — formal conditions define when prediction enables optimal policy