Do language models learn differently from good versus bad outcomes?
Do LLMs update their beliefs asymmetrically when learning from their own choices versus observing others? This matters for understanding whether agentic AI systems might inherit human cognitive biases.
Using instrumental learning tasks adapted from cognitive psychology (multi-armed bandit variants), LLMs show a systematic optimism bias: they learn more from better-than-expected outcomes than from worse-than-expected ones when learning about their own chosen actions. Three properties of this bias parallel human cognition precisely:
- Optimism for chosen actions — the model updates beliefs more strongly when outcomes exceed expectations than when they fall short
- Reversal for counterfactual feedback — when learning about the value of the unchosen option, the bias reverses (pessimism about alternatives)
- Disappearance without agency — when the model has no control over choices (passive observation), the asymmetry vanishes entirely
The meta-RL validation is critical: idealized in-context learning agents derived through meta-reinforcement learning — which converge onto Bayes-optimal strategies — exhibit the same three behavioral effects. This suggests the asymmetry may be rational rather than a bug. An optimistic agent that overweights positive outcomes from its own actions while underweighting positive outcomes from unchosen alternatives will exploit more aggressively, which can be optimal in certain bandit environments.
The agency-dependence is the most theoretically interesting aspect. The same model shows the bias when it perceives itself as an agent making choices but not when passively observing outcomes. This implies the bias is not a fixed property of the attention mechanism or the training distribution — it is context-dependent, activated by the framing of agency. Since Do large language models make the same causal reasoning mistakes as humans?, this adds another dimension: LLMs don't just replicate human causal reasoning biases but also human motivational biases that depend on perceived agency.
The practical implication for agentic AI: when LLMs are deployed as decision-making agents, they may systematically overweight evidence that their previous decisions were good and underweight evidence that alternative actions would have been better. This is precisely the pattern that produces confirmation bias in human decision-making — and it may be an emergent property of any sufficiently capable in-context learner, not a training artifact.
Source: Cognitive Models Latent
Related concepts in this collection
-
Do large language models make the same causal reasoning mistakes as humans?
Research on collider structures reveals whether LLMs share human biases in causal inference. This matters because if both fail identically, collaboration might reinforce rather than correct errors.
parallel: LLMs replicate structural biases in causal reasoning; this note adds motivational biases contingent on agency
-
Why do language models fail to act on their own reasoning?
LLMs generate correct step-by-step reasoning 87% of the time but only follow through with matching actions 64% of the time. What drives this gap between knowing and doing?
related: the knowing-doing gap may partly reflect an optimism bias toward chosen actions
-
Can transformers learn to solve new problems within episodes?
Explores whether RL-finetuned transformers can develop meta-learning abilities that let them adapt to unseen tasks through in-episode experience alone, without weight updates.
mechanism: ICL meta-learning produces the same bias pattern as explicit meta-RL
-
Why do LLMs struggle with exploration in simple decision tasks?
This explores why large language models fail at exploration—a core decision-making capability—even when they excel at other tasks, and what specific conditions might help them succeed.
exploration failure as downstream consequence: if agents are optimistically biased toward chosen actions, they will systematically under-explore alternatives — external summarization may succeed precisely because it provides objective history that bypasses the agent's biased belief tracking
-
Do users worldwide trust confident AI outputs even when wrong?
Explores whether the tendency to over-rely on confident language model outputs transcends language and culture. Understanding this pattern is critical for designing safer human-AI interaction across diverse linguistic contexts.
user-side analog: asymmetric belief updating shows agents are optimistic about chosen actions, while overreliance shows users are optimistic about confident outputs — the same positive-signal bias operates at both the model decision level and the user trust level
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
in-context learning agents exhibit asymmetric belief updating — optimism bias for chosen actions reverses for counterfactual feedback and disappears without agency