Language Understanding and Pragmatics Psychology and Social Cognition LLM Reasoning and Architecture

Do language models learn differently from good versus bad outcomes?

Do LLMs update their beliefs asymmetrically when learning from their own choices versus observing others? This matters for understanding whether agentic AI systems might inherit human cognitive biases.

Note · 2026-02-23 · sourced from Cognitive Models Latent

Using instrumental learning tasks adapted from cognitive psychology (multi-armed bandit variants), LLMs show a systematic optimism bias: they learn more from better-than-expected outcomes than from worse-than-expected ones when learning about their own chosen actions. Three properties of this bias parallel human cognition precisely:

  1. Optimism for chosen actions — the model updates beliefs more strongly when outcomes exceed expectations than when they fall short
  2. Reversal for counterfactual feedback — when learning about the value of the unchosen option, the bias reverses (pessimism about alternatives)
  3. Disappearance without agency — when the model has no control over choices (passive observation), the asymmetry vanishes entirely

The meta-RL validation is critical: idealized in-context learning agents derived through meta-reinforcement learning — which converge onto Bayes-optimal strategies — exhibit the same three behavioral effects. This suggests the asymmetry may be rational rather than a bug. An optimistic agent that overweights positive outcomes from its own actions while underweighting positive outcomes from unchosen alternatives will exploit more aggressively, which can be optimal in certain bandit environments.

The agency-dependence is the most theoretically interesting aspect. The same model shows the bias when it perceives itself as an agent making choices but not when passively observing outcomes. This implies the bias is not a fixed property of the attention mechanism or the training distribution — it is context-dependent, activated by the framing of agency. Since Do large language models make the same causal reasoning mistakes as humans?, this adds another dimension: LLMs don't just replicate human causal reasoning biases but also human motivational biases that depend on perceived agency.

The practical implication for agentic AI: when LLMs are deployed as decision-making agents, they may systematically overweight evidence that their previous decisions were good and underweight evidence that alternative actions would have been better. This is precisely the pattern that produces confirmation bias in human decision-making — and it may be an emergent property of any sufficiently capable in-context learner, not a training artifact.


Source: Cognitive Models Latent

Related concepts in this collection

Concept map
15 direct connections · 160 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

in-context learning agents exhibit asymmetric belief updating — optimism bias for chosen actions reverses for counterfactual feedback and disappears without agency