Why do LLMs give unrealistic survey responses?

Direct numerical elicitation from language models produces skewed, over-positive survey distributions. Is this a fundamental model limitation, or an artifact of how we ask the question?

Synthesis note · 2026-06-03 · sourced from Personas Personality

Asking an LLM directly for a numerical rating produces unrealistic, skewed response distributions — the documented failure of synthetic-consumer panels. Semantic Similarity Rating (SSR) changes the elicitation, not the model: prompt for a free-text response, then map it to a Likert distribution via embedding similarity to a set of reference statements. On an extensive dataset — 57 personal-care product surveys, 9,300 human responses — SSR reaches 90% of human test-retest reliability with realistic distributions (KS similarity > 0.85) and yields rich qualitative rationales, all with no fine-tuning.

The keeper is diagnostic: the well-known pathologies of LLM-as-survey-respondent — skewed distributions, over-positivity, regression-to-the-mean — are artifacts of how responses are elicited, not intrinsic limitations of the model. Shift from direct numerical elicitation to textual elicitation plus similarity mapping and the artifacts largely dissolve. This relocates the problem from "LLMs can't simulate consumers" to "we were asking the question wrong."

This sharpens the persona-simulation cluster's central tension. Since Can AI agents learn people better from interviews than surveys? shows fidelity rises with richer input, SSR shows fidelity also rises with a better output elicitation channel — both are measurement-design wins. But the caution from Can AI personas reliably replicate human experiment results? still applies: high aggregate fidelity can coexist with unreliable fine-grained effects, so SSR's realism is a measurement improvement, not a guarantee of validity.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map

12 direct connections · 80 in 2-hop network ·medium cluster Open in graph ↗

Why do LLMs give unrealistic survey responses? Can AI agents learn people better from interviews … Can AI personas reliably replicate human experimen… Can language models simulate belief change in peop…

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Can AI agents learn people better from interviews than surveys? Can rich interview transcripts seed more accurate generative agents than demographic data or survey responses? This matters because it challenges how we build digital simulations of real people.
richer input raises fidelity; SSR shows better output elicitation does too
Can AI personas reliably replicate human experiment results? Exploring whether LLM-based persona simulations accurately reproduce experimental findings from published psychology and marketing research, and what factors determine when they succeed or fail.
caution: aggregate realism can mask unreliable fine-grained effects
Can language models simulate belief change in people? Current LLM social simulators treat behavior as input-output mappings without modeling internal belief formation or revision. Can they be redesigned to actually track how people think and change their minds?
SSR improves behavior-level simulation realism without addressing the thought-simulation critique

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs simulate human survey responses faithfully only when text is elicited and mapped to scales via embedding similarity — unrealistic numerical distributions are an elicitation artifact

Why do LLMs give unrealistic survey responses?

Related concepts in this collection 3

Related papers in this collection 8

Search by related questions 4