SYNTHESIS NOTE
Psychology, Society, and Alignment Language, Text, and Discourse

Why do LLMs give unrealistic survey responses?

Direct numerical elicitation from language models produces skewed, over-positive survey distributions. Is this a fundamental model limitation, or an artifact of how we ask the question?

Synthesis note · 2026-06-03 · sourced from Personas Personality

Asking an LLM directly for a numerical rating produces unrealistic, skewed response distributions — the documented failure of synthetic-consumer panels. Semantic Similarity Rating (SSR) changes the elicitation, not the model: prompt for a free-text response, then map it to a Likert distribution via embedding similarity to a set of reference statements. On an extensive dataset — 57 personal-care product surveys, 9,300 human responses — SSR reaches 90% of human test-retest reliability with realistic distributions (KS similarity > 0.85) and yields rich qualitative rationales, all with no fine-tuning.

The keeper is diagnostic: the well-known pathologies of LLM-as-survey-respondent — skewed distributions, over-positivity, regression-to-the-mean — are artifacts of how responses are elicited, not intrinsic limitations of the model. Shift from direct numerical elicitation to textual elicitation plus similarity mapping and the artifacts largely dissolve. This relocates the problem from "LLMs can't simulate consumers" to "we were asking the question wrong."

This sharpens the persona-simulation cluster's central tension. Since Can AI agents learn people better from interviews than surveys? shows fidelity rises with richer input, SSR shows fidelity also rises with a better output elicitation channel — both are measurement-design wins. But the caution from Can AI personas reliably replicate human experiment results? still applies: high aggregate fidelity can coexist with unreliable fine-grained effects, so SSR's realism is a measurement improvement, not a guarantee of validity.

Inquiring lines that use this note as a source 3

This note is a source for these synthesized inquiries. Follow a line forward into its question, or open it to trace back to all of its sources.

Related concepts in this collection 3

This note in its neighbourhood — explore the map, then jump to a related concept in the list below.

Concept map
12 direct connections · 80 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere

Related papers in this collection 8

Papers most semantically related to this note, ranked by cosine similarity in the embedding space.

Original note title

LLMs simulate human survey responses faithfully only when text is elicited and mapped to scales via embedding similarity — unrealistic numerical distributions are an elicitation artifact