INQUIRING LINE

Why do humans publish more negative reviews in public than in private?

This explores why the same person gives a harsher review when they know an audience is watching than when they rate privately — and what that says about reviewing as performance rather than reporting.


This explores why the same person gives a harsher review in public than in private — and the corpus suggests the answer is less about the product and more about how you want to look. The sharpest finding here is that public reviewers actively lower their ratings after reading negative reviews, even when their own experience was positive, because negativity reads as a signal of intelligence and discernment — sounding critical makes you sound smart Why do online reviewers publish negative ratings despite positive experiences?. Private raters, with no audience to impress, show no such drift. So the gap isn't measurement error; it's self-presentation. A public review is a performance staged for onlookers, and the performance rewards the critic.

What makes this more than a one-paper curiosity is that the same self-presentational pressure shows up from the opposite direction in human-machine work. When people talk to a machine, the social goals that govern public behavior — face-saving, impression management, looking a certain way — get suppressed, because the machine has no inner life to perform for Why do people share more openly with machines than humans?. People disclose more honestly to a chatbot precisely because nothing is judging them and nothing is watching Why do people share more with chatbots than humans?. Read together, these flip the public-review puzzle into a clean principle: add an audience and you add a performance tax; remove the audience and candor returns. Negativity bias in public reviews is what the performance tax looks like when critical-sounding is the prestige move.

There's also a compounding twist the corpus surfaces: public negativity doesn't just reflect the audience, it shapes the next reviewer. Ratings aren't independent readings of quality — prior ratings measurably pull later ones, and those nudges accumulate over time Do online ratings actually reflect independent customer opinions?. So one person dialing their score down to look discerning becomes part of the context the next reviewer performs against, and the negativity ratchets. Layer on the fact that only strong-opinion people bother to post at all — small participation costs filter out the lukewarm middle and bias the visible average away from true quality Why do people bother writing online ratings at all? — and public review pools drift negative for reasons that have little to do with the products.

The most surprising corner: AI inherits the *wrong* half of this. Off-the-shelf LLMs default to politeness because alignment training rewards agreeableness, so they write glowing reviews even for products the user hated — the mirror image of the human public-negativity bias Why do LLMs generate polite reviews even when users hated products?. Getting a model to write an authentically negative review takes deliberate work: feeding it the user's actual rating history and fine-tuning on it to override the trained niceness Can user history override an LLM's politeness bias in reviews?. So humans over-perform criticism in public and machines over-perform politeness everywhere — and the thing you'd want from a review, an honest signal, is exactly what the audience (human or training) keeps distorting in whichever direction it's rewarded.


Sources 7 notes

Why do online reviewers publish negative ratings despite positive experiences?

Posters systematically reduce their ratings in public when exposed to negative reviews, even with positive personal experience—because negative reviewers appear more intelligent. Private raters show no such shift, revealing a self-presentational mechanism tied to multiple-audience communication.

Why do people share more openly with machines than humans?

Human-machine communication reduces secondary social goals like face-saving and impression management because machines lack inner experience, while novel goals like understandability emerge. This simpler goal structure predicts higher directness and deeper disclosure of sensitive information.

Why do people share more with chatbots than humans?

Chatbots elicit deeper emotional disclosure than human partners not through superior understanding, but by eliminating fears of judgment, rejection, and burdening others. This judgment-free quality activates reciprocity norms and creates therapeutic bonds users experience as real, yet simultaneously enables emotional avoidance and dishonesty.

Do online ratings actually reflect independent customer opinions?

Moe and Trusov decomposed ratings into baseline quality, social-dynamics influence, and error, finding that prior ratings meaningfully affect subsequent ones. These effects have both immediate sales impact and long-term compounding effects through future ratings, though high opinion variance can eventually dampen the distortion.

Why do people bother writing online ratings at all?

Lafky's experiments show raters care about both buyers and sellers rather than purely one or the other. Small participation costs create U-shaped distributions where only strong-opinion raters engage, biasing average ratings away from true quality.

Why do LLMs generate polite reviews even when users hated products?

Off-the-shelf LLMs generate inappropriately positive reviews due to alignment-training politeness bias. Combining user review history, rating signals as satisfaction indicators, and supervised fine-tuning successfully redirects the model to generate negative reviews when warranted.

Can user history override an LLM's politeness bias in reviews?

Review-LLM defeats the politeness bias inherent in RLHF-trained models by aggregating user behavior sequences (prior reviews, item ratings) in the prompt and fine-tuning on these contextualized examples. This dual intervention—personalized context plus explicit satisfaction signals—allows the model to generate authentically negative reviews matching user dissatisfaction.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about audience effects on review authenticity and LLM alignment trade-offs. The question remains: why do humans publish harsher reviews in public than private, and does this gap persist or dissolve under newer models and evaluation methods?

What a curated library found — and when (dated claims, not current truth):
Findings span 2020–2026; treat each as a snapshot, not current ground truth.
• Public reviewers lower ratings after reading negative reviews, treating criticism as a signal of intelligence; private raters show no such drift (self-presentation, not measurement error) (~2024).
• People disclose more honestly to chatbots because machines lack inner lives to perform for; social goals that govern public behavior get suppressed (~2024).
• Prior ratings measurably pull later ones, and negativity nudges compound over time, creating ratcheting bias independent of product quality (~2024).
• Off-the-shelf LLMs default to politeness due to alignment training, generating glowing reviews even when users hated products — the inverse of human public negativity bias (~2024–2025).
• Overriding LLM politeness requires feeding actual user rating history and fine-tuning; authenticity doesn't emerge without explicit signal (~2024).

Anchor papers (verify; mind their dates):
• arXiv:2402.17937 (Feb 2024): Self-disclosure and emotional effects in human–AI dialogue.
• arXiv:2407.07487 (Jul 2024): Review-LLM and personalized review generation from user context.
• arXiv:2510.01395 (Oct 2025): Sycophantic AI and its effects on user intentions and dependence.
• arXiv:2604.22503 (Apr 2026): Persona distortions from AI writing assistance.

Your task:
(1) RE-TEST EACH CONSTRAINT. For the claim that LLMs default to politeness overriding authenticity, ask: have instruction-tuning, constitutional AI, or newer evaluation harnesses (e.g., multi-stakeholder review tasks, adversarial prompts, user-model disagreement metrics) since relaxed this? Does the gap between human public negativity and LLM politeness still hold, or have models learned context-dependent tone? Separate the durable question (do alignment objectives distort signal?) from the perishable limit (do current models exhibit it?).
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last 6 months: look for papers showing LLMs *can* produce authentically negative reviews without fine-tuning, or that human public negativity is itself declining (e.g., due to reputation systems, AI-moderated communities, or norm shifts).
(3) Propose 2 research questions assuming the regime has moved: (A) if models now write honest negative reviews by default, does human-model review disagreement reveal what *we* are performing for? (B) in mixed human–AI review ecosystems, does the machine's politeness or the human's performativity become the attracting behavior for new raters?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines