Does better summary writing actually increase user engagement?
When AI systems generate more informative push notifications, do users engage more? This explores whether informativeness and engagement always align in real product contexts.
LLM-generated summaries for social network push notifications were objectively more informative and customized than existing templates. They did not improve user engagement. The explanation is structural, not quality-related: a well-summarized notification body contains sufficient information that users do not need to open the notification to understand the content. The optimization target (informativeness) directly undermines the business metric (engagement/clicks).
This is an instance of Goodhart's Law operating through content quality: when you optimize for how informative a message is, you can succeed at informativeness while failing at the behavior the informativeness was supposed to drive. The information was meant to entice users to engage; instead, it satisfied their information need at the notification level.
Two compounding factors emerged from the experiments:
Voice alienation: LLM summarization transformed first-person user voice ("I'm looking for a plumber") into third-person reportage ("neighbor asks about plumbers"). This tonal shift alienated recipients by creating distance from the original social context. The content was more polished but less relational — it sounded like a news brief about a neighbor rather than a neighbor reaching out.
Optimization gap: Without a reward model specifically trained for engagement, or specific model training to tailor user preferences into content generation, in-context learning alone cannot shortcut established templates that have been iteratively refined over years. The control templates were the product of multiple iteration cycles; the LLM-generated alternatives were one-shot productions. Even when LLMs produce "better" content by linguistic quality metrics, they cannot automatically improve engagement metrics that require alignment with user behavioral patterns.
The broader pattern: LLM-generated content is best suited for rapid prototyping of new products but directly using it to improve metrics on mature products that have undergone years of A/B testing often fails. The same dynamic appeared in invitation emails — more informative, more personalized, but not more effective at driving sign-ups. Generic LLM-generated content cannot capture individual personal preferences without further training.
This connects to the alignment tax discussion: since Does preference optimization harm conversational understanding?, we see a parallel where optimizing for one communication quality (informativeness) erodes the behavioral outcome it was meant to serve (engagement). The mechanism differs — RLHF erodes grounding acts while informativeness optimization eliminates click-through motivation — but the pattern is the same: optimizing a proxy metric degrades the downstream target.
Source: Social Media
Related concepts in this collection
-
Does preference optimization harm conversational understanding?
Exploring whether RLHF training that rewards confident, complete responses undermines the grounding acts—clarifications, checks, acknowledgments—that actually build shared understanding in dialogue.
parallel pattern: optimizing for one communication quality undermines the broader communicative goal
-
Can we measure reading efficiency as a quality metric?
How can we quantify whether generated text delivers novel information efficiently or wastes reader attention through redundancy? This matters because standard coherence and fluency scores miss texts that are well-written but informationally dense.
high knowledge density in summaries may be the mechanism: too much information per token eliminates the curiosity gap
-
Do language models generate more novel research ideas than experts?
Explores whether LLMs can break free from expert constraints to generate more novel research concepts. Matters because novelty is often thought to be AI's creative blind spot.
parallel dissociation: higher quality on one dimension doesn't translate to effectiveness on the actual goal dimension
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
more informative AI-generated content paradoxically reduces user engagement because informational sufficiency eliminates the need to click through