Conversational AI Systems Language Understanding and Pragmatics Psychology and Social Cognition

Why do LLM meeting summaries fail to help individuals?

Current LLM summarization treats all meeting participants the same, but organizational contexts require personalized recaps. What barriers prevent systems from learning what matters to each person?

Note · 2026-02-23 · sourced from Reading Summarizing

LLM-based dialogue summarization shows promise for meeting recap — but a user study with seven participants evaluating real work meetings reveals three specific failure modes that prevent organizational adoption.

The personal relevance gap. LLM recap summarizes what was globally important in the meeting, not what was personally relevant to each participant. A designer cares about the design decisions made. A project manager cares about timeline commitments. The same meeting requires different summaries for different participants, and current summarization has no model of what matters to whom. This is the personalization problem applied to collaborative settings — since Do user outputs outperform inputs for LLM personalization?, the system would need to learn from each participant's interaction history what they care about.

The mis-attribution problem. When the system attributes a statement to the wrong participant, the consequences extend beyond simple factual error. Mis-attributions are detrimental to group dynamics — they can create false impressions about who committed to what, who raised which concern, or who proposed which idea. In organizational settings where credit, accountability, and trust are at stake, getting attribution wrong damages the social fabric the meeting was meant to build. This parallels the finding that Does warmth training make language models less reliable? — errors in social contexts have consequences that accuracy metrics don't capture.

Context-dependent representation. Two distinct recap formats serve different needs: "highlights" (important moments, key decisions) for quick scanning and cognitive efficiency, and "hierarchical minutes" (structured, ordered, detailed) for reference and alignment. The rationale comes from cognitive science — perception and recall operate differently, and one format cannot serve both. Since Do generated interfaces outperform text-based chat for most tasks?, the representation should adapt to context rather than defaulting to a single format.

The design implication: AI summarization in collaborative organizational settings must learn from natural interactions what matters to each participant. Pure content summarization — extracting "what happened" — is insufficient when the question is "what happened that matters to me."

Source: Reading Summarizing

Related concepts in this collection

Do user outputs outperform inputs for LLM personalization? Does a user's history of outputs (responses, endorsed content) matter more for personalization than their input queries? This explores what actually drives effective personalization in language models.
the mechanism for learning personal relevance
Does warmth training make language models less reliable? Explores whether training models for empathy and warmth creates a hidden trade-off that degrades accuracy on medical, factual, and safety-critical tasks—and whether standard safety tests catch it.
errors in social contexts have invisible consequences
Do generated interfaces outperform text-based chat for most tasks? Explores whether LLMs should create interactive UIs instead of text responses, and under what conditions users prefer dynamic interfaces to traditional conversational chat.
adaptive representation over fixed format
Why do AI agents misalign with what users actually want? UserBench explores how often AI models fully understand user intent across multi-turn interactions. The study reveals that human communication is underspecified, incremental, and indirect — traits that challenge current models to actively clarify goals.
the intent alignment gap applied to organizational settings

Concept map

16 direct connections · 136 in 2-hop network ·medium cluster

Why do LLM meeting summaries fail to help indivi… Do user outputs outperform inputs for LLM personal… Does warmth training make language models less rel… Do generated interfaces outperform text-based chat… Why do AI agents misalign with what users actually…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

llm meeting summaries fail on personal relevance and speaker attribution — mis-attributions harm group dynamics in organizational settings