Conversational AI Systems Language Understanding and Pragmatics Psychology and Social Cognition

Why do LLM meeting summaries fail to help individuals?

Current LLM summarization treats all meeting participants the same, but organizational contexts require personalized recaps. What barriers prevent systems from learning what matters to each person?

Note · 2026-02-23 · sourced from Reading Summarizing
Why do AI conversations reliably break down after multiple turns? How do you build domain expertise into general AI models?

LLM-based dialogue summarization shows promise for meeting recap — but a user study with seven participants evaluating real work meetings reveals three specific failure modes that prevent organizational adoption.

The personal relevance gap. LLM recap summarizes what was globally important in the meeting, not what was personally relevant to each participant. A designer cares about the design decisions made. A project manager cares about timeline commitments. The same meeting requires different summaries for different participants, and current summarization has no model of what matters to whom. This is the personalization problem applied to collaborative settings — since Do user outputs outperform inputs for LLM personalization?, the system would need to learn from each participant's interaction history what they care about.

The mis-attribution problem. When the system attributes a statement to the wrong participant, the consequences extend beyond simple factual error. Mis-attributions are detrimental to group dynamics — they can create false impressions about who committed to what, who raised which concern, or who proposed which idea. In organizational settings where credit, accountability, and trust are at stake, getting attribution wrong damages the social fabric the meeting was meant to build. This parallels the finding that Does warmth training make language models less reliable? — errors in social contexts have consequences that accuracy metrics don't capture.

Context-dependent representation. Two distinct recap formats serve different needs: "highlights" (important moments, key decisions) for quick scanning and cognitive efficiency, and "hierarchical minutes" (structured, ordered, detailed) for reference and alignment. The rationale comes from cognitive science — perception and recall operate differently, and one format cannot serve both. Since Do generated interfaces outperform text-based chat for most tasks?, the representation should adapt to context rather than defaulting to a single format.

The design implication: AI summarization in collaborative organizational settings must learn from natural interactions what matters to each participant. Pure content summarization — extracting "what happened" — is insufficient when the question is "what happened that matters to me."


Source: Reading Summarizing

Related concepts in this collection

Concept map
16 direct connections · 136 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

llm meeting summaries fail on personal relevance and speaker attribution — mis-attributions harm group dynamics in organizational settings