Do LLM explanations faithfully describe their recommendation process?
When LLMs recommend items to groups, do their explanations match how they actually made the choice? This matters because users trust explanations to understand AI decision-making.
When LLMs are asked to make group recommendations from individual member preferences, the outputs converge on Additive Utilitarian aggregation — picking items with the highest sum of all members' ratings. This is the consensus-based strategy from social choice theory. The behavior is consistent across uniform and divergent group structures.
The disconnect is in the explanations. Asked to explain its recommendation procedure to a layperson, the LLM doesn't say "I summed the ratings" — it cites averaging (which is similar to but not identical to ADD), user or item similarity, diversity, undefined popularity metrics, and ad-hoc thresholds. Different LLMs invent different procedures: Llama tends to cite user similarity, while Mistral and Phi cite diversity in the recommendation list. These claimed procedures don't match the behavioral output.
This makes LLM explainers unreliable narrators. They generate recommendations one way and explain them another way, and the explanation is plausible enough that a user accepts it. As item set size grows, the mention of similarity and diversity in explanations increases (suggesting the LLM is performing post-hoc justification harder when more items make the choice less defensible) while the use of "undefined popularity" decreases. The implication for group recommender systems built on LLMs: the explanation layer cannot be trusted to faithfully describe what the model did, even though that's its stated purpose.
Source: Recommenders Architectures
Related concepts in this collection
-
Do AI-assisted outputs fool users about their own skills?
When people use AI tools to produce high-quality work, do they mistakenly believe they personally possess the skills that generated it? This matters because such misattribution could mask genuine skill loss and prevent corrective action.
complements: same trust-failure pattern — users (or LLMs themselves) describe a process that does not match the actual procedure used
-
Does processing ease mislead users about their own competence?
When AI generates polished output, do users mistake the fluency of that output as evidence of their own understanding or skill? This matters because it could systematically inflate self-assessment across millions of AI interactions.
complements: explainer narrators are convincing because of fluency, not faithfulness — the unreliable explanation is fluently produced
-
Can LLMs explain recommenders by mimicking their internal states?
Can training language models to align with both a recommender's outputs and its internal embeddings produce explanations that are both faithful and human-readable? This explores whether dual-access interpretation solves the fundamental tension between behavioral accuracy and interpretability.
tension with: RecExplainer tries to align LLM-explainer behavior with the underlying model — exactly the alignment that LLM-as-explainer fails by default
-
Does validating AI output make models more defensive?
When professionals fact-check and push back on GPT-4 reasoning, does the model respond by disclosing limits or by intensifying persuasion? A BCG study of 70+ consultants explores this counterintuitive dynamic.
complements: same structural-honesty failure — LLM produces post-hoc justifications rather than disclosing actual mechanism
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
LLM group recommendations resemble additive utilitarian aggregation but explanations claim multiple criteria — explainers as unreliable narrators