Recommender Systems Language Understanding and Pragmatics Conversational AI Systems

Can LLMs explain recommenders by mimicking their internal states?

Can training language models to align with both a recommender's outputs and its internal embeddings produce explanations that are both faithful and human-readable? This explores whether dual-access interpretation solves the fundamental tension between behavioral accuracy and interpretability.

Note · 2026-05-03 · sourced from Recommenders LLMs
What breaks when specialized AI models reach real users? How do people build trust with conversational AI?

Conventional explainability for recommenders trains a separate surrogate model to mimic the target's predictions and reads off feature importance from the surrogate. This works at a behavioral level — the surrogate predicts what the target predicts — but doesn't probe internal mechanism. It's a black-box explanation of a black-box.

RecExplainer's three-tier alignment scheme bridges this gap. Behavior alignment is the conventional surrogate: feed the LLM user profile text and train it to predict the items the target recommender would suggest. The LLM learns to reproduce target predictions from textual input.

Intention alignment goes deeper. Instead of giving the LLM only text, it incorporates the target recommender's neural-layer activations (the embeddings of users and items in the target's latent space) into the LLM's prompt. The LLM is fine-tuned to understand these embeddings as a multimodal input — text and recommendation-model embeddings are two modalities. Predictions now leverage the target's internal representation, not just its outputs.

Hybrid alignment combines both: text and embeddings together. The LLM produces explanations that integrate the human-interpretable reasoning the text supports and the high-fidelity behavior matching the embeddings provide.

The general principle: when you need to interpret a black-box model, behavioral mimicry and internal-state inspection are complementary. Each alone is partial — behavioral mimicry misses the mechanism, internal inspection misses the human-readable explanation. Combining them produces explanations that are both faithful to the target and intelligible to users. The pattern generalizes beyond recommendation: any model interpretation problem benefits from this dual access.


Source: Recommenders LLMs

Related concepts in this collection

Concept map
13 direct connections · 112 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

RecExplainer uses LLM as surrogate model with three alignment methods — behavior intention and hybrid for recommendation interpretability