RecExplainer: Aligning Large Language Models for Recommendation Model Interpretability

Paper · arXiv 2311.10947 · Published November 18, 2023

we propose a new model interpretation approach for recommender systems, by using LLMs as surrogate models and learn to mimic and comprehend target recommender models. Specifically, we introduce three alignment methods: behavior alignment, intention alignment, and hybrid alignment. Behavior alignment operates in the language space, representing user preferences and item information as text to learn the recommendation model’s behavior; intention alignment works in the latent space of the recommendation model, using user and item representations to understand the model’s behavior; hybrid alignment combines both language and latent spaces for alignment training. To demonstrate the effectiveness of our methods, we conduct evaluation from two perspectives: alignment effect, and explanation generation ability on three public datasets. Experimental results indicate that our approach effectively enables LLMs to comprehend the patterns of recommendation models and generate highly credible recommendation explanations.

The LLM is then trained to emulate the recommendation model’s predictive patterns—given a user’s profile as input, the LLM is fine-tuned to predict the items that the recommendation model would suggest to the user. We refer to this approach as behavior alignment.

However, similar to traditional surrogate model-based interpretability, behavior alignment merely mimics predictive observations from outside the model, attempting to deduce what is happening within the black-box. We argue that a more profound way to interpret models involves enabling the LLM to directly comprehend the neural layers of the recommender model. Therefore, we propose an alternative approach called intention alignment, wherein the embeddings (i.e., activations of neural layers) of the recommender model are incorporated into the LLM’s prompts to represent user and item information, and the LLM is fine-tuned to understand these embeddings. This approach can be considered as a multimodal model, with textual words and recommendation model embeddings representing two distinct data modalities.