Prompting Large Language Models for Recommender Systems: A Comprehensive Framework and Empirical Analysis
“In order to alleviate the problem of information overload [31, 76], recommender systems explore the needs of users and provide them with recommendations based on their historical interactions, which are widely studied in both industry and academia [23, 28, 29, 84]. Over the past decade, various recommendation algorithms have been proposed to solve recommendation tasks by capturing the personalized interaction patterns from user behaviors [39, 144]. Despite the progress of conventional recommenders, the performance is highly dependent on the limited training data from a few datasets and domains, and there are two major drawbacks. On the one hand, traditional models lack the general world knowledge beyond interaction sequences. For complex scenarios that need to think or plan, existing methods do not have commonsense knowledge to solve such tasks [27, 71, 112, 119]. On the other hand, traditional models cannot truly understand intentions and preferences of users. The recommendation results do not have explainability, and requirements expressed by users in explicit forms such as natural languages are difficult to consider [47, 52, 126].
Recently, Large Language Models (LLMs) such as ChatGPT have demonstrated impressive abilities in solving general tasks [24, 118], showing their potential in developing next-generation recommender systems. The advantages of incorporating LLMs into recommendation tasks are two-fold. Firstly, the excellent performance of LLMs in complex reasoning tasks indicates the rich world knowledge and superior inference ability, which can effectively compensate for the local knowledge of traditional recommenders [1, 75, 85]. Secondly, the language modeling abilities of LLMs can seamlessly integrate massive textual data, enabling them to extract features beyond IDs and even understand user preferences explicitly [30, 50]. Therefore, researchers have attempted to leverage LLMs for recommendation tasks.
Typically, there are three ways to employ LLMs to make recommendations: (1) LLMs can serve as the recommender to make recommendation decisions, encompassing both discriminative and generative recommendations [3, 12, 20, 32, 133]. (2) LLMs can be leveraged to enhance traditional recommendation models by extracting semantic representations of users and items from text corpora. The extensive semantic information and robust planning capabilities of LLMs are integrated into traditional models [1, 16, 27, 31, 71, 107, 112, 119]. (3) LLMs are utilized as the recommendation simulator to execute external generative agents in the recommendation process, where users and items may be empowered by LLMs to stimulate the virtual environment [18, 100, 101, 130, 132]. We mainly focus on the first scenario in this paper.”
“The research on how to improve recommendation models with LLMs can be divided into three categories, i.e., LLM as feature encoder, LLM for data augmentation and LLM co-optimized with domain-specific models.
• LLM as feature encoder. The representation embeddings of users and items are important factors in classical recommender systems [28, 84]. LLMs serving as feature encoders can generate related textual data of users and items, and enrich their representations with semantic information. U-BERT [80] injected user representations with user review texts, item review texts and domain IDs, augmenting the contextual semantic information in user vectors. Wu et al. [114], on the other hand, employed language models to generate item representations for news recommendation. With the development of LLMs and prompting strategies, BDLM [134] constructed the prompt consisting of interaction and contextual information into LLMs, and obtained top-layer feature embeddings as user and item representations.
• LLM for data augmentation. For this paradigm, LLMs are required to generate auxiliary textual information for data augmentation [1, 66, 71, 112]. By using prompting or in-context learning strategies, the related knowledge will be extracted out in different text forms to facilitate recommendation tasks [16, 107, 119]. One form of auxiliary textual information is summarization or text generation, enabling LLMs to enrich representations of users or items [110]. For example, Du et al. [16] proposed a job recommendation model which utilized the capability of LLMs for summarization to extract user information and job requirements. Considering item descriptions and user reviews, KAR [119] extracted the reasoning knowledge on user preferences and the factual knowledge on items through specifically designed prompts, while SAGCN [66] utilized a chain-based prompting strategy to generate semantic information. Another form of using the textual features generated from LLMs is for graph augmentation in the recommendation field. LLMRG [107] leveraged LLMs to extend nodes in recommendation graphs. The resulting reasoning graph was encoded using GNN, which served as additional input to enhance sequential models. LLMRec [112] adopted three types of prompts to generate information for graph augmentation, including implicit feedback, user profile and item attributes.
• LLM co-optimized with domain-specific models. The categories mentioned above mainly focus on the impact of common knowledge for domain-specific models [110]. However, LLM itself often struggles to handle domain-specific tasks due to the lack of task-related information [40, 125]. Therefore, some studies conducted experiments to bridge the gap between LLMs and domain-specific models. BDLM [134] proposed an information sharing module serving as an information storage mechanism between LLMs and domain-specific models. The user embeddings and item embeddings stored in the module were updated in turn by the LLM and the domain-specific model, enhancing the performance of both sides.”
“This paper aims to provide a comprehensive exploration of Large Language Models (LLMs) to serve as recommender systems. It presents a systematic review of the advancements made in LLM-based recommendations, generalizing related work into multiple scenarios and tasks in terms of LLMs and prompts. We also conduct extensive experiments on two public datasets to investigate empirical findings for recommendation with LLMs. Our objective is to assist researchers in gaining a deeper understanding of the characteristics, strengths, and limitations of LLMs utilized as recommender systems. Considering the significant progress in LLMs, the development of LLM-based recommendations holds the potential to better align the powerful capabilities of LLMs with the evolving needs of intended users in the field of recommender systems. By addressing current challenges, we hope that our work will contribute to the advancement of LLM-based recommendations and serve as an inspiration for future research efforts. Last but not least, we outline promising directions for future research in utilizing LLMs for recommendation as follows.
• Efficiency optimization of LLMs for recommendation. The key limitation of leveraging LLMs in industrial recommender systems is efficiency[17, 51, 116], including considerations of both time and space. On the one hand, the fine-tuning and inference efficiency of LLMs cannot compare to traditional recommendation models [32, 141]. While techniques such as parameter-efficient fine-tuning can aid in keeping LLMs updated in a computationally efficient manner, recommender systems need to iterate continuously over time, i.e., incremental learning. Frequent updates of LLMs inevitably bring spatial and temporal burdens to recommender systems [88]. On the other hand, billions of parameters in LLMs also pose challenges for the lightweight deployment of recommendation algorithms [95, 96]. Therefore, efficiency optimization of LLMs utilized as recommender systems is one of the prerequisites for large-scale applications, which has widespread application prospects and scientific research values [35, 88, 98].
• Knowledge distillation of LLMs for recommendation. Since LLMs as recommenders are limited by efficiency, another feasible approach is to distill [53, 91] the recommendation capabilities of LLMs into lightweight models, striking a balance between efficiency and effectiveness. Specifically, knowledge distillation is a classic model compression method adopted in recommender systems [91, 94], with the core idea of guiding lightweight student models to “imitate” teacher models with better performance and more complex structures such as LLMs. In the field of recommender systems, the collaborative optimization of LLMs and recommendation models can also be seen as the distillation process to inject knowledge from LLMs to traditional recommenders, enhancing representations of users and items from semantic features [53, 79, 119]. Due to the fact that knowledge distillation can improve efficiency while retaining the capabilities of LLMs for recommendation, more applications need to be fully explored.
• Multimodal recommendations with LLMs. In addition to IDs and text, multimodal recommendations with LLMs hold considerable promise and warrant comprehensive exploration with the evolving landscape of media consumption [123, 137]. The essence of multimodal recommendations resides in the fusion of textual and visual information for enhanced user engagement [26, 77], and the dual functionality of LLMs is pivotal in this context. LLMs possess the capability to function as multimodal LLMs, enabling the incorporation and encoding of visual information extracted from images. Furthermore, images can be transformed into textual representations by multimodal encoders first, and LLMs are mainly used for the subsequent integration of diverse modalities [26]. In addition, the rich multimodal attributes also provide a basis for diversified recommendation results [63]. As the field progresses, an emphasis on the reproducibility, benchmarking, and standardization of evaluation datasets and metrics will be essential to foster a cohesive and informed advancement in multimodal recommender systems leveraging LLMs.”