A Survey on Large Language Models for Recommendation

Paper · arXiv 2305.19860 · Published May 31, 2023

LikangWu , Zhi Zheng , Zhaopeng Qiu , Hao Wang, Hongchao Gu, Tingjia

Shen, Chuan Qin, Chen Zhu, Hengshu Zhu, Qi Liu, Hui Xiong, Enhong Chen

University of Science and Technology of China, Career Science Lab, BOSS Zhipin, Hong Kong

https://arxiv.org/abs/2305.19860

“(1) LLM Embeddings + RS. This modeling paradigm views the language model as a feature extractor, which feeds the features of items and users into LLMs and outputs corresponding embeddings. A traditional RS model can utilize knowledge-aware embeddings for various recommendation tasks.

(2) LLM Tokens + RS. Similar to the former method, this approach generates tokens based on the inputted items’ and users’ features. The generated tokens capture potential preferences through semantic mining, which can be integrated into the decision-making process of a recommendation system.

(3) LLM as RS. Different from (1) and (2), this paradigm aims to directly transfer pre-trained LLM into a powerful recommendation system. The input sequence usually consists of the profile description, behavior prompt, and task instruction. The output sequence is expected to offer a reasonable recommendation result.”

“Prompting

This category of works aims to design more suitable instructions and prompts to help LLMs better understand and solve the recommendation tasks. Liu et al. (2023a) systematically evaluated the performance of ChatGPT on five common recommendation tasks, i.e., rating prediction, Sequential Recommendation, direct recommendation, explanation generation, and review summarization. They proposed a general recommendation prompt construction framework, which consists of: (1) task description, adapting recommendation tasks to natural language processing tasks; (2) behavior injection, incorporating user-item interaction to aid LLMs in capturing user preferences and needs; (3) format indicator, constraining the output format and making the recommendation results more comprehensible and assessable. Similarly, Dai et al. (2023) conducted an empirical analysis of ChatGPT’s recommendation abilities on three common information retrieval tasks, including point-wise, pair-wise, and list-wise ranking. They proposed different prompts for different kinds of tasks and introduced the role instructions (such as You are a news recommendation system now.) at the beginning of the prompts to enhance the domain adaption ability of ChatGPT. “

“Compared to discriminative models, generative models have better natural language generation capabilities. Therefore, unlike most discriminative model-based approaches that align the representation learned by LLMs to the recommendation domain, most generative model-based work translates recommendation tasks as natural language tasks, and then applies techniques such as in-context learning, prompt tuning, and instruction tuning to adapt LLMs to directly generate the recommendation results.”

Challenges

“Position Bias. In the generative language modeling paradigm of recommendation systems, various information such as user behavior sequences and recommended candidates are input to the language model in the form of textual sequential descriptions, which can introduce some position biases inherent in the language model itself [Lu et al., 2021]. For example, the order of candidates affects the ranking results of LLM-based recommendation models, i.e., LLM often prioritizes the items in the top order. And the model usually cannot capture the behavior order of the sequence well.

Popularity Bias. The ranking results of LLMs are influenced by the popularity levels of the candidates. Popular items, which are often extensively discussed and mentioned in the pre-training corpora of LLMs, tend to be ranked higher. Addressing this issue is challenging as it is closely tied to the composition of the pre-trained corpus.

Fairness Bias. Pre-trained language models have exhibited fairness issues related to sensitive attributes, which are influenced by the training data or the demographics of the individuals involved in certain task annotations [Ferrara, 2023] “

“However, most existing LLM-based work only uses the name to represent items, and a list of item names to represent users, which is insufficient for modeling users and items accurately.”

“Additionally, it is critical to translate a user’s heterogeneous behavior sequence (such as clicks, adding to cart, and purchases in the e-commerce domain) into natural language for preference modeling. ID-like features have been proven effective in traditional recommendation models, but incorporating them into prompts to improve personalized recommendation performance is also challenging.”

“…it is critical to translate a user’s heterogeneous behavior sequence (such as clicks, adding to cart, and purchases in the e-commerce domain) into natural language for preference modeling. ID-like features have been proven effective in traditional recommendation models, but incorporating them into prompts to improve personalized recommendation performance is also challenging.”