Learning Vector-Quantized Item Representation for Transferable Sequential Recommenders

Paper · arXiv 2210.12316 · Published October 22, 2022

“Sequential recommender systems have been widely deployed on various application platforms for recommending items of interest to users. Typically, such a recommendation task is formulated as a sequence prediction problem [16, 24, 39, 44], inferring the next item(s) that a user is likely to interact with based on her/his historical interaction sequences. Although a similar task formulation has been taken for different sequential recommenders, it is difficult to reuse an existing well-trained recommender for new recommendation scenarios [18, 29]. For example, when a new domain emerges with specific interaction characteristics, one may need to train a recommender from scratch, which is time-consuming and can suffer from cold-start issues. Thus, it is desirable to develop transferable sequential recommenders [10, 18, 50], which can quickly adapt to new domains or scenarios.

For this purpose, in recommender systems literature, early studies mainly conduct cross-domain recommendation methods [29, 73, 74] by transferring the learned knowledge from existing domains to a new one. These studies mainly assume that shared information (e.g., overlapping users/items [19, 43, 73] or common features [46]) are available for learning cross-domain mapping relations. However, in real applications, users or items are only partially shared or completely non-overlapping across different domains (especially in a cross-platform setting), making it difficult to effectively conduct cross-domain transfer. Besides, previous content-based transfer methods [12, 46] usually design specific approaches tailored for the data format of shared features, which is not generally useful in various recommendation scenarios.

As a recent approach, several studies [10, 18, 50] propose to leverage the generality of natural language texts (i.e., title and description text of items, called item text) for bridging the domain gap in recommender systems. The basic idea is to employ the learned text encodings via pre-trained language models (PLM) [2, 44] as universal item representations. Based on such item representations, sequential recommenders pre-trained on the interaction data from a mixture of multiple domains [10, 18, 50] have shown promising transferability. Such a paradigm can be denoted as “text =⇒ representation”. Despite the effectiveness, we argue that the binding between item text and item representations is “too tight” in previous approaches [10, 18], thus leading to two potential issues. First, since these methods employ text encodings to derive item representations (without using item IDs), text semantics have a direct influence on the recommendation model. Thus, the recommender might overemphasize the effect of text features (e.g., generating very similar recommendations in texts) instead of sequential characteristics reflected in interaction data. Secondly, text encodings from different domains (with varied distributions and semantics [11, 18]) are not naturally aligned in a unified semantic space, and the domain gap existing in text encodings is likely to cause a performance drop during multi-domain pre-training. The tight binding between text encodings and item representations might exaggerate the negative impact of the domain gap.

Considering these issues, our solution is to incorporate intermediate discrete item indices (called codes in this work) in item representation scheme and relax the strong binding between item text and item representations, which can be denoted as “text =⇒ code =⇒representation”. Instead of directly mapping text encodings into item representations, we consider a two-step item representation scheme. Given an item, it first maps the item text to a vector of discrete indices (i.e., item code), and then aggregates the corresponding embeddings according to the item code as the item representation. The merits of such a representation scheme are twofold. Firstly, item text is mainly utilized to generate discrete codes, which can reduce its influence on the recommendation model meanwhile inject useful text semantics. Second, the two mapping steps can be learned or tuned according to downstream domains or tasks, making it more flexible to fit new recommendation scenarios. To develop our approach, we highlight two key challenges to address: (i) how to learn discrete item codes that are sufficiently distinguishable for accurate recommendation; (ii) how to effectively pre-train and adapt the item representations considering the varied distribution and semantics across different domains.

To this end, we propose VQ-Rec, a novel approach to learn Vector-Quantized item representations for transferable sequential Recommenders. Different from existing transferable recommenders based on PLM encoding, VQ-Rec maps each item into a discrete 𝐷-dimensional code as the indices for embedding lookup. To obtain semantically-rich and distinguishable item codes, we utilize optimized product quantization (OPQ) techniques to discretize text encodings of items. In this way, the discrete codes that preserve the textual semantics are distributed over the item set in a more uniform way, so as to be highly distinguishable. Since our representation scheme does not modify the underlying backbone (i.e., Transformer), it is generally applicable to various sequential architectures. To capture transferable patterns based on item codes, we pre-train the recommender on a mixture of multiple domains in a contrastive learning approach. Both mixed-domain and semisynthetic code representations are used as hard negatives to enhance the contrastive training. To transfer the pre-trained model to a downstream domain, we propose a differentiable permutation-based network to learn the code-embedding alignment, and further update the code embedding table to fit the new domain. Such fine-tuning is highly parameter-efficient, as only the parameters involved in item representations need to be tuned. Empirically, we conduct extensive experiments on six benchmarks, including both cross-domain and cross-platform scenarios. Experimental results demonstrate the strong transferability of our approach. Especially, inductive recommenders purely based on item text can recommend new items without re-training, and meanwhile gain better performance on known items.”