Learning Distributed Representations from Reviews for Collaborative Filtering

Paper · arXiv 1806.06875 · Published June 18, 2018

“Collaborative filtering has been successfully used for recommendation systems (see, e.g., [17]). A typical approach to using collaborative filtering for recommendation systems is to consider all the observed ratings given by a set of users to a set of products as elements in a matrix, where the row and column of this matrix correspond to users and products, respectively. As the observed ratings is typically only a small subset of the possible ratings (all users rating all products), this matrix is sparse. The goal of collaborative filtering is to fill in the missing values of this matrix: to predict, for each user, the rating of products the user has not rated. In this setting, collaborative filtering is usually cast as a problem of matrix factorization with missing values [10, 16, 18]. The sparse matrix is factorized into a product of two matrices of lower rank representing a user matrix and a product matrix. Once these matrices are estimated, a missing observation can be trivially reconstructed by taking a dot product of a corresponding user vector (or representation) and a product vector (or representation).

In this formulation of collaborative filtering, an important issue of data sparsity arises. For instance, the dataset provided as a part of the Net ix Challenge1 had only 100,480,507 observed ratings out of more than 8 billion possible ratings2 (user / product pairs) meaning that 99% of the values were missing. This data sparsity easily leads to naive matrix factorization over fitting the training set of observed ratings [10]. In this paper, we are interested in regularizing the collaborative filtering matrix factorization using an additional source of information: reviews written by users in natural language. Recent work has shown that better rating prediction can be obtained by incorporating this kind of text-based side information [13, 12, 1]. Motivated by these recent successes, here we explore alternative approaches to exploiting this side information. Specifically, we study how different models of reviews can impact the performance of the regularization.

We introduce two approaches to modeling reviews and compare these to the current state-of-the-art LDA-based approaches [13, 12]. Both models have previously been studied as neural-network-based document models. One is based on the Bag-of-Words Paragraph Vector [11]. This model is similar to the existing LDA-based model, but, as we argue, it offers a more flexible natural language model. The other is a recurrent neural network (RNN) based approach. RNNs have recently become very popular models of natural language for a wide array of tasks [11]. Here we will find that despite the considerable additional modelling power brought by the RNN, it does not o er better performance when used as a regularizer in this context.”