Neural Collaborative Filtering vs. Matrix Factorization Revisited

Paper · arXiv 2005.09683 · Published May 19, 2020
Recommenders Architectures

“Embedding based models have been the state of the art in collaborative filtering for over a decade. Traditionally, the dot product or higher order equivalents have been used to combine two or more embeddings, e.g., most notably in matrix factorization. In recent years, it was suggested to replace the dot product with a learned similarity e.g. using a multilayer perceptron (MLP). This approach is often referred to as neural collaborative filtering (NCF). In this work, we revisit the experiments of the NCF paper that popularized learned similarities using MLPs. First, we show that with a proper hyperparameter selection, a simple dot product substantially outperforms the proposed learned similarities. Second, while a MLP can in theory approximate any function, we show that it is non-trivial to learn a dot product with an MLP. Finally, we discuss practical issues that arise when applying MLP based similarities and show that MLPs are too costly to use for item recommendation in production environments while dot products allow to apply very efficient retrieval algorithms. We conclude that MLPs should be used with care as embedding combiner and that dot products might be a better default choice.”

“In this work, we study MLP versus dot product similarities in more detail. We start with revisiting the experiments of the NCF paper [16] that popularized the use of MLPs in recommender systems. We show that a carefully configured dot product baseline largely outperforms the MLP. At first glance, it looks surprising that the MLP, which is a universal function approximator, does not perform at least as well as the dot product. We investigate this issue in a second experiment and show empirically that learning a dot product with high accuracy for a decently large embedding dimension requires a large model capacity as well as many training data. Besides prediction quality, we also discuss the inference cost of dot product versus MLPs, where dot products have a large advantage due to the existence of efficient maximum inner product search algorithms. Finally, we discuss that dot product vs MLP is not a question of whether a deep neural network (DNN) is useful. In fact, many of the most competitive DNN models, such as transformers in natural language processing [9] or resnets for image classification [14], use a dot product similarity in their output layer. To summarize, this paper argues that MLP-based similarities for combining embeddings should be used with care. While MLPs can approximate any continuous function, their inductive bias might not be well suited for a similarity measure. Unless the dataset is large or the embedding dimension is very small, a dot product is likely a better choice.”