Can a linear model beat deep collaborative filtering?
Does a shallow linear autoencoder with a zero-diagonal constraint outperform deeper neural models on collaborative filtering tasks? This challenges the field's assumption that depth and nonlinearity drive performance.
A surprising empirical result: a linear model with no hidden layer outperforms most deep collaborative-filtering models. ESLER (called easer) is a single item-item weight matrix B trained as an autoencoder where the input vector is the user's interaction history and the output reconstructs the same history. The single non-trivial constraint is that the diagonal of B must be zero — an item cannot use itself to predict itself.
This constraint is doing all the work. Without it, the model trivially copies inputs to outputs and learns nothing. With it, predicting whether a user likes item i forces the model to express i in terms of the other items the user interacted with, which is exactly what generalization in collaborative filtering requires. About 60% of the learned weights turn out to be negative, indicating the model is also learning dissimilarities between items, not just similarities. Setting negative weights to zero degrades performance to roughly the level of L1-regularized SLIM, suggesting that what made easer special wasn't sparsity but the ability to encode anti-affinity.
The closed-form training takes a few lines of code and orders of magnitude less time than SLIM. The result challenges the field's assumption that depth and non-linearity are essential for CF — the right structural constraint matters more than expressive capacity, mirroring the Rendle et al. dot-product result for similarity functions.
Source: Recommenders Architectures
Related concepts in this collection
-
Can simpler models beat deep networks for recommendation systems?
Does removing hidden layers and constraining self-similarity create a more effective collaborative filtering approach than deep autoencoders? This challenges the assumption that architectural depth drives performance.
extends: paired re-statement of the same EASE/easer result emphasizing the precision-matrix-vs-covariance distinction
-
Why does dot product beat MLP-based similarity in practice?
Neural Collaborative Filtering theory suggests MLPs should outperform dot products as universal approximators. But what explains the empirical gap, and what role do data scale and deployment constraints play?
complements: paired anti-deep-CF lesson — the right inductive bias matters more than the universal approximation guarantee
-
Can MLPs learn to match dot product similarity in practice?
Universal approximation theory suggests MLPs should learn any similarity function, including dot product. But does this theoretical promise hold up when training on real, finite datasets with practical constraints?
complements: capacity-vs-bias point at the similarity layer; easer makes it at the architecture-depth layer
-
Why does multinomial likelihood work better for click prediction?
Explores whether the choice of likelihood function—multinomial versus Gaussian or logistic—affects recommendation performance, and what structural properties make one better suited to modeling user clicks.
complements: another simpler-with-the-right-prior result — likelihood choice matters more than depth
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
ESLER easer beats deep models on collaborative filtering by constraining self-similarity to zero — proving model depth is not what mattered