Can simpler models beat deep networks for recommendation systems?
Does removing hidden layers and constraining self-similarity create a more effective collaborative filtering approach than deep autoencoders? This challenges the assumption that architectural depth drives performance.
The deep-learning trend in collaborative filtering treated more layers as more capacity. EASE — Embarrassingly Shallow AutoEncoder — pushes the opposite direction. It is a linear model with no hidden layer, learning only an item-item weight matrix B. The single non-trivial constraint is that the diagonal of B is forced to zero: an item cannot use itself to predict itself. That constraint forces every item's prediction to be reconstructed from the other items the user has interacted with, which is what generalization in collaborative filtering actually requires.
The model has a closed-form solution to a convex objective, so training is dominated by a matrix inversion rather than gradient descent. On most public datasets EASE outperforms deep, non-linear, and probabilistic models — and beats SLIM, the most similar prior approach, by dropping SLIM's L1 regularization and non-negativity constraint. About 60% of the learned weights end up negative; the dissimilarity (negative weights) between items is structurally important, and removing the ability to learn negatives by setting them to zero collapses accuracy to SLIM levels.
The conceptual lesson is twofold. First, the relevant similarity matrix for CF is the precision matrix, not the covariance matrix that neighborhood-based methods typically use. Second, when a constraint (here, zero-diagonal) is the right inductive bias, simpler models with that constraint can beat deeper models without it. Capacity is not the bottleneck most of the time — the right structural prior is.
Source: Recommenders Architectures
Related concepts in this collection
-
Can a linear model beat deep collaborative filtering?
Does a shallow linear autoencoder with a zero-diagonal constraint outperform deeper neural models on collaborative filtering tasks? This challenges the field's assumption that depth and nonlinearity drive performance.
extends: paired re-statement of the same EASE result emphasizing that anti-affinity (negative weights) is the under-appreciated mechanism
-
Can MLPs learn to match dot product similarity in practice?
Universal approximation theory suggests MLPs should learn any similarity function, including dot product. But does this theoretical promise hold up when training on real, finite datasets with practical constraints?
complements: same anti-deep-CF lesson — capacity isn't the bottleneck, the right structural prior is
-
Why does dot product beat MLP-based similarity in practice?
Neural Collaborative Filtering theory suggests MLPs should outperform dot products as universal approximators. But what explains the empirical gap, and what role do data scale and deployment constraints play?
complements: paired anti-MLP result reinforcing that inductive bias > capacity in CF
-
Why does multinomial likelihood work better for ranking recommendations?
Explores whether the choice of likelihood function in VAE-based collaborative filtering matters for matching training objectives to ranking evaluation metrics. Why items should compete for probability mass.
complements: another structural-prior-matters-more-than-capacity result — likelihood choice over architectural depth
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
EASE outperforms deep autoencoders for collaborative filtering by removing hidden layers and forbidding self-similarity