LLM Reasoning and Architecture Recommender Systems Reinforcement Learning for LLMs

Why does dot product beat MLP-based similarity in practice?

Neural Collaborative Filtering theory suggests MLPs should outperform dot products as universal approximators. But what explains the empirical gap, and what role do data scale and deployment constraints play?

Note · 2026-05-03 · sourced from Recommenders Architectures
What breaks when specialized AI models reach real users?

Neural Collaborative Filtering popularized replacing the dot product between user and item embeddings with a learned MLP, on the theory that an MLP — a universal function approximator — should subsume the dot product as a special case. Rendle and colleagues revisit the experiments and show two non-obvious results.

First, with proper hyperparameter tuning, the simple dot product substantially outperforms the MLP-based similarity. The original NCF gain came from undertuning the dot-product baseline, not from MLP expressiveness. Second, even though an MLP can in theory approximate any function, learning a dot product with an MLP requires both a large model and a large training set — the inductive bias of MLPs makes the dot-product structure expensive to recover from data.

The practical bite is in inference. Dot products admit Maximum Inner Product Search algorithms that retrieve top-K items in sublinear time over millions of items. MLP similarities require a forward pass per (user, item) pair, which is intractable at production scale. The paper concludes that MLPs as embedding combiners should be "used with care" — that the modern DNN architectures most competitive in NLP (transformers) and vision (resnets) all use dot products in their output layers reinforces the point. Universal approximation does not mean universal good choice; the inductive bias of the operator interacts with data scale and serving constraints.


Source: Recommenders Architectures

Related concepts in this collection

Concept map
13 direct connections · 101 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

MLP-based similarity underperforms dot product despite being a universal function approximator — inductive bias matters more than capacity