Recommender Systems LLM Reasoning and Architecture Knowledge Retrieval and RAG

Can MLPs learn to match dot product similarity in practice?

Universal approximation theory suggests MLPs should learn any similarity function, including dot product. But does this theoretical promise hold up when training on real, finite datasets with practical constraints?

Note · 2026-05-03 · sourced from Recommenders Architectures
What breaks when specialized AI models reach real users?

The Neural Collaborative Filtering paper popularized replacing the dot product with a learned MLP for combining user and item embeddings. The justification was theoretical: an MLP is a universal function approximator, so it can in principle learn any similarity function — including dot product — and presumably better ones. Rendle et al.'s revisit shows this argument fails empirically and operationally.

Empirically, with careful hyperparameter selection, a properly configured dot product baseline substantially outperforms the MLP. Even more pointedly, learning a dot product through an MLP requires a large model capacity and a lot of training data — the universal approximation guarantee is asymptotic, and finite-data inductive bias matters more than expressiveness. The MLP is too flexible for the task; its inductive bias points away from the simple geometric similarity that actually fits the data.

Operationally, dot products allow maximum-inner-product search over precomputed item embeddings, which is fast enough for real-time serving over millions of items. MLP similarities require a forward pass per item per query — they cannot be precomputed. So even if MLPs were marginally more accurate, they would be unaffordable in production.

The takeaway: an inductive bias that matches the geometry of the problem (dot product) wins over an expressive parameterization that has to learn the geometry from scratch.


Source: Recommenders Architectures

Related concepts in this collection

Concept map
13 direct connections · 125 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

MLP similarity does not approximate dot product in practice — universal approximation theorems do not survive contact with finite data