Why does multinomial likelihood work better for click prediction?

Explores whether the choice of likelihood function—multinomial versus Gaussian or logistic—affects recommendation performance, and what structural properties make one better suited to modeling user clicks.

Note · 2026-05-03 · sourced from Recommenders Architectures

The choice of likelihood function in collaborative filtering looks like a technical detail but is actually a structural commitment about what the data represents. Gaussian likelihoods model each interaction as an independent observation of a continuous quantity. Logistic likelihoods model each interaction as an independent binary classification. Both treat items as separate prediction targets.

Liang et al. argue the multinomial likelihood is structurally correct for click data because of competition. The model has a probability budget that must sum to 1 across all items. Putting probability on one item necessarily takes it away from others. This forces the model to assign more mass to items that are more likely to be clicked, which is exactly what top-N ranking metrics reward. Gaussian and logistic models can assign high probability to many items simultaneously without penalty, so they don't optimize for the relative ordering that recommendation actually requires.

The deeper point is that the likelihood is a closer proxy to the evaluation metric than logistic or Gaussian. Top-N ranking loss is hard to optimize directly, but multinomial likelihood induces the same kind of competition implicitly. The match between training objective and evaluation objective is doing the work — not anything specific to neural networks.

Source: Recommenders Architectures

Related concepts in this collection

Why does multinomial likelihood work better for ranking recommendations? Explores whether the choice of likelihood function in VAE-based collaborative filtering matters for matching training objectives to ranking evaluation metrics. Why items should compete for probability mass.
extends: paired statement of the same Liang result emphasizing the implicit-CF setting
Can implicit feedback reveal both preference and confidence? When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
complements: implicit-feedback structure motivates the multinomial framing — clicks are observation events that compete for user attention
Why does collaborative filtering struggle with sparse user data? Collaborative filtering datasets appear massive but hide a fundamental challenge: each user has rated only a tiny fraction of items. How does this per-user sparsity shape the modeling problem, and what techniques can overcome it?
grounds: VAE-multinomial works because Bayesian latent variable models compensate for per-user sparsity
Can a linear model beat deep collaborative filtering? Does a shallow linear autoencoder with a zero-diagonal constraint outperform deeper neural models on collaborative filtering tasks? This challenges the field's assumption that depth and nonlinearity drive performance.
complements: same right-prior-beats-depth lesson — likelihood choice and constraint choice both prove structural priors dominate capacity

Concept map

14 direct connections · 77 in 2-hop network ·medium cluster

Why does multinomial likelihood work better for … Why does multinomial likelihood work better for ra… Can implicit feedback reveal both preference and c… Why does collaborative filtering struggle with spa… Can a linear model beat deep collaborative filteri…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

multinomial likelihoods outperform Gaussian and logistic for click data because items must compete for limited probability mass