Can neural networks explore efficiently at recommendation scale?
Exploration—discovering unknown user preferences—normally requires expensive posterior uncertainty estimates. Can a neural architecture make Thompson sampling practical for real-world recommenders without prohibitive computational cost?
Supervised neural networks form the backbone of most recommenders, but they only exploit recognized user interests. Discovering unknown user preferences requires exploration — and the standard exploration framework (contextual bandits with Thompson sampling) requires posterior uncertainty estimates, which are computationally prohibitive for large neural networks at recommendation scale.
The Zhu et al. proposal is the Epistemic Neural Recommendation (ENR) architecture, an epistemic neural network designed to enable Thompson sampling at scale. Epistemic neural networks separate aleatoric uncertainty (irreducible noise in outputs) from epistemic uncertainty (uncertainty about the model's parameters). The latter is what's needed for Thompson sampling: sample a parameter setting from the posterior, choose actions according to that setting, observe outcomes, update.
Empirically, ENR significantly boosts click-through rates and user ratings by at least 9% and 6% respectively compared to state-of-the-art neural contextual bandit algorithms. It achieves equivalent performance with at least 29% fewer user interactions than the best-performing baseline. Computationally, it demands orders of magnitude fewer resources than other neural contextual bandit baselines — moving Thompson-sampling-based exploration from research-only to production-feasible.
The general principle: when a Bayesian technique seems too expensive at scale, ask whether the expensive part is genuinely necessary or whether a structural approximation captures what's needed. Epistemic networks make a focused commitment to estimating only the parameter uncertainty Thompson sampling actually uses, dropping the rest. The architectural simplification is what unlocks scale.
Source: Recommenders Architectures
Related concepts in this collection
-
Can bandit algorithms beat collaborative filtering for news?
News recommendation faces constant content churn and cold-start users—settings where traditional collaborative filtering struggles. Can a contextual bandit approach like LinUCB explicitly balance exploration and exploitation better than static methods?
extends: ENN scales the LinUCB framework beyond linear-reward assumptions while preserving the bandit framing
-
When can greedy bandits skip exploration entirely?
Under what conditions does natural randomness in incoming contexts eliminate the need for active exploration in contextual bandits? This matters for high-stakes domains like medicine where exploration carries real costs.
tension with: ENN scales exploration; greedy-first avoids it under context diversity — design choice depends on context-distribution structure
-
Can implicit feedback reveal both preference and confidence?
When users take implicit actions like purchases or watches, do those signals carry two separable pieces of information: what they prefer and how certain we should be? Explicit ratings can't make that distinction.
complements: epistemic uncertainty in ENN is the bandit-style confidence signal that exploration acts on
-
Why do academic recommenders fail when deployed in production?
Academic recommendation models assume static test sets known at training time, but real platforms continuously receive new users, items, and interactions. Understanding this gap reveals what production systems actually need.
complements: bandit framing assumes inductive learning; ENR is the production-scale exploration primitive for inductive recommenders
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
scalable neural contextual bandits enable sample-efficient exploration via epistemic neural networks supporting Thompson sampling at scale