How does soft parameter sharing in MMoE improve multi-objective ranking systems?
This explores why Multi-gate Mixture-of-Experts (MMoE) — where ranking objectives share a pool of expert sub-networks softly rather than each getting its own isolated tower — helps systems that must optimize several competing goals at once (clicks, watch time, satisfaction).
This explores why MMoE's "soft" parameter sharing helps when a ranking system has to serve several conflicting objectives at once. The corpus has one note that lands squarely on this — YouTube's multi-objective video ranker — and a cluster of adjacent material on what it actually means to optimize for ranking, which is where the more interesting picture comes from. Up front: if you want the canonical case study, Why do ranking systems need to model selection bias explicitly? is the doorway. The rest of the corpus doesn't dwell on MMoE's mechanics, so the honest answer is partly lateral.
The core idea behind soft sharing: when you train one set of objectives (say, "will they click") and another that pulls in a different direction ("will they be satisfied an hour later"), forcing them through fully shared layers makes them fight over the same weights, and giving each a fully separate network throws away everything they have in common. MMoE splits the difference — a bank of expert sub-networks is shared, but each objective has its own gating network that decides how much to lean on each expert. Objectives that overlap can borrow the same experts; objectives that conflict can route around each other. That's the "soft" part: sharing is learned and per-objective, not hard-wired.
What the YouTube note adds — and what's easy to miss — is that MMoE alone isn't enough. The same system needs a separate shallow "position tower" to strip out selection bias, because the training data is itself the product of the model's past rankings. Without it, the ranker converges on degenerate equilibria that just amplify its own previous decisions. The lesson worth taking away: handling conflicting objectives (MMoE) and handling biased feedback loops (debiasing) are two different problems, and solving one doesn't solve the other.
The lateral thread is about what "ranking objective" even means once you fix the architecture. Why does multinomial likelihood work better for ranking recommendations? shows that the loss function quietly encodes a ranking objective — multinomial likelihood wins because forcing items to compete for probability mass mirrors the top-N goal you actually care about. From the opposite direction, Can recommendation metrics train language models directly? and Can reinforcement learning align summarization with ranking goals? both train models directly on ranking metrics like NDCG and Recall as reward signals, instead of baking objectives into a multi-tower architecture. So there are at least three places to put your multi-objective trade-off: in the architecture (MMoE gates), in the loss (multinomial competition), or in an RL reward (optimize the metric end-to-end).
That's the thing you might not have known you wanted: MMoE is one answer to a recurring design choice — where does the trade-off between competing goals live? Architecture is just the most visible option, and the corpus quietly shows the loss-function and reward-signal alternatives sitting right next to it.
Sources 4 notes
YouTube's multi-objective ranker uses MMoE for conflicting objectives and a shallow position tower to remove selection bias from training data. Without both mechanisms, models converge on degenerate equilibria that amplify their own past decisions.
Liang et al. show that switching VAE likelihoods from Gaussian/logistic to multinomial achieves state-of-the-art results because enforced probability competition between items directly aligns training with top-N ranking objectives. Rebalancing KL regularization further improves performance.
Rec-R1 demonstrates that LLMs can be trained directly on rule-based recommendation metrics like NDCG and Recall as RL reward signals, eliminating the need for SFT distillation from proprietary models while remaining model-agnostic across different retriever architectures.
ReLSum trains summarizers using downstream relevance scores as RL rewards, producing dense, attribute-focused summaries instead of fluent prose. This alignment to the actual ranking metric improves recall, NDCG, and user engagement in production e-commerce search.