How do portfolio-of-rankers and MMoE compare as architectural solutions?

This compares two ways to handle competing objectives in recommendation/ranking systems — a portfolio that runs several specialized rankers in parallel, versus MMoE (multi-gate mixture-of-experts), which shares a pool of experts and lets per-task gates decide how to draw on them — but the corpus doesn't treat either by name, so this is a lateral read of the principle underneath both.

First, the honest part: the collection has no note that puts portfolio-of-rankers and MMoE head to head, or that names either directly. What it does have is a clear, repeated finding about *why* architectures like these work at all — and that finding reframes the comparison in a more useful way than a feature checklist would. The recurring lesson is that wins come not from how much capacity you have but from how you *allocate and constrain* it. The recommender-systems work makes this almost bluntly: removing hidden layers, enforcing constraints on self-similarity, and matching the likelihood to the task beat deeper, higher-capacity models (What architectural choices actually improve recommender system performance?). Capacity isn't the lever; problem-specific structure is.

Read through that lens, both designs are answers to the same problem — multiple objectives fighting over one shared representation — and they differ mainly in *where* they put the constraint. The closest thing the corpus has to MMoE's core mechanism is the mixture-of-experts result from multimodal training: when vision and language compete, the failure isn't that they're incompatible, it's that dense architectures force them through one rigid block of shared capacity. MoE fixes it by allocating capacity per token, so the competing demands stop colliding (Can we solve modality competition through architectural design?). MMoE is the same idea pointed at multi-task ranking: shared experts, but per-task gates so each objective pulls a different blend instead of being averaged into a compromise. A portfolio-of-rankers makes the opposite bet — it hard-separates capacity up front (distinct specialized rankers) rather than learning a soft, gated split, trading MMoE's parameter sharing and learned routing for isolation and interpretability.

The corpus also hints at why neither architecture is the whole story: the *objective* you train against can matter more than the routing. Switching a recommender's likelihood to multinomial — so items actually compete for probability mass — delivered state-of-the-art ranking because it aligned the loss with the top-N goal, independent of how fancy the model was (Why does multinomial likelihood work better for ranking recommendations?). That's a useful caution for any architecture debate: a portfolio or an MMoE built on a misaligned loss will underperform a simpler model with the right one.

So the comparison the corpus *can* support is this: MMoE-style gating and portfolio separation are two points on a spectrum from soft, learned capacity allocation to hard, designed isolation — and the collection's evidence says the deciding factor is usually which one imposes the better inductive bias for your specific competition between objectives, not which one has more raw capacity. To actually adjudicate portfolio-vs-MMoE with named benchmarks, you'd be reaching past what's curated here.

Sources 3 notes

What architectural choices actually improve recommender system performance?

Research shows that architectural choices like removing hidden layers, enforcing constraints on self-similarity, and using appropriate likelihood functions deliver better results than deeper or more complex models. This suggests that problem-specific design decisions matter more than raw representational capacity.

Can we solve modality competition through architectural design?

Modality competition arises from caption distributional shift and rigid dense capacity allocation, not from vision and language being fundamentally incompatible. Mixture of Experts resolves the architectural bottleneck by allocating capacity per token, enabling modalities to coexist without competing.

Why does multinomial likelihood work better for ranking recommendations?

Liang et al. show that switching VAE likelihoods from Gaussian/logistic to multinomial achieves state-of-the-art results because enforced probability competition between items directly aligns training with top-N ranking objectives. Rebalancing KL regularization further improves performance.

How do portfolio-of-rankers and MMoE compare as architectural solutions?

Sources 3 notes

Next inquiring lines