Recommender Systems

What dominates AI compute in production systems today?

While public discussion centers on large language models, Facebook's infrastructure data reveals a different story about which AI workloads actually consume the most compute cycles in real production environments.

Note · 2026-05-03 · sourced from Recommenders Personalized
What breaks when specialized AI models reach real users? How do recommendation feeds shape what people see and believe?

Public discussion of AI compute centers on training and inference for large language models. Facebook's published architecture analysis tells a different story. DNN-based personalized recommendation models comprise up to 79% of AI inference cycles in their production data center. Just three model classes — RMC1, RMC2, RMC3 — account for up to 65% of inference cycles, despite hundreds of recommendation models running across the system.

These models follow a distinct architectural pattern that drives their compute profile. Inputs combine dense features (continuous, like user age) with sparse categorical features (like preferred genres or device types). Sparse features are encoded as multi-hot vectors with potentially millions of categories, but only a few entries are active per user. Mapping these to dense embedding vectors requires embedding-table lookups — operations that are memory-bound rather than compute-bound, which inverts the compute profile of more familiar transformer or convnet workloads.

The implication is that production AI infrastructure is shaped by recommendation, not by the model types that get research attention. Embedding-table operations, sparse feature handling, and the storage capacity for billion-parameter embedding tables are the engineering constraints. McKinsey and TechEmergence estimated recommendation drives up to 35% of Amazon's revenue; Netflix and YouTube data put the figures at 75% of movies watched and 60% of videos consumed. The economic gravity of recommendation in production drives the dominant inference workload — yet methods papers tend to underweight this reality compared to the visibility of LLM compute.


Source: Recommenders Personalized

Related concepts in this collection

Concept map
14 direct connections · 98 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

personalized recommendation models drive 79 percent of Facebook AI inference cycles — three model classes consume two-thirds of total compute