Monolith: Real Time Recommendation System With Collisionless Embedding Table

Paper · arXiv 2209.07663 · Published September 16, 2022
Recommenders Architectures

“The past decade witnessed a boom of businesses powered by recommendation techniques. In pursuit of a better customer experience, delivering personalized content for each individual user as real-time response is a common goal of these business applications. To this end, information from a user’s latest interaction is often used as the primary input for training a model, as it would best depict a user’s portrait and make predictions of user’s interest and future behaviors.

Deep learning have been dominating recommendation models [5, 6, 10, 12, 20, 21] as the gigantic amount of user data is a natural fit for massively data-driven neural models. However, efforts to leverage the power of deep learning in industry-level recommendation systems are constantly encountered with problems arising from the unique characteristics of data derived from real-world user behavior. These data are drastically different from those used in conventional deep learning problems like language modeling or computer vision in two aspects:

(1) The features are mostly sparse, categorical and dynamically changing;

(2) The underlying distribution of training data is non-stationary, a.k.a. Concept Drift [8].

Such differences have posed unique challenges to researchers

and engineers working on recommendation systems.

1.1 Sparsity and Dynamism

The data for recommendation mostly contain sparse categorical features, some of which appear with low frequency. The common practice of mapping them to a high-dimensional embedding space would give rise to a series of issues:

• Unlike language models where number of word-pieces are limited, the amount of users and ranking items are orders of

magnitude larger. Such an enormous embedding table would hardly fit into single host memory;

• Worse still, the size of embedding table is expected to grow over time as more users and items are admitted, while frameworks like [1, 17] uses a fixed-size dense variables to represent embedding table.

In practice, many systems adopt low-collision hashing [3, 6] as away to reduce memory footprint and to allow growing of IDs. This relies on an over-idealistic assumption that IDs in the embedding table is distributed evenly in frequency, and collisions are harmless to the model quality. Unfortunately this is rarely true for a real world recommendation system, where a small group of users or items have significantly more occurrences. With the organic growth of embedding table size, chances of hash key collision increases and lead to deterioration of model quality [3].Therefore it is a natural demand for production-scale recommendation systems to have the capacity to capture as many features in its parameters, and also have the capability of elastically adjusting the number of users and items it tries to book-keep.”