The Netflix Recommender System: Algorithms, Business Value, and Innovation

Paper · Source

“Internet TV is about choice: what to watch, when to watch, and where to watch, compared with linear broadcast and cable systems that offer whatever is now playing on perhaps 10 to 20 favorite channels. But humans are surprisingly bad at choosing between many options, quickly getting overwhelmed and choosing “none of the above” or making poor choices (e.g., see Schwartz [2015]). At the same time, a benefit of Internet TV is that it can carry videos from a broader catalog appealing to a wide range of demographics and tastes, and including niche titles of interest only to relatively small groups of users.

Consumer research suggests that a typical Netflix member loses interest after perhaps 60 to 90 seconds of choosing, having reviewed 10 to 20 titles (perhaps 3 in detail) on one or two screens. The user either finds something of interest or the risk of the user abandoning our service increases substantially. The recommender problem is to make sure that on those two screens each member in our diverse pool will find something compelling to view, and will understand why it might be of interest. Historically, the Netflix recommendation problem has been thought of as equivalent to the problem of predicting the number of stars that a person would rate a video after watching it, on a scale from 1 to 5.We indeed relied on such an algorithm heavily when our main business was shipping DVDs by mail, partly because in that context, a star rating was the main feedback that we received that a member had actually watched the video. We even organized a competition aimed at improving the accuracy of the rating prediction, resulting in algorithms that we use in production to predict ratings to this day [Netflix Prize 2009].

But the days when stars and DVDs were the focus of recommendations at Netflix have long passed. Now, we stream the content, and have vast amounts of data that describe what each Netflix member watches, how each member watches (e.g., the device, time of day, day of week, intensity of watching), the place in our product in which each video was discovered, and even the recommendations that were shown but not played in each session. These data and our resulting experiences improving the Netflix product have taught us that there are much better ways to help people find videos to watch that focusing only on those with a high predicted star rating

Now, our recommender system consists of a variety of algorithms that collectively define the Netflix experience, most of which come together on the Netflix homepage. This is the first page that a Netflix member sees upon logging onto one’s Netflix profile on any device (TV, tablet, phone, or browser)—it is the main presentation of recommendations, where 2 of every 3 hours streamed on Netflix are discovered.

An example of our current TV homepage is shown in Figure 1. It has a matrixlike layout. Each entry in the matrix is a recommended video, and each row of videos contains recommendations with a similar “theme.” Rows are labeled according to their theme to make the theme transparent and (we think) more intuitive to our members.

2.1. Personalized Video Ranker: PVR

There are typically about 40 rows on each homepage (depending on the capabilities of the device), and up to 75 videos per row; these numbers vary somewhat across devices because of hardware and user experience considerations. The videos in a given row typically come from a single algorithm. Genre rows such as Suspenseful Movies, shown on the left of Figure 1, are driven by our personalized video ranker (PVR) algorithm. As its name suggests, this algorithm orders the entire catalog of videos (or subsets selected by genre or other filtering) for each member profile in a personalized way. The resulting ordering is used to select the order of the videos in genre and other rows, and is the reason why the same genre row shown to different members often has completely different videos. Because we use PVR so widely, it must be good at general-purpose relative rankings throughout the entire catalog; this limits how personalized it can actually be. Equivalently, PVR works better when we blend personalized signals with a pretty healthy dose of (unpersonalized) popularity, which we use to drive the recommendations in the Popular row shown on the left of Figure 2. See Amatriain and Basilico [2012] for more on personalized video ranking.

2.2. Top-N Video Ranker

We also have a Top N video ranker that produces the recommendations in the Top Picks row shown on the right of Figure 1. The goal of this algorithm is to find the best few personalized recommendations in the entire catalog for each member, that is, focusing only on the head of the ranking, a freedom that PVR does not have because it gets used to rank arbitrary subsets of the catalog. Accordingly, our Top N ranker is optimized and evaluated using metrics and algorithms that look only at the head of the catalog ranking that the algorithm produces, rather than at the ranking for the entire catalog (as is the case with PVR). Otherwise the Top N ranker and PVR share similar attributes, for example, combining personalization with popularity, and identifying and incorporating viewing trends over different time windows ranging from a day to a year.

2.3. Trending Now

We have also found that shorter-term temporal trends, ranging from a few minutes to perhaps a few days, are powerful predictors of videos that our members will watch, especially when combined with the right dose of personalization, giving us a trending ranker [Padmanabhan et al. 2015] used to drive the Trending Now row (left of Figure 2). There are two types of trends that this ranker identifies nicely: (1) those that repeat every several months (e.g., yearly) yet have a short-term effect when they occur, such as the uptick of romantic video watching during Valentine’s Day in North America, and (2) one-off, short-term events, for example, a big hurricane with an impending arrival to some densely populated area, being covered by many media outlets, driving increased short-term interest in documentaries and movies about hurricanes and other natural disasters.

2.4. Continue Watching

Given the importance of episodic content viewed over several sessions, as well as the freedom to view non-episodic content in small bites, another important video ranking algorithm is the continue watching ranker that orders the videos in the Continue Watching row (see right of Figure 2). Most of our rankers sort unviewed titles on which we have only inferred information. In contrast, the continue watching ranker sorts the subset of recently viewed titles based on our best estimate of whether the member intends to resume watching or rewatch, or whether the member has abandoned something not as interesting as anticipated. The signals that we use include the time elapsed since viewing, the point of abandonment (mid-program vs. beginning or end), whether different titles have been viewed since, and the devices used. In general, our different video ranking algorithms use different mathematical and statistical models, different signals and data as input, and require different model trainings designed for the specific purpose each ranker serves.

2.5. Video-Video Similarity

Because You Watched (BYW) rows are another type of categorization. A BYW row anchors its recommendations to a single video watched by the member. The video-video similarity algorithm, which we refer to simply as “sims,” drives the recommendations in these rows. An example row is shown on the left of Figure 1. The sims algorithm is an unpersonalized algorithm that computes a ranked list of videos—the similars—for every video in our catalog. Even though the sims ranking is not personalized, the choice of which BYW rows make it onto a homepage is personalized, and the subset of BYW videos recommended in a given BYW row benefits from personalization, depending on what subsets of the similar videos we estimate that the member would enjoy (or has already watched).

2.6. Page Generation: Row Selection and Ranking

The videos chosen for each row represent our estimate of the best choices of videos to put in front of a specific user. But most members have different moods from session to session, and many accounts are shared by more than one member of a household. By offering a diverse selection of rows, we hope to make it easy for a member to skip videos that would be good choices for a different time, occasion, or member of the household, and quickly identify something immediately relevant.

The page generation algorithm uses the output of all the algorithms already described to construct every single page of recommendations, taking into account the relevance of each row to the member as well as the diversity of the page. A typical member has tens of thousands of rows that could go on one’s homepage, making it challenging to manage the computations required to evaluate them. For this reason, before 2015, we used a rule-based approach that would define what type of row (e.g., genre row, BYW row, Popular row) would go in each vertical position of the page. This page layout was used to construct all homepages for all members. Today, we have a fully personalized and mathematical algorithm that can select and order rows from a large pool of candidates to create an ordering optimized for relevance and diversity. Our current algorithm does not use a template, thus is freer to optimize the experience, for example, choosing not to have any BYW row for a given homepage and devoting half of the page to BYW rows for another homepage. A recent blogpost [Alvino and Basilico 2015] on this algorithm discusses it in more detail.”