Collaborative Filtering Bandits
“Recommender Systems are an essential part of many successful on-line businesses, from e-commerce to on-line streaming, and beyond. Moreover, Computational Advertising can be seen as a recommendation problem where the user preferences highly depend on the current context. In fact, many recommendation domains such as Youtube video recommendation or news recommendation do not t the classical description of a recommendation scenario, whereby a set of users with essentially fixed preferences interact with a fixed set of items. In this classical setting, the well-known cold-start problem, namely, the lack of accumulated interactions by users on items, needs to be addressed, for instance, by turning to hybrid recommendation methods (e.g., [19]). In practice, many relevant recommendation domains are dynamic, in the sense that user preferences and the set of active users change with time. Recommendation domains can be distinguished by how much and how often user preferences and content universe change (e.g., [24]). In highly dynamic recommendation domains, such as news, ads and videos, active users and user preferences are fluid, hence classical collaborative filtering-type methods, such as Matrix or Tensor-Factorization break down. In these settings, it is essential for the recommendation method to adapt to the shifting preference patterns of the users.
Exploration-exploitation methods, a.k.a. multi-armed bandits, have been shown to be an excellent solution for these dynamic domains (see, e.g., the news recommendation evidence in [23]). While effective, standard contextual bandits do not take collaborative information into account, that is, users who have interacted with similar items in the past will not be deemed to have similar taste based on this fact alone, while items that have been chosen by the same group of users will also not be considered as similar. It is this significant limitation in the current bandit methodology that we try to address in this work. Past efforts on this problem were based on using online clustering-like algorithms on the graph or network structure of the data in conjunction with multi-armed bandit methods (see Section 3).
Commercial large scale search engines and information retrieval systems are examples of highly dynamic environments where users and items could be described in terms of their membership in some preference cluster. For instance, in a music recommendation scenario, we may have groups of listeners (the users) clustered around music genres, with the clustering changing across different genres. On the other hand, the individual songs (the items) could naturally be grouped by sub-genre or performer based on the fact that they tend to be preferred by the same group of users. Evidence has been collected which suggests that, at least in specific recommendation scenarios, like movie recommendation, data are well modeled by clustering at both user and item sides (e.g., [31]).”