Large Scale Product Graph Construction for Recommendation in E-commerce

Paper · arXiv 2010.05525 · Published October 12, 2020

“On the other hand, Taobao has an enormous amount of real user behavior data from billions of customers, which are much stronger and reliable signals for capturing product relationships than text data. Thus we focus on utilizing the user behavior data directly to construct the product graph via an unsupervised approach similar to similarity based collaborative filtering.

When we try to adapt traditional approaches such as item-item CF with local similarities [23, 14], we face the following challenges:

Accuracy: The traditional local similarity calculations do not consider any inner structure of user behavior graph, which has been shown to be useful for other link prediction problems. Thus the prediction accuracy is limited.

Robustness: User behavior data (i.e. user-item click graph) contain many noisy, casual or accidental clicks, which could affect the reliability of predictions.

Sparsity: Though the user behavior data in Taobao is huge, the ratio of user purchase is relatively small. User co-purchasing data is very sparse, thus capturing complementary relationships is very difficult.

Direction: The complementary relationship is asymmetry. We also need to consider the direction of relationship on the basis of co-purchasing graph.

Scalability: The computation complexity of traditional approaches grows with both the number of customers and products. Given billions of users and products in Taobao, scalability is a major challenge.

In this paper, we propose a new algorithm called Swing, which can utilize the inner structure–Swing of user-item behavior graph, to construct substitute product graph. Swing is a quasi-local structure and is designed to be much more stable than a single edge used in traditional CF approaches. It provides much more reliable calculation propagation over a user-item bi-partite graph, and helps to reduce the influence of noisy user behavior data. Then we propose a new algorithm called Surprise to construct complementary product graph. Surprise algorithm solves the sparsity problem on the user co-purchasing graph by utilizing products’ category in- formation and product clusters constructed based on Swing algorithm. Furthermore, it considers the time sensitivity and temporal order of co-purchased products. Both methods are implemented for parallel running with commonly used large scale distributed computing framework such as Map- Reduce or Spark, thus scalability is not a problem. Based on that, we can build the product substitute graph and prod- uct complementary graph in Taobao, which provide basic indexing service to generate candidate products for further recommendation ranking modules.”