User-Centric Conversational Recommendation with Multi-Aspect User Modeling

Paper · arXiv 2204.09263 · Published April 20, 2022

Conversational recommender systems (CRS) aim to provide high-quality recommendations in conversations. However, most conventional CRS models mainly focus on the dialogue understanding of the current session, ignoring other rich multi-aspect information of the central subjects (i.e., users) in recommendation. In this work, we highlight that the user’s historical dialogue sessions and look-alike users are essential sources of user preferences besides the current dialogue session in CRS. To systematically model the multi-aspect information, we propose a User-Centric Conversational Recommendation (UCCR) model, which returns to the essence of user preference learning in CRS tasks. Specifically, we propose a historical session learner to capture users’ multi-view preferences from knowledge, semantic, and consuming views as supplements to the current preference signals. A multi-view preference mapper is conducted to learn the intrinsic correlations among different views in current and historical sessions via self-supervised objectives. We also design a temporal look-alike user selector to understand users via their similar users.

Generally, CRS methods can be roughly divided into a recommender module and a dialogue module [2, 39]. The dialogue module converses with users through natural language. In contrast, the recommender module learns user preferences based on the dialogue contents, and provides appropriate recommendations for users. For the generative CRS [18, 39], the recommended items are naturally integrated into the natural language replies and given to users. Different from traditional recommender systems [3], CRS mainly captures user preferences according to the current dialogue session, and thus should handle both natural language understanding and user modeling [6].

However, most of them pay too much attention to the current dialogue session and only learn the preferences reflected by the session (although we admit that it is indeed an important source of user preferences), ignoring the central subjects in CRS, i.e., users.

In this work, we attempt to emphasize users and polish the model’s ability on user-centric preference learning in CRS. As in Fig. 1, the user preferences in real-world CRS could be mainly extracted from three aspects: (1) the user’s current dialogue session, which is the main information widely adopted by conventional CRS models. (2) The user’s historical dialogue sessions, which stores user’s historical preferences from multiple views. This historical information is beneficial since users tend to have similar preferences with their historical behaviors, which is inspired by the idea of item-CF [24]. (3) The user’s look-alike users, which could be retrieved by the relevance of user profiles or user historical behaviors. It learns users’ preferences via their similar users under the instruction of user-CF [36]. The newly-introduced information on historical dialogue sessions and look-alike users is beneficial especially when the current session contains little information.

However, incorporating multi-aspect user information in CRS is non-trivial, since it is challenging to decide how much we should learn from historical and look-alike features without confusing the current session modeling. Differing from users in classical recommender systems, users in CRS will actively interact with the system via natural language. Hence, their user intentions are more explicit and definite according to the current sessions, and thus the historical and look-alike features should be considered under the constraints of the current user intentions. We hope to smartly utilize multi-aspect features, successfully capturing both the basic fantasy intention from the current session and the hidden romantic preference from the historical and look-alike features in Fig. 1.

In light of the observations above, we propose a novel User- Centric Conversational Recommendation (UCCR) framework to jointly model user’s multi-aspect information in CRS. Specifically, UCCR learns the multi-aspect user preferences mainly from three information sources, including the user’s current dialogue session, historical dialogues sessions, and look-alike users. UCCR mainly consists of four parts: (1) We first design a historical session learner to capture users’ diverse preferences in their historical sessions besides learning from the current session. Precisely, we extract multi-view user preferences from the dialogues, including the word-level semantic view, entity-level knowledge view, and item-level consuming view. The correlations between the current and historical information are also considered in historical preference learning. (2) We propose a multi-view preference mapper to learn the intrinsic correlations among different views in the current/ historical sessions. The main idea is that two views of a user should be more relevant, since they reflect similar preferences of the same user. We design three self-supervised cross-view objectives between these views as supplements to the supervised losses, which enables a more sufficient training of user multi-view preferences. (3) For the look-alike user aspect, we refer to the similar users’ preferences as a user-CF based supplement to the target user’s understanding. User basic profiles and user historical behaviors, which are essential sources of personalization, could be used for the user similarity calculation. A temporal look-alike user selector is designed for more precise user generalization. (4) Finally, multi-aspect user-centric modeling is conducted to jointly encode multi-aspect multi-view user preferences into the final user representation.

Through UCCR, these multi-aspect features are properly incorporated under the guiding ideology of user-centric modeling. Compared with conventional CRS models that focus on current session understanding, our UCCR comprehensively understands users from multiple aspects (current dialogue session, historical dialogue sessions, and lookalike users) and multiple views (word, entity, and item views), which returns to the essence of user understanding in recommendations. We summarize the contributions of this work as follows:

• We emphasize the user-centric modeling in CRS, and systematically highlight and verify the significance of historical dialogue sessions and look-alike users, returning to the essence of user understanding in CRS. To the best of our knowledge, we are the first to jointly model current dialogue session, historical dialogue sessions, and look-alike users via a user-centric manner in CRS.

• We propose a set of techniques to precisely extract useful user preferences related to the current user intentions from multiple views, including a historical session learner, a multi-view preference mapper, and a temporal look-alike user selector.