Conversational Recommendation: A Grand AI Challenge

Paper · arXiv 2203.09126 · Published March 17, 2022

First, while conventional recommendations rely on push communication, conversational recommender systems (CRS) support multi-turn and mixed-initiative interaction patterns. Moreover, in particular natural language based CRS combine aspects of recommendation and search, i.e., they allow users to make queries. Generally, CRS in many cases target at problem settings where no long-term user profiles are available and where user requirements and the user's context are interactively acquired.

These early systems were however often hampered by the limited capabilities of NLP technology available at that time. Since then, NLP technology has greatly improved and represents one main pillar of academic CRS that are implemented as chatbots (Iovine et al., 2020; Qiu et al., 2017). In the majority of today's real-world chatbot applications, the system's responses are however mostly based on pre-defined templates, and the main tasks where AI technology comes into play, besides language processing, include the detection of the user's intents and the recognition of entities in the user utterances. To avoid the bottleneck that comes with the definition of the templates, recent end-to-end learning systems commonly train complex models on large corpora of recorded recommendation dialogues between humans.

A conversational recommender system can be characterized as “a software system that supports its users in achieving recommendation-related goals through a multi-turn dialogue” (Jannach et al., 2021). A recommendation-related goal in that context can, for example, be to help users find relevant items or, more generally, to make better decisions. However, there can also be more indirect goals like helping users to understand the space of options or explaining to them why a certain option is a good choice for them.

Following this definition, a CRS is a task-oriented system. This differentiates CRS from general conversational AI systems, including the famous ELIZA system from the 1960s. In some ways, building a CRS might therefore appear to be an easier task, because the conversation to be supported is usually bounded to a few of pre-defined tasks and dialogue situations. Moreover, the competence of a particular CRS can furthermore be limited to a certain domain, e.g., movies.

“…achieving a certain naturalness of conversational recommendation dialogues can be challenging. For example, in case of a CRS supporting natural language interactions, the virtual CRS agent should probably be able to respond to chit-chat (\phatic") user utterances. Furthermore, a conversation between humans—which a CRS might aim to mimic—is much richer than just answering questions like —What is a good sci-if movie?". In such a conversation, the initiative might also switch between dialogue partners, thus requiring a system that supports both user-driven, system-driven, or mixed-initiative dialogues. Moreover, the system must be able to respond to a variety of possible user intents, e.g., providing or revising preference information, asking for explanations, or rejecting a recommendation. Finally, the CRS must be able to keep track of the ongoing dialogue and possibly even past interactions with the user, as done in (Thompson et al., 2004) or (Ricci and Nguyen, 2007)”