Large Language Models for User Interest Journeys
“Large language models (LLMs) have shown impressive capabilities in natural language understanding and generation. Their potential for deeper user understanding and improved personalized user experience on recommendation platforms is, however, largely untapped. This paper aims to address this gap. Recommender systems today capture users’ interests through encoding their historical activities on the platforms. The generated user representations are hard to examine or interpret. On the other hand, if we were to ask people about interests they pursue in their life, they might talk about their hobbies, like I just started learning the ukulele, or their relaxation routines, e.g., I like to watch Saturday Night Live, or I want to plant a vertical garden. We argue, and demonstrate through extensive experiments, that LLMs as foundation models can reason through user activities, and describe their interests in nuanced and interesting ways, similar to how a human would. We define interest journeys as the persistent and overarching user interests, in other words, the non-transient ones. These are the interests that we believe will benefit most from the nuanced and personalized descriptions. We introduce a framework in which we first perform personalized extraction of interest journeys, and then summarize the extracted journeys via LLMs, using techniques like few-shot prompting, prompt-tuning and fine-tuning. Together, our results in prompting LLMs to name extracted user journeys in a large-scale industrial platform demonstrate great potential of these models in providing deeper, more interpretable, and controllable user understanding. We believe LLM powered user understanding can be a stepping stone to entirely new user experiences on recommendation platforms that are journey-aware, assistive, and enabling frictionless conversation down the line.”
If one were to ask a friend for recommendations around any of their journeys, the friend would probably ask them to first describe their interests or needs in detail. Once they get a reply, like I want to know the history of stand-up comedy and the most famous stand-up comedian at this time, or I started playing ukulele a month ago, and I want to improve my strumming skills, the friend would then be in a much better position to give good recommendations. Conversely, recommender systems make recommendations by predicting the next item a user might want to interact with, given their historical activities [1, 11, 41].
We argue that this type of collaborative filtering based approach [11, 20] does not meet user needs for higher-level semantic preferences. In order for recommender systems to truly assist users through their real-life journeys, they need to be able to understand and reason about interests, needs, and goals users want to pursue [19, 33, 37, 49]. However, the task presents some challenges. First, users often do not explicitly spell out their interests, needs, and real-life goals to the recommenders. As a result, the recommenders need to infer them from the historical activities the users engaged on the platform. Second, users can have multiple journeys intertwined in their activity history at any time. Third and most importantly, journeys are personalized and nuanced. Two users who are both into stand-up comedy can be interested in completely different aspects of it (e.g., history of stand-up comedy documentaries vs Saturday Night Live skits). This is where Large Language Models (LLMs) come in play. LLMs have demonstrated impressive capabilities for natural language understanding and generation, achieving state-of-the-art performance in a variety of tasks, from coding to essay writing to question answering [10, 13, 16, 50]. What if we power recommender systems with LLMs that can reason through user activities on the platform
To this end, we propose to build a personalized user journey profile which 1) uses personalized clustering to uncover coherent user journeys, i.e., persisting user interests, needs, and goals, from a long sequence of user interaction logs, and 2) leverages the capabilities of LLMs aligned to the user interest journey domain through prompt-tuning [27] and fine-tuning [60] on different data sources, to describe the extracted journeys with interpretable and nuanced names. Together, we make the following contributions:
• A first demonstration of the capabilities of LLMs to uncover and describe in natural language the interests, needs, and goals users pursue, similar to how people would describe them, e.g., hydroponic gardening, playing the ukulele as a beginner, cooking italian recipes (Figure 1). We posit that this will unlock unique user experiences and enable recommenders to assist users throughout their journeys.
• A thorough research study shedding light into the different factors impacting the quality of the generated journey names, e.g., the prompting techniques, the underlying domain data used for prompting, the LLM architecture and size, and the journey extraction technique.
• An at-scale user research study
users do pursue real-life journeys on recommendation platforms. About 66% of survey respondents used the platform recently to pursue a journey they valued. Out of the 66%, about 8 in 10 consumed content relevant to a journey for more than a month, with half saying some journey lasts for more than a year. People reported exploring multiple journeys simultaneously, with 7 in 10 pursuing one to three journeys
they described it in much more nuanced phrases, i.e., "designing hydroponic systems for small spaces." Another aspect uncovered was the right specificity people identify with, e.g., "greenhouse designs for cold climates" was deemed irrelevant for someone pursuing indoor gardening
2.2 Guidelines for human-AI interaction
Within the past few years, the emergence of AI as a design material [41, 46, 62, 191] has necessitated guidelines that inform its use. A growing body of work within the human-centered AI research community has proposed best practices for human-AI interaction in the form of design guidelines (e.g., [7, 11, 104, 186, 192]), formal studies (e.g., [24, 102]), toolkits (e.g., [110]), and reviews (e.g., [59, 72, 180, 188]).
Some of these guidelines include claims of universal applicability8 or being of a general nature to AI-infused systems (e.g., [7, 153]). Other guidelines focus on specific types of AI technologies (e.g. text-to-image models [104]), specific domains of use (e.g. creative writing [24]), or specific issues regarding the use of AI, including ethics [11, 59, 69, 72], fairness [110], human rights [50], explainability [119], and user trust [186]. Finally, as more consumer products incorporate AI technologies, industry leaders including Google [133], Microsoft [7, 95] and Apple [9] have developed and published their own guidelines; Wright et al. [188] provide a comparative analysis of these guidelines.
3 WHY GENERATIVE AI NEEDS DESIGN PRINCIPLES
Generative AI technologies have introduced a new paradigm of human-computer interaction, what Nielsen refers to as “intent based outcome specification” [127]. In this paradigm, users specify what they want, often using natural language9, but not how it should be produced. One challenge of this paradigm stems from the distinguishing characteristic of generative AI: it generates artifacts as outputs and those outputs may vary in character or quality, even when a user’s input does not change. This characteristic has been described by Weisz et al. [182] as generative variability, and it provides what Alvarado and Waern [6] describe as an “algorithmic experience,” raising questions on appropriate types of user control, levels of algorithmic transparency, and user awareness of how the algorithms work and how to effectively interact with them.
With generative AI applications, users will need to develop a new set of skills to work with (not against) generative variability by learning how to create specifications that result in artifacts that match their desired intent. One emerging skill revolves around crafting effective natural language prompts, known as in-context learning [20, 40, 189] or prompt engineering [185, 193]. This process is typically informal and relies on trial-and-error [104, 131, 165, 190]. The use of open-ended natural language, rather than a fixed vocabulary of commands, leads to new design challenges. For example, Nielsen argues, “users should not have to wonder whether different words, situations, or actions mean the same thing” [125, p.156]; given the innumerable ways that users can express their intent in a natural language prompt, how can generative AI applications help users achieve desired results? Is it necessarily a “mistake” or “error” when a user’s prompt results in an output that they didn’t anticipate or like? Does it violate the consistency heuristic when it is difficult for users to achieve replicable results (e.g. [116, 135, 150]), because each click of the “generate” button results in different outputs, even for the same input?
Existing human-AI design guidelines fail to address the unique design challenges of generative AI because they do not cover generative use cases or new considerations stemming from generative variability, and they do not cover new or amplified ethical issues stemming from the models’ generative nature.
We begin by presenting our final set of six design principles and their corresponding strategies in Table 1, along with our overall design framework in Figure 1. We also provide extended descriptions and examples of each principle and strategy in Appendix A. In the rest of this paper, we describe the process we used to develop and validate these principles and strategies. The principles are generally presented as high-level “design for...” statements that indicate the characteristics that are important to consider when making design decisions. Three principles focus on aspects of existing AI systems that have new interpretations through the lens of generative AI: Design Responsibly, Design for Mental Models, and Design for Appropriate Trust & Reliance. Three principles identify unique aspects of generative AI UX: Design for Generative Variability, Design for Co-Creation, and Design for Imperfection.
These principles and strategies can be employed to support two user goals: (1) optimization, in which the user seeks to produce an output that satisfies some task-specific criteria; and (2) exploration, in which the user uses the generative process to explore a domain, seek inspiration, and discover alternate possibilities in support of their own ideation.