Recent Trends in Personalized Dialogue Generation: A Review of Datasets, Methodologies, and Evaluations
Enhancing user engagement through personalization in conversational agents has gained significance, especially with the advent of large language models that generate fluent responses. Personalized dialogue generation, however, is multifaceted and varies in its definition – ranging from instilling a persona in the agent to capturing users’ explicit and implicit cues. This paper seeks to systemically survey the recent landscape of personalized dialogue generation, including the datasets employed, methodologies developed, and evaluation metrics applied.
It is not obvious how the persona should be represented. Through the literature survey, we found that different datasets employ different ways of representing persona information and that the persona representation can be classified into three categories: (1) persona description, (2) key-value attributes, (3) user ID and comment histories.
Most datasets employ descriptive sentences as the persona representation (Dinan et al., 2020; Mazaré et al., 2018; Smith et al., 2020; Zhong et al., 2020;Wu et al., 2021a; Xu et al., 2022a,b; Liu et al., 2022; Jang et al., 2022; Ahn et al., 2023; Kwon et al., 2023). For example, PersonaChat (Zhang et al., 2018a) contains 5 descriptive sentences for each speaker. These datasets primarily recruited annotators to chat based on given persona descriptions, thus avoiding privacy concerns. Mazaré et al. (2018) extracted persona descriptions from Reddit using heuristic rules to gather large datasets, which was followed by Zhong et al. (2020) and Ahn et al. (2023).
Some datasets represent personal information using sparse key-value attributes (Qian et al., 2018; Zheng et al., 2019; Wu et al., 2021b; Gao et al., 2023). Table 2 shows the examples of key value attributes. For example, WD-PB (Qian et al., 2018) defines 6 attribute keys like gender, location, and age, and the values corresponding
Personalized dialogue generation is an open domain task: Human speakers are allowed to talk about whatever topics they like. However, through the literature survey, we found that there are domain differences between datasets.