Personalized Dialogue Generation with Persona-Adaptive Attention
Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona. Unlike conventional dialogue generation, the persona-based dialogue needs to consider both dialogue context and persona, posing a challenge for coherent training. Specifically, this requires a delicate weight balance between context and persona. To achieve that, in this paper, we propose an effective framework with Persona-Adaptive Attention (PAA), which adaptively integrates the weights from the persona and context information via our designed attention. In addition, a dynamic masking mechanism is applied to the PAA to not only drop redundant information in context and persona but also serve as a regularization mechanism to avoid overfitting.
One challenge in persona-based dialogue generation is that the related datasets are usually small. As collecting dialogues in persona-based dialogue datasets requires crowd workers to chat with each other based on provided persona profiles, building such quality datasets is expensive and time-consuming, which in turn restricts the size of those datasets. For example, the ConvAI2 dataset (Dinan et al. 2019) only contains 131k utterances with less than 5k unique personas, much smaller than open-domain dialogue datasets such as Pushshift.io Reddit (Baumgartner et al. 2020) with roughly 1.2B utterances.
Another challenge is to choose the weights between the persona and context. Unlike open-domain dialogue models that generate responses by considering the dialogue context alone, persona-based dialogue generation systems need to additionally take personalized background descriptions into account along with the dialogue context. The weights between context and persona should be dynamically adjusted by the dialogue system under different situations. For example, given a user utterance “How are you?”, the context preferred answer is likely to be “I am fine.”, which is safe but bland. Meanwhile, a persona-preferred answer would fuse persona information to the response, such as “I am spending time with my four sisters”. Under such circumstances, the persona-preferred answer would be more informative and meaningful. On the other hand, sometimes, the system needs to focus on context to make the conversation interactive and engaging. For instance, if the user says: “I have two greyhounds. Their names are Tom and Jerry.”, then the system would focus on the context and answer: “That’s cute! How old are they?”, which encourages the user to chat
From the above two scenarios, it can be seen that the weights between context and persona should be adjusted accordingly, which is important for a dialogue model to build long-term relationships with users.
• We propose the PAA in an encoder-decoder framework. This framework models the persona and context information by two separate transformer encoders, which are then fused in the persona-prompted decoder by the proposed PAA mechanism.
• Extensive experiments on the ConvAI2 dataset show that the proposed model performs comparably to or even better than strong baseline methods by about 30% improvement in terms of the perplexity metric.
To address the aforementioned second challenge, in this paper, we design a Persona-Adaptive Attention (PAA) to dynamically learn the weights of the persona and context information in the proposed framework. To enhance the persona information in the PAA, we prepend the persona in the decoder as a prompt so that the weights can capture more persona-related information. To balance the context and persona information, the PAA takes two cross-attention and the self-attention from the persona-prompted decoder to compute the weights for combining the latent representations from the context and persona. Moreover, inspired by some findings in (Welleck et al. 2019; Cao et al. 2022; ?) that not all context and persona information is useful to generate the response, we design two dynamic masks to the weighted latent representation to not only remove redundant information but also act as a regularizer in the PAA.
As shown in Table 5, with the mega amount of external data and the large-scale parameters within the models, the performance will continue to increase significantly, but the cost of performance gain comes from huge hardware and human resources consumption. Our model is much smaller than the large-scale language model (254M vs. 2.7B) that was fine-tuned over the ConvAI2 dataset while keeping the comparative results to the best results over the large models.