From Persona to Person: Enhancing the Naturalness with Multiple Discourse Relations Graph Learning in Personalized Dialogue Generation
Abstract. In dialogue generation, the naturalness of responses is crucial for effective human-machine interaction. Personalized response generation poses even greater challenges, as the responses must remain coherent and consistent with the user’s personal traits or persona descriptions. We propose MUDI (Multiple Discourse Relations Graph Learning) for personalized dialogue generation. We utilize a Large Language Model to assist in annotating discourse relations and to transform dialogue data into structured dialogue graphs. Our graph encoder, the proposed Dialogue- GAT model, then captures implicit discourse relations within this structure, along with persona descriptions. During the personalized response generation phase, novel coherence-aware attention strategies are implemented to enhance the decoder’s consideration of discourse relations. Our experiments demonstrate significant improvements in the quality of personalized responses, thus resembling human-like dialogue exchanges.
A significant drawback of traditional dialogue systems is their limited ability to personalize responses based on specific user characteristics or preferences. This limitation often results in generic and less engaging interactions that fail to meet individual user needs effectively [9]. Previous works [18] defined this problem as the naturalness issue of the Dialogue System. One effective solution to enhance the naturalness of dialogue systems is to integrate personality into the agents, referred to as "persona". Typically, a persona comprises several sentences describing the interlocutor’s facts or background. This information is crucial for building a trustworthy and confident dialogue system. By endowing chatbot agents with human-like traits, the interactions become more realistic. Given these benefits, Personalized Dialogue Generation has emerged as a prominent research topic in recent years, focusing on improving user engagement and satisfaction within dialogue systems. This surge in interest is largely driven by the availability of large-scale personalized dialogue datasets, such as those from Zhang et al. [22] and Dinan et al. [5]. These datasets have significantly advanced efforts to enhance persona consistency and context understanding in generated responses. Innovative methods, including Liu et al. [14] have concentrated on improving dialogue system consistency by modeling interlocutor understanding [8,17].
Despite advances, challenges remain in enhancing engagement, coherence, and persona consistency. The focus has been predominantly on the trade-offs between persona consistency and discourse coherence. These challenges are primarily twofold. Firstly, many existing methods rely on sophisticated structures or external natural language inference (NLI) datasets to learn persona consistency. This approach, while effective, can sometimes lead the model to overly prioritize persona information at the expense of neglecting the broader dialogue context.
Secondly, many dialogue-generating models assume that that fluency alone can measure a dialogue’s coherence and fail to adequately consider the importance of discourse relations. Discourse coherence, which focuses on how utterances are interconnected and the overall organization of dialogue to effectively convey information, is essential for effective conversation. Discourse coherence can be divided into local and global coherence. Local coherence refers to the logical connections between adjacent sentences, ensuring that they relate to each other and form a coherent sequence. Global coherence, on the other hand, extends beyond immediate sentence pairs to encompass higher-level relationships across the entire dialogue. This macro-linguistic capability allows conversational agents to maintain topic consistency and effectively convey meaning throughout an interaction. Poor global coherence can significantly impair the user’s understanding of the discourse as a cohesive whole. As illustrated in Figure 1, the dialogue demonstrates various common issues encountered in personalized dialogue systems, including local and global incoherence as well as persona inconsistency. This study focuses on improving generation of responses that are coherent with the context and consistent with the persona, thus enhancing the naturalness of the personalized dialogue generation. Our method is suitable for applications like customer service or healthcare assistants, where maintaining coherence and persona consistency is crucial for user trust.
The introduction of the PersonaChat dataset [22] marked a significant milestone of dialogue generation, which was expanded by the creation of the ConvAI2 dataset by Dinan et al. [5] to serve as a key benchmark in persona-based dialogue tasks. Jang et al. further enriched this domain by proposing a FoCus dataset that incorporates background knowledge alongside persona traits. Before the advent of large personalized dialogue datasets [22], researchers explored diversifying generated responses by incorporating speaker information into models by learning speaker embeddings.
For persona-related approaches, Zhang et al. employed LSTMs to fuse persona with contextual information [22]. TransferTransfo fine-tuned GPT-2 with concatenated persona and dialogue inputs [19], while BoB utilized three BERT models and the MNLI dataset to improve response relevance and consistency [17]. P2BOT introduced a unique architecture that enhances mutual persona perception [14]. Despite these advancements, maintaining persona consistency alongside coherence in responses remains a challenge. A few studies like LMEDR, have attempted to address this by leveraging entailment and latent memory for discourse understanding [4], yet the effectiveness for coherence is limited. Therefore, while current methods align responses with personas, there is substantial scope for improving discourse coherence evaluation.
goal is to estimate the probability distribution p(r|C, P) that incorporate persona information and dialogue history. An ideal personalized response should be natural and consistent with the persona. To ensure coherence, we incorporate the discourse relations in dialogue.
The goal is to estimate the probability distribution p(r|C, P) that incorporate persona information and dialogue history. An ideal personalized response should be natural and consistent with the persona. To ensure coherence, we incorporate the discourse relations in dialogue. With specific response types T = {t1, t2, ..., t|T|} identified, we extend the objective to p(r|C, P, T).
3.2 Discourse Coherence Learning Coherence Relations Annotation
To facilitate the model’s understanding of coherence, we employ LLaMA-3-70B to assist in annotating coherence relations. According to STAC [1], there are 16 discourse relations proposed. To these, we add a topic-shift to represent coherent topic transitions between conversations. Each pair of utterances can be annotated with up to three different relations. Dialogue Graph Modeling To capture discourse coherence in conversations for response generation, inspired by prior graph-based discourse modeling [6,7], we use a graph encoder to learn the interactive relationships between discourses.
To account for sentence-level semantics, we use Sentence-BERT1 as an encoder to extract contextualized global semantics of utterances and personas, thereby initializing the node features. Existing GNN models lack the design to fully capture dialogue structure and long-term interactions. To overcome this, we enhance GATv2 [3] with two key features: Order information and Turn information, integrated via attention mechanisms in our proposed DialogueGAT.
To model the sequential nature of dialogues, we introduce auxiliary edges that connect each utterance to its k-hop neighboring utterances based on their order.
This indicates that our approach enables the model to generate responses with enhanced local coherence. Furthermore, MUDI achieves excellent results in global coherence, which evaluates the coherence between the entire dialogue context and the response (right-side scores).
For Personalization evaluation, PAA significantly outperforms other methods in scores for Personalization. Upon further examination, we found this is because PAA frequently generates sentences that are exact restatements of the persona description, often ignoring the relevance to the query.