Cue-CoT: Chain-of-thought Prompting for Responding to In-depth Dialogue Questions with LLMs
However, most of the previous works prompt the LLMs to directly generate a response based on the dialogue context, overlooking the underlying linguistic cues about the user status exhibited in the context. Such in-depth dialogue scenarios are challenging for existing LLMs to figure out the user’s hidden needs and respond satisfactorily through a single-step inference. To this end, we propose a novel linguistic cue-based chain-of-thoughts (Cue-CoT), which enhances the LLMs inference with an intermediate reasoning step to find cues exhibited in the dialogue, aiming to provide a more personalized and engaging response. To evaluate the approach, we build a benchmark
we design a linguistic cue-based chain-of-thoughts (Cue-CoT), consisting of two variants: O-Cue CoT and M-Cue CoT in which the former one outputs intermediate reasoning results with a final response in one-step but the latter reasons step by step, as shown in Figure 1.
we can prompt the LLMs to generate user status and a final response simultaneously giving dialogue context, enforcing the LLMs to reason based on the user status. However, it is important to note that generating intermediate reasoning results with responses together may lead to a reduction in the length of the different outputs, particularly when multiple or complex reasoning results are involved, sacrificing the details and explanations.
For the former one, we first identify all utterances from the system labeled as empathic comfort for each dialogue sample in the test set. From these instances, the utterance with the longest length is chosen as the ground truth response, regarding preceding utterances as corresponding dialogue context4. This approach ensures fairness and comparability in evaluating the performance of LLMs, particularly because they tend to generate lengthy responses. For the ED, there are two roles in the dialogue: Listener who is actively listening, and Speaker who is speaking and conveying information. We follow the setting of the original paper (Rashkin et al., 2019), and directly use all samples in the test set. Neither the situation description written by the Speaker nor the emotional label is contained (just as they were not given to the Listener during dialogue collection). Thus, the collected empathetic dialogue datasets provide a standard benchmark for evaluating the