Synthetic Dialogue Dataset Generation using LLM Agents

Paper · arXiv 2401.17461 · Published January 30, 2024

Linear programming (LP) problems are pervasive in real-life applications. However, despite their apparent simplicity, an untrained user may find it difficult to determine the linear model of their specific problem. We envisage the creation of a goal-oriented conversational agent that will engage in conversation with the user to elicit all information required so that a subsequent agent can generate the linear model. In this paper, we present an approach for the generation of sample dialogues that can be used to develop and train such a conversational agent. Using prompt engineering, we develop two agents that “talk” to each other, one acting as the conversational agent, and the other acting as the user.

many individuals, particularly those without specialized mathematical backgrounds, often struggle to formulate the appropriate linear models for their specific problem instances. This barrier hinders the broader utilization of LP techniques, especially among non-experts.

goal-oriented conversational agent capable of assisting users in constructing accurate linear models for their unique problem scenarios. This conversational agent would engage users in a dialogue, eliciting relevant information pertaining to the problem, and subsequently generate the corresponding linear model. This paper focuses on an essential aspect of creating such an agent —- the generation of synthetic dialogues that can be employed to train and evaluate the conversational agent’s performance.

Our methodology leverages prompt engineering to construct two distinct agents: one simulating the conversational agent’s behavior, and the other emulating the user’s responses during problem-solving interactions. The agents are designed to engage in purposeful dialogues aimed at extracting the necessary information from the user to construct a valid linear model. To facilitate this process, we utilize a set of text descriptions of linear problems, accessible only to the user agent, sourced from the NL4Opt dataset (Ramamonjison et al., 2022, 2023). These text descriptions serve as the basis for the dialogues and enable the conversational agent to iteratively gather the critical information required for problem formulation.