ToolFlow: Boosting LLM Tool-Calling Through Natural and Coherent Dialogue Synthesis
Supervised fine-tuning (SFT) is a common method to enhance the tool calling capabilities of Large Language Models (LLMs), with the training data often being synthesized. The current data synthesis process generally involves sampling a set of tools, formulating a requirement based on these tools, and generating the call statements. However, tools sampled randomly lack relevance, making them difficult to combine and thus reducing the diversity of the data. Additionally, current work overlooks the coherence between turns of dialogues, leading to a gap between the synthesized data and realworld scenarios. To address these issues, we propose a Graph-based Sampling strategy to sample more relevant tool combinations, and a Planned-generation strategy to create plans that guide the synthesis of coherent dialogues.
A typical tool-calling data synthesis process involves three steps: (1) selecting candidate tool(s), (2) generating requirements based on those tools, and (3) creating the call statements (Tang et al., 2023; Liu et al., 2024b). However, the data synthesized through this method often lacks realism and naturalness. Randomly sampled tools frequently fail to interconnect, making it difficult to combine them for complex tasks. Consequently, the requirements for subsequent synthesis tend to be simplistic, which reduces the diversity and complexity of the data. Furthermore, much of the existing research focuses solely on generating single-turn tool-calling instructions, neglecting the coherence between dialogue turns (Qin et al., 2023; Yang et al., 2023). In real-world interactions, LLMs typically engage with users through dialogues rather than single-round Q&A sessions. This creates a gap between Q&A-type training data and its practical application, ultimately diminishing the naturalness of the synthesized data.