Agentic and Multi-Agent Systems

Why does random tool sampling produce unrealistic synthetic training data?

Tool-calling datasets generated through random sampling and single-turn framing lack the complexity and coherence of real deployment. This explores what structural choices in data synthesis determine whether models can learn realistic tool composition.

Note · 2026-05-03 · sourced from Action Models

The standard pipeline for generating tool-calling training data — sample tools, formulate a requirement, generate the call statement — has two defects that together cap the realism of the resulting data. First, randomly sampled tools frequently fail to interconnect, which means the synthesized requirements default to simplistic single-tool tasks because there is no plausible composition path across the random set. This collapses both diversity and complexity in the resulting dataset.

Second, the dominant framing treats tool calls as single-turn Q&A rather than dialogue. Real users interact through multi-turn conversation, so models trained on Q&A-shaped data carry a gap to deployment that surfaces as unnaturalness across turns.

ToolFlow's response is two-part. Graph-Based Sampling selects tools that are actually relevant to each other — so a synthesized requirement can credibly combine them, restoring the complexity ceiling that random sampling caps. Planned-Generation creates a plan that guides the dialogue across turns, so coherence between turns becomes a property of the generation rather than an accident.

The implication for anyone synthesizing agent training data: the choice of how tools are sampled is not a hyperparameter but a structural determinant of how complex the synthesized tasks can be. And single-turn framing is not just simpler — it is a different distribution from real deployment, which is multi-turn and coherent across turns.

This is the data-side counterpart to Where do traditional function calling systems actually break down?'s deployment-side critique: random sampling at synthesis produces simplistic tasks, which (combined with single-turn framing) yields models that fail to compose calls across turns. ToolFlow's graph-sampling move parallels Can synthetic dialogues become realistic through layered diversity? — multiplicative structured sampling beats single-axis random sampling for dialogue synthesis generally.


Source: Action Models

Related concepts in this collection

Concept map
15 direct connections · 128 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

tool-calling data synthesis fails through random tool sampling and single-turn framing — graph-based sampling and planned dialogue restore realism