MCP-Zero: Proactive Toolchain Construction for LLM Agents from Scratch

Paper · arXiv 2506.01056 · Published June 1, 2025

Function-calling has enabled large language models (LLMs) to act as tool-using agents, but injecting thousands of tool schemas into the prompt is costly and error-prone. We introduce MCP-Zero, a proactive agent framework that lets the LLM itself decide when and which external tools to retrieve, thereby assembling a task-specific toolchain from scratch. The framework is built upon three components: (1) Proactive Tool Request, where the model emits a structured ⟨tool assistant⟩ block that explicitly specifies the desired server and task; (2) Hierarchical Vector Routing, a coarse-to-fine retrieval algorithm that first selects candidate servers and then ranks tools within each server based on the semantic similarity; (3) Iterative Proactive Invocation, enabling multi-round, cross-domain toolchain construction with minimal context overhead, and allowing the model to iteratively revise its request when the returned tools are insufficient.

The rapid advancement of large language models (LLMs) has catalyzed a paradigm shift from pure text understanding and generation to sophisticated tool-using agents capable of interacting with external systems [1, 8, 30]. With the introduction of function-calling mechanisms, LLMs have transcended the boundaries of their parametric knowledge, enabling them to leverage external tools, third-party APIs, and code execution environments to accomplish complex reasoning chains and real-world tasks [14, 18, 21, 26]. However, as the ecosystem of available tools continues to expand exponentially, the conventional approach of injecting comprehensive tool schemas into the system prompt has emerged as a critical bottleneck (Figure 1a). Mainstream practices typically embed complete JSON-Schema of all available tools within the system prompt, leading to substantial context overhead, as shown in Figure 2. For instance, the GitHub MCP server encompasses 26 tools requiring over 4,600 tokens, substantially compressing the available context window for actual task content.

To mitigate context window constraints, recent approaches have adopted retrieval-based strategies that match and inject only the most relevant tool based on semantic similarity with user queries [7, 15]. However, these methods face significant limitations when dealing with complex, multi-step tasks that require coordination across multiple domains (Figure 1b). For example, a query like “Debug the file” requires coordinating tools across different domains: filesystem for file access, code generation for updates, and command execution for debugging. Single-tool retrieval fails to capture such complex workflows, as the initial query cannot determine all required tools across domains. In summary, while retrieval methods demonstrate efficacy in single-step, single-domain scenarios, they exhibit fundamental limitations in realistic multi-turn, multidomain agent environments: (1) passive retrieval - external systems select tools based on initial query rather than allowing models to actively express their evolving needs; (2) semantic misalignment between colloquial user inputs and formal API documentation creates distributional mismatches that reduce retrieval precision; and (3) single-round invocation - tool retrieval is performed only once per query, failing to accommodate the progressive refinement of subtask requirements or iterative correction when initially retrieved tools prove inadequate.

Modern LLMs possess powerful capabilities in chainof- thought reasoning, self-reflection, and planning. Rather than having external systems passively select tools based on initial queries, we propose letting the model proactively analyze the context, identify capability gaps, and request tools when external assistance is needed. Based on this insight, we introduce MCP-Zero (Figure 1c), a proactive retrieval framework with the following key components:

Proactive Tool Request. Unlike traditional approaches that passively wait for external retrieval, we return the authority of tool requirement specification to the LLM itself. When external tool assistance is needed, the model proactively generates structured tool request declarations in the following format:

server: ... # Platform/permission domain

tool: ... # Operation type + target

</tool assistant>

This mechanism enables spontaneous need expression while ensuring semantic consistency between requests and tool documentation, avoiding colloquial query issues and improving retrieval quality.

Hierarchical Vector Routing. The system employs two-stage retrieval: first filtering candidate servers by platform requirements, then matching specific tools within selected servers. Only top-k tool descriptions are returned, significantly reducing context overhead.

Iterative Proactive Invocation. The model can initiate multiple tool requests throughout the conversation for different subtasks, enabling cross-server toolchain construction from scratch, while introducing only necessary tools at each step. If returned tools are insufficient, the model can refine its requests and reinitiate retrieval, providing fault tolerance and self-correction capabilities.