HierSearch: A Hierarchical Enterprise Deep Search Framework Integrating Local and Web Searches

Paper · arXiv 2508.08088 · Published August 11, 2025
Deep ResearchNovel ArchitecturesReasoning o1 o3 SearchTasks Planning

Recently, large reasoning models have demonstrated strong mathematical and coding abilities, and deep search leverages their reasoning capabilities in challenging information retrieval tasks. Existing deep search works are generally limited to a single knowledge source, either local or the Web. However, enterprises often require private deep search systems that can leverage search tools over both local and the Web corpus. Simply training an agent equipped with multiple search tools using flat reinforcement learning (RL) is a straightforward idea, but it has problems such as low training data efficiency and poor mastery of complex tools. To address the above issue, we propose a hierarchical agentic deep search framework, HierSearch, trained with hierarchical RL. At the low level, a local deep search agent and a Web deep search agent are trained to retrieve evidence from their corresponding domains. At the high level, a planner agent coordinates low-level agents and provides the final answer. Moreover, to prevent direct answer copying and error propagation, we design a knowledge refiner that filters out hallucinations and irrelevant evidence returned by low-level agents. Experiments show that HierSearch achieves better performance compared to flat RL, and outperforms various deep search and multisource retrieval-augmented generation baselines in six benchmarks across general, finance, and medical domains.

Existing deep search works often equip LRMs with a local corpus search tool (Chen et al. 2025; DeepSeek-AI et al. 2025; Song et al. 2025) or a Web search tool (Li et al. 2025a,b; Zheng et al. 2025). However, a common scenario for most enterprises is that their private deep search system interacts with both local knowledge sources and Web knowledge sources (Yu et al. 2025). To be specific, enterprises often possess private domain-specific documents. Existing methods for building private RAG systems usually involve processing them into a text chunk corpus and constructing knowledge graphs (Edge et al. 2024; Guo et al. 2024; Zhao et al. 2025). Web knowledge sources generally include search engines and web pages. Generally speaking, local knowledge sources are more professional and targeted. Meanwhile, Web knowledge sources are more comprehensive and timely (Zhao et al. 2024b; Wang et al. 2024a). This deep search scenario with multiple knowledge sources poses challenges to existing methods: Deep search agents need to selectively use different knowledge sources based on user questions and the characteristics of knowledge sources, and cross-supplement missing knowledge.

A straightforward solution for the above challenge is equipping the deep search agent with all search tools for all knowledge sources and conducting flat reinforcement learning (RL). However, the flat RL solution is not suitable for the following reasons: (1) Numerous search tools result in a large action space during RL, leading to low training efficiency and instability. (2) Search tools within the same knowledge source have stronger synergy (e.g., browsing a Web page via a URL retrieved by a search engine or retrieving text chunks mentioning an entity from the knowledge graph), while that between tools across different knowledge sources is weaker. However, flat RL fails to effectively utilize this characteristic. (3) Moreover, preliminary experiments show that during flat RL, rewards encourage the agent to search more frequently in easily retrievable knowledge sources, while less frequently in hard ones (Web search is more difficult in our setting due to a wider search scope and more noise). Thus, the training efficiency of flat RL for the difficult knowledge source is poor due to limited exploration of the corresponding tools.

To address the above issues, we propose a hierarchical agentic deep search paradigm, HierSearch, which comprises a local deep search agent, a Web deep search agent, and a planner agent. Two deep search agents interact directly with search tools within their knowledge sources and retrieve evidence for the planner agent. Specifically, the local deep search agent has access to the local text chunk corpus and the local knowledge graph. The Web deep search agent has access to the Web search engine and online web pages. Meanwhile, the planner agent drafts search plans, coordinates search agents, analyzes evidence provided by search agents, and provides the final answer.

Accordingly, we leverage a hierarchical reinforcement learning (HRL) (Pateria et al. 2022) algorithm to train this hierarchical agentic framework. Also, we use Group Relative Policy Optimization (GRPO) (Shao et al. 2024) and rule-based rewards. HRL overcomes the challenges above, mainly manifested in: (1) In the first stage, we train lowlevel agents, the local deep search agent and the Web deep search agent separately. They master search tools within the same domain well, because the number of tools is limited and the tools are closely related. (2) In the second stage, we train the high-level planner agent, equipped with both deep search agents. Well-trained deep search agents mask the complex interaction process with search tools, and greatly lower the difficulty of knowledge acquisition. The planner agent can learn search planning and knowledge integration across multiple knowledge sources faster and better.

In the planner agent’s training stage, we find that directly providing the complete trajectories of deep search agents would introduce irrelevant search results and the agents’ hallucinatory reasoning contents. To address this, we design a reasoning-aware knowledge refiner. This refiner first selects the evidence that contributes to each round of reasoning by the deep search agent. Second, it selects the evidence helpful to the agent’s conclusion from an overall perspective.