StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization

Paper · arXiv 2410.08815 · Published October 11, 2024

Retrieval-augmented generation (RAG) is a key means to effectively enhance large language models (LLMs) in many knowledge-based tasks. However, existing RAG methods struggle with knowledge-intensive reasoning tasks, because useful information required to these tasks are badly scattered. This characteristic makes it difficult for existing RAG methods to accurately identify key information and perform global reasoning with such noisy augmentation. In this paper, motivated by the cognitive theories that humans convert raw information into various structured knowledge when tackling knowledge-intensive reasoning, we proposes a new framework, StructRAG, which can identify the optimal structure type for the task at hand, reconstruct original documents into this structured format, and infer answers based on the resulting structure.

Knowledge-intensive reasoning tasks often require a large amount of useful information which is dispersed across many locations in the provided documents,

From a human perspective, people do not solve knowledge-intensive reasoning tasks by simply reading raw texts (Johnson-Laird, 1986; Paivio, 1990). As suggested in cognitive load theory, humans typically summarize scattered information from documents into structured knowledge, which is then used to shorten the reasoning path and enable more accurate judgement (Sweller, 1988; Chandler & Sweller, 1991). Furthermore, cognitive fit theory shows that humans prefer using different types of structured knowledge for various tasks, such as tables for statistical analysis tasks and graphs for long-chain inference (Vessey, 1991; Umanath & Vessey, 1994). In recent years, the rapid development of LLMs has laid the foundation for directly using these models to construct various knowledge structures (Li et al., 2023; Jain et al., 2024). Meanwhile, many studies suggest that LLMs share similarities with humans in how they utilize information and solve complex problems (Wei et al., 2022; Li et al., 2024b). These inspire us to explore whether LLMs can adopt human-like thinking processes to transform scattered information into various structure formats during inference, thereby better serving knowledge-intensive reasoning tasks.

the StructRAG framework consists of three modules designed to sequentially identify the most suitable structure type, construct structured knowledge in that format, and utilize that structured knowledge to infer the final answer. First, recognizing that different structure types are suited for different tasks, a hybrid structure router is proposed to determine the most appropriate structure type based on the question and document information of the current task. Second, given that constructing structured knowledge is complex and requires strong comprehension and generation abilities, an LLM-based scattered knowledge structurizer is employed to convert raw documents into structured knowledge in the optimal type.

core aspect of StructRAG is the hybrid structure router’s ability to accurately select the most suitable structure type for each input task. To equip the router with this capability, we propose a training method for the hybrid structure router. Inspired by successful use of reinforcement learning in training LLMs for decision-making tasks (Havrilla et al., 2024; OpenAI, 2024), we employ the DPO algorithm to train the router module, which follows reinforcement learning principles without requiring additional reward models (Rafailov et al., 2023; Allam, 2024). However, there is insufficient training data for the model to learn how to choose the optimal structure type, and collecting enough such data in the real world is also challenging. To address this, we introduce a novel pipeline for constructing preference training data that involves task synthesis, solution simulation, and preference judgment to create high-quality synthetic data, thereby enhancing the router’s ability to select the appropriate structure type.

From a human perspective, when solving knowledge-intensive reasoning tasks, individuals tend to use the type of structured knowledge that best matches the specific requirements of faced task. To this end, StructRAG incorporates a hybrid structure routerRto select the optimal structure type. Specifically, the router leverages the question q and the core content C of documents D to make its decision and generate the most suitable structure type t, as it is impractical to process the entire set of documents at once. t = R(q,C), where C = {c(i)}mi =1 (2) The core content C is the concentrate of the titles or the first few sentences from each document d(i). In our work, there are five candidate structure types for five kinds of knowledge-intensive tasks: table for statistical tasks, graph for long-chain tasks, algorithm for planning tasks, catalogue for summarizing tasks, and chunk for simple single-hop tasks.

After identifying the most suitable structure type, StructRAG extracts the textual knowledge scattered across raw documents and reconstructs it into structured knowledge. This process requires a comprehensive understanding of all raw documents and sprecise formatting of the information, making it a challenging and flexible problem.