Revisiting RAG Ensemble: A Theoretical and Mechanistic Analysis of Multi-RAG System Collaboration
theoretical analysis, we provide the first explanation of the RAG ensemble framework from the perspective of information entropy. In terms of mechanism analysis, we have explored the RAG ensemble framework from both the pipeline and module levels. We carefully select four different pipelines (Branching, Iterative, Loop, and Agentic) and three different modules (Generator, Retriever, and Reranker) to solve seven different research question
These results collectively demonstrate that single-RAG systems struggle with task generalization, whether measured by performance or output perplexity. This motivates our core research question: How can we aggregate multiple RAG systems to enhance generalization capability for complex, heterogeneous tasks? To address this issue, one intuitive approach is to perform adaptive fine-tuning on the model to enhance its ability in RAG tasks. However, such methods may interfere with the model’s inherent capabilities and come with higher training costs. Another common strategy is to treat the model as a router, selecting the optimal single RAG system’s answer and discarding the remaining systems. However, we consider that the unselected answers may still contain valuable information for the task. Recent research has begun to explore component ensemble methods. Some studies suggest that meta-search engines, by aggregating results from multiple search engines, can provide more relevant information [41, 49]. Additionally, numerous studies focus on model-level ensemble strategies. We argue that, compared to routing methods, ensemble strategy can better make full use of the useful information in each subsystem, improving the quality of the final results. However, existing methods mainly focus on multi-component ensemble on single level, while RAG tasks involve more complex input flows and system structures. Unfortunately, both in terms of theoretical modeling and mechanism explanation, there is still a significant lack of systematic research on ensemble across multiple RAG systems, which significantly limits its development and application.
(2) From a mechanism analysis perspective: To achieve a comprehensive exploration of RAG ensemble, we conduct in-depth investigations of seven different research questions from both the pipeline and module levels. At the system level, we carefully select four different RAG pipelines (Branching, Iterative, Loop, and Agentic) for ensemble research. Additionally, we conduct ensemble experiments on closed-source RAG frameworks to further explore the characteristics of RAG ensemble. At the module level, we conduct experimental research on the retriever, reranker, and generator of the standard RAG framework. We carefully select three retrievers and five generation models for the experiments, and delve into the characteristics of applying generative rerankers to ensemble tasks. Moreover, our experiments cover a wide range of task sets, including single-hop tasks, multi-hop tasks, multiple-choice tasks, summarization tasks, and tasks in vertical domains, all of which have detailed ensemble analysis.
Our main findings include:
• RAG ensemble demonstrates clear advantages in both the framework type and the granularity of the ensemble. This reflects the good generalizability of the RAG ensemble method.
• In a significant portion of ensemble tasks, the RAG ensemble method exhibits scaling-up characteristics, meaning that increasing the external information has a notable positive impact on the final ensemble result. However, this characteristic also depends on the model’s strong resistance to information interference.
• The ensemble model shows a preference for certain groups of input information, and this preference becomes more pronounced as task difficulty increases.
Model Ensemble in LLM. Ensemble of LLMs has significantly outperformed individual models by leveraging the strengths of different systems. Existing ensemble approaches can be mainly categorized into three types: (1) A series of studies have fine-tuned external routers to select the most suitable LLM for specific inputs, enabling model selection before inference [40, 53]; (2) Another branch of efforts involves multiple models processing inputs incrementally and combining their outputs during decoding, showcasing strong collaborative potential [20, 36]; and (3) Some researchers focus on allowing each model to process inputs independently, then selecting the best response [5, 25]. To further improve the efficiency of LLMs ensemble, some works employ input compression [26, 34, 39] and speculative decoding [21, 69] to accelerate model inference. Unfortunately, these studies have not systematically examined the application of ensemble techniques in the RAG domain, and integrating information at the model level alone is insufficient to bridge system gaps. Our study makes the first attempt to integrate all external knowledge and outputs from different RAG systems to maximize performance.