Ranking Free RAG: Replacing Re-ranking with Selection in RAG for Sensitive Domains

Paper · arXiv 2505.16014 · Published May 21, 2025
RAGDomain Specialization

Traditional Retrieval-Augmented Generation (RAG) pipelines rely on similarity-based retrieval and re-ranking, which depend on heuristics such as top-k, and lack explainability, interpretability, and robustness against adversarial content. To address this gap, we propose a novel method METEORA that replaces re-ranking in RAG with a rationale-driven selection approach. METEORA operates in two stages. First, a general-purpose LLM is preference-tuned to generate rationales conditioned on the input query using direct preference optimization. These rationales guide the evidence chunk selection engine, which selects relevant evidence in three stages: pairing individual rationales with corresponding retrieved evidence for local relevance, global selection with elbow detection for query-adaptive cutoff, and context expansion via neighboring evidence. This process eliminates the need for top-k heuristics. The rationales are also used for a consistency check using Verifier LLM to detect and filter poisoned or misleading content for safe generation. The framework provides explainable and interpretable evidence flow by using rationales consistently across both selection and verification. Our evaluation across six datasets spanning legal, financial, and academic research domains, METEORA improves generation accuracy by 33.34% while using approximately 50% fewer evidence than state-of-the-art re-ranking methods. In adversarial settings, METEORA significantly improves the F1 score from 0.10

A typical RAG pipeline comprises a retriever, a re-ranker, and a generator. The re-ranker is crucial to select contextually relevant top-k chunks or evidence before passing them to the generator [Glass et al., 2022].

Despite the growing popularity of RAG, its black-box nature poses challenges in sensitive domains like law, finance, and academic research [Zhou et al., 2024, Xue et al., 2024]. Existing re-ranking methods suffer mainly from three limitations. First, existing re-ranking approaches lack interpretability, as they rely on opaque similarity scores and a manually defined number k to select evidence, without providing justification for why particular evidence was chosen. Second, re-rankers are not robust to adversarial attacks such as the injection of irrelevant or poisoned content, which can corrupt the selected context and negatively impact the final generation. Third, these methods depend on heuristic decisions, particularly the choice of top-k, which is often query-specific and difficult to determine in advance. Selecting too few evidence may omit critical context, while selecting too many can introduce noise, degrading answer quality [Leng et al., 2024]. Such limitations raise serious concerns in high-stakes domains where factual accuracy and robustness are critical [Barron et al., 2024, Bhushan et al., 2025].

Recent work such as RAG2 [Sohn et al., 2025] and RankRAG [Yu et al., 2024] has attempted to address these challenges. RAG2 uses rationales to improve retriever capabilities to identify relevant evidence. However, it still relies on a re-ranking step to reorder retrieved evidence, thereby inheriting the limitations of traditional re-rankers, including a dependence on fixed top-k cutoffs and opaque scoring mechanisms. Similarly, RankRAG instruction-tunes a single LLM to both rank retrieved contexts and generate answers based on the retrieved passages. Because it leverages the LLM’s parametric knowledge, RankRAG suffers from limited interpretability and lacks robust filtering mechanisms.

Rationale generator aim to generates a set of rationales conditioned on the query. Figure 3 shows an example of a rationale generated for a real-world query from the PrivacyQA dataset. To enables effective learning without manual annotation while ensuring high-quality rationale generation, we use an off-the-shelf LLM and preference-tune it to generate rationales aligned with a query. We automatically construct preference dataset by pairing the query with rationales that led to correct evidence selection as positive samples (rw), and with other rationales as negative samples (rl).

Pairing Rationales with evidence. Rationale-based pairing computes the similarity score (Ev) between each pair of a rationale and the evidences from documents, ej , and selects the evidence with the highest match. Rationale and evidence are encoded using SBERT model to compute cosine similarity. Such a method ensures to obtain higher precision.

Example

Query: TickTick: To Do List with Reminder, Day Planner’s privacy policy; can it view my real name?

Rationale: Search for terms like “real name”, “PII”, or “user information”, especially in sections covering data collection, use, or disclosure. Flagging Instructions:

Flag the chunk if it contains internally inconsistent language about real name usage, or if it contradicts other verified parts of the policy.

To make RAG robust against adversarial content, METEORA incorporates a Verifier LLM that filters selected evidence Es before generation. The Verifier evaluates each evidence using the input query and its associated rationale, which includes embedded Flagging Instructions. Evidences are flagged for (i) factual violations, when the content contradicts established facts; (ii) contradiction, when a evidence is logically inconsistent with other verified evidences; and (iii) instruction violations, when a evidence fails to meet the criteria embedded in the rationale. Flagged evidence are discarded, and only the filtered set is passed to the generator.