Knowledge Retrieval and RAG

Can tailoring queries per document improve debatable summarization?

When summarizing documents with opposing perspectives on a topic, does adapting the query to each document's unique content retrieve more balanced viewpoints than using a single uniform query?

Note · 2026-02-23 · sourced from Agents Multi
What makes multi-agent teams actually perform better? How should retrieval and reasoning integrate in RAG systems?

When a query has opposing but equally valid perspectives across documents ("Is law school worth it?"), standard summarization fails in two specific ways. First, using the same query to retrieve contexts from every document misses document-specific perspectives — a query about "career outcomes" may not retrieve a document's strongest arguments about "personal growth." Second, merging free-form intermediate outputs requires extra reasoning to extract, classify, and compare perspectives, distracting from balanced summary generation.

MODS (Moderating a Mixture of Document Speakers) applies a panel discussion metaphor. Each document gets its own Speaker LLM that responds to tailored queries using only its document's content. A Moderator LLM plans an agenda of topics, selects relevant speakers per topic, and tailors a specific query to each selected speaker. Speakers retrieve their document's context relevant to the tailored query and report both "yes" and "no" perspectives. The moderator tracks all perspectives in a structured outline, which guides the final summary.

The results are substantial: 38-58% improvement in topic paragraph coverage and balance over baselines. The mechanism is the tailored query — by asking each document-speaker a question aligned to that document's unique expertise, MODS retrieves perspectives that a uniform query would miss. This is a retrieval problem disguised as a summarization problem.

The design insight generalizes beyond debatable summarization. Any task where multiple sources have different relevant expertise benefits from source-specific querying rather than uniform querying. Since Do hierarchical retrieval architectures outperform flat ones on complex queries?, MODS extends this principle: not just separating planning from synthesis, but also specializing the query per source.

The connection to Does including all conversation history actually help retrieval? is direct: MODS solves the same problem at the document level that selective history solves at the conversation level — irrelevant context degrades retrieval, and the fix is source-aware filtering.

Complementary approach — reranking-based perspective summarization: Where MODS specializes at the retrieval stage (tailored queries per document), reranking-based methods operate at the generation stage: generate multiple candidate summaries, then rerank for coverage and faithfulness. Reranking consistently outperforms prompting frameworks for perspective summarization — even when prompting is scaled to high-resource settings. DPO on reranked self-generated summaries further boosts both attributes, with the most pronounced gains in faithfulness. Additionally, LM-based evaluation metrics (AlignScore, prompting-based scoring) substantially outperform traditional metrics (ROUGE, BERTScore) for measuring perspective summary quality. MODS and reranking address different bottlenecks: MODS ensures diverse perspectives are retrieved, reranking ensures the generated summary faithfully represents them.


Source: Agents Multi

Related concepts in this collection

Concept map
14 direct connections · 129 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

debatable query-focused summarization requires per-document speaker specialization with moderator orchestration