Can tailoring queries per document improve debatable summarization?

When summarizing documents with opposing perspectives on a topic, does adapting the query to each document's unique content retrieve more balanced viewpoints than using a single uniform query?

Note · 2026-02-23 · sourced from Agents Multi

When a query has opposing but equally valid perspectives across documents ("Is law school worth it?"), standard summarization fails in two specific ways. First, using the same query to retrieve contexts from every document misses document-specific perspectives — a query about "career outcomes" may not retrieve a document's strongest arguments about "personal growth." Second, merging free-form intermediate outputs requires extra reasoning to extract, classify, and compare perspectives, distracting from balanced summary generation.

MODS (Moderating a Mixture of Document Speakers) applies a panel discussion metaphor. Each document gets its own Speaker LLM that responds to tailored queries using only its document's content. A Moderator LLM plans an agenda of topics, selects relevant speakers per topic, and tailors a specific query to each selected speaker. Speakers retrieve their document's context relevant to the tailored query and report both "yes" and "no" perspectives. The moderator tracks all perspectives in a structured outline, which guides the final summary.

The results are substantial: 38-58% improvement in topic paragraph coverage and balance over baselines. The mechanism is the tailored query — by asking each document-speaker a question aligned to that document's unique expertise, MODS retrieves perspectives that a uniform query would miss. This is a retrieval problem disguised as a summarization problem.

The design insight generalizes beyond debatable summarization. Any task where multiple sources have different relevant expertise benefits from source-specific querying rather than uniform querying. Since Do hierarchical retrieval architectures outperform flat ones on complex queries?, MODS extends this principle: not just separating planning from synthesis, but also specializing the query per source.

The connection to Does including all conversation history actually help retrieval? is direct: MODS solves the same problem at the document level that selective history solves at the conversation level — irrelevant context degrades retrieval, and the fix is source-aware filtering.

Complementary approach — reranking-based perspective summarization: Where MODS specializes at the retrieval stage (tailored queries per document), reranking-based methods operate at the generation stage: generate multiple candidate summaries, then rerank for coverage and faithfulness. Reranking consistently outperforms prompting frameworks for perspective summarization — even when prompting is scaled to high-resource settings. DPO on reranked self-generated summaries further boosts both attributes, with the most pronounced gains in faithfulness. Additionally, LM-based evaluation metrics (AlignScore, prompting-based scoring) substantially outperform traditional metrics (ROUGE, BERTScore) for measuring perspective summary quality. MODS and reranking address different bottlenecks: MODS ensures diverse perspectives are retrieved, reranking ensures the generated summary faithfully represents them.

Source: Agents Multi

Related concepts in this collection

Do hierarchical retrieval architectures outperform flat ones on complex queries? Explores whether separating query planning from answer synthesis into distinct architectural components improves performance on multi-hop retrieval tasks compared to unified single-pass approaches.
MODS extends hierarchical architecture to source-specific querying
Does including all conversation history actually help retrieval? Conversational search systems typically use all previous context to understand current queries. But do topic switches in multi-turn conversations inject noise that degrades performance rather than helps it?
same principle at document level
Can AI agents communicate efficiently in joint decision problems? When humans and AI must collaborate to solve optimization problems under asymmetric information, what communication patterns enable effective coordination? Current LLMs struggle with this—why?
moderator-speaker as asymmetric information management
Why does vanilla RAG produce shallow and redundant results? Standard RAG systems get stuck in a single semantic neighborhood because their initial query determines what documents are discoverable. The question asks whether fixed retrieval strategies fundamentally limit knowledge depth compared to iterative exploration.
MODS as an alternative to iterative expansion

Concept map

14 direct connections · 129 in 2-hop network ·dense cluster

Can tailoring queries per document improve debat… Do hierarchical retrieval architectures outperform… Does including all conversation history actually h… Can AI agents communicate efficiently in joint dec… Why does vanilla RAG produce shallow and redundant…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

debatable query-focused summarization requires per-document speaker specialization with moderator orchestration