MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections

Paper · arXiv 2502.00322 · Published February 1, 2025
Agents Multi

Query-focused summarization (QFS) gives a summary of documents to answer a query. Past QFS work assumes queries have one answer, ignoring debatable ones (Is law school worth it?). We introduce Debatable QFS (DQFS), a task to create summaries that answer debatable queries via documents with opposing perspectives; summaries must comprehensively cover all sources and balance perspectives, favoring no side. These goals elude LLM QFS systems, which: 1) lack structured content plans, failing to guide LLMs to write balanced summaries, and 2) use the same query to retrieve contexts across documents, failing to cover all perspectives specific to each document’s content. To overcome this, we design MODS, a multi-LLM framework mirroring human panel discussions. MODS treats documents as individual Speaker LLMs and has a Moderator LLM that picks speakers to respond to tailored queries for planned topics. Speakers use tailored queries to retrieve relevant contexts from their documents and supply perspectives, which are tracked in a rich outline, yielding a content plan to guide the final summary.

we propose debatable QFS (DQFS). As input, DQFS uses documents and a debatable query, defined as a yes/no query where documents have opposing, equally-valid2 “yes” and “no” perspectives (Fig 1). Such queries are broad (Is law school worth it?), and decomposing broad concepts into more specific topics (cost, job market)

Multi-LLM summarizers (Chang et al., 2024; Adams et al., 2023), which use LLMs to summarize documents individually into intermediate outputs before merging them with another LLM call, are better choices, as they represent documents more equally. However, they have two key issues. First, they use the same topic or query as input to summarize each document, which is subpar if we wish to use retrieval in summarization to reduce LLM costs. Queries unaligned to a document’s unique content and expertise will fail to retrieve all of its most relevant contexts (Sachan et al., 2022); this reduces the total number of perspectives in the intermediate output, resulting in lower coverage. Second, their intermediate outputs are unstructured, free-form texts, which are hard for the LLM to combine into a final output. Free-form text needs extra reasoning to extract, classify, and compare the texts’ perspectives (Barrow et al., 2021), steps that distract from the final goal of generating a balanced summary. To solve our issues, we build MODS (Fig 2), a multi-LLM system using a Mixture of Document Speakers. Inspired by panel discussions (Doumont et al., 2014), MODS has a Speaker LLM for each document that responds to queries using its document, and a Moderator LLM that decides when and how speakers respond. Specifically, MODS: 1) plans an agenda of topics for the outline (§4.1); 2) picks a subset of speakers with relevant perspectives for each topic and tailors them a query (§4.2); and 3) asks each speaker to obtain its document’s context relevant to the tailored query and give the context’s “yes” and “no” perspectives for the topic.

We propose debatable query-focused summarization, a new task to help users navigate yes/no queries in documents with opposing perspectives. 2) We design MODS, a multi-LLM DQFS system that treats documents as individual LLM speakers, uses a moderator to tailor queries to apt speakers, and tracks speaker perspectives in an outline. 3) We release DebateQFS for DQFS and citation metrics to capture summary coverage and balance. 4) Experiments show MODS beats baselines by 38-58% in topic paragraph coverage and balance, while annotators find MODS’s summaries maintain readability and better balance perspectives.