Language Understanding and Pragmatics Design & LLM Interaction LLM Reasoning and Architecture

Why do deep research agents fabricate scholarly content?

Explores whether AI research agents deliberately invent plausible-sounding academic constructs to meet user demands for depth and comprehensiveness, and what drives this behavior.

Note · 2026-03-28 · sourced from Agentic Research
How does test-time scaling work for individual research agents? What kind of thing is an LLM really?

FINDER/DEFT (2025) presents the first failure taxonomy specifically for deep research agents, built through grounded theory methodology with human-LLM co-annotation and inter-annotator reliability validation. Based on ~1,000 reports from mainstream deep research agents, the taxonomy identifies 14 fine-grained failure modes organized into three core categories.

Reasoning failures (4 modes):

Retrieval failures (5 modes):

Generation failures (5 modes):

Strategic Content Fabrication is the most consequential finding. Over 39% of failures occur in content generation, with fabrication as the dominant mode. The root cause analysis reveals the mechanism: when prompts demand "deep," "systematic," and "comprehensive" analysis, the model engages in "generative extrapolation to fulfill depth" — fabricating specific future-dated examples, inventing plausible product names, and creating false epistemic foundations. This is not accidental hallucination but strategic fabrication in service of appearing thorough.

This connects directly to Should we call LLM errors hallucinations or fabrications? — DEFT's "Strategic Content Fabrication" is fabrication with a PURPOSE: satisfying the evaluator's demand for depth. Since Does polished AI output trick audiences into trusting it?, deep research agents are the most sophisticated instantiation of style-for-thought: they produce reports that mimic scholarly rigor down to citations and methodology descriptions, all fabricated.

The root cause "mimicry without substance" — "the agent correctly identified the linguistic style and structure of a software evaluation report... lacking the ability to conduct such research, it defaults to generating text that mimics the expected output" — is a precise description of the custodial challenge. Since How does LLM-mediated search change what expertise requires?, the expert custodian must now detect strategic fabrication within reports that are specifically designed to look authoritative.


Source: Agentic Research

Related concepts in this collection

Concept map
15 direct connections · 145 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

deep research agents fail through 14 fine-grained modes across reasoning retrieval and generation — strategic content fabrication accounts for 39 percent of failures