Agentic Systems and Planning

Where does AI assistance become unreliable in research?

This explores whether AI capability follows a sharp boundary in research tasks, and what determines which side of that line a task falls on. Understanding this matters because it reveals where humans must stay in control.

Note · 2026-05-28 · sourced from Agentic Research

The roadmap's first finding is that AI capability is not uniformly distributed across research work — it is sharply stage-dependent. Where tasks are structured, externally checkable, and tool-mediated (literature retrieval, drafting, figure generation, review support), AI is reliable. Where tasks demand genuine novelty, implicit domain knowledge, long-horizon reasoning, or scientific judgment (open-ended ideation, research-level experiments), capability drops sharply and autonomy becomes unreliable.

This is more useful than a blanket "AI is/isn't good at research" claim because it predicts where to draw the human-machine boundary rather than whether to draw one. The survey documents the failure pattern concretely: generated ideas often degrade after implementation, research code lags far behind pattern-matching benchmarks, and end-to-end autonomous systems have not consistently reached major-venue acceptance standards.

The counterpoint is that the boundary moves — yesterday's "unreliable autonomy" zone (e.g. coding) keeps shrinking. But the boundary's shape is stable even as it shifts: it always tracks checkability. Tasks with an external oracle to verify against fall on the reliable side; tasks requiring judgment with no ground truth stay on the unreliable side. Therefore the design principle is durable even though the specific task assignments are not — which is why this pairs naturally with the lifecycle verification gap: the boundary is exactly the line where verification becomes impossible.


— "AI for Auto-Research: Roadmap & User Guide", https://arxiv.org/abs/2605.18661

Related concepts in this collection

Concept map
16 direct connections · 124 in 2-hop network ·medium cluster Open in graph ↗

Click a node to walk · click center to open · click Open in graph to see this note in the full knowledge graph

your link semantically near linked from elsewhere
Original note title

a sharp stage-dependent boundary separates reliable ai assistance from unreliable autonomy in research