Agentic Research and Workflows

Topic · 16 papers

A Survey on Context-Aware Multi-Agent Systems: Techniques, Challenges and Future Directions
Research interest in agents is rising, particularly in Artificial Intelligence (AI) techniques such as deep multi-modal representation learning, deep graph learning, and deep reinforcement learning, w…
AI Can Learn Scientific Taste
Great scientists have strong judgement and foresight, closely tied to what we call scientific taste. Here, we use the term to refer to the capacity to judge and propose research ideas with high potent…
Agentic AI and the next intelligence explosion
By its nature, intelligence is high-dimensional and relational, not a single quantity that must be unambiguously less or greater than human scale. In fact, it is unclear what we even mean by “human sc…
Agents of Chaos
LLM-powered AI agents are rapidly becoming more capable and more widely deployed (Masterman et al., 2024; Kasirzadeh & Gabriel, 2025). Unlike conventional chat assistants, these systems are increasing…
Bilevel Autoresearch: Meta-Autoresearching Itself
If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We take this idea literally: we use an autoresearch loop to optimize the autoresearch loop. Every exi…
Deep Research: A Systematic Survey
Abstract: Large language models (LLMs) have rapidly evolved from text generators into powerful problem solvers. Yet, many open tasks demand critical thinking, multi-source, and verifiable outputs, whi…
DeepDive: Advancing Deep Search Agents with Knowledge Graphs and Multi-Turn RL
First, data-wise, most existing QA datasets usually feature relatively simple questions that do not reflect true “hard-to-find” cases. For example, questions in HotpotQA [Yang et al., 2018] can often …
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research
Deep research systems represent an emerging class of agentic information retrieval methods that generate comprehensive and well-supported reports to complex queries. However, most existing frameworks …
How Far Are We from Genuinely Useful Deep Research Agents?
Deep Research Agents (DRAs) aim to automatically produce analyst-level reports through iterative information retrieval and synthesis. However, most existing DRAs were validated on question-answering b…
Measuring Agents in Production
AI agents are already operating in production across many industries, yet there is limited public understanding of the technical strategies that make deployments successful. We present the first large…
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning
Existing Large Reasoning Models (LRMs) have shown the potential of reinforcement learning (RL) to enhance the complex reasoning capabilities of Large Language Models (LLMs). While they achieve remarka…
Researchy Questions: A Dataset of Multi-Perspective, Decompositional Questions for LLM Web Agents
Existing question answering (QA) datasets are no longer challenging to most powerful Large Language Models (LLMs). Traditional QA benchmarks like TriviaQA, NaturalQuestions, ELI5 and HotpotQA mainly s…
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Efficiently acquiring external knowledge and up-to-date information is essential for effective reasoning and text generation in large language models (LLMs). Prompting advanced LLMs with reasoning cap…
Single-Agent LLMs Outperform Multi-Agent Systems on Multi-Hop Reasoning Under Equal Thinking Token Budgets
Recent work reports strong performance from multi-agent LLM systems (MAS), but these gains are often confounded by increased test-time computation. When computation is normalized, single-agent systems…
Single-agent or Multi-agent Systems? Why Not Both?
Multi-agent systems (MAS) decompose complex tasks and delegate subtasks to different large language model (LLM) agents and tools. Prior studies have reported the superior accuracy performance of MAS a…
Towards a Science of Scaling Agent Systems
Agents, language model (LM)-based systems that are capable of reasoning, planning, and acting are becoming the dominant paradigm for real-world AI applications. Despite this widespread adoption, the p…

No results.