AgentRxiv: Towards Collaborative Autonomous Research

Paper · arXiv 2503.18102 · Published March 23, 2025
Deep ResearchAgents Multi ArchitectureCo Writing Collaboration

To address these challenges, we introduce AgentRxiv—a framework that lets LLM agent laboratories upload and retrieve reports from a shared preprint server in order to collaborate, share insights, and iteratively build on each other’s research. We task agent laboratories to develop new reasoning and prompting techniques and find that agents with access to their prior research achieve higher performance improvements compared to agents operating in isolation (11.4% relative improvement over baseline on MATH-500). We find that the best performing strategy generalizes to benchmarks in other domains (improving on average by 3.3%). Multiple agent laboratories sharing research through AgentRxiv are able to work together towards a common goal, progressing more rapidly than isolated laboratories, achieving higher overall accuracy (13.7% relative improvement over baseline on MATH-500). These findings suggest that autonomous agents may play a role in designing future AI systems alongside humans.

In an effort to accelerate the process of scientific discovery, recent work has explored the ability of LLM agents to perform autonomous research (Lu et al. (2024b); Schmidgall et al. (2025); Swanson et al. (2024)). The AI Scientist framework (Lu et al. (2024b)) is a large language model (LLM)-based system that generates research ideas in machine learning, writes research code, run experiments, and produces a scientific paper with using an automated peer-review to evaluate the work. Virtual Lab (Swanson et al. (2024)) uses a multi-agent system of LLM-based experts from different backgrounds (e.g. chemist or biologist) working together with human scientists to produce novel nanobody binders for SARS-CoV-2, where discovered nanobodies demonstrate promising efficacy in wet-lab validations. Finally, Agent Laboratory (Schmidgall et al. (2025)) is a multi-agent autonomous research system that is able to incorporate human feedback, with greatly reduced cost compared to Lu et al. (2024b). While these works demonstrate progress toward accelerated scientific discovery, they often operate in isolation and do not support the continuous, cumulative development of research across time that reflects the nature of science. Therefore, we aim to provide a unified platform that enables agents to build upon the research of other agents.

In this study, we introduce AgentRxiv, an autonomous research collaboration framework that supports LLM agents in generating, sharing, and building upon scientific research. By implementing a centralized, open-source preprint server for autonomous agents, AgentRxiv enables systematic sharing of research findings, allowing agents to cumulatively build on previous work. AgentRxiv also supports parallel research across multiple agentic systems, enabling scalability with available computational resources. When incorporating AgentRxiv, each generation of papers shows measurable improvements. For example, accuracy on the MATH-500 benchmark increased from 70.2% to 78.2% with the best discovered reasoning technique, using gpt-4o mini as the base model.