Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses
Search agents are often trained as policies over growing transcripts: the model must decide how to search while also remembering what it has seen, which evidence is useful, which constraints remain open, and which claims have actually been checked. We argue that this formulation puts too much routine state management inside the policy: reinforcement learning is forced to optimize both semantic search decisions and recoverable bookkeeping that the environment can maintain more reliably. We introduce Harness-1, a 20B search agent (retrieval subagent) trained with reinforcement learning inside a stateful search harness. The harness maintains environment-side working memory, including a candidate pool, an importance-tagged curated set, compact evidence links, verification records, compressed and deduplicated observations, and budget-aware context rendering. The policy retains the semantic decisions: what to search, which documents to keep or discard, what to verify, and when to stop. Across eight retrieval benchmarks spanning web, finance, patents, and multi-hop QA, Harness-1 achieves 0.730 average curated recall, outperforming the next strongest open search subagent by +11.4 points and remaining competitive with much larger frontier-model searchers.
Introduction. Search agents are usually described as language models that call retrieval tools: given a question, they issue queries, read returned evidence, decide what is still missing, and return documents to a downstream answerer. This view follows a broad line of work on reasoning-and-acting agents, iterative retrieval, active retrieval, and tool-use training [51, 40, 17, 33, 15]. It captures the visible behavior of the agent, but not the search state that must be built along the way. A successful multi-turn search episode requires the agent to remember which documents have been seen, which candidates are worth keeping, which constraints remain uncovered, which entities connect separate pieces of evidence, and which claims have actually been checked against source text. This distinction becomes especially important for reinforcement learning. Recent work has shown that LLMs can be trained to interact with search engines and retrieval systems through RL, improving query generation, multi-turn search, and downstream retrieval utility [14, 18, 16, 45].
Discussion / Conclusion. We introduced Harness-1, a 20B search agent trained with reinforcement learning inside a stateful harness. The key idea is to separate semantic search decisions from recoverable bookkeeping: the policy decides what to search, curate, verify, and submit, while the harness maintains candidate pools, importance-tagged evidence, verification records, evidence links, and budget-aware context rendering. Across eight benchmarks, Harness-1 achieves the strongest average recall among the open search agents we evaluate and remains competitive with much larger frontier-model searchers. Its gains on held-out transfer benchmarks and component ablations suggest that the harness is not merely an implementation detail, but a central part of what the policy learns to use. These results point to stateful harness design as an important direction for retrieval-agent RL.