Reinforcement Learning for LLMs LLM Reasoning and Architecture

Do search steps follow the same scaling rules as reasoning tokens?

Exploring whether the overthinking curve observed in reasoning models also appears in deep research agents. This matters because it could reveal universal scaling laws governing all inference-time compute.

Note · 2026-02-21 · sourced from Deep Research

Writing angle — Medium/LinkedIn post.

Hook: The overthinking papers showed that more reasoning tokens helps — until it doesn't. Now the same curve is showing up in a completely different place: search. Deep research agents improve with more search budget following the same monotonic-then-degrading relationship. Scaling laws aren't just for training anymore. They're for every inference loop.

The claim: Test-time scaling generalizes from single-query reasoning to multi-step retrieval. The "search budget law" (Agentic Deep Research paper) shows that answer quality scales with search steps in a way that mirrors the relationship between reasoning quality and thinking tokens.

Why it matters:

  1. It means inference-compute optimization now has two levers: reasoning budget and search budget. The old question was "how many tokens should we think?" The new question is "how many retrieval rounds should we run, and how much reasoning per round?"
  2. It raises the same ceiling question: if reasoning has an overthinking threshold, does search? ASearcher's turn-limit finding suggests yes — unrestricted per-turn reasoning in iterative search loops degrades iterative quality, which means the search version of overthinking exists too.
  3. It reframes DR quality as an infrastructure decision as much as a model decision. A weaker model with more search budget can match a stronger model with a smaller one.

The synthesis: Does search budget scale like reasoning tokens for answer quality? + Does limiting reasoning per turn improve multi-turn search quality? together make the full argument: search has its own TTS curve, it follows similar shape, and it has its own overthinking variant.


Source: Deep Research

Related concepts in this collection

Concept map
13 direct connections · 114 in 2-hop network ·medium cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

the search budget law — why deep research agents follow the same scaling rules as reasoning models