Reinforcement Learning for LLMs LLM Reasoning and Architecture

Do search steps follow the same scaling rules as reasoning tokens?

Exploring whether the overthinking curve observed in reasoning models also appears in deep research agents. This matters because it could reveal universal scaling laws governing all inference-time compute.

Note · 2026-02-21 · sourced from Deep Research

Writing angle — Medium/LinkedIn post.

Hook: The overthinking papers showed that more reasoning tokens helps — until it doesn't. Now the same curve is showing up in a completely different place: search. Deep research agents improve with more search budget following the same monotonic-then-degrading relationship. Scaling laws aren't just for training anymore. They're for every inference loop.

The claim: Test-time scaling generalizes from single-query reasoning to multi-step retrieval. The "search budget law" (Agentic Deep Research paper) shows that answer quality scales with search steps in a way that mirrors the relationship between reasoning quality and thinking tokens.

Why it matters:

It means inference-compute optimization now has two levers: reasoning budget and search budget. The old question was "how many tokens should we think?" The new question is "how many retrieval rounds should we run, and how much reasoning per round?"
It raises the same ceiling question: if reasoning has an overthinking threshold, does search? ASearcher's turn-limit finding suggests yes — unrestricted per-turn reasoning in iterative search loops degrades iterative quality, which means the search version of overthinking exists too.
It reframes DR quality as an infrastructure decision as much as a model decision. A weaker model with more search budget can match a stronger model with a smaller one.

The synthesis: Does search budget scale like reasoning tokens for answer quality? + Does limiting reasoning per turn improve multi-turn search quality? together make the full argument: search has its own TTS curve, it follows similar shape, and it has its own overthinking variant.

Source: Deep Research

Related concepts in this collection

Does search budget scale like reasoning tokens for answer quality? Explores whether the test-time scaling law that applies to reasoning tokens also governs search-based retrieval in agentic systems. Understanding this relationship could reshape how we allocate inference compute between thinking and searching.
grounds this angle
Does more thinking time actually improve LLM reasoning? The intuition that extended thinking helps LLMs reason better seems obvious, but what does the empirical data actually show when we test it directly?
extends: search faces the same assumption; the search budget law makes it empirically testable in the retrieval domain
Does limiting reasoning per turn improve multi-turn search quality? When language models engage in iterative search cycles, does capping reasoning at each turn—rather than just total compute—help preserve context for subsequent retrievals and improve overall search effectiveness?
provides the nuance: budget matters but so does per-turn allocation

Concept map

13 direct connections · 114 in 2-hop network ·medium cluster

Do search steps follow the same scaling rules as… Does search budget scale like reasoning tokens for… Does more thinking time actually improve LLM reaso… Does limiting reasoning per turn improve multi-tur…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

the search budget law — why deep research agents follow the same scaling rules as reasoning models