How does test-time scaling work for individual research agents?
Can search budget follow the same scaling curves as reasoning tokens in agentic systems? This explores whether deep research exhibits test-time scaling laws similar to reasoning, with implications for inference-compute tradeoffs.