LLM Reasoning and Architecture Reinforcement Learning for LLMs

How should we categorize different test-time scaling approaches?

Test-time scaling research spans multiple strategies for improving model performance at inference. Understanding how these approaches differ—and how they relate—helps researchers and practitioners choose the right method for their constraints.

Note · 2026-02-20 · sourced from Test Time Compute
How should we allocate compute budget at inference time?

Every test-time scaling approach belongs to one of two categories:

Internal and external TTS are complementary, not competing: internal TTS makes models better reasoners; external TTS extracts more performance from whatever reasoning capability exists. Combining them (e.g., using Best-of-N to boost a long-CoT model with a PRM) often outperforms either alone.

The practical distinction matters for deployment: internal scaling is a training cost paid once; external scaling is an inference cost paid per query. The economics push toward internal scaling at scale, but external scaling remains essential during development when training is expensive.

The finding that Can non-reasoning models catch up with more compute? illustrates the limits of external TTS alone: you need the internal foundation before external scaling can amplify it.


Source: Test Time Compute

Related concepts in this collection

Concept map
22 direct connections · 198 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

internal vs external tts is the primary taxonomic split in test-time scaling research