INQUIRING LINE

What distinguishes intrinsic search from extrinsic search method approaches?

This explores the dividing line between search that a model runs internally — having absorbed the strategy into its own weights and reasoning — versus search bolted on from the outside as an explicit algorithm (tree search, evolutionary loops) that steers the model from without.


This explores the dividing line between search that a model runs *internally* — having absorbed the strategy into its own weights — versus search that wraps the model from the *outside* as an explicit algorithm. The corpus splits cleanly along this seam, and the more interesting finding is what each side gives up to get its advantage.

The extrinsic camp treats the model as a strong-but-blind move generator and surrounds it with machinery. Mind Evolution runs a genetic algorithm where LLM calls supply mutations and crossovers, and an island model deliberately keeps the population diverse so it doesn't collapse onto one answer too early Can evolutionary search beat sampling and revision at inference time?. AlphaLLM wraps the model in Monte Carlo Tree Search plus critic models, using the tree's own structure to rank solution paths and manufacture reward signals that would otherwise need human annotation Can tree search replace human feedback in LLM training?. The appeal here is control and inspectability — the search lives in code you can read and tune. The most striking version pushes this to a second level: a bilevel system that reads its own inner-loop search code, spots the bottleneck, and writes *new* search mechanisms at runtime, discovering bandit and combinatorial methods that beat the hand-written ones fivefold Can an AI system improve its own search methods automatically?.

The intrinsic camp asks whether the model can just *be* the search. Meta-CoT trains on linearized traces of MCTS and A* so the model learns to run those strategies in its own forward pass, opening the door to optimizing over algorithms themselves rather than single answers Can models learn to internalize search algorithms through training?. Stream of Search goes further by training on the *messy* process — dead ends, backtracking, mistakes serialized into the training string — and gets 25% better solvers than training on clean optimal trajectories alone, precisely because the model builds an internal world model instead of memorizing a fixed external recipe Does training on messy search processes improve reasoning?.

What actually distinguishes the two, the corpus suggests, is what happens to *diversity*. Extrinsic search preserves exploration by construction — the island model, the tree's branching, the outer loop all hold options open. Internalizing search via reinforcement learning quietly destroys that: RL squeezes a search agent's behavioral diversity through the same entropy-collapse mechanism seen in reasoning, with policies converging onto narrow reward-maximizing paths, while plain supervised fine-tuning on diverse demonstrations keeps the breadth alive Does reinforcement learning squeeze exploration diversity in search agents?. So the intrinsic-vs-extrinsic choice is partly a choice about *how* you internalize: imitate diverse traces and breadth survives; reward-optimize and it narrows.

The unexpected payoff is a hard limit on how far internalization can go. There's a class of search — genuine iterative numerical optimization — that models simply *cannot* internalize: they pattern-match a problem to a memorized template and emit plausible-but-wrong numbers rather than actually iterating, a failure that persists across scale Do large language models actually perform iterative optimization?. And reasoning-tuned models show no consistent edge on these tasks, because extended chain-of-thought produces more *text*, not more *computation* Do reasoning models actually beat standard models on optimization?. That's the real boundary line: for combinatorial and strategic search, intrinsic methods can absorb the algorithm; for true numerical iteration, the search has to stay extrinsic — in a tool, a solver, a loop the model calls but doesn't pretend to be.


Sources 8 notes

Can evolutionary search beat sampling and revision at inference time?

Mind Evolution uses genetic algorithms with LLM-generated mutations and crossovers to significantly outperform Best-of-N and Sequential Revision on planning benchmarks. An island model sustains population diversity, preventing the premature convergence that single-trajectory refinement exhibits.

Can tree search replace human feedback in LLM training?

AlphaLLM uses tree search outcomes and three critic models to derive dense reward signals equivalent to human-labeled feedback. Tree structure naturally ranks solution paths by success, replacing the annotation oracle that standard RLHF requires.

Can an AI system improve its own search methods automatically?

An outer loop successfully read inner loop code, identified bottlenecks, and generated new Python mechanisms at runtime, discovering combinatorial optimization and bandit methods that broke the inner loop's deterministic patterns and improved performance on GPT pretraining by 5x.

Can models learn to internalize search algorithms through training?

Meta-CoT demonstrates that instruction-tuning on linearized MCTS and A* traces teaches models to implement search strategies internally. This enables optimization over algorithms themselves rather than specific outputs, potentially unlocking novel reasoning strategies.

Does training on messy search processes improve reasoning?

Stream of Search pretraining, which represents exploration and backtracking as serialized strings, achieves 25% higher accuracy than optimal-trajectory-only training. Models learn internal world models for search and adaptive strategies rather than fixed external methods.

Does reinforcement learning squeeze exploration diversity in search agents?

RL training compresses behavioral diversity in search agents through the same entropy collapse mechanism documented in reasoning—policies converge on narrow reward-maximizing strategies. SFT on diverse demonstrations preserves exploration breadth, suggesting diversity-preservation techniques are essential for RL search scaling.

Do large language models actually perform iterative optimization?

Research shows LLMs cannot perform iterative procedures in latent space. They recognize optimization problems as template-similar and emit plausible-looking but incorrect values, a failure mode that persists across model scale and training approaches.

Do reasoning models actually beat standard models on optimization?

Reasoning variants with extended CoT show no consistent advantage over standard models on constraint-bound numerical tasks like optimal power flow. Extended thinking produces more text, not more iterative computation, suggesting the bottleneck is numeric procedure rather than reasoning steps.

Next inquiring lines