INQUIRING LINE

What makes external diversity more effective than sequential revision steps?

This explores why spreading exploration across many parallel candidates (or pulling in outside signals) tends to beat refining a single answer step-by-step — and what the corpus says is actually going wrong in the sequential case.


This explores why spreading exploration across many parallel candidates — or pulling in outside critique — tends to beat refining a single answer step-by-step. The corpus points to one recurring culprit: **a single trajectory collapses toward its own confidence**, while diversity keeps escape routes open.

The cleanest head-to-head is the planning work where evolutionary search beat both Best-of-N and Sequential Revision Can evolutionary search beat sampling and revision at inference time?. The reason wasn't a smarter operator — it was an island model that *sustains a population* of competing solutions, preventing the premature convergence that single-trajectory refinement falls into. When you revise one draft over and over, you're hill-climbing from one starting point; a diverse population explores multiple basins at once and recombines the good parts. That only works if the underlying model actually emits varied competent answers, which is why training models to maximize solution diversity (rather than converging on one scalar-best answer) unlocks search procedures that an entropy-collapsed policy simply cannot reach Should training maximize diversity when models feed into search?.

But the deeper finding is about *where the diversity or correction comes from*. Revising your own reasoning often backfires: a model revising its own uncertain output tends to amplify confidence in wrong answers rather than fix them — it's the revision *source*, not the act of revising, that determines whether accuracy goes up or down Does revising your own reasoning actually help or hurt?. External critique guides revision toward truth; internal self-assessment polishes errors. That's the same wall pure self-improvement hits: without an outside anchor, models stall on the generation–verification gap, diversity collapse, and reward hacking, and the methods that *do* work quietly smuggle in something external — a past checkpoint, a third-party judge, a tool, a user correction Can models reliably improve themselves without external feedback?.

So "external diversity" wins on two fronts at once. It supplies the *exploration* that sequential refinement narrows away, and it supplies the *independent signal* that self-revision lacks. Critique models make this concrete during training: step-level critique counteracts the tail-narrowing that creeps in over self-training iterations, keeping solution diversity alive instead of letting the model converge prematurely Do critique models improve diversity during training itself?. And diversity isn't just a hedge against failure — optimizing for semantic diversity during RL actively *catalyzes* exploration and produces higher-quality outputs than quality-only training, on math as well as creative tasks Can diversity optimization improve quality during language model training?.

The thing you might not have expected: diversity has limits that mirror the revision problem. In multi-agent ideation, cognitive diversity only helps when the agents actually have domain expertise — diverse-but-ignorant teams underperform a single competent agent, because stimulation without grounding turns into process loss Does cognitive diversity alone improve multi-agent ideation quality?. So the real lesson isn't "more voices beat one voice." It's that effective improvement needs an *external, competent* signal — whether that arrives as a diverse population or an outside critic — and a lone trajectory grinding on its own output supplies neither.


Sources 7 notes

Can evolutionary search beat sampling and revision at inference time?

Mind Evolution uses genetic algorithms with LLM-generated mutations and crossovers to significantly outperform Best-of-N and Sequential Revision on planning benchmarks. An island model sustains population diversity, preventing the premature convergence that single-trajectory refinement exhibits.

Should training maximize diversity when models feed into search?

Vector Policy Optimization trains models to emit varied competent solutions rather than converging to one answer. This unlocks search procedures like evolutionary algorithms to explore and combine modes, solving problems that entropy-collapsed policies cannot reach at all.

Does revising your own reasoning actually help or hurt?

Revision guided by external models improves accuracy, but a model revising its own uncertain output typically amplifies confidence in wrong answers rather than correcting them. The revision source, not the revision act itself, determines the outcome.

Can models reliably improve themselves without external feedback?

Pure self-improvement stalls due to the generation-verification gap, diversity collapse, and reward hacking. Reliable improvement methods succeed by smuggling in external anchors: past model versions, third-party judges, user corrections, or tool feedback.

Do critique models improve diversity during training itself?

Step-level critique in the training loop counteracts tail narrowing and maintains solution diversity across self-training iterations. This training-time benefit—preventing premature convergence—is more fundamental than test-time accuracy gains.

Can diversity optimization improve quality during language model training?

DARLING jointly optimizes for quality and semantic diversity using a learned classifier, finding that diversity rewards catalyze exploration and produce higher-quality outputs than quality-only baselines across both creative and mathematical tasks.

Does cognitive diversity alone improve multi-agent ideation quality?

Multi-agent teams substantially outperform solo ideation, but only when members possess genuine senior knowledge. Diverse teams without expertise underperform even a single competent agent, because cognitive stimulation without expertise triggers process losses instead of insight.

Next inquiring lines