Why does training data format shape reasoning strategy more than domain content?

This explores why *how* training data is presented — multiple-choice vs. free-form, the shape of the examples — does more to set a model's reasoning style than *what* the data is about, and what that reveals about where reasoning actually lives in a model.

This explores why the *format* of training data shapes a model's reasoning strategy more than the *subject matter* it covers. The headline result is stark: presentation outweighs content by roughly 7.5 to 1. Models trained on multiple-choice data learn to fan out across options — a breadth-first sweep — while free-form training pushes them to follow one line of thought deeply, depth-first (Does training data format shape reasoning strategy more than domain?). The effect size is large enough (Cohen's d up to 1.5) that the topic the model is reasoning about barely registers next to the shape of the examples it saw.

The reason this happens becomes clearer once you stop thinking of training as *teaching* reasoning and start thinking of it as *selecting* a reasoning strategy the model already has. A growing body of the corpus argues that base models already contain latent reasoning ability — RL, critique fine-tuning, decoding tweaks, and feature steering all elicit behavior that was already present in the activations rather than building it from scratch (Do base models already contain hidden reasoning ability?). If post-training mostly chooses *when* to deploy reasoning rather than installing *how* to reason (Does RL post-training create reasoning or just deploy it?), then the format of your data acts like a switch: it tells the model which of its pre-existing strategies to default to. Content can't reshape a capability that was never being built in the first place — but format can flip which one gets selected.

There's a deeper layer in *where* reasoning comes from during pretraining. Analysis of millions of pretraining documents found that reasoning draws on broad, transferable *procedural* knowledge — patterns of how-to-do-things scattered across many sources — whereas factual recall leans on narrow, document-specific memorization (Does procedural knowledge drive reasoning more than factual retrieval?). Because reasoning is procedural, it's the *procedure shown by the format* (sweep the options vs. commit to a chain) that transfers, not the domain facts. This is also why reasoning strategies turn out to be such clean, manipulable objects: verbose vs. concise chains of thought occupy distinct linear directions in activation space, steerable with a single vector and no retraining (Can we steer reasoning toward brevity without retraining?). Strategy is a dial, and format sets where the dial lands.

The flip side is a warning. If format is what's really being learned, then reasoning learned through one format is brittle when the format changes. Chain-of-thought degrades predictably under distribution shifts — in task, length, *and* format — producing fluent prose that imitates the *form* of reasoning without the underlying logic (Does chain-of-thought reasoning actually generalize beyond training data?). Reasoning also collapses just from longer inputs, well below the context limit, in a way that's task-agnostic (Does reasoning ability actually degrade with longer inputs?). Both findings make the same uncomfortable point: a model that adopted a strategy from a format hasn't necessarily internalized valid reasoning — it's pattern-matched the presentation.

The practical takeaway is non-obvious. Domain adaptation methods each have their own sweet spots and hidden costs, and one of the things they quietly degrade is *format flexibility* (How do domain training techniques actually reshape model behavior?). So if you're trying to shape how a model reasons, curating the *shape* of your examples — and the diversity of formats it sees — may matter more than curating the topics. The model already knows how to reason in several ways; your data format is mostly choosing which way it reaches for first.

Sources 8 notes

Does training data format shape reasoning strategy more than domain?

Models trained on multiple-choice data adopt breadth-first exploration (Cohen's d up to 1.5), while free-form training produces depth-first reasoning. Format effect dwarfs domain effect, meaning presentation matters far more than content type.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Does RL post-training create reasoning or just deploy it?

Evidence shows base models already contain reasoning capability in latent form; RL training optimizes deployment timing rather than capability creation. Hybrid models recover 91% of performance gains by routing tokens only, and activation vectors for reasoning strategies pre-exist before any RL.

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Can we steer reasoning toward brevity without retraining?

Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Does reasoning ability actually degrade with longer inputs?

FLenQA shows reasoning accuracy drops from 92% to 68% at just 3000 tokens of padding, far below context window capacity. The degradation is task-agnostic, uncorrelated with language modeling performance, and persists even with chain-of-thought prompting.

How do domain training techniques actually reshape model behavior?

Research shows every adaptation method—from parameter-efficient tuning to knowledge graph curricula—has optimal conditions tied to specific domains. The key finding: visible benefits like performance gains often come with hidden degradation in reasoning faithfulness, capability transfer, and format flexibility.

Why does training data format shape reasoning strategy more than domain content?

Sources 8 notes

Next inquiring lines