How much does training data presentation format shape reasoning ability?
This explores whether *how* training data is laid out — multiple-choice vs. free-form, the shape of worked examples, output formatting — shapes a model's reasoning more than *what* the data is about, and the corpus suggests format is a surprisingly dominant lever.
This explores whether the *format* of training data — its presentation, not its subject matter — drives how a model reasons. The corpus comes down hard on one side: format matters enormously, often more than content. The cleanest result is that training format shapes a model's reasoning *strategy* about 7.5 times more strongly than the domain it was trained on — models fed multiple-choice data learn to explore broadly (breadth-first), while free-form data produces depth-first reasoning, with effect sizes large enough that presentation, not topic, sets the reasoning style Does training data format shape reasoning strategy more than domain?. A related thread argues that small models can become competitive reasoners largely by learning *output format* rather than absorbing new knowledge: a 1.5B model with cheap LoRA tuning matched much larger RL-trained models, implying that a lot of what looks like 'learning to reason' is really learning how to organize an answer Can small models reason well by just learning output format?.
The unsettling implication is that models may absorb the *form* of reasoning without the substance. Logically invalid chain-of-thought exemplars perform nearly as well as valid ones — it's the structural shape of the reasoning trace, not its logical correctness, that delivers the gains Does logical validity actually drive chain-of-thought gains?. Push that further and the formatting starts to look like imitation: chain-of-thought reasoning degrades predictably the moment you shift task, length, or format away from the training distribution, producing fluent-but-broken logic — the model learned to mimic a presentation pattern, not to infer Does chain-of-thought reasoning actually generalize beyond training data?.
But there's a deeper countercurrent worth knowing about: if format only teaches *form*, where does real reasoning come from? Two notes suggest the capability is already latent. Base models appear to contain reasoning ability that minimal training merely *selects* rather than creates — five independent techniques all elicit reasoning that was already sitting in the activations Do base models already contain hidden reasoning ability?. And at the pretraining scale, what actually drives reasoning generalization is exposure to broad *procedural* knowledge — documents that show how problems get solved — as opposed to the narrow document-specific memorization behind factual recall Does procedural knowledge drive reasoning more than factual retrieval?. So 'presentation format' may be powerful precisely because it's an elicitation knob: it decides which already-present capability gets switched on and how it's expressed, more than it installs new skill.
This reframing is reinforced by how separable knowledge and reasoning turn out to be in the network itself. Knowledge tends to live in lower layers and reasoning adjustment in higher ones, which is why reasoning-style training can sharpen math while degrading knowledge-heavy domains like medicine — the format you train toward biases *which* part of the model you're reshaping Why does reasoning training help math but hurt medical tasks?. Format effects even show up at inference: reasoning verbosity is a single steerable direction in activation space, so you can compress a chain-of-thought by two-thirds without retraining at all Can we steer reasoning toward brevity without retraining?.
The honest synthesis: presentation format is one of the strongest levers we have over *how* a model reasons — its strategy, verbosity, and apparent competence — but the corpus warns this is largely about shaping and eliciting capability rather than creating it. The thing you didn't know you wanted to know is that a model trained on the 'right' format can look like it reasons well while having learned only the choreography — and the failure shows up the instant you step outside the distribution it was formatted on.
Sources 8 notes
Models trained on multiple-choice data adopt breadth-first exploration (Cohen's d up to 1.5), while free-form training produces depth-first reasoning. Format effect dwarfs domain effect, meaning presentation matters far more than content type.
A 1.5B parameter model with LoRA-only post-training matched larger full-parameter RL models on reasoning tasks, suggesting RL teaches output format organization rather than new factual knowledge. This efficiency indicates reasoning and knowledge storage are separable capabilities.
Illogical chain-of-thought exemplars matched valid CoT performance on BIG-Bench Hard, showing that structural properties—not logical validity—drive the gains. The model learns the form of reasoning, not genuine inference.
DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.
Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.
Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.
Two-phase inference model shows knowledge retrieval operates in lower network layers while reasoning adjustment happens in higher layers. This separation explains why reasoning training improves math but can degrade knowledge-intensive domains like medicine.
Activation-Steered Compression extracts a single vector from 50 paired examples to reduce chain-of-thought length by 67% while maintaining accuracy and achieving 2.73x speedup. The method is training-free and generalizes across model sizes and domains.