INQUIRING LINE

How does training data distribution create asymmetric competence across relation types?

This explores how what's abundant (and what's missing) in training data leaves a model lopsided — fluent at some kinds of tasks or knowledge and unreliable at others — and reads 'relation types' broadly as the different categories of competence (procedural vs. factual, structured vs. open-ended, in-distribution vs. out-of-distribution) that training distributes unevenly.


This explores how the shape of training data hands a model uneven competence — strong at the kinds of tasks its data rewards, weak at the kinds it underrepresents. The corpus keeps finding the same shape from different angles, and the sharpest version is the split between knowing-how and knowing-that. One analysis of five million pretraining documents shows reasoning draws on broad, transferable procedural knowledge spread across many sources, while factual recall depends on narrow, document-specific memorization — so a model can be genuinely good at *how to do* a class of problems while being brittle on *the specific facts* a problem needs Does procedural knowledge drive reasoning more than factual retrieval?. The asymmetry isn't random; it tracks how each kind of competence is represented in the data.

The same lopsidedness shows up the moment you push past the training distribution. Chain-of-thought reasoning degrades predictably under shifts in task, length, or format — models reproduce the *form* of reasoning they saw without the underlying logic, so competence falls off a cliff exactly where the data thins out Does chain-of-thought reasoning actually generalize beyond training data?. What looks like general skill is often distribution-bounded fluency. And different domains don't even shift the model in the same direction: structured tasks (math, code) drive output entropy *down* while creative tasks drive it *up*, so training them together lets the structured domains' entropy collapse quietly damage open-ended capability — competence in one relation type actively erodes another unless you sequence them deliberately Does training order reshape how models handle different task types?.

The interesting twist is that the raw capability is often already there — the asymmetry is in what gets *elicited*, not what exists. Base models carry latent reasoning that minimal training merely selects and surfaces rather than creates Do base models already contain hidden reasoning ability?. So when a model is incompetent at some relation type, it can mean the data never taught the eliciting move, not that the ability is absent. Post-training choices then sculpt which competences come forward — and can make the asymmetry worse. Training on near-impossible problems teaches degenerate shortcuts that contaminate previously sound capabilities Do overly hard RLVR samples actually harm model capabilities?, and richer teacher context produces confident, concise traces that students inherit while losing the epistemic caution needed out-of-distribution — buying in-domain polish at the cost of generalization Does richer teacher context hurt student generalization?.

The thread worth pulling: competence asymmetry isn't only about facts the model never saw. It's also about *which interactional moves* the training reward distribution suppresses. Preference optimization tuned for single-turn helpfulness rewards confident answers over clarifying questions, cutting grounding behaviors 77.5% below human levels — so models stay competent at sounding helpful while quietly losing the relation type (multi-turn, mutual understanding) that the reward never priced in Does preference optimization harm conversational understanding?. Across all these, the lesson is the same one most users never expect: a model's strengths and blind spots are a fairly direct readout of what its training distribution over-weighted and what it left out — and you can often predict where it will fail before you ever test it.


Sources 7 notes

Does procedural knowledge drive reasoning more than factual retrieval?

Analysis of 5 million pretraining documents shows reasoning relies on broad, transferable procedural knowledge from diverse sources, unlike factual recall which depends on narrow, document-specific memorization of target facts.

Does chain-of-thought reasoning actually generalize beyond training data?

DataAlchemy experiments show CoT fails systematically under distributional shifts in task, length, and format. Models produce fluent but logically inconsistent reasoning — imitating reasoning form without valid underlying logic.

Does training order reshape how models handle different task types?

Omni-Thinker shows structured domains decrease output entropy while creative domains increase it. BWT-guided scheduling—training structured tasks first—yields 6.2% gains over joint training by preventing entropy collapse from damaging open-ended capabilities.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Do overly hard RLVR samples actually harm model capabilities?

Training on nearly-impossible problems causes models to learn degenerate shortcuts rather than genuine reasoning, and these shortcuts contaminate pre-existing capabilities. Group-relative normalization treats rare accidental successes as high-advantage trajectories, reinforcing answer repetition and computation-skipping instead of sound reasoning patterns.

Does richer teacher context hurt student generalization?

Teachers conditioned on correct answers and verifier output produce confident, concise traces that students inherit. This style suppresses uncertainty expression, optimizing in-domain performance while degrading generalization to out-of-distribution problems that require epistemic caution.

Does preference optimization harm conversational understanding?

RLHF optimizes models for single-turn helpfulness by rewarding confident responses over clarifying questions and understanding checks. This preference alignment systematically reduces grounding acts by 77.5% below human levels, creating an alignment tax where models appear helpful but fail silently in multi-turn contexts.

Research prompt for your LLMexpand ↓

Copy into ChatGPT or Claude to take this line of inquiry further — it asks the model to find newer work and re-test which earlier constraints still hold.

You are a research analyst re-testing claims about how training data distribution creates asymmetric competence across relation types in LLMs. The question remains open; treat the findings below as dated.

What a curated library found — and when (findings span 2023–2026, perishable claims):
• Procedural knowledge (reasoning) transfers broadly across sources; factual recall depends on narrow, document-specific memorization — so knowing-how generalizes while knowing-that brittles (2024–25).
• Chain-of-thought reasoning degrades predictably when task, length, or format shift out-of-distribution; models reproduce form without underlying logic (2025).
• Structured domains (math, code) collapse output entropy during multi-task training while creative tasks expand it; training together lets structured domains quietly damage open-ended capability unless sequenced deliberately (2025).
• Base models carry latent reasoning; minimal training elicits rather than creates it. Post-training choices sculpt which competences surface — richer teacher context produces confident, concise traces students inherit while losing epistemic caution needed out-of-distribution (2025–26).
• Preference optimization tuned for single-turn helpfulness suppresses clarifying questions 77.5% below human levels, eroding multi-turn mutual-understanding competence (2026).

Anchor papers (verify; mind their dates):
• arXiv:2411.12580 (2024) — Procedural Knowledge in Pretraining Drives Reasoning
• arXiv:2508.01191 (2025) — Chain-of-Thought Reasoning as Data Distribution Mirage
• arXiv:2507.14783 (2025) — Omni-Thinker: Multi-Task RL entropy dynamics
• arXiv:2605.28388 (2026) — Sample Difficulty in RLVR and Degenerate Shortcuts

Your task:
(1) RE-TEST EACH CONSTRAINT. For procedural vs. factual split, multi-task entropy collapse, latent capability, and preference-optimization suppression: has newer model scaling, synthetic data, curriculum design, or architectural innovation (e.g., mixture-of-experts, adaptive routing) since RELAXED or OVERTURNED these limits? Separate durable asymmetry (data-induced selection) from perishable limitation (specific training regime). Cite what resolved it.
(2) Surface the strongest CONTRADICTING or SUPERSEDING work from the last ~6 months — especially any showing preference optimization recovers grounding, or multi-task training no longer collapses competence, or factual recall transfers like procedural knowledge under new conditions.
(3) Propose 2 research questions assuming the regime may have moved: (a) Can adaptive weighting of task losses during RLVR prevent entropy collapse and preserve open-ended competence? (b) Does instruction-tuning on explicitly multi-turn, clarification-rich data restore grounding without sacrificing single-turn helpfulness?

Cite arXiv IDs; flag anything you cannot ground in a real paper.

Next inquiring lines