INQUIRING LINE

Why does decomposition ability transfer across domains but solving ability does not?

This explores why the skill of breaking a problem into steps (decomposition) seems to be portable between fields like math, web tasks, and medicine, while the skill of actually executing each step (solving) stays stubbornly tied to its home domain.


This explores why the *planning* layer of reasoning travels well across domains while the *execution* layer doesn't — and the corpus suggests the answer is that decomposition is a structural skill while solving is a knowledge-bound one. The clearest evidence comes from work that physically splits the two: when a separate decomposer model hands subproblems to a separate solver, the decomposition ability transfers to new domains but the solving ability doesn't, and pulling them apart also stops planning and execution from interfering with each other Does separating planning from execution improve reasoning accuracy?. The split isn't just an engineering convenience — it tracks a real division in what each skill *is*.

Decomposition transfers because it's compositional structure, and that structure turns out to be domain-agnostic. Pruning studies show networks naturally implement sub-tasks as isolated, reusable subnetworks, and pretraining makes that modular wiring more consistent across architectures and domains Do neural networks naturally learn modular compositional structure?. The same shape shows up at the behavioral level: agents that extract reusable *sub-task routines* (rather than memorizing whole tasks) gain the most exactly when the train-test gap is widest Can agents learn reusable sub-task routines from past experience?, and a trained skill *curator* generalizes across different executor backbones and domains while drifting toward strategic meta-skills Can a separate trained curator improve skill libraries better than frozen agents?. Decomposition is essentially a re-usable grammar of 'how to break things down,' and grammar ports.

Solving doesn't transfer because it bottoms out in domain knowledge, not skill. The sharpest counterexample: math-tuned reasoning models fail to beat base models in medicine, because medicine rewards knowing the right fact more than reasoning well — and no amount of reasoning fine-tuning closes that gap without domain-specific data Why doesn't mathematical reasoning transfer to medicine?. So the same problem-solving 'engine' that wins at math stalls in a knowledge-heavy field, because execution requires facts the model simply doesn't have. Decomposition asks 'what are the parts?'; solving asks 'what's the answer to this part?' — and the second question is where domain-specific content lives.

There's a deeper twist worth sitting with: the transferable part may already be latent, and training just *selects* it. Multiple independent methods — RL steering, single-problem critique, decoding tricks, feature steering — all elicit reasoning that's already present in base-model activations rather than installing it new Do base models already contain hidden reasoning ability?, and even a single problem's worth of critique can unlock that structure Can a single problem unlock reasoning through solution critique?. If the decomposition machinery is shared latent structure waiting to be switched on, that explains both why it generalizes so cheaply and why solving — which needs new content poured in — keeps demanding fresh domain data Can reconstructing expert thinking improve reasoning transfer?.

One last wrinkle the corpus hints at: 'solving' may not even be one thing across domains. Structured domains drive output entropy *down* while open-ended ones drive it *up*, so the execution dynamics are mechanically different by domain type — which is part of why a solver tuned in one regime can actively damage performance in another Does training order reshape how models handle different task types?. Decomposition stays above that turbulence; solving is stuck in it.


Sources 9 notes

Does separating planning from execution improve reasoning accuracy?

Modular architectures with separate decomposer and solver models outperform monolithic LLMs, with decomposition ability transferring across domains while solving ability does not. The separation prevents planning-execution interference and produces more generalizable skills.

Do neural networks naturally learn modular compositional structure?

Pruning experiments reveal that neural networks implement compositional subroutines in isolated subnetworks, with ablations affecting only their corresponding function. Pretraining substantially increases the consistency and reliability of this modular structure across architectures and domains.

Can agents learn reusable sub-task routines from past experience?

Agent Workflow Memory induces sub-task routines at finer granularity than full tasks, abstracts example-specific values, and compounds them hierarchically. This produces 24.6% relative gain on Mind2Web and 51.1% on WebArena, with larger gains as train-test gaps widen.

Can a separate trained curator improve skill libraries better than frozen agents?

SkillOS shows that separating a trainable curator from a frozen executor, grouped by task streams, causes skill repositories to shift from generic verbose additions toward actionable execution logic and cross-task meta-strategies. The trained curator generalizes across different executor backbones and domains.

Why doesn't mathematical reasoning transfer to medicine?

R1-distilled reasoning models fail to outperform base models on medical tasks because knowledge accuracy matters more than reasoning quality in medicine—the opposite of math. Fine-tuning cannot close this gap without domain-specific training data.

Do base models already contain hidden reasoning ability?

Five independent mechanisms—RL steering, critique fine-tuning, decoding changes, SAE feature steering, and RLVR—all elicit reasoning already present in base model activations. Post-training selects rather than creates reasoning; the bottleneck is elicitation, not capability acquisition.

Can a single problem unlock reasoning through solution critique?

Critique Fine-Tuning achieves reasoning activation comparable to RLVR using only one problem and teacher-generated critiques of varied solutions, with no reinforcement learning. This demonstrates that exposure to correct versus incorrect reasoning on a specific problem is the sufficient activation signal.

Can reconstructing expert thinking improve reasoning transfer?

Training on expert texts augmented with reconstructed thought processes (self-talk, knowledge recall, verification) produces reasoning skills that transfer across domains and adapt depth to problem difficulty, outperforming standard continual pretraining by up to 8 points on hard problems.

Does training order reshape how models handle different task types?

Omni-Thinker shows structured domains decrease output entropy while creative domains increase it. BWT-guided scheduling—training structured tasks first—yields 6.2% gains over joint training by preventing entropy collapse from damaging open-ended capabilities.

Next inquiring lines