LLM Reasoning and Architecture Reinforcement Learning for LLMs

Why doesn't mathematical reasoning transfer to medicine?

Can models trained to reason well about math apply those skills to medical domains through fine-tuning? This explores whether reasoning ability is truly domain-agnostic or constrained by domain-specific knowledge requirements.

Note · 2026-02-21 · sourced from Domain Specialization
How do you build domain expertise into general AI models? What kind of thing is an LLM really? How should researchers navigate LLM reasoning research?

The assumption behind porting reasoning-capable models to specialized domains is that reasoning ability transfers — that a model trained to reason well about mathematics can be steered toward medical reasoning through fine-tuning. The Knowledge or Reasoning paper falsifies this assumption with a specific mechanism.

R1-distilled models — fine-tuned variants of strong base models specifically trained to produce long reasoning chains — do not outperform base models on medical benchmarks when evaluated with domain-specific metrics (KI/InfoGain). The general reasoning capabilities that make R1-distilled models effective on mathematical tasks do not transfer to the medical domain via either SFT or RL. The limiting factor is domain knowledge, not reasoning architecture.

The mechanism is clarified by the KI/InfoGain framework. In medical tasks, knowledge accuracy (KI) correlates more strongly with final accuracy than reasoning step informativeness (InfoGain) across four of five benchmarks. Mathematical reasoning has the inverse pattern: reasoning quality matters more than factual knowledge retrieval. These are different competency regimes. A model optimized for one regime cannot import its advantages to the other.

This is distinct from Can non-reasoning models catch up with more compute?, which is about inference-compute regime differences within the same training framework. That finding says you can't close the gap by adding more inference-time compute. This finding says you can't close the gap by fine-tuning either — the gap is in the underlying domain knowledge, which fine-tuning on the wrong type of reasoning traces cannot supply.

The practical implication for domain AI deployment: a strong general reasoning model is not a substitute for domain-specific training data. In knowledge-intensive domains, the ceiling is what the model knows, not how it reasons. Systems that assume general reasoning strength translates to domain-specific reliability will be overconfident about their actual performance. Does supervised fine-tuning actually improve reasoning quality? adds that even when SFT improves accuracy in domain tasks, the reasoning quality may degrade — compounding the problem.


Source: Domain Specialization

Related concepts in this collection

Concept map
15 direct connections · 170 in 2-hop network ·dense cluster

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere
Original note title

general reasoning does not transfer to knowledge-intensive domains via sft due to domain knowledge gaps