What actually changes inside a model during RL training?
RL training modifies only sparse regions of model parameters through suppression of incorrect paths rather than broad capability building. Understanding these mechanics reveals how fine-tuning shapes reasoning and what hidden costs accompany optimization.