Does weight decay directly cause contractive behavior near training examples?

This explores whether weight decay — the L2 penalty added during training — is itself the mechanism that makes a model collapse onto, or become locally rigid around, the examples it was trained on, versus that contractive behavior arising from other forces.

This reads the question as asking whether weight decay is the *direct cause* of a model becoming contractive — locally flattened or rigid — near its training data. Up front: the corpus doesn't contain a note that isolates weight decay and tests it as the causal lever, so a clean yes/no isn't available here. What the corpus does offer is a striking lateral reframing: several of the strongest results show contraction-like behavior emerging *without any explicit regularizer at all*, which complicates the premise that weight decay is doing the work.

The sharpest example is the finding that RL fine-tuning updates only 5–30% of parameters in sparse-but-nearly-full-rank subnetworks, and does so with no explicit regularization term in the loss Does reinforcement learning update only a small fraction of parameters?. That those updates are nearly identical across random seeds points to structural pressure baked into the optimization dynamics themselves — not into a decay penalty. If you're asking whether a regularizer is *required* to get tight, concentrated, low-dimensional changes near the data, this says no: the geometry shows up on its own.

The corpus is also rich on the broader phenomenon of training pressure pulling a model inward toward a narrow region — which is the behavior people often attribute to over-regularization. RL post-training converges onto a single dominant pretraining format and suppresses the alternatives within the first epoch Does RL training collapse format diversity in pretrained models?; positive-only reinforcement degrades diversity by concentrating probability mass, while negative reinforcement preserves it Does negative reinforcement alone outperform full reinforcement learning?; and SFT-then-RL on divergent expert data ends in an overfit phase after readaptation Why does SFT-then-RL training follow a predictable three-phase pattern?. None of these collapses is driven by weight decay — they're driven by the reward signal, the data mix, and the update rule. That's the lateral takeaway: contraction near training examples in these systems is mostly a *dynamics* story, not a *penalty* story.

There's also a counter-lever worth knowing about, because it's the closest thing the corpus has to a knob that governs how far a model is allowed to move: KL drift from the base model. Keeping drift low (staying close to the base distribution) preserves plasticity and the ability to keep learning, whereas large parameter-only moves cause models to stall when the domain shifts Does staying close to the base model preserve learning ability?. This is the inverse framing of your question: rather than weight decay forcing local rigidity, *too much* unconstrained movement is what destroys adaptability — and a soft constraint toward the base is what keeps the model supple. So if you came looking for 'is the regularizer the villain,' the corpus gently flips it: the more documented failure mode is uncontrolled drift, with structured contraction often emerging on its own.

Sources 5 notes

Does reinforcement learning update only a small fraction of parameters?

Across seven RL algorithms and ten LLM families, RL induces intrinsic parameter sparsity of 5–30% without explicit regularization. Critically, these sparse updates are nearly full-rank and nearly identical across random seeds, indicating structural rather than arbitrary parameter selection.

Does RL training collapse format diversity in pretrained models?

Controlled experiments show RL consistently amplifies one format distribution from pretraining within the first epoch while collapsing alternatives. The winning format depends on model scale, not necessarily performance, and is largely hidden when starting from proprietary pretrained models.

Does negative reinforcement alone outperform full reinforcement learning?

Training with only negative samples consistently improves Pass@k across the spectrum, often matching full PPO and GRPO. Negative reinforcement suppresses incorrect trajectories while preserving diversity, whereas positive-only reinforcement degrades higher-k performance by concentrating probability mass.

Why does SFT-then-RL training follow a predictable three-phase pattern?

CHORD identifies three distinct training phases: initial capability disruption from policy shift, readaptation to expert patterns, then overfitting. Dynamically weighting SFT as an auxiliary objective within on-policy RL resolves this progression and improves stability.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Does weight decay directly cause contractive behavior near training examples?

Sources 5 notes

Next inquiring lines