Why does adaptation concentrate in low-dimensional subspaces of weights or representations?

This explores why fine-tuning and adaptation seem to need only a tiny slice of a model's parameters or representation space — and what that smallness tells us about what adaptation actually does.

This explores why adaptation concentrates in low-dimensional subspaces — and the corpus points to one recurring answer: adaptation isn't writing new knowledge, it's selecting and reweighting capacity the base model already has, which is a much smaller job than it looks. The clearest demonstration is representation intervention. Instead of updating weights at all, ReFT learns task-specific edits on frozen hidden states, and its low-rank linear variant LoReFT matches or beats LoRA while using 10–50x fewer parameters Can editing hidden representations beat weight updates for finetuning?. If steering a model into a new behavior takes a handful of directions in representation space, that's strong evidence the behavior was already latent — adaptation just needs to find the right low-dimensional handle to pull.

The same story shows up in weight space from a different angle. Transformer² adapts by tuning only the singular values of weight matrices — leaving the singular vectors (the actual directions) fixed — and gets composable expert skills that outperform LoRA with fewer parameters Can models dynamically activate expert skills at inference time?. That's a striking constraint: you're not rotating the model's representational basis, just rescaling along axes it already has. Both results suggest the base model's pretraining lays down a rich set of directions, and a task corresponds to a low-rank pattern of emphasis over them rather than a wholly new structure.

Why would the base model already contain what adaptation needs? Two notes hint at the mechanism. Representational density is *learned* through familiarity during pretraining — models build dense, structured activations for things they've seen a lot of Is representational sparsity learned or intrinsic to neural networks? — and under unfamiliar or hard inputs, hidden states sparsify into a localized, selective filter rather than firing everywhere Do language models sparsify their activations under difficult tasks?. So the model's own representations are already organized into a small active subspace at any given moment. Adaptation that respects this structure has very little it actually needs to move.

The flip side — what happens when you ignore the low-dimensional structure — is just as telling. Direct full fine-tuning corrupts knowledge stored in lower layers, whereas proxy-tuning leaves base weights untouched and applies its shift only at decoding time, preserving knowledge better while still closing most of the alignment gap Can decoding-time tuning preserve knowledge better than weight fine-tuning?. Relatedly, models that drift less from their base distribution (low KL drift) retain far more plasticity for the *next* task, while parameter-heavy updates stall when the domain shifts Does staying close to the base model preserve learning ability?. The lesson is that forgetting is a misallocation problem, not an inherent cost — Fast-Slow Training makes this explicit by routing most task-specific learning into fast textual context and keeping weight updates minimal, reaching the same performance faster with less catastrophic forgetting Can splitting adaptation into two channels reduce forgetting?.

Put together, the corpus reframes the question. Adaptation concentrates in low-dimensional subspaces not because of a clever trick, but because that's the honest size of the task: the heavy lifting happened in pretraining, and good adaptation is a small, surgical selection over existing capacity. The interesting tension is that low-dimensionality is also where interpretability lives — forcing weight sparsity yields disentangled circuits where neurons map to simple concepts Can sparse weight training make neural networks interpretable by design? — which suggests the subspaces adaptation prefers and the subspaces humans can read may be closer to the same thing than we assumed.

Sources 8 notes

Can editing hidden representations beat weight updates for finetuning?

ReFT learns task-specific interventions on frozen model representations rather than updating weights, with LoReFT (low-rank linear subspace variant) dramatically outperforming LoRA across reasoning, instruction-following, and NLU benchmarks while using far fewer parameters.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Is representational sparsity learned or intrinsic to neural networks?

During pretraining, neural networks develop dense activations for familiar training data and default to sparse representations for unfamiliar inputs. This trend emerges without task-specific fine-tuning and reflects how models consolidate knowledge through exposure.

Do language models sparsify their activations under difficult tasks?

As task difficulty increases, LLM hidden states become substantially sparser in a localized, systematic way that correlates with task unfamiliarity and reasoning load. This sparsification acts as a selective filter stabilizing performance under OOD shift rather than a failure mode.

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Does staying close to the base model preserve learning ability?

FST-trained models stay up to 70% closer to their base distribution than parameter-only RL, and this reduced drift preserves the model's ability to learn subsequent tasks effectively. Parameter-only approaches stall when task domains change, while low KL drift enables sustained adaptation.

Can splitting adaptation into two channels reduce forgetting?

Fast-Slow Training routes task-specific lessons into optimized prompts while keeping parameter updates minimal, reaching equivalent performance 1.4–3x faster with substantially less catastrophic forgetting and plasticity loss, demonstrating that forgetting is a misallocation problem rather than an inherent cost.

Can sparse weight training make neural networks interpretable by design?

Training transformers with sparse weights creates compact, human-interpretable circuits where neurons correspond to simple concepts with clear connections. Ablation studies confirm these circuits are necessary and sufficient for task performance, though scaling beyond tens of millions of parameters while maintaining interpretability remains unsolved.

Why does adaptation concentrate in low-dimensional subspaces of weights or representations?

Sources 8 notes

Next inquiring lines