Can steering vectors be combined with other compression techniques?

This reads as asking whether steering vectors — inference-time activation edits that nudge model behavior — can stack with techniques that shrink models or distill knowledge into smaller forms; the corpus has no note dedicated to steering vectors specifically, but it has strong adjacent material on decoding-time interventions and composable compression that answers the underlying composability question.

This explores whether steering vectors compose with compression, and the honest first thing to say is that nothing in this collection is *about* steering vectors by name. But the deeper question — can a lightweight behavioral intervention layer on top of a compression method without the two corrupting each other — is exactly what several notes here circle, just under different vocabulary. The most direct doorway is proxy-tuning Can decoding-time tuning preserve knowledge better than weight fine-tuning?, which works the way steering vectors do: it applies a distributional shift at decoding time while leaving base weights untouched. The interesting finding is *where* the shift lands — proxy-tuning mostly affects reasoning and style, while direct fine-tuning corrupts knowledge stored in lower layers. That's a clue about composability in general: interventions applied at the output end stay decoupled from where knowledge lives, which is precisely why you'd expect them to stack with a compressed model rather than fight it.

The Memory Decoder result Can retrieval knowledge compress into a tiny parametric model? makes the composition concrete. It compresses retrieval knowledge into a small transformer that plugs into *any* LLM through output-distribution interpolation — the same interface a steering vector or proxy-tuner uses. So you have at least two techniques (a decoding-time behavioral shift and a compressed knowledge module) that both operate by interpolating output distributions, which is the cleanest sign they can be combined: they speak the same language and meet at the same layer.

The corpus also suggests *why* combining mechanisms can beat using either alone. The Engram work Can lookup memory and computation work together better than either alone? finds a U-shaped scaling law where balanced allocation to lookup memory and computation outperforms a pure version of either — combination isn't just tolerable, it's optimal, and the gains show up most in reasoning and code rather than raw retrieval. That's a recurring theme: complementary axes (memory vs. computation, knowledge module vs. behavioral steer) tend to add capability that neither captures alone.

There's a cautionary thread too. Compression in these models is lossy in structured ways — LLMs compress concepts more aggressively than humans and shed fine-grained distinctions Do LLMs compress concepts more aggressively than humans do?, and identical accuracy can hide fractured internal representations vulnerable to perturbation Can models be smart without organized internal structure?. A steering vector is itself a perturbation, so the unstated risk in stacking it onto a compressed model is that you're pushing on representations that may already be brittle — the combination could work on the benchmark and break under distribution shift in ways your metrics never show.

The thing you might not have known you wanted to know: the reason output-level interventions tend to compose cleanly with compression isn't an accident of engineering — it's that they touch reasoning and style rather than the lower-layer machinery where knowledge is stored, so they leave the compressed knowledge intact while reshaping behavior on top of it.

Sources 5 notes

Can decoding-time tuning preserve knowledge better than weight fine-tuning?

Proxy-tuning closes 88-91% of the alignment gap while surpassing direct fine-tuning on knowledge tasks by leaving base model weights untouched. Direct fine-tuning corrupts knowledge storage in lower layers, whereas proxy-tuning applies distributional shifts that primarily affect reasoning and style.

Can retrieval knowledge compress into a tiny parametric model?

Memory Decoder successfully compresses kNN-LM retrieval distributions into a small transformer that plugs into any LLM via output interpolation. It preserves long-tail factual knowledge while maintaining semantic coherence, reducing perplexity by 6.17 points across domains.

Can lookup memory and computation work together better than either alone?

Engram combines O(1) N-gram lookup with Mixture-of-Experts routing, revealing a U-shaped scaling law where balanced allocation to both mechanisms outperforms either alone. Gains appear largest in reasoning and code rather than pure retrieval.

Do LLMs compress concepts more aggressively than humans do?

Using Rate-Distortion Theory on cognitive datasets, LLMs capture broad category structure but lose fine-grained distinctions humans preserve. LLMs maximize compression efficiency; humans trade compression for contextual meaning that enables situated action.

Can models be smart without organized internal structure?

Models trained with SGD can contain all the linearly decodable features needed for a task while maintaining fundamentally broken internal organization. This makes them vulnerable to perturbation and distribution shift invisible to standard evaluation metrics.

Can steering vectors be combined with other compression techniques?

Sources 5 notes

Next inquiring lines