How do trait adapters interact with different base model architectures?

This explores whether lightweight 'trait adapters' — small modules that inject personality or behavior into a model — behave the same way across different base architectures, or whether the architecture underneath changes what's possible.

This explores whether the small modules used to inject traits (personality, behavior) into a language model carry over cleanly across architectures, or whether each base model reshapes what works. The corpus splits in a way that's worth sitting with: the same idea — 'nudge the model's traits without retraining it' — generalizes beautifully in one approach and falls apart completely in another, and the difference is *where* the trait lives.

The optimistic case is PsychAdapter Can we control personality in language models without prompting?, which threads a tiny adapter (under 0.1% extra parameters) through every transformer layer and hits high accuracy on personality and well-being traits across GPT-2, Gemma, and Llama 3 alike. Because it operates at the architecture level — on the layer stack every transformer shares — it travels across model families and even bypasses prompt resistance. The shared skeleton of the transformer is exactly what makes the adapter portable.

Now the cautionary mirror: subliminal trait transmission Can language models transmit hidden behavioral traits through unrelated data?, where a trait spreads through data that has no semantic link to it. That effect is *model-specific* and breaks across different architectures. The mechanism isn't a clean structural hook — it's a statistical signature baked into a particular model's weights, so a different base model doesn't 'hear' it. Put the two papers side by side and you get the real answer to the question: an adapter that targets shared structure crosses architectures; a trait that rides on idiosyncratic statistical fingerprints does not.

That framing connects to a deeper map of *how much* you can change a model given your access to it Does model access level determine which specialization techniques work? — black-box methods can only activate what's already there, while white-box methods (where layer-level adapters live) can inject genuinely new behavior but risk over-specialization. The portability of a trait adapter is really a question of which tier it operates in. And the more sophisticated cousins of trait adapters lean into architecture rather than fighting it: Transformer² Can models dynamically activate expert skills at inference time? tunes only the singular values inside weight matrices to make composable 'expert vectors' that mix at inference, while SoftCoT Can continuous reasoning avoid forgetting in instruction-tuned models? freezes the backbone entirely and bolts on a small auxiliary module — both treat the base model's structure as fixed scaffolding to attach to, not something to overwrite.

The thing you might not have known you wanted to know: interference is the hidden cost of stacking traits. Core-parameter isolation work Can isolating task-specific parameters prevent multi-task fine-tuning interference? shows that multiple adaptations collide unless you explicitly identify and freeze each one's 'core' parameter region. So 'how trait adapters interact with architectures' isn't only about whether one adapter ports across models — it's about whether several adapters can coexist inside *one* architecture without scrambling each other. The architecture isn't a neutral container; it's a contested space.

Sources 6 notes

Can we control personality in language models without prompting?

PsychAdapter modifies every transformer layer with <0.1% additional parameters to achieve 87.3% Big Five accuracy and 96.7% depression/life satisfaction accuracy across GPT-2, Gemma, and Llama 3. This architecture-level approach bypasses prompt resistance entirely.

Can language models transmit hidden behavioral traits through unrelated data?

Research demonstrates that behavioral traits propagate between models via filtered data bearing no semantic relationship to the trait. The effect is model-specific, fails across different architectures, and persists despite rigorous filtering—indicating the mechanism embeds statistical signatures rather than semantic content.

Does model access level determine which specialization techniques work?

Three tiers of access—black-box, grey-box, and white-box—create a hierarchy of specialization power. Black-box techniques can only activate existing knowledge; white-box methods can inject new knowledge but risk over-specialization.

Can models dynamically activate expert skills at inference time?

Transformer2 demonstrates that tuning only singular values within weight matrices produces composable expert vectors that dynamically mix at inference without interference, outperforming LoRA with fewer parameters and enabling continual specialization.

Can continuous reasoning avoid forgetting in instruction-tuned models?

SoftCoT avoids catastrophic forgetting by keeping the main LLM frozen while delegating soft thought generation to a small auxiliary model. This architectural separation maintains pre-trained knowledge while enabling continuous reasoning.

Can isolating task-specific parameters prevent multi-task fine-tuning interference?

Research shows that identifying core parameter regions per task, clustering overlapping tasks, and freezing core parameters while geometrically merging non-core parameters consistently outperforms standard multi-task fine-tuning. Temporal task scheduling alone proves insufficient without explicit structural parameter isolation.

How do trait adapters interact with different base model architectures?

Sources 6 notes

Next inquiring lines