Can we control personality in language models without prompting?

Can lightweight adapter modules enable continuous, fine-grained control over psychological traits in transformer outputs independent of prompt engineering? This explores whether architecture-level personality modification outperforms prompt-based approaches.

Note · 2026-02-23 · sourced from Psychology Therapy Practice

PsychAdapter modifies the transformer architecture to accept continuous psychological trait scores as input, enabling generation conditioned on personality, mental health, and demographic variables without consuming context window or relying on prompt engineering. The key difference from prior work: trait influence is applied at every transformer layer via a learned dimension expansion, not just at the input level.

Training uses social media and blog posts with estimated psychological scores from an empirically-trained language-based assessment model. The adapter learns how to weight the psychological scores' contribution to each layer alongside standard next-word prediction. The result: fine-grained, continuous control over personality expression. An input vector of (0, 0, +3, 0, 0) generates text characteristic of high extraversion while remaining average on other Big Five dimensions. Any combination is possible, including interactions: high openness with low extraversion produces text that captures both traits simultaneously.

Expert raters evaluated generated text at 87.3% average accuracy for Big Five personalities and 96.7% for depression and life satisfaction. These numbers hold across GPT-2, Gemma (2B), and Llama 3, demonstrating model-agnostic applicability. The total added parameters are less than 0.1% of the base model (55,296 for Gemma 2B vs 2 billion base parameters), making distribution trivial.

Applications extend beyond mental health simulation: customer service training with diverse personalities, crisis worker training with simulated distress levels, machine translation matched to audience education/dialect levels, and research tools that generate coherent text (not isolated words/phrases) for trait analysis. Since Do personality traits activate hidden emoji patterns in language models?, PsychAdapter may be activating these pre-existing trait-language circuits through a more precise mechanism than prompting or full fine-tuning.

Source: Psychology Therapy Practice

Related concepts in this collection

Do personality traits activate hidden emoji patterns in language models? When large language models are fine-tuned on personality traits, do they spontaneously generate emojis that were never in their training data? This explores whether personality adjustment activates latent, pre-existing patterns in model weights.
complementary mechanism; PsychAdapter modifies every layer while neuron-level work identifies specific locations
Can we track and steer personality shifts during model finetuning? This research explores whether personality traits in language models occupy specific linear directions in activation space, and whether we can detect and control unwanted personality changes during training using these geometric directions.
persona vectors operate in activation space; PsychAdapter operates in weight space; both enable personality control
Can open language models adopt different personalities through prompting? Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
PsychAdapter bypasses prompt resistance entirely by operating at the architecture level

Concept map

13 direct connections · 112 in 2-hop network ·medium cluster

Can we control personality in language models wi… Do personality traits activate hidden emoji patter… Can we track and steer personality shifts during m… Can open language models adopt different personali…

Click a node to walk · click center to open · click Open full network for a force-directed map

your link semantically near linked from elsewhere

Original note title

lightweight psychological trait adapters modify every transformer layer with less than 0.1 percent additional parameters — enabling fine-grained psychological profile control independent of prompting