How do LLMs identify which personality items matter most for trait inference?
This explores how language models decide which signals — survey items, behaviors, text cues — carry the most weight when inferring someone's personality traits, rather than how they store a persona once known.
This explores how LLMs figure out which personality signals matter most for trait inference. The honest starting point: the corpus doesn't show models doing explicit item-weighting the way a psychometrician would. Instead it suggests something more interesting — models don't rank items so much as compress them into a different representation where the important relationships are already baked in. In zero-shot profiling work, an LLM handed raw Big Five scores writes a natural-language summary that captures *second-order* patterns — how traits interact, not just their levels — and that summary then predicts nine unrelated psychological scales with striking structural alignment (Can language summaries unlock hidden psychological patterns?). The combined summary-plus-score prediction beats either alone, which is the tell: the model isn't selecting a few decisive items, it's surfacing emergent structure that the raw items don't expose on their own.
A second answer lives below the prompt, in the model's weights. Research on persona vectors finds linear directions in activation space that correspond to specific traits like sycophancy or hallucination (Can we track and steer personality shifts during model finetuning?). If a trait is a direction, then "which items matter" becomes "how strongly does this input project onto that direction" — a geometric question, not a checklist. PsychAdapter pushes the same logic into architecture, modifying every transformer layer with under 0.1% extra parameters to hit 87% Big Five accuracy (Can we control personality in language models without prompting?). Both say trait inference is distributed across the network rather than localized to a handful of salient cues.
Here's the cross-current worth knowing: this representational machinery is good at *populations* and shaky at *individuals*. Conditioning an LLM on a specific person's profile across 208,000 participants produced no measurable gain in person-level prediction (Does conditioning LLMs on personal profiles improve prediction?). So whatever items the model is weighting, the weighting generalizes the average and washes out the idiosyncratic. The one place individual-level inference does work is narrative: persona-driven memory retrieval lets a model predict a specific character's choices when fed an expert persona profile plus psychologically relevant retrieved memories (Can LLMs predict character choices from narrative context?). That's a clue about which items matter most — not trait scores, but situated memories that ground the trait in context.
There's also a thumb on the scale the model brings before it sees any items at all. Open LLMs converge on an ENFJ-like default and resist conditioning away from it (Why do open language models converge on one personality type?, Can open language models adopt different personalities through prompting?). So "which items matter" is never asked on a blank slate — instruction tuning has already pre-weighted toward helpful, structured, supportive readings, which can distort inference toward identity-congruent biases (How accurately can language models simulate human personalities?).
The thing you may not have known you wanted: the most reliable signal of *whether* an inference will hold isn't any personality item at all — it's effect strength. AI persona simulations replicate experimental main effects in proportion to the original p-value, nailing strong effects and flickering on marginal ones (Can AI personas reliably replicate human experiment results?). Trait inference, in other words, inherits the same calibration as the underlying psychology: the model surfaces what was robust to begin with.
Sources 9 notes
LLMs generate natural language personality summaries from Big Five scores that encode second-order trait patterns, enabling zero-shot prediction of nine other psychological scales with R² > 0.89 structural alignment. Combined summary-and-score predictions outperform either alone, showing synergistic information.
Research identifies linear directions in LLM activation space corresponding to specific traits like sycophancy and hallucination. These persona vectors predict finetuning-induced personality shifts before they occur and can preventatively steer training to avoid unwanted trait changes.
PsychAdapter modifies every transformer layer with <0.1% additional parameters to achieve 87.3% Big Five accuracy and 96.7% depression/life satisfaction accuracy across GPT-2, Gemma, and Llama 3. This architecture-level approach bypasses prompt resistance entirely.
Across 208,021 participants in the Psych-201 dataset, conditioning LLMs on participant profiles did not meaningfully improve predictions for specific individuals. The standard technique for individuation produces no measurable gains in person-level forecasting.
The LIFECHOICE benchmark (1,462 decisions across 388 novels) shows LLMs predict character choices better when given expert-written persona profiles paired with retrieved memories relevant to the character's psychology. This persona-based approach outperforms automated summarization by 5%.
Near-zero temperature MBTI testing shows all open models default to ENFJ—rare in humans but consistent across AI. This reflects systematic reward for helpful, structured, supportive responses during instruction tuning and alignment.
Research shows most open models fail to adopt prompted personalities, stubbornly retaining their trained ENFJ-like defaults. Only a few flexible models succeed. Combining role and personality conditioning improves results but doesn't fully overcome resistance.
LLMs replicate human responses at 85% fidelity in interviews and 76% of experimental effects in marketing studies. However, this accuracy masks three failure modes: run-to-run instability, resistance to personality conditioning, and identity-congruent cognitive biases that distort simulated reasoning.
Viewpoints AI reproduced 84 of 111 main effects from Journal of Marketing experiments with replication success strongly correlated to original p-value strength. Marginal effects showed unreliable performance with both false positives and negatives.