Does model capability translate to better persona consistency?
As language models become more advanced, do they naturally become better at maintaining consistent personas across conversations? PersonaGym testing across multiple models and thousands of interactions explores whether scaling helps with persona adherence.
The PersonaGym evaluation framework tests 6 open and closed-source LLMs on persona adherence across 200 personas and 10,000 questions. The finding: Claude 3.5 Sonnet achieves only a 2.97% relative improvement in PersonaScore over GPT 3.5 — despite being a much more advanced model by every other measure.
This suggests persona consistency is an orthogonal capability that standard training does not improve. Models get better at reasoning, coding, instruction-following, and knowledge retrieval as they scale — but they do not get meaningfully better at maintaining a consistent persona across varied interactions.
The explanation likely connects to how models are trained. Standard training objectives (next-token prediction, RLHF for helpfulness) optimize for response quality on a per-turn basis. Persona consistency requires cross-turn coherence — remembering what you said earlier, maintaining behavioral patterns, avoiding contradiction with your established character. These are different optimization targets that standard training doesn't address.
Since Can open language models adopt different personalities through prompting?, the problem compounds: models resist persona change AND their base persona-adherence capability doesn't improve with scale. More capability doesn't mean more flexibility or more consistency.
This finding challenges the assumption that "better models will naturally solve persona problems." Dedicated persona training — whether through Why does supervised learning fail to enforce persona consistency? or other methods — appears necessary.
Source: Personas Personality
Related concepts in this collection
-
Can open language models adopt different personalities through prompting?
Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.
models resist change AND don't improve with scale
-
Why does supervised learning fail to enforce persona consistency?
Supervised learning trains models to generate good responses but never punishes contradictions. This note explores why explicit negative feedback is structurally necessary for dialogue agents to maintain consistent personas, and what training methods can provide it.
dedicated training needed since scaling doesn't help
-
Why do specialized models fail outside their domain?
Deep domain optimization creates sharp performance cliffs at domain boundaries. Specialized models generate plausible-sounding but ungrounded responses when queries fall outside their training scope, and often fail to signal their own ignorance.
another case where general capability doesn't transfer to specific competency
Click a node to walk · click center to open · click Open full network for a force-directed map
Original note title
persona adherence does not scale with general model capability — advanced models show minimal improvement over basic models