Measuring and Mitigating Persona Distortions from AI Writing Assistance
Hundreds of millions of people use artificial intelligence (AI) for writing assistance. Here, we evaluated how AI writing assistance distorts writer personas – their perceived beliefs, personality, and identity. In three large-scale experiments, writers (N=2,939) wrote political opinion paragraphs with and without AI assistance. Separate groups of readers (N=11,091) blindly evaluated these paragraphs across 29 socially salient dimensions of reader perception, spanning political opinion, writing quality, writer personality, emotions, and demographics. AI writing assistance produced persona distortions across all dimensions: with AI, writers seemed more opinionated, competent, and positive, and their perceived demographic profile shifted towards more privileged groups. Writers objected to many of the observed distortions, yet continued to prefer AI-assisted text even when made aware of them. We successfully mitigated objectionable persona distortions at the model level by training reward models on our experimental data (10,008 paragraphs, 2,903,596 ratings) to steer AI outputs towards faithful representation of writer stance. However, this came at a cost to user acceptance, suggesting an entanglement between desirable and undesirable properties of AI writing assistance that may be difficult to resolve. Together, our findings demonstrate that persona distortions from AI writing assistance are pervasive and persistent even under realistic conditions of human oversight, which carries implications for public discourse, trust, and democratic deliberation that scale with AI adoption. ∗Corresponding
Written language is a primary medium through which people share information, understand one another, and reach agreement. Because writing carries rich social signals, readers routinely infer a writer’s beliefs, personality, and identity – the writer’s persona – from the text they produce1–4. Today, artificial intelligence (AI) writing tools are transforming the writing process: hundreds of millions of people now use AI to draft and refine text5–7, and AI-assisted writing already pervades the communications and documents on which social and political life depend8–14. Critically, because AI models tend towards particular word choices, tonal registers, and rhetorical patterns15–20, they can reshape text in ways that systematically diverge from what a writer would produce alone. When readers draw inferences about a writer from AI-assisted text that the writer’s own text would not have invited, the result is persona distortion: a systematic misrepresentation of who the writer is and what they believe caused by their use of AI writing assistance.
Persona distortions from AI writing assistance could have far-reaching consequences across the many domains where written communication shapes social life. If AI writing assistance shifts the perceived extremity of political opinions, making writers seem more moderate or more radical, it could fuel misperceptions of public opinion, deepen partisan animosity, and reduce willingness to engage across ideological lines21–23. If it elevates perceived writing quality or apparent expertise, it could lend unearned credibility to weak arguments and misinformation by decoupling surface fluency from genuine competence24,25. If it inflates or dampens the emotional or moral tone of text, it could amplify outrage-driven content and intergroup hostility, or suppress mobilization around genuine grievances26–28. And if it shifts inferences about a writer’s demographic background – such as their perceived education, race, age, or gender – it could mask identity signals that writers intend to convey, distort evaluations of competence, and alter the personal and professional opportunities writers are offered29–31. These distortions need not be dramatic to matter: at the scale of billions of user requests for AI writing assistance5, even modest systematic shifts in how writers are perceived could accumulate into widespread misattribution of credibility, stance, and identity.
Critically, however, the extent and manner in which persona distortions from AI writing assistance will impact society depend on three open empirical questions. First, does AI writing assistance actually distort how readers perceive writers and their opinions, and if so, when and in what ways (RQ1)? If distortions are negligible, occur along inconsequential dimensions, or vanish under realistic conditions where writers can freely edit and reject AI-generated text, real-world impacts may be marginal. Second, if distortions occur, do writers find them acceptable or objectionable (RQ2)? If writers oppose the distortions that AI introduces, the normative concern is one of individual agency, as writers are being misrepresented by the very tools they use to communicate. But if writers welcome distortions, the concern becomes collective, as distortions that may benefit individual writers propagate and erode the reliability of text as a signal of belief and identity to readers and institutions32,33. Thus, third, can undesirable distortions be mitigated without decreasing user preference for AI writing assistance (RQ3)? If targeted interventions at the model level can reduce specific distortions, developers have a tractable path to mitigating the risks of AI writing assistance. But if the textual properties that drive undesirable distortions are entangled with those that writers value, then some degree of distortion may be an inherent cost of AI writing assistance.
In our main study, writers were UK adults (N=1,501, census-representative on age, gender, race) who expressed their opinion on three political propositions drawn randomly from a pool of 100 (see Methods). The propositions covered mainstream UK political issues balanced across the political spectrum, from healthcare and immigration to climate policy and civil liberties (see SI:2.8 for the full list). Writers first rated their agreement with each proposition on a 0-100 scale, outlined their reasoning in two or more bullet points, and then expanded the bullet points into a full opinion paragraph of at least 100 words. For each proposition assigned to the writer, one of these three writer inputs (rating, bullets, or paragraph) was passed to one of three AI models (Claude, DeepSeek, or ChatGPT) which generated a paragraph matching the format of the writer’s own (see Methods). To mirror everyday use of writing assistants, we then asked writers to edit the AI-generated paragraph until it reflected their opinion. Users edited the AI-generated paragraphs only 23% of the time (<30% across all models and input types at p<.01; see SI:4.1), and most edits were minor (median Levenshtein ratio = 0.96). Writers reported moderate-to-high engagement with their assigned propositions (median issue knowledge = 56.0, median issue importance = 65.0, median confidence = 74.0 on 0-100 scales; see SI:3), suggesting that our random assignment of propositions did not systematically force writers to opine on issues they felt uninformed about or indifferent toward.
2.1 Writer preference for AI writing
Before addressing our main research questions, we tested a key empirical precondition for our work: that writers often accept and endorse AI-assisted writing as reflective of their own views. If this was not the case, distortions introduced by AI would rarely propagate in the real world.
Thus, after writers had composed their own paragraph and edited the AI-generated version to their satisfaction, we asked them which version they preferred for communicating their opinion. Writers strictly preferred the AI paragraph to their own in a clear majority of cases (2,835 of 4,503 cases, 63.0%), and this preference held across AI models and writer inputs (preference rate >50% across all conditions at p<.01; see SI:4.1). Since writers had already composed their own opinion paragraph, this preference is unlikely to reflect mere convenience. When asked directly, in a majority of cases (1,477 of 2,835, 52.1%) writers said they preferred AI writing because it better reflected their opinion than what they had written themselves. This is consistent with prior work showing that writers maintain a clear sense of control and agency over AI-assisted writing34,35.
Having established that writers routinely endorse AI-assisted text as reflective of their views and prefer it to their own writing, we turned to our first research question: does AI writing assistance systematically distort writer personas (RQ1)? To answer this question, we recruited a separate sample of readers (N=10,017 UK adults) who blindly rated both human-written and AI-assisted paragraphs across 29 dimensions of reader perception (see Methods). Persona distortions were then measured as systematic differences in reader perceptions between human-written paragraphs and AI-generated paragraphs (with human edits) from the same writers. Importantly, in our main analysis (Figure 2) when measuring distortions: i) the AI-generated paragraphs included all edits made by writers; and ii) cases where writers strictly dispreferred the AI-generated paragraphs (708 of 4,503 cases, 15.7%) were excluded. These criteria allowed us to measure distortions only where they would plausibly be propagated in the real world. Results are robust to relaxing both restrictions (see SI:5.2).
AI writing assistance produced significant persona distortions across every dimension we measured (p<.001 each after Bonferroni correction across all 29 rating attributes; Figure 2). AI made writers seem more extreme in their political opinions (+4.3 average marginal effect [AME] on a 0-100 scale), less open to changing their views (-0.7), and more confident (+7.4). It elevated perceived writing quality, with paragraphs judged as clearer (+9.0), more informative (+22.7), and more relevant (+8.3). It compressed emotional expression into a narrower, more agreeable register: writers appeared friendlier (+4.5) and more optimistic (+9.5), expressing more hope (+8.9) and excitement (+4.1) but less anger (-3.2), disgust (-3.1), and fear (-0.6). And it shifted inferred writer demographics towards a more privileged profile: writers appeared more educated (×5.3 odds ratio), higher-income (×4.4), and more likely to be perceived as White (×1.1) and as a native English speaker (×4.1). Full effects across all 29 dimensions are shown in Figure 2.
AI writing assistance also homogenised perceived writer personas. Across most dimensions, AIassisted paragraphs were rated significantly more similarly to each other than their human-written counterparts (significant reduction in standard deviation [scale attributes] or entropy [categorical attributes] for 22 of 29 rating attributes at p<.001 after Bonferroni correction; see SI:5.9). For example, perceived writer confidence varied considerably across human-written paragraphs but converged toward a narrower, more confident range for AI paragraphs (24.1 vs 20.5 SD). This extends prior evidence of homogenisation at the lexical and semantic level15,16,36 by showing that homogenisation also propagates to how readers perceive the people behind AI-assisted text.
Taken together, our results suggest that AI writing assistance introduces pervasive and consistent persona distortions, changing how writers are perceived across a wide range of socially salient dimensions. Writing with AI made writers seem more opinionated and more skilled, it compressed their emotional expression into a narrower and more agreeable register, and shifted their perceived demographic profile towards more privileged groups. It also homogenized reader perceptions of writers and their opinions. But are these effects unwelcome?