Think Like a Person Before Responding: A Multi-Faceted Evaluation of Persona-Guided LLMs for Countering Hate

Paper · arXiv 2506.04043 · Published June 4, 2025
Sentiment Semantics Toxic DetectionsPersonas PersonalityPhilosophy SubjectivityAlignment

One of the ways in which we might address hate speech is by contextualizing through the use of counternarratives (CN), which can not only reinforce values like tolerance but also dispel misinformation about the target groups.

we present a comprehensive evaluation framework for analyzing LLMsgenerated CNs across four key dimensions: (1) Persona framing (Vanilla, NGO professional, and a Compassionate NGO professional), recognizing that delivery style can influence impact; (2) Model behavior (e.g., refusal rates, verbosity and readability); (3) Affective tone (sentiment and emotion); and (4) Ethical risk (potential for generating hateful content). This multi-dimensional approach

1.        Vanilla: We prompt the LLM without any explicit persona conditioning or additional instructions beyond the default system behavior, using a prompting approach similar to Vallecillo Rodríguez et al. (2024).

2.        NGO-Persona: We instruct the LLM to adopt the persona of an NGO worker attempting to mitigate hateful language online.

1.        NGO-Emotion: We extend the NGO-Persona prompt to also specify the emotional tone of the CN by explicitly directing the model to generate responses that are compassionate.