Beyond Discrete Personas: Personality Modeling Through Journal Intensive Conversations
Large Language Models (LLMs) have significantly improved personalized conversational capabilities. However, existing datasets like Persona Chat, Synthetic Persona Chat, and Blended Skill Talk rely on static, predefined personas. This approach often results in dialogues that fail to capture human personalities’ fluid and evolving nature. To overcome these limitations, we introduce a novel dataset with around 400,000 dialogues and a framework for generating personalized conversations using long-form journal entries from Reddit. Our approach clusters journal entries for each author and filters them by selecting the most representative cluster, ensuring that the retained entries best reflect the author’s personality. We further refine the data by capturing the Big Five personality traits—openness, conscientiousness, extraversion, agreeableness, and neuroticism— ensuring that dialogues authentically reflect an individual’s personality.
Other works have explored integrating psychological models (Azucar et al., 2018; Barlett and Anderson, 2012) like the Big Five (O.C.E.A.N. Model) into CA. However, significant challenges remain in accurately capturing and representing dynamic personality traits.
The detailed prompt provided is as follows: Instruction: Create a 9-turn dialogue in english between two authors based on the journal entries provided below. The dialogue should reflect a natural and engaging conversation, finding common ground between the authors’ experiences, thoughts, or emotions. Ensure that the conversation stays true to the personality traits and tones expressed in the journal entries. Each author should contribute equally, with utterances that are concise, relevant, and no longer than 20 words. journal, journal 2
Conversations generated from such static personas can feel repetitive(Zhang et al., 2020), shallow, and sometimes even contradictory (Nie et al., 2021), failing to engage the user truly. Our research seeks to fill this gap and transform this approach by moving beyond the constraints of discrete personas, instead embracing a model that captures the dynamic nature of personal identity (Schwartz et al., 2011). By leveraging long-form journal entries mined from platforms like Reddit—where individuals share their authentic, unfiltered life experiences, we ensured the preservation of personality traits, achieving greater depth and realism than static personas.