Character is Destiny: Can Role-Playing Language Agents Make Persona-Driven Decisions?

Paper · arXiv 2404.12138 · Published April 18, 2024

Can Large Language Models (LLMs) simulate humans in making important decisions? Recent research has unveiled the potential of using LLMs to develop role-playing language agents (RPLAs), mimicking mainly the knowledge and tones of various characters. However, imitative decision-making necessitates a more nuanced understanding of personas. In this paper, we benchmark the ability of LLMs in persona-driven decision-making. Specifically, we investigate whether LLMs can predict characters’ decisions provided by the preceding stories in high-quality novels. Leveraging character analyses written by literary experts, we construct a dataset LIFECHOICE comprising 1,462 characters’ decision points from 388 books. Then, we conduct comprehensive experiments on LIFECHOICE, with various LLMs and RPLA methodologies. The results demonstrate that state-of-the-art LLMs exhibit promising capabilities in this task, yet substantial room for improvement remains. Hence, we further propose the CHARMAP method, which adopts persona-based memory retrieval and significantly advances RPLAs on this task, achieving 5.03% increase in accuracy.

These memories can enhance the model’s decision-making and reasoning, bringing a better personal experience for users. However, obtaining real user memory data is difficult and violates privacy. We model characters from historical data in high-quality novel texts, allowing the model to restore the real choices in the storyline based on the previous text, providing the first benchmark for the wide testing of personal intelligent agents.

Character-driven Motivation Character-driven behavior revolves around the character’s inner world, personality, and transformation. Submotivations of character-driven behavior include Personality and Traits, Emotions and Psychological State, Social Relationships, Values and Beliefs, and Desires and Goals.

4 Experiments

Because our inputs generally exceed 100k, it is difficult for LLMs to handle them directly. Therefore, our approach is divided into two steps: 1) Character Profile Construction, which includes the character’s description and memories; 2) Reasoning for Decisions, where different LLMs use the constructed profile to answer the questions.

4.1 Character Profile Construction As shown in Figure 1, the character profile consists of two parts. The first part is the character’s description, including their personality, experiences, hobbies, etc. The second part is the character’s memories, specific segments from the preceding text. Below, I will detail the methods for constructing these two parts:

Description Construction We adopt two automatic methods to construct character descriptions:

(1) Hierarchical merging (Wu et al., 2021): Books are divided into chunks that fit within the LLM context window. The LLM summarizes each chunk, then merges and summarizes adjacent summarized chunks iteratively to produce the final description.

(2) Incremental updating (Chang et al., 2023): Books are divided into chunks and summarized sequentially, and the description is updated and refined incrementally by concatenating summarized chunks. The summarization model for both automated methods is GPT-3.5. Additionally, using the (3) expert-written descriptions from Supersummary, we employ GPT-4 to identify the positions of the decision points and truncate the text, providing only the data before these points. All descriptions are kept within 5k tokens, the maximum for human-written descriptions.

Memory Retrieval We use two memory retrieval methods: (1) BM25 (Robertson et al., 2009): Scores documents based on term relevance and length, optimizing retrieval using term frequency and distribution. (2) Embedding-based retrieval: Uses dense vectors representing documents and queries to assess semantic similarity through vector distance. For the embedding model, we use OpenAI’s text-embedding-ada-002(Neelakantan et al., 2022) model.