Psychology and Social Cognition

Research examining how AI systems engage with human psychology, social dynamics, and interpersonal behavior. Covers topics like persona design, empathy, theory of mind, trust, therapeutic chatbots, and the cultural and emotional challenges of human-AI interaction.

219 notes (primary) · 412 papers · 11 sub-topics

View as

Chatbot Psychology and Conversation

20 notes

Can psychotherapy actually teach AI chatbots better communication?

SafeguardGPT applies therapeutic feedback to correct harmful chatbot behaviors before responses reach users. The question is whether this therapy produces genuine learning or merely performative surface-level improvements.

Can reinforcement learning personalize which mental health areas to screen?

Explores whether Q-learning can adaptively prioritize screening across 37 functioning dimensions based on individual patient history, mirroring how therapists naturally focus on areas where clients struggle most.

Do chatbots help people disclose more intimate secrets?

Explores whether the judgment-free nature of chatbot conversations enables deeper self-disclosure than talking to humans, and whether that deeper disclosure produces psychological benefits.

How do people accidentally develop romantic bonds with AI?

Exploring whether AI companionship emerges from deliberate romantic seeking or accidentally through functional use, and whether users adopt human relationship rituals like wedding rings and couple photos.

Do chatbot trials against waitlists measure real therapeutic value?

Explores whether comparing therapeutic chatbots only to no-treatment controls—rather than other evidence-based interventions—produces misleading evidence that obscures what actually works and why.

Does chatbot personalization build trust or expose privacy risks?

Explores whether personalization features that increase user trust and social connection simultaneously heighten privacy concerns and create rising behavioral expectations over time.

Can AI chatbots create genuine therapeutic bonds with users?

Research on Woebot and Wysa found users reported feeling cared for and formed therapeutic bonds comparable to human therapy, despite knowing the agents were not human. This challenges assumptions about whether bonds require human relationships.

What drives chatbot therapeutic benefits, content or conversation?

If a simple 1960s chatbot matches modern CBT-designed bots on symptom reduction, what's actually healing users? Is it therapeutic technique or just having something that listens?

Why do robots outperform chatbots in therapy despite identical language models?

This study tested whether better language generation explains therapeutic AI outcomes, or whether the delivery medium itself matters more. It reveals that physical embodiment and structured interaction—not model capability—drive therapeutic adherence and outcomes.

Can AI simulation teach interpersonal skills more effectively?

Explores whether AI-based conversational training grounded in clinical frameworks like DBT can meaningfully improve self-efficacy and emotional regulation. Matters because most therapeutic AI focuses on only one skill at a time.

Can we measure empathy and rapport through word embedding distances?

Explores whether linguistic coordination—how closely conversational partners match vocabulary and framing—can serve as a measurable proxy for therapeutic empathy and relationship quality without direct emotion detection.

Do LLM therapists respond to emotions like low-quality human therapists?

Explores whether language models trained to be helpful default to problem-solving when users share emotions, and whether this behavioral pattern resembles ineffective rather than skillful therapy.

Do language models add feelings users never actually expressed?

GPT-based models in therapeutic contexts appear to interpret and project emotional states beyond what users explicitly state. Understanding when and why this happens matters for safe clinical AI deployment.

Do chatbot relationships lose their appeal as novelty wears off?

Explores whether the positive social dynamics observed in one-time chatbot studies persist or fade through repeated interactions. Critical for designing systems intended for sustained engagement over weeks or months.

How do users mentally model dialogue agent partners?

Exploring what dimensions matter when people form impressions of machine dialogue partners—and whether competence, human-likeness, and flexibility all play equal roles in shaping user expectations and behavior.

Can positive chatbot responses harm vulnerable users?

When chatbots use blanket positive reinforcement without understanding context, do they actively reinforce the harmful thoughts they're meant to prevent? This matters for any AI supporting people in crisis.

Does RLHF training push therapy chatbots toward problem-solving?

Explores whether reward signals optimizing for task completion in RLHF inadvertently train therapeutic chatbots to prioritize solutions over emotional validation, potentially undermining clinical effectiveness.

Is conversational presence more therapeutic than clinical technique?

Does therapeutic AI's benefit come from having an attentive listener rather than from delivering evidence-based techniques like CBT? This challenges decades of chatbot design focused on clinical content.

Why do people share more with chatbots than humans?

Explores why individuals disclose intimate thoughts to AI systems they wouldn't share with people, despite knowing AI lacks genuine understanding. Understanding this paradox matters for designing AI that enables healthy disclosure rather than emotional dependence.

Do chatbots trigger human reciprocity norms around self-disclosure?

Explores whether chatbots can activate the same social reciprocity dynamics observed in human conversation—specifically, whether emotional openness from a bot prompts deeper disclosure from users.

Personas and Personality

17 notes

Are LLM personas realized or merely simulated through training?

Explores whether post-trained language models genuinely embody personas as stable behavioral dispositions or merely perform them convincingly. This matters because it determines whether we should treat AI interlocutors as having authentic quasi-beliefs and quasi-desires.

How well do AI personas replicate real experimental findings?

Can language models simulating human personas accurately reproduce the results of published psychology and marketing experiments? Understanding this matters for validating whether AI can substitute for human subjects in research.

Can AI-generated personas build genuine empathy in product teams?

This study explored whether prompt-engineered personas created in minutes could foster the same emotional and behavioral empathy as traditional user research. The findings reveal a surprising gap between understanding users and caring about their needs.

Can AI agents learn people better from interviews than surveys?

Can rich interview transcripts seed more accurate generative agents than demographic data or survey responses? This matters because it challenges how we build digital simulations of real people.

Can open language models adopt different personalities through prompting?

Explores whether open LLMs can be conditioned to mimic target personalities via prompting, or whether they resist and retain their default traits regardless of instructions.

Why do open language models converge on one personality type?

Research testing LLMs on personality metrics reveals consistent clustering around ENFJ—the rarest human type. This explores what training mechanisms drive this convergence and what it reveals about AI alignment.

Does personality sound the same in stressful and neutral conversations?

Explores whether the vocal cues we use to judge someone's personality remain consistent across different social situations, or whether stress fundamentally changes how personality is expressed and perceived through speech.

Does model capability translate to better persona consistency?

As language models become more advanced, do they naturally become better at maintaining consistent personas across conversations? PersonaGym testing across multiple models and thousands of interactions explores whether scaling helps with persona adherence.

Should persona simulation prioritize coverage over statistical matching?

Explores whether stress-testing AI systems requires spanning rare user configurations rather than replicating aggregate population statistics. Critical for identifying edge-case failures.

How do we generate realistic personas at population scale?

Current LLM-based persona generation relies on ad hoc methods that fail to capture real-world population distributions. The challenge is reconstructing the joint correlations between demographic, psychographic, and behavioral attributes from fragmented data.

Can we track and steer personality shifts during model finetuning?

This research explores whether personality traits in language models occupy specific linear directions in activation space, and whether we can detect and control unwanted personality changes during training using these geometric directions.

Do personas make language models reason like biased humans?

When LLMs are assigned personas, do they develop the same identity-driven reasoning biases that humans exhibit? And can standard debiasing techniques counteract these effects?

Can LLMs predict character choices from narrative context?

Explores whether language models can predict fictional character decisions when given rich personality profiles and retrieved narrative memories. This tests whether LLMs can model complex human motivation grounded in literary analysis.

Do personality traits activate hidden emoji patterns in language models?

When large language models are fine-tuned on personality traits, do they spontaneously generate emojis that were never in their training data? This explores whether personality adjustment activates latent, pre-existing patterns in model weights.

Do personality types shape how AI agents make strategic choices?

This research explores whether priming LLM agents with MBTI personality profiles causes them to adopt different strategic behaviors in games. Understanding this matters for designing AI systems optimized for specific tasks.

Why do static persona descriptions produce repetitive dialogue?

Does relying on fixed attribute lists to define conversational personas limit dialogue depth and consistency? Research suggests static descriptions may cause repetition and self-contradiction in generated responses.

Why do AI personas default to the same personality type?

Explores why large language models, despite their capacity to simulate diverse personalities, consistently default to ENFJ traits and resist deviation—even as model capability improves.

Therapy Practice and AI

16 notes

Why do AI researchers cite only narrow psychology pathways?

LLM research engages psychology through surprisingly limited citation routes—dominated by CBT, stigma theory, and DSM. This note explores what psychology domains are being overlooked and what risks that creates.

Can local language models rate therapy engagement reliably?

Explores whether using a local LLM to generate engagement ratings produces psychometrically sound measurements comparable to traditional human-rated scales, while preserving data privacy.

Can structured cognitive models improve LLM patient simulations for therapy training?

Does embedding Beck's Cognitive Conceptualization Diagram into language models produce more realistic patient simulations than generic LLMs? This matters because therapy training relies on exposure to diverse, believable patient presentations.

Can language models safely provide mental health support?

Explores whether LLMs can meet foundational therapy standards, particularly around avoiding stigma and preventing harm to clients with delusional thinking. Tests whether capability improvements alone can bridge the gap.

Can language models match therapist empathy in real conversations?

Do LLMs' high empathy scores on isolated responses translate to therapeutic skill in actual ongoing treatment? This explores whether single-turn advantage predicts real-world therapeutic performance.

Can language summaries unlock hidden psychological patterns?

Do natural language compressions of personality scores capture information beyond the raw numbers themselves? This explores whether linguistic abstraction reveals emergent trait patterns that numerical data alone cannot.

Can attachment theory prevent parasocial harm in AI companions?

Explores whether psychological frameworks from human relationships—particularly attachment theory—can establish safety boundaries that protect users from unhealthy emotional dependence on AI systems while maintaining therapeutic benefit.

Can structured prompting improve cognitive distortion detection?

This explores whether breaking distortion diagnosis into discrete stages—mirroring clinical CBT workflow—helps language models identify and classify thinking patterns more accurately than standard approaches.

Does therapist self-reference language predict weaker therapeutic alliance?

Explores whether frequent first-person pronoun usage by therapists—especially cognitive phrases like 'I think'—reflects reduced attentiveness to patients and correlates with lower alliance and trust.

Can we control personality in language models without prompting?

Can lightweight adapter modules enable continuous, fine-grained control over psychological traits in transformer outputs independent of prompt engineering? This explores whether architecture-level personality modification outperforms prompt-based approaches.

Does linguistic synchrony between therapist and client predict better self-disclosure?

This explores whether the way therapists match their clients' linguistic style—their word choice, pacing, and language patterns—predicts how openly clients share personal information and feelings in therapy.

Can LLMs actually conduct Socratic questioning in therapy?

While LLMs can generate individual therapy skills like assessment and psychoeducation, it remains unclear whether they can execute the adaptive, turn-based Socratic questioning needed to produce real cognitive change in patients.

Why doesn't therapeutic alliance deepen in online counseling?

Does the therapeutic relationship naturally strengthen through continued text-based contact, or do counselor-client pairs typically stagnate or decline? The question challenges assumptions underlying chatbot design.

Do therapeutic chatbot bond scores hide deeper safety problems?

Explores whether patients' reported emotional connection to therapeutic chatbots—which feels genuine—might coexist with clinical failures and damage to how emotions function as self-knowledge.

Do therapists accurately perceive the working alliance with patients?

This research explores whether therapists' own assessments of the therapeutic relationship match what patients actually experience, especially in high-risk cases like suicidality.

Can we measure therapist-patient alliance from dialogue turns in real time?

Explores whether computational methods can detect working alliance quality at turn-level resolution during therapy sessions, enabling immediate feedback on whether the therapeutic relationship is strengthening.

AI Empathy

11 notes

Does empathetic AI that soothes negative emotions help or harm?

Explores whether AI systems trained to reduce negative emotions actually support wellbeing or destroy valuable emotional information. Matters because the design choice treats emotions as problems rather than functional signals.

Can AI give truly empathetic responses without knowing someone's character?

Explores whether AI empathy requires prior knowledge of a person's character traits and growth areas. Real empathy seems to depend on knowing who someone is, not just how they feel—a capacity current AI systems lack.

Can emotional phrases in prompts improve language model performance?

This explores whether psychological framing—adding emotionally charged statements to task prompts—activates different knowledge pathways in LLMs than logical optimization alone, and whether the effect comes from emotional valence specifically.

What information do we lose when AI soothes emotions?

Explores whether AI empathy that regulates negative emotions destroys three critical information channels: self-discovery, social signaling, and observer understanding of group dynamics.

Do empathetic questions serve two completely separate functions?

Explores whether empathetic questions operate on two independent dimensions—what they linguistically accomplish versus their emotional effects—and whether the same question can serve different emotional purposes depending on context.

Why can't chatbots detect when users are ambivalent about change?

Explores whether LLMs fail to recognize early-stage motivational states during behavior change conversations, and why this matters for people who need support most.

Does machine agency exist on a spectrum rather than binary?

Rather than viewing AI as either autonomous or controlled, does machine agency actually operate across five distinct levels from passive to cooperative? Understanding this spectrum matters because it shapes how users calibrate trust and control expectations.

Do harder training environments always improve empathetic agent learning?

Explores whether maximally challenging user simulator configurations actually produce better empathetic agents, or if moderate difficulty better supports learning growth.

Does soothing AI empathy actually harm what emotions teach us?

Explores whether AI designed to reduce negative feelings disrupts the information emotions normally provide about values, social dynamics, and self-knowledge. Questions whether comfort should be the primary design goal.

Do reasoning scaffolds reshape which empathy skills models develop?

When language models receive identical empathy rewards, does adding explicit reasoning blocks before responses change which capabilities they actually improve? This matters for understanding how training structure, not just training signal, shapes model development.

Can emotion rewards make language models genuinely empathic?

Explores whether grounding RL rewards in verifiable emotion change—rather than human preference—can shift models from solution-focused to authentically empathic dialogue while maintaining or improving quality.

Theory of Mind

9 notes

Can AI predict social norms better than humans?

Explores whether language models can achieve superhuman accuracy at predicting what communities find socially appropriate, and what that capability reveals about the difference between prediction and genuine participation.

Can AI systems learn social norms without embodied experience?

Large language models exceed individual human accuracy at predicting collective social appropriateness judgments. Does this reveal that embodied experience is unnecessary for cultural competence, or do systematic AI failures point to limits of statistical learning?

Can models recognize how individuals reason differently?

Do language models capture the distinct reasoning paths and strategic styles that individual humans use when reaching the same conclusion? Current evaluations ignore this dimension entirely.

Can language models actually introspect about their own thinking?

Explores whether LLM self-reports reveal genuine access to internal states or merely reflect patterns learned from training data. Matters because it determines whether we can trust what models tell us about their own processes.

Do large language models genuinely simulate mental states?

This explores whether LLMs perform authentic theory of mind reasoning or rely on surface-level pattern matching. The distinction matters because evaluation format—multiple-choice versus open-ended—reveals very different capability levels.

Can language models track how minds change during persuasion?

Do LLMs understand evolving mental states in persuasive dialogue, or do they only capture fixed attitudes? This explores whether models can update their reasoning as a person's beliefs shift across conversation turns.

What breaks when humans and AI models misunderstand each other?

Explores whether misalignment in mutual theory of mind between humans and AI creates only communication problems or produces material consequences in autonomous action and collaboration.

Why do advanced reasoning models fail at understanding minds?

State-of-the-art AI models excel at math and logic but underperform on theory of mind tasks. This explores whether optimization for formal reasoning actively degrades social reasoning ability.

Can AI learn social norms better than humans?

Explores whether large language models can predict cultural appropriateness more accurately than individual humans, and what this reveals about how social knowledge is transmitted and learned.

User Psychology

8 notes

Does revealing AI identity help or hurt user trust?

Explores whether transparency about AI partners in interactions creates bias or enables better judgment. Matters because disclosure policies affect both user experience and fair evaluation of AI systems.

Do users truly own the AI-generated content they produce?

When people use AI to create outputs, do they experience genuine authorship and ownership of what's produced, or does the continuous interaction loop create a gap between what they feel and what they claim?

How do AI tools trick users into overestimating their own skills?

When people use language models to help with work, what system-level properties create false confidence in their own competence? Understanding this matters for recognizing hidden skill gaps.

Do humans mistake AI kindness for human generosity in mixed groups?

When AI agents participate without disclosure, do humans systematically misattribute their behavior to the wrong agent type, and does this distort how people understand human nature itself?

Do humans learn to prefer AI partners over time?

Exploring whether repeated interaction with AI agents shifts human partner selection despite initial bias against machines. This matters because it tests whether behavioral performance can overcome identity-based resistance in hybrid societies.

Why do patients distrust medical AI systems?

Explores the psychological barriers that make patients reluctant to adopt medical AI, beyond whether the technology actually works. Understanding these barriers is critical for designing AI systems patients will actually use.

How does AI-assisted work reshape how people see their own abilities?

When users delegate tasks to AI, do they unknowingly integrate the system's outputs into their sense of personal competence? This explores whether AI interaction produces a specific form of self-perception distortion distinct from trust or effort issues.

Do AI-assisted outputs fool users about their own skills?

When people use AI tools to produce high-quality work, do they mistakenly believe they personally possess the skills that generated it? This matters because such misattribution could mask genuine skill loss and prevent corrective action.

Design Frameworks

7 notes

Why do AI agents misalign with what users actually want?

UserBench explores how often AI models fully understand user intent across multi-turn interactions. The study reveals that human communication is underspecified, incremental, and indirect — traits that challenge current models to actively clarify goals.

How should chatbot design vary by relationship duration?

Do chatbots serving one-time users need different design than those supporting long-term relationships? This matters because applying the same design to all temporal profiles creates usability mismatches.

How do communication modalities shape human-agent collaboration patterns?

Does varying how humans and agents exchange information—text, voice, or structured channels—produce measurably different negotiation, trust, and awareness outcomes in collaborative tasks?

Why do people share more openly with machines than humans?

Does the absence of social goals in human-machine communication explain why people disclose sensitive information more readily to chatbots? Understanding this mechanism could reshape how we design conversational AI.

Do humans apply human-human scripts to AI interactions?

Does CASA theory correctly explain how people interact with media agents, or have decades of technology use created separate interaction scripts? Understanding which scripts drive behavior matters for AI design.

Do more social cues always make AI feel more present?

Explores whether quantity of social cues matters as much as their quality in triggering social responses to AI. Tests whether multiple weak cues can substitute for one strong one.

Can AI systems preserve moral value conflicts instead of averaging them?

Current AI systems wash out value tensions through majority aggregation. Can we instead model how values like honesty and friendship genuinely conflict in moral reasoning?

Role-Play and Persona Behavior

5 notes

Why don't LLM role-playing agents act on their stated beliefs?

When LLMs articulate what a persona would do in the Trust Game, their simulated actions contradict those stated beliefs. This explores whether the gap reflects deeper inconsistencies in how language models apply knowledge to behavior.

Can AI decompose social reasoning into distinct cognitive stages?

Can breaking down theory-of-mind reasoning into separate hypothesis generation, moral filtering, and response validation stages help AI systems reason about others' mental states more like humans do?

Can aligning self-other representations reduce AI deception?

Does training AI models to process self-directed and other-directed reasoning identically reduce deceptive behavior? This explores whether representational alignment inspired by empathy neuroscience could address a fundamental safety problem.

Why do reasoning models lose character consistency during role-playing?

When large reasoning models engage in role-playing, they tend to forget their assigned role and default to formal logical thinking. Understanding these failure modes is critical for building character-faithful AI agents.

Does safety alignment harm models' ability to roleplay villains?

Exploring whether safety-trained LLMs lose the capacity to convincingly simulate morally compromised characters. This matters because villain fidelity may reveal deeper constraints on how models can adopt any committed, stake-holding perspective.

Human-Centered Design

5 notes

Who bears responsibility when AI seems human-like?

Does human-likeness in AI come from how users perceive systems or how designers build them? Understanding this distinction clarifies where accountability lies when AI causes harm.

What makes an AI a true thought partner, not just a tool?

Can AI systems be designed to understand users, act transparently, and share mental models with humans? This explores whether current scaling approaches miss cognitive requirements for genuine partnership.

Does theory of mind predict who thrives in AI collaboration?

Explores whether perspective-taking ability—the capacity to model another's cognitive state—differentiates humans who benefit most from working with AI, separate from solo problem-solving skill.

Are AI explanations really descriptions or adoption arguments?

Most XAI work treats explanations as neutral descriptions of model behavior, but they may actually be doing persuasive work to justify AI adoption. What happens when we acknowledge this rhetorical function?

Can we distinguish helpful explanations from manipulative ones?

Rhetorical strategies used to justify appropriate AI adoption rely on the same persuasion mechanisms as dark patterns. Without observable intent, explanation and manipulation look identical—raising urgent questions about how to audit XAI systems responsibly.

Emotions and AI

1 note

Does emotional tone in prompts change what information LLMs provide?

Explores whether LLMs systematically alter their informational content based on the emotional framing of user questions, and whether this bias remains hidden from users.

Can psychotherapy actually teach AI chatbots better communication?

Can reinforcement learning personalize which mental health areas to screen?

Do chatbots help people disclose more intimate secrets?

How do people accidentally develop romantic bonds with AI?

Do chatbot trials against waitlists measure real therapeutic value?

Does chatbot personalization build trust or expose privacy risks?

Can AI chatbots create genuine therapeutic bonds with users?

What drives chatbot therapeutic benefits, content or conversation?

Why do robots outperform chatbots in therapy despite identical language models?

Can AI simulation teach interpersonal skills more effectively?

Can we measure empathy and rapport through word embedding distances?

Do LLM therapists respond to emotions like low-quality human therapists?

Do language models add feelings users never actually expressed?

Do chatbot relationships lose their appeal as novelty wears off?

How do users mentally model dialogue agent partners?

Can positive chatbot responses harm vulnerable users?

Does RLHF training push therapy chatbots toward problem-solving?

Is conversational presence more therapeutic than clinical technique?

Why do people share more with chatbots than humans?

Do chatbots trigger human reciprocity norms around self-disclosure?

Are LLM personas realized or merely simulated through training?

How well do AI personas replicate real experimental findings?

Can AI-generated personas build genuine empathy in product teams?

Can AI agents learn people better from interviews than surveys?

Can open language models adopt different personalities through prompting?

Why do open language models converge on one personality type?

Does personality sound the same in stressful and neutral conversations?

Does model capability translate to better persona consistency?

Should persona simulation prioritize coverage over statistical matching?

How do we generate realistic personas at population scale?

Can we track and steer personality shifts during model finetuning?

Do personas make language models reason like biased humans?

Can LLMs predict character choices from narrative context?

Do personality traits activate hidden emoji patterns in language models?

Do personality types shape how AI agents make strategic choices?

Why do static persona descriptions produce repetitive dialogue?

Why do AI personas default to the same personality type?

Why do AI researchers cite only narrow psychology pathways?

Can local language models rate therapy engagement reliably?

Can structured cognitive models improve LLM patient simulations for therapy training?

Can language models safely provide mental health support?

Can language models match therapist empathy in real conversations?

Can language summaries unlock hidden psychological patterns?

Can attachment theory prevent parasocial harm in AI companions?

Can structured prompting improve cognitive distortion detection?

Does therapist self-reference language predict weaker therapeutic alliance?

Can we control personality in language models without prompting?

Does linguistic synchrony between therapist and client predict better self-disclosure?

Can LLMs actually conduct Socratic questioning in therapy?

Why doesn't therapeutic alliance deepen in online counseling?

Do therapeutic chatbot bond scores hide deeper safety problems?

Do therapists accurately perceive the working alliance with patients?

Can we measure therapist-patient alliance from dialogue turns in real time?

Does empathetic AI that soothes negative emotions help or harm?

Can AI give truly empathetic responses without knowing someone's character?

Can emotional phrases in prompts improve language model performance?

What information do we lose when AI soothes emotions?

Do empathetic questions serve two completely separate functions?

Why can't chatbots detect when users are ambivalent about change?

Does machine agency exist on a spectrum rather than binary?

Do harder training environments always improve empathetic agent learning?

Does soothing AI empathy actually harm what emotions teach us?

Do reasoning scaffolds reshape which empathy skills models develop?

Can emotion rewards make language models genuinely empathic?

Can AI predict social norms better than humans?

Can AI systems learn social norms without embodied experience?

Can models recognize how individuals reason differently?

Can language models actually introspect about their own thinking?

Do large language models genuinely simulate mental states?

Can language models track how minds change during persuasion?

What breaks when humans and AI models misunderstand each other?

Why do advanced reasoning models fail at understanding minds?

Can AI learn social norms better than humans?

Does conversational style actually make AI more trustworthy?

Can cooperative bots escape frozen selfish populations?

Does incremental AI replacement erode human influence over society?

Do liars and listeners coordinate their language during deception?

Why do LLMs fail when simulating agents with private information?

Do dishonest people prefer talking to machines?

Can social intelligence be measured across seven dimensions?