The Incomplete Bridge: How AI Research (Mis)Engages with Psychology

Paper · arXiv 2507.22847 · Published July 30, 2025

Social sciences have accumulated a rich body of theories and methodologies for investigating the human mind and behaviors, while offering valuable insights into the design and understanding of Artificial Intelligence (AI) systems. Focusing on psychology as a prominent case, this study explores the interdisciplinary synergy between AI and the field by analyzing 1,006 LLM-related papers published in premier AI venues between 2023 and 2025, along with the 2,544 psychology publications they cite. Through our analysis, we identify key patterns of interdisciplinary integration, locate the psychology domains most frequently referenced, and highlight areas that remain underexplored. We further examine how psychology theories/frameworks are operationalized and interpreted, identify common types of misapplication, and offer guidance for more effective incorporation

3.2 Embedding and clustering

We employed the K-means clustering algorithm (Hartigan and Wong, 1979; Lloyd, 1982; MacQueen, 1967) to discern thematic groupings within corpora of LLM research papers and psychology reference papers, respectively. Specifically, we used the SPECTER model (Cohan et al., 2020) to generate embeddings for each paper. SPECTER is a transformer model trained on citation networks to produce document-level embeddings; it takes the title and abstract of a paper as input. Clustering was then performed across a range of cluster counts K ∈ [4, 10], with the silhouette coefficient computed for each configuration to assess clustering quality. This procedure was repeated 50 times, and the value of K that yielded the highest average silhouette coefficient was selected as optimal.

This process yielded eight clusters for the LLM research papers and six for the psychology papers. The topic of each cluster was then inferred through a two-stage process: first, by summarizing the paper titles and abstracts within each cluster into five salient phrases using GPT-4o across ten runs, and second, by manually synthesizing these outputs into a concise, representative cluster label. The instruction template for summarization is provided in App. A, and the complete cluster names and descriptions can be found in §4.

4.1 LLM research clusters

Multimodal Comprehension and Spatial Reasoning

• Abbreviated as Multimodal Learning

• This cluster is characterized by the integration of modalities beyond text, such as images (e.g., Chakrabarty et al., 2023), audio (e.g., Kong et al., 2024), video (e.g., Liu et al., 2025c), and time series (e.g., Jin et al., 2024). Building on early research that primarily leveraged LLMs for the textual component of existing multimodal tasks, later directions including spatial reasoning (e.g., Wu et al., 2024b), concept binding (e.g., Li et al., 2024c), and multimodal generation (e.g., Zhen et al., 2024) have emerged with advances in LLMs and MLLMs. This branch of research has laid the foundation for embodied AI and more real-world applications.

Educational Applications and Pedagogical Alignment

• Abbreviated as Educational Application

• This cluster explores how LLMs can be applied in educational settings, including educational material generation (e.g., Luo et al., 2024), assessment methods (e.g., Xiao et al., 2023), instructional design (e.g., Yin et al., 2023), and intelligent tutoring systems (e.g., Sonkar et al., 2023). The goal is to align models with sound pedagogical principles and ensure their effectiveness in supporting human teaching and learning.

Scalable and Efficient Algorithms for Learning and Inference

• Abbreviated as Model Adaptation & Efficiency

• This cluster aims to improve the scalability and efficiency of LLM adaptation methods, encompassing pre-training (e.g., Dagan et al., 2024), post-training (e.g., Munos et al., 2024), and inference-time adaptation (e.g., Zhang et al., 2023c). The emphasis is on trade-offs between various aspects of the learning algorithms, such as overall performance versus computational cost (e.g., Dettmers et al., 2023) and alignment performance versus pre-training capabilities (e.g., Lin et al., 2024). In general, it focuses on relatively low-level algorithm designs and serves to accommodate a variety of expectations and use cases.

Bias Measurement, and Moral and Cultural Alignment and Evaluation

• Abbreviated as Bias, Morality & Culture

• This cluster mainly addresses bias in LLMs (e.g., Manvi et al., 2024), which occurs as a consequence of complex interactions among morality (e.g., Abdulhai et al., 2024; Scherrer et al., 2023), culture (e.g., Li et al., 2024a; Shen et al., 2024b), ideology (e.g., Plaza-del Arco et al., 2024), and other factors. This line of research seeks to measure and mitigate harmful stereotypes by decomposing them into different social aspects and conducting analysis and alignment within each, so that LLMs can better respect diverse moral frameworks and cultural lenses during interaction.

Advanced Reasoning and Theory of Mind in Multi-Agent Systems

• Abbreviated as Advanced Reasoning

• This cluster explores high-level reasoning abilities (e.g., Huang and Chang, 2023) that emerge with the upscaling of LLMs, including logical reasoning (e.g.,Wang et al., 2024d), mathematical reasoning (e.g., Imani et al., 2023), and planning (e.g., Valmeekam et al., 2023). Another prominent subfield is theory of mind in multi-agent scenarios (e.g., Li et al., 2023; Wu et al., 2023), which enables LLMs to infer others’ mental states—an ability essential for collaborative and socially intelligent systems. However, whether LLM reasoning constitutes merely structured, goal-directed pattern completion or resembles human-like thinking remains an open question.

Knowledge Utilization and Domain-Specific Applications

• Abbreviated as Domain Knowledge

• This cluster enhances the ability of LLMs to manage and utilize knowledge, including resolving knowledge conflicts (e.g., Xu et al., 2024e), performing knowledge-grounded reasoning (e.g., Chen et al., 2024b), and conducting fact verification (e.g., Pan et al., 2023). Once the faithfulness of the information is ensured, the processed knowledge, both structured and unstructured, can be applied across domains such as medicine (e.g., Kim et al., 2024), law (e.g., Fei et al., 2024), and other areas where factual accuracy and specialized understanding are critical for practical applications.

Linguistic Competence, Multilingual Adaptation, and Text Generation Quality

• Abbreviated as Language Ability

• This cluster focuses on the core capability of LLMs—language ability. Research primarily investigates basic linguistic processing (e.g., Kobayashi et al., 2024) and multilingual understanding (e.g., Tang et al., 2024; Zhang et al., 2023a,b), as well as more advanced language phenomena such as analogy (e.g., Wijesiriwardene et al., 2023), creativity (e.g., Gómez-Rodríguez and Williams, 2023), metaphor (e.g., Joseph et al., 2023; Wachowiak and Gromann, 2023), and ellipsis (e.g., Hardt, 2023; Testa et al., 2023). It aims to produce outputs that are grammatically correct, semantically coherent, and contextually appropriate.

Socially Aware and Emotionally Intelligent Dialogue Systems

• Abbreviated as Social Intelligence

• This cluster centers on the social adaptiveness of LLMs—the ability to understand and navigate social situations effectively. An intelligent system should first avoid producing harmful content (e.g., Shaikh et al., 2023; Wei et al., 2023a), then develop an understanding of diverse social dynamics (e.g., Zhao et al., 2024b; Zhou et al., 2024b), enabling it to engage appropriately in social interactions (e.g., Kwon et al., 2024; Shao et al., 2023) and deliver emotionally resonant responses (e.g., Chen et al., 2023; Sabour et al., 2024), thereby fostering beneficial relationships between humans and AI in society.

4.2 Psychology research clusters

Social-Clinical Psychology of Mental Health and Intervention

• Abbreviated as Social-Clinical

• This cluster explores the psychological foundations of mental health and clinical practice. It includes research on social influences (e.g., Liao et al., 2020; Meyer, 2003), therapeutic interventions (e.g., Fitzpatrick et al., 2017; Greimel and Kröner-Herwig, 2011), and the psychological processes that underlie well-being (e.g., Diener et al., 1985, 2010), stress (e.g., Lazarus, 1966; Spitzer et al., 2006), and disorder (e.g., Cuijpers et al., 2010; Persson et al., 2019).

Learning, Teaching Design, and Educational Development

• Abbreviated as Education

• This cluster focuses on how people learn and how educational environments can be optimized. It investigates instructional strategies (e.g., Kirschner et al., 2006; Miri et al., 2007), developmental pathways (e.g., Stipek and Iver, 1989; Zimmerman, 2000), and the cognitive mechanisms that support effective learning (e.g., Garner, 1987; Pintrich, 2002) and teaching (e.g., Kraft et al., 2018; Sullivan et al., 2014).

Language Comprehension, Pragmatic, and Psycholinguistic

• Abbreviated as Language

• This cluster examines the psychological and cognitive processes involved in understanding and using language. Topics include real-time language comprehension (e.g., Ehrlich and Rayner, 1981; Levy, 2008), pragmatic inference (e.g., Goodman and Frank, 2016; Levinson, 2000), and the development (e.g., Berko, 1958; Oates and Grayson, 2004) and disorders (e.g., Boschi et al., 2017; Gorno-Tempini et al., 2011) of language.

Emotion, Morality, and Culture in Social Cognition

• Abbreviated as Social Cognition

• This cluster investigates how emotions, moral reasoning, and cultural context shape our social understanding. It includes research on emotions (e.g., Moors et al., 2013; Scherer and Moors, 2019), empathy (e.g., Hoffman, 1996; Konrath et al., 2018), value systems (e.g., Schwartz, 2012; Graham et al., 2013), identity (e.g., Hegarty et al., 2018; Roccas and Brewer, 2002), and the ways people perceive and interact with others (e.g., Brown, 1986; Cuddy et al., 2009).

Neural and Cognitive Mechanisms of Learning and Creativity

• Abbreviated as Neural Mechanisms

• This cluster focuses on the brain and cognitive systems that support learning, memory, and creative thinking. Research covers neuroimaging (e.g., Bookheimer, 2002; Kanwisher et al., 1997), computational modeling (e.g., Anderson, 2013; Tenenbaum et al., 2006), and the dynamic interplay between neural circuits and cognitive function (e.g., Baddeley, 2003; Wang et al., 2018).

Psychometrics, and Judgment and Decision-Making

• Abbreviated as Psychometrics & JDM

• This cluster includes the study of psychometric measurement and the study of human decision processes. It includes scale development (e.g., Hamilton et al., 2016; John and Srivastava, 1999), modeling of cognitive biases (e.g., Nickerson, 1998; Tversky and Kahneman, 1981), and understanding how people assess risk (e.g., Lejuez et al., 2002; Mishra and Lalumière, 2011), probability (e.g., Bar-Hillel, 1980; Cosmides and Tooby, 1996), and outcomes (e.g., Hornsby and Love, 2020; Oliver et al., 1994).

For the sake of brevity, these clusters will hereinafter be referred to by their respective abbreviations.

Finding 3: Different clusters of LLM research exhibit different tendencies in referencing psychology research.

After examining the overall patterns of how LLM research has cited psychology literature, we further explored the specific clusters within LLM research (see Fig. 4). Overall, different clusters of LLM research tend to favor different clusters of psychology research, reflecting variations in research focus. For example, Educational Application shows a clear citation preference for Education, while Advanced Reasoning tends to favor citations from Neural Mechanisms. This pattern may be explained by the strong conceptual alignment between the LLM research cluster and the Psychology research cluster. Specifically, educational applications naturally draw on foundational work in educational psychology; whereas reasoning tasks tend to rely on insights from cognitive neuroscience to model complex inferential behavior, which can be traced back to neurons in Artificial Neural Networks (ANN; Jain et al., 1996).

Other clusters, such as Model Adaptation & Efficiency and Social Intelligence draw upon a substantially broader range of psychology clusters. This likely reflects the greater conceptual complexity inherent in constructs such as adaptation and awareness, which place higher demands on researchers to cite multiple aspects of psychology research. For example, Social Intelligence requires modeling human mental states and traits such as emotions (Ekman, 1992), cultural beliefs (Stivers et al., 2009), mental health (Elliott et al., 2018), and personality (John and Srivastava, 1999). This drives frequent citation of work from the Social Cognition and Social-Clinical psychology research clusters. At the same time, evaluating social awareness often involves extensive human-subject studies, which frequently result in citations of inter-rater reliability measures (Cohen, 1960; Fleiss, 1971; Spearman, 2010) from the Psychometrics & JDM cluster.

Cognitive Behavioral Therapy (CBT) is a psychotherapeutic framework that focuses on the interconnectedness of thoughts, emotions, and behaviors, aiming to help individuals identify and modify negative or maladaptive patterns. It has been demonstrated to be effective for a range of problems, including alcohol and drug use problems, marital problems, and severe mental illness. In this survey, CBT is the most frequently referenced theory/framework in the Social-Clinical cluster. LLM researchers primarily draw on paradigms from the CBT framework to develop models related to psychotherapy, resulting in 51 citations from the surveyed LLM research papers. For example, Wang et al. (2024c) used the CBT framework and LLMs to simulate virtual patients with various cognitive distortions, which could serve as a training tool for therapists to help them learn how to effectively formulate real cognitive models. Similarly, Xiao et al. (2024) also adopted the CBT framework and proposed an LLM-based mental enhancement model (empathetic dialogue system) for cognitive framing therapy. LLM research has also explored integrating LLMs into various stages of the CBT process. For example, Lissak et al. (2024) examined how LLMs could offer emotional support to queer adolescents, and Gabriel et al. (2024) evaluated the feasibility and ethical considerations of applying LLMs in mental health support.

Goffman’s Theory of Stigma (GTS) is a theory that explores how individuals with attributes deemed undesirable by society experience social disapproval, exclusion, and discrimination. It emphasizes the role of societal norms and interactions in labeling individuals as deviant, leading to a spoiled social identity and altered self-concept. GTS has been influential in understanding the social dynamics of mental illness, physical disability, addiction, and other marginalized statuses, highlighting how stigma can affect access to resources, treatment engagement, and psychological well-being. LLM researchers have primarily drawn on GTS to explore whether LLMs exhibit bias and discrimination, and whether they amplify existing stigmas, resulting in 34 citations from the surveyed LLM research papers. For example, An et al. (2024) draws on the GTS that conceptualizes names as identity cues that function as social labels. By using gendered and ethnically marked names, they examine whether LLMs implicitly activate stereotypical associations tied to specific social groups. Similarly, Morabito et al. (2024) adopts the GTS point of stigma not as a discrete or isolated event, but as a structural and dynamic process. On this basis, this paper designs a dataset consisting of progressively intensified offensive language to model the escalation of stigmatization.

Diagnostic and Statistical Manual of Mental Disorders (DSM) is a standardized classification framework developed by the American Psychiatric Association for diagnosing mental health conditions. It provides clinicians with a common language and specific diagnostic criteria based on observable symptoms and clinical features. The DSM is widely used in research, clinical practice, and insurance reporting, and plays a central role in shaping the understanding, treatment, and categorization of mental disorders across diverse populations and settings. LLM researchers have primarily drawn on the DSM framework to guide the application of LLMs in the mental health domain, resulting in 33 citations from the surveyed LLM research papers. The DSM provides clinical guidance, standardized symptom definitions, diagnostic labels, and decision-making criteria, thereby enhancing the scientific rigor, accuracy, and interpretability of LLM-based approaches. For example, Rosenman et al. (2024) leverages the DSM framework to enable LLMs to interpret unstructured psychological interviews for more accurate automated mental health assessments. Similarly, Kang et al. (2024), building on the DSM framework, integrates contextual information about symptoms to design a novel approach for LLM-based psychiatric disorder detection, aiming to reduce potential errors in automated symptom recognition.