Psyche-R1: Towards Reliable Psychological LLMs through Unified Empathy, Expertise, and Reasoning
Amidst a shortage of qualified mental health professionals, the integration of large language models (LLMs) into psychological applications offers a promising way to alleviate the growing burden of mental health disorders. Recent reasoningaugmented LLMs have achieved remarkable performance in mathematics and programming, while research in the psychological domain has predominantly emphasized emotional support and empathetic dialogue, with limited attention to reasoning mechanisms that are beneficial to generating reliable responses. Therefore, in this paper, we propose Psyche- R1, the first Chinese psychological LLM that jointly integrates empathy, psychological expertise, and reasoning, built upon a novel data curation pipeline. Specifically, we design a comprehensive data synthesis pipeline that produces over 75k high-quality psychological questions paired with detailed rationales, generated through chain-of-thought (CoT) reasoning and iterative prompt-rationale optimization, along with 73k empathetic dialogues. Subsequently, we employ a hybrid training strategy wherein challenging samples are identified through a multi-LLM cross-selection strategy for group relative policy optimization (GRPO) to improve reasoning ability, while the remaining data is used for supervised finetuning (SFT) to enhance empathetic response generation and psychological domain knowledge.
Therefore, in this paper, we propose a novel data curation pipeline and introduce Psyche-R1 that integrates empathy, domain-specific expertise, and reasoning capabilities. Specifically, to construct a high-quality training corpus, we design a comprehensive data synthesis pipeline that first generates psychological questions from various resources. Afterward, we apply chain-of-thought (CoT) prompting to generate initial detailed rationale for each question, followed by an iterative prompt-rationale optimization process, aiming to enhance both the coherence of the reasoning and its alignment with the corresponding questions. In parallel, we synthesize 73k empathetic dialogues to support affective understanding. Then, we adopt a multi-LLM cross-selection strategy to categorize questions into challenging and nonchallenging subsets based on their inferred complexity. The non-challenging subset is used for supervised fine-tuning (SFT) to enhance empathetic response generation and domain knowledge, while the challenging subset is utilized for training with group relative policy optimization (GRPO) to improve the model’s reasoning capabilities, with both jointly contributing to the development of Psyche-R1.