WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis

Wu, Yuqi; Wan, Guangya; Li, Jingjing; Zhao, Shengming; Ma, Lingfeng; Ye, Tianyi; Zhang, Mike; Pop, Ion; Zhang, Yanbo; Chen, Jie

doi:10.1038/s41746-026-02559-9

Download PDF

Article
Open access
Published: 25 March 2026

WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis

Yuqi Wu^1,2^na1,
Guangya Wan³^na1,
Jingjing Li⁴,
Shengming Zhao¹,
Lingfeng Ma¹,
Tianyi Ye¹,
Mike Zhang⁵,
Ion Pop⁵,
Yanbo Zhang⁵ &
…
Jie Chen^1,2

npj Digital Medicine , Article number: (2026) Cite this article

1950 Accesses
3 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Large Language Models (LLMs) offer promising opportunities to support mental healthcare workflows, yet they often lack the structured clinical reasoning needed for reliable diagnosis and may struggle to provide the emotionally attuned communication essential for patient trust. Here, we introduce WiseMind, a novel multi-agent framework inspired by the theory of Dialectical Behavior Therapy designed to facilitate psychiatric assessment. By integrating a “Reasonable Mind" Agent for evidence-based logic and an “Emotional Mind" Agent for empathetic communication, WiseMind effectively bridges the gap between instrumental accuracy and humanistic care. Our framework utilizes a Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5)-guided Structured Knowledge Graph to steer diagnostic inquiries, significantly reducing hallucinations compared to standard prompting methods. Using a combination of virtual standard patients, simulated interactions, and real human interaction datasets, we evaluate WiseMind across three common psychiatric conditions. WiseMind outperforms state-of-the-art LLM methods in both identifying critical diagnostic nodes and establishing accurate differential diagnoses. Across 1206 simulated conversations and 180 real user sessions, the system achieves 85.6% top-1 diagnostic accuracy, approaching reported diagnostic performance ranges of board-certified psychiatrists and surpassing knowledge-enhanced single-agent baselines by 15-54 percentage points. Expert review by psychiatrists further validates that WiseMind generates responses that are not only clinically sound but also psychologically supportive, demonstrating the feasibility of empathetic, reliable AI agents to conduct psychiatric assessments under appropriate human oversight.

Opportunities and risks of large language models in psychiatry

Article Open access 24 May 2024

Benchmarking large language model-based agent systems for clinical decision tasks

Article Open access 18 February 2026

Multi-model assurance analysis showing large language models are highly vulnerable to adversarial hallucination attacks during clinical decision support

Article Open access 02 August 2025

Data availability

The datasets generated and/or analyzed during the current study are not publicly available due to requirements stated in the Research Ethics Board (REB) agreement, but are available from the corresponding author upon reasonable request. The simulated interview sessions are available at https://github.com/YWU99u/WiseMind-DDx-Psyc.

Code availability

The underlying code for this study is not publicly available for proprietary reasons. However, a secure test link that exposes the model behavior without revealing proprietary components can be provided by the corresponding author upon reasonable request for academic, non-commercial evaluation.

References

Carlat, D. J. The Psychiatric Interview: A Practical Guide 4th edn (Wolters Kluwer, 2023).
Nordgaard, J., Sass, L. A. & Parnas, J. The psychiatric interview: validity, structure, and subjectivity. Eur. Arch. Psychiatry Clin. Neurosci. 263, 353–364 (2013).
Google Scholar
First, M. B. DSM-5-TR Handbook of Differential Diagnosis (American Psychiatric Association Publishing, 2024).
Demazeux, S. & Singy, P. (eds.) The DSM-5 in Perspective: Philosophical Reflections on the Psychiatric Babel, History, Philosophy and Theory of the Life Sciences. vol. 15 (Springer, 2015).
Kessler, R. C., Chiu, W. T., Demler, O. & Walters, E. E. Prevalence, severity, and comorbidity of 12-month DSM-IV disorders in the National Comorbidity Survey Replication. N. Engl. J. Med. 352, 2515–2523 (2005).
Google Scholar
Thomas, C. R. & HOLZER III, C. E. The continuing shortage of child and adolescent psychiatrists. J. Am. Acad. Child Adolesc. Psychiatry 45, 1023–1031 (2006).
Google Scholar
Butryn, T., Bryant, L., Marchionni, C. & Sholevar, F. The shortage of psychiatrists and other mental health providers: causes, current state, and potential solutions. Int. J. Acad. Med. 3, 5–9 (2017).
Google Scholar
Das, K. K. Graduate medical education: variation of program and training duration. Korean J. Med. Educ. 35, 421 (2023).
Google Scholar
for Addiction, C. & Health, M. Clinical practicum training program in psychology https://www.camh.ca/-/media/education-files/clinical-psychology-practicum-program-brochure.pdf (2025).
Topol, E. J. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25, 44–56, https://doi.org/10.1038/s41591-018-0300-7 (2019).
Google Scholar
Wilson, S. L., Forte, A., Huynh, G. et al. Ethical principles for artificial intelligence in health. Lancet Digit. Health 3, e425–e427 (2021).
Google Scholar
Clusmann, J. et al. The future landscape of large language models in medicine. Commun. Med. 3 https://doi.org/10.1038/s43856-023-00370-1 (2023).
Yang, R. et al. Large language models in health care: development, applications, and challenges. Health Care Sci. 2, 255–263 (2023).
Google Scholar
Wu, Y., Mao, K., Dennett, L., Zhang, Y. & Chen, J. Systematic review of machine learning in ptsd studies for automated diagnosis evaluation. npj Ment. Health Res. 2, 16 (2023).
Google Scholar
Xue, C. et al. Ai-based differential diagnosis of dementia etiologies on multimodal data. Nat. Med. 30, 2977–2989 (2024).
Google Scholar
Demetriou, E. A. et al. Machine learning for differential diagnosis between clinical conditions with social difficulty: autism spectrum disorder, early psychosis, and social anxiety disorder. Front. Psychiatry 11, 545 (2020).
Google Scholar
Wu, Y., Chen, J., Mao, K. & Zhang, Y. Automatic post-traumatic stress disorder diagnosis via clinical transcripts: A novel text augmentation with large language models. In 2023 IEEE Biomedical Circuits and Systems Conference (BioCAS), 1–5 (IEEE, 2023).
Freidel, S. & Schwarz, E. Knowledge graphs in psychiatric research: potential applications and future perspectives. Acta Psychiatr. Scand. 151, 180–191 (2025).
Google Scholar
Wu, H. et al. A survey on clinical natural language processing in the United Kingdom from 2007 to 2022. npj Digit. Med. 5, 186 (2022).
Google Scholar
Croxford, E. et al. Current and future state of evaluation of large language models for medical summarization tasks. npj Health Systems. 2 https://doi.org/10.1038/s44401-024-00011-2 (2025).
Tu, T. et al. Towards conversational diagnostic artificial intelligence. Nature 642, 442–450. (2025).
American Psychiatric Association. Diagnostic and Statistical Manual of Mental Disorders: DSM-5 5 edn (American Psychiatric Publishing, 2013).
Regier, D. A., Kuhl, E. A. & Kupfer, D. J. The DSM-5: classification and criteria changes. World Psychiatry. 12, 92–98 (2013).
Google Scholar
World Health Organization. International Classification of Diseases for Mortality and Morbidity Statistics. (11th Revision) (World Health Organization, 2019).
Yoran, O., Wolfson, T., Ram, O. & Berant, J. Making retrieval-augmented language models robust to irrelevant context. In The Twelfth International Conference on Learning Representations, https://openreview.net/forum?id=ZS4m74kZpH (2024).
Lu, M. Y. et al. A multimodal generative AI copilot for human pathology. Nature 634, 466–473 (2024).
Google Scholar
Meurisch, C. et al. Exploring user expectations of proactive ai systems. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 1–22 (2020).
Google Scholar
Liu, P. et al. Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing. ACM Comput. Surv. 55 https://doi.org/10.1145/3560815 (2023).
McCabe, R. & Healey, P. G. T. What to take up from the patient’s talk? The clinician’s responses to emotional cues during the psychiatric intake interview. Front. Psychiatry 15, 1352601 (2024).
Google Scholar
Linehan, M. M.Cognitive–Behavioral Treatment of Borderline Personality Disorder (Guilford Publications, 1993).
Singhal, K. et al. Toward expert-level medical question answering with large language models. Nat. Med. 31, 943–950. (2025).
Chung, H. W. et al. Scaling instruction-finetuned language models. J. Mach. Learn. Res. 25, 1–53 (2024).
Google Scholar
Singhal, K. et al. Large language models encode clinical knowledge. Nature 620, 172–180 (2023).
Google Scholar
Fitzpatrick, K. K., Darcy, A. & Vierhile, M. Delivering cognitive behavior therapy to young adults with symptoms of depression and anxiety using a fully automated conversational agent (woebot): a randomized controlled trial. JMIR Ment. health 4, e7785 (2017).
Google Scholar
Inkster, B., Sarda, S., Subramanian, V. An empathy-driven, conversational artificial intelligence agent (wysa) for digital mental well-being: real-world data evaluation mixed-methods study. JMIR mHealth uHealth 6, e12106 (2018).
Google Scholar
Lan, K. et al. Depression diagnosis dialogue simulation: self-improving psychiatrist with tertiary memory. CoRR, https://doi.org/10.48550/arXiv.2409.15084 (2024).
Kim, Y. et al. Mdagents: an adaptive collaboration of llms for medical decision-making. Adv. Neural Inf. Process. Syst. 37, 79410–79452 (2024).
Google Scholar
McDuff, D. et al. Towards accurate differential diagnosis with large language models. Nature 1–7 (2025).
Aggarwal, N. K. The cultural formulation interview in case formulations: a state-of-the-science review. Behav. Ther. 55, 1130–1143 (2024).
Google Scholar
Kanjee, Z., Crowe, B. & Rodman, A. Accuracy of a generative artificial intelligence model in a complex diagnostic challenge. JAMA 330, 78–80 (2023).
Google Scholar
Yu, C. C. et al. The development of empathy in the healthcare setting: a qualitative approach. BMC Med. Educ. 22, 245 (2022).
Google Scholar
Licciardone, J. C. et al. Physician empathy and chronic pain outcomes. JAMA Netw. Open 7, e246026 (2024).
Google Scholar
Robertson, C. et al. Diverse patients’ attitudes towards artificial intelligence (AI) in diagnosis. PLoS Digit. Health 2, e0000237 (2023).
Google Scholar
Kerz, E., Zanwar, S., Qiao, Y. & Wiechmann, D. Toward explainable AI (XAI) for mental health detection based on language behavior. Front. psychiatry 14, 1219479 (2023).
Google Scholar
Pashak, T. J. & Heron, M. R. Build rapport and collect data: a teaching resource on the clinical interviewing intake. Discov. Psychol. 2, 20 (2022).
Google Scholar
Ferrara, M. et al. Machine learning and non-affective psychosis: identification, differential diagnosis, and treatment. Curr. Psychiatry Rep. 24, 925–936 (2022).
Google Scholar
Pozzi, G. & De Proost, M. Keeping an AI on the mental health of vulnerable populations: reflections on the potential for participatory injustice. AI Ethics 5, 2281–2291 (2025).
Google Scholar
Rahsepar Meadi, M. et al. Exploring the ethical challenges of conversational AI in mental health care: Scoping review. JMIR Ment. Health 12, e60432 (2025).
Google Scholar
Bear Don’t Walk IV, O. J., Nieva, H. R., Lee, S. S. J. & Elhadad, N. A scoping review of ethics considerations in clinical natural language processing. JAMIA Open 5, ooac039 (2022).
Google Scholar
Zhang, T., Schoene, A. M., Ji, S. & Ananiadou, S. Natural language processing applied to mental illness detection: a narrative review. npj Digit. Med. 5, 46 (2022).
Google Scholar
Tu, G. et al. Multiple knowledge-enhanced interactive graph network for multimodal conversational emotion recognition. In Findings of the Association for Computational Linguistics: EMNLP 2024, 3861–3874 (ACL, 2024).
Maicher, K. et al. Developing a conversational virtual standardized patient to enable students to practice history-taking skills. Simul. Healthc. 12, 124–131 (2017).
Google Scholar
Hubal, R. C., Kizakevich, P. N., Guinn, C. I., Merino, K. D. & West, S. L. The virtual standardized patient-simulated patient-practitioner dialog for patient interview training. In Medicine Meets Virtual Reality 2000, 133–138 (IOS Press, 2000).
Reger, G. M., Norr, A. M., Gramlich, M. A. & Buchman, J. M. Virtual standardized patients for mental health education. Curr. psychiatry Rep. 23, 57 (2021).
Google Scholar
Wu, Y., Mao, K., Zhang, Y. & Chen, J. Callm: Enhancing clinical interview analysis through data augmentation with large language models. IEEE Journal of Biomedical and Health Informatics. (IEEE, 2024).
King, A. & Hoppe, R. B. "best practice" for patient-centered communication: a narrative review. J. Graduate Med. Educ. 5, 385–393 (2013).
Google Scholar
Dacre, J., Besser, M., White, P. Mrcp (uk) part 2 clinical examination (paces): a review of the first four examination sessions (june 2001–july 2002). Clin. Med. 3, 452–459 (2003).
Google Scholar
Basco, M. R. et al. Methods to improve diagnostic accuracy in a community mental health setting. Am. J. Psychiatry 157, 1599–1605 (2000).
Google Scholar
Hirschfeld, R. M., Lewis, L., Vornik, L. A. Perceptions and impact of bipolar disorder: how far have we really come? results of the national depressive and manic-depressive association 2000 survey of individuals with bipolar disorder. J. Clin. Psychiatry. 64, 161–174 (2003).
Google Scholar
Shabani, A. et al. Psychometric properties of structured clinical interview for DSM-5 disorders-clinician version (SCID-5-CV). Brain Behav. 11, e01894 (2021).
Google Scholar
Osório, F. L. et al. Clinical validity and intrarater and test–retest reliability of the structured clinical interview for DSM-5–clinician version (SCID-5-CV). Psychiatry Clin. Neurosci. 73, 754–760 (2019).
Google Scholar
He, J., Li, M., Sun, T., Gao, X. & Yu, C. Trustworthy AI in medicine: a systematic review. Patterns 5, 100924 (2024).
Google Scholar
Dong, Q. et al. A survey on in-context learning. In Proc. 2024 Conference on Empirical Methods in Natural Language Processing, (Association for Computational Linguistics) (eds.) Al-Onaizan, Y. 1107–1128, Miami, Florida, USA (Association for Computational Linguistics, Miami, Florida,USA, 2024).
Dong, Q. et al. A survey on in-context learning. Preprint at https://doi.org/10.48550/arXiv.2301.00234 (2022).
Yan, W.-J., Ruan, Q.-N. & Jiang, K. Challenges for artificial intelligence in recognizing mental disorders. Diagnostics 13, 2 (2022).
Google Scholar
Rajpurkar, P., Chen, E., Banerjee, O. & Topol, E. J. AI in health and medicine. Nat. Med. 28, 31–38 (2022).
Google Scholar
Norcross, J. C. & Lambert, M. J. Working alliance: Theory, research, and practice (Springer, 2011).
Chapman, A. L. Dialectical behavior therapy: current indications and unique elements. Psychiatry 3, 62–68 (2006).
Google Scholar
Park, J. S. et al. Generative agents: Interactive simulacra of human behavior. In Proc. 36th Annual ACM Symposium on User Interface Software and Technology, UIST ’23 (Association for Computing Machinery, 2023).
Frank, J. R., Snell, L. & Sherbino, J. (eds.) CanMEDS 2015 Physician Competency Framework (Royal College of Physicians and Surgeons of Canada, 2015).
Davenport, T. H. & Kalakota, R. The potential for artificial intelligence in healthcare, vol. 6 (2019).
Price, W. N. & Cohen, I. G. Privacy in the age of medical big data. Nat. Med. 28, 197–204 (2022).
Google Scholar
Pierson, E., Cutler, D. M., Leskovec, J., Mullainathan, S. & Obermeyer, Z. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat. Med. 27, 136–140 (2021).
Google Scholar
Sommers-Flanagan, J. & Sommers-Flanagan, R. Clinical Interviewing, 7th edn (John Wiley & Sons, 2023).
Edge, D. et al. From local to global: a graph rag approach to query-focused summarization. Preprint at https://doi.org/10.48550/arXiv.2404.16130 (2024).
Holtzman, A., Buys, J., Du, L., Forbes, M. & Choi, Y. The curious case of neural text degeneration. In International Conference on Learning Representations, https://openreview.net/forum?id=rygGQyrFvH (2020).
Peeperkorn, M., Kouwenhoven, T., Brown, D. & Jordanous, A. Is temperature the creativity parameter of large language models? In ICCC, 226–235 (ICCC, 2024).
Renze, M. The effect of sampling temperature on problem solving in large language models. In Al-Findings of the Association for Computational Linguistics: EMNLP 2024, (eds Onaizan, Y., Bansal, M. & Chen, Y.-N.) 7346–7356 (Association for Computational Linguistics, 2024).
Korbak, T. et al. Pretraining language models with human preferences. In International Conference on Machine Learning, 17506–17533 (PMLR, 2023).
Wang, H. et al. Pre-trained language models and their applications. J. Comput. Sci. Technol. 38, 705–737 (2023).
Google Scholar

Download references

Acknowledgements

Y.W. gratefully acknowledges the generous financial support provided by the China Scholarship Council under the grant IDs 202308180002 (YW).

Author information

These authors contributed equally: Yuqi Wu, Guangya Wan.

Authors and Affiliations

College of Biomedical Engineering, Fudan University, Shanghai, China
Yuqi Wu, Shengming Zhao, Lingfeng Ma, Tianyi Ye & Jie Chen
Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada
Yuqi Wu & Jie Chen
School of Data Science, University of Virginia, Charlottesville, VA, USA
Guangya Wan
McIntire School of Commerce, University of Virginia, Charlottesville, VA, USA
Jingjing Li
Department of Psychiatry, University of Alberta, Edmonton, AB, Canada
Mike Zhang, Ion Pop & Yanbo Zhang

Authors

Yuqi Wu
View author publications
Search author on:PubMed Google Scholar
Guangya Wan
View author publications
Search author on:PubMed Google Scholar
Jingjing Li
View author publications
Search author on:PubMed Google Scholar
Shengming Zhao
View author publications
Search author on:PubMed Google Scholar
Lingfeng Ma
View author publications
Search author on:PubMed Google Scholar
Tianyi Ye
View author publications
Search author on:PubMed Google Scholar
Mike Zhang
View author publications
Search author on:PubMed Google Scholar
Ion Pop
View author publications
Search author on:PubMed Google Scholar
Yanbo Zhang
View author publications
Search author on:PubMed Google Scholar
Jie Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.W. served as the lead contributor, responsible for initiating the research ideation, leading the study design and experimental execution, all programming aspects, drafting the initial manuscript, and manuscript editing. G.W. participated substantially in the research ideation, contributed to the study design, experiment design, programming aspects, and assisted with manuscript construction. J.L. conceived and led the research ideation and research design, proposed system architecture and evaluation strategy, and provided substantially support for manuscript drafting and editing throughout the submission and revision process. Y.Z. contributed substantially to the study design, provided key medical insights and expertise, REB Approval, and participated in manuscript editing. J.C. served as the Principal Investigator, coordinating the overall study, providing essential research funding, and contributing significantly to manuscript editing. S.Z., L.M., and T.Y. participated in study discussion and contributed to the evaluation process. M.Z. and I.P. participated in the medical insights discussion and contributed to the final evaluation of the system.

Corresponding authors

Correspondence to Jingjing Li, Yanbo Zhang or Jie Chen.

Ethics declarations

Competing interests

Authors Y.W., S.Z., Y.Z., and J.C. are employees of and hold equity in Shanghai KeyLinkAI Inc., which is developing commercial applications for the technology described in this manuscript. Additionally, Authors Y.W., G.W., J.L., Y.Z., and J.C. are inventors on a pending patent application regarding this technology. The remaining authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary information (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, Y., Wan, G., Li, J. et al. WiseMind: a knowledge-guided multi-agent framework for accurate and empathetic psychiatric diagnosis. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02559-9

Download citation

Received: 23 September 2025
Accepted: 06 March 2026
Published: 25 March 2026
DOI: https://doi.org/10.1038/s41746-026-02559-9