Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
An AI-based mental health guardrail and dataset for identifying psychiatric crises in text-based conversations
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 03 April 2026

An AI-based mental health guardrail and dataset for identifying psychiatric crises in text-based conversations

  • Benjamin W. Nelson1,2,
  • Celeste Wong1,
  • Matthew T. Silvestrini1,
  • Sooyoon Shin1,
  • Alanna Robinson1,
  • Jessica Lee1,
  • Eric Yang1,
  • John Torous2 &
  • …
  • Andrew Trister1 

npj Digital Medicine , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Diseases
  • Health care
  • Mathematics and computing
  • Medical research

Abstract

Large language models often mishandle psychiatric emergencies, offering harmful or inappropriate advice. This study evaluated the Verily Mental Health Guardrail (VMHG) on two clinician-labeled datasets: the Verily Mental Health Crisis Dataset v1.0, containing 1800 simulated messages and the NVIDIA Aegis AI Content Safety Dataset subsetted to 794 mental health-related messages. Performance was benchmarked against OpenAI Omni Moderation Latest and NVIDIA NeMo Guardrails. The VMHG demonstrated high sensitivity (0.990) and specificity (0.992) on the Verily dataset, with an F1-score of 0.939 and high category-level sensitivity (0.917–0.992) and specificity (≥0.978). On the NVIDIA dataset, it maintained strong sensitivity (0.982) and accuracy (0.921) with reduced specificity (0.859). Compared with NVIDIA and OpenAI guardrails, the VMHG achieved significantly higher sensitivity (all p < 0.001) and comparable specificity (NVIDIA p < 0.001, OpenAI p = 0.094). Overall, the VMHG demonstrated robust, generalizable, and clinically oriented safety performance that prioritizes sensitivity to minimize missed mental health crises.

Data availability

Data from this study are available upon researcher request.

Code availability

Code from this study is available upon researcher request.

References

  1. Bommersbach, T. J., McKean, A. J., Olfson, M. & Rhee, T. G. National trends in mental health-related emergency department visits among youth, 2011-2020. JAMA 329, 1469–1477 (2023).

    Google Scholar 

  2. SAMHSA. Key Substance Use and Mental Health Indicators in the United States: Results from the 2022 National Survey on Drug Use and Health. https://www.samhsa.gov/data/sites/default/files/reports/rpt42731/2022-nsduh-nnr.pdf (2023).

  3. Nock, M. K. et al. Prediction of suicide attempts using clinician assessment, patient self-report, and electronic health records. JAMA Netw. Open 5, e2144373 (2022).

    Google Scholar 

  4. Bentley, K. H. et al. Clinician suicide risk assessment for prediction of suicide attempt in a large health care system. JAMA Psychiatry 82, 599–608 (2025).

    Google Scholar 

  5. Asmelash, L. From ‘menty b’ to ‘grippy socks,’ internet slang is taking over how we talk about mental health. CNN https://www.cnn.com/2023/11/30/health/menty-b-social-media-language-wellness-cec (2023).

  6. Kauschke, C., Mueller, N., Kircher, T. & Nagels, A. Do patients with depression prefer literal or metaphorical expressions for internal states? Evidence from sentence completion and elicited production. Front. Psychol. 9, 1326 (2018).

    Google Scholar 

  7. Tay, D. Using metaphor in healthcare mental health. In (ed. Elena Semino, Z. D.) The Routledge Handbook of Metaphor and Language, 371 (Routledge, 2017).

  8. Helping people when they need it most. https://openai.com/index/helping-people-when-they-need-it-most/.

  9. Rousmaniere, T., Zhang, Y., Li, X. & Shah, S. Large language models as mental health resources: patterns of use in the United States. Pract. Innov. https://doi.org/10.1037/pri0000292 (2025).

  10. Roose, K. Can a chatbot named Daenerys Targaryen be blamed for a teen’s suicide? The New York Times (2024).

  11. Cuthbertson, A. ChatGPT is pushing people towards mania, psychosis and death - and OpenAI doesn’t know how to stop it. Independent (2025).

  12. CYNTHIA MONTOYA and WILLIAM ‘WIL’PERALTA, Individually and as Successors-in-Interest of JULIANA PERALTA, Deceased,Plaintiff,v.CHARACTER TECHNOLOGIES, INC.;NOAM SHAZEER; DANIEL DE FREITASADIWARSANA; GOOGLE LLC;ALPHABET INC.

  13. Allyn, B. Lawsuit: a chatbot hinted a kid should kill his parents over screen time limits. NPR (2024).

  14. Purtill, C. AIs gave scarily specific self-harm advice to users expressing suicidal intent, researchers find. Los Angeles Times (2025).

  15. CBS. ChatGPT gave alarming advice on drugs, eating disorders to researchers posing as teens. CBS News (7 August 2025). Available at: https://www.cbsnews.com/news/chatgpt-alarming-advice-drugs-eating-disorders-researchers-teens/.

  16. Fake Friend. https://counterhate.com/research/fake-friend-chatgpt/ (2025).

  17. Prinstein, M. J. Written Testimony of Mitchell J. Prinstein, PhD, ABPP, Chief of Psychology, American Psychological Association Examining the Harm of AI Chatbots Before the U.S. Senate Judiciary Committee, Subcommittee on Crime and Counterterrorism. APA https://onlinelibrary.wiley.com/doi/10.1002/9781119125556.devpsy112 (2025).

  18. National Association of Attorneys General. Letter to Congressional Leadership: Artificial Intelligence and the Exploitation of Children. 54 State and Territory Attorneys General (5 September 2023). https://ncdoj.gov/wp-content/uploads/2023/09/54-State-AGs-Urge-Study-of-AI-and-Harmful-Impacts-on-Children.pdf.

  19. Center for Devices & Radiological Health. FDA Digital Health Advisory Committee. U.S. Food and Drug Administration https://www.fda.gov/medical-devices/digital-health-center-excellence/fda-digital-health-advisory-committee (2025).

  20. U.S. Food and Drug Administration. November 6, 2025: Digital Health Advisory Committee Meeting Announcement. Available at: https://www.fda.gov/advisory-committees/advisory-committee-calendar/november-6-2025-digital-health-advisory-committee-meeting-announcement-11062025 (2025).

  21. NVIDIA. Llama 3.1 NemoGuard 8B Content Safety. Available at: https://docs.api.nvidia.com/nim/reference/nvidia-llama-3_1-nemoguard-8b-content-safety.

  22. Markov, T. et al. A holistic approach to undesired content detection in the real world. Proc. AAAI Conf. Artif. Intell. 37, 15009–15018 (2023).

    Google Scholar 

  23. Ghosh, S. et al. Aegis2.0: A diverse AI safety dataset and risks taxonomy for alignment of LLM guardrails. North Am Chapter Assoc Comput Linguistics https://doi.org/10.48550/arXiv.2501.09004 (2025).

  24. Rebedea, T. et al. NeMo Guardrails: a toolkit for controllable and safe LLM applications with programmable rails. Proc. 2023 Conf. Empir. Methods Nat. Lang. Process.: Syst. Demonstr. 431–445 (2023).

  25. OpenAI. Moderation guide. OpenAI Platform Documentation. Available at: https://platform.openai.com/docs/guides/moderation (2024).

  26. Google. Safety and content filters. Google Cloud: Generative AI on Vertex AI Documentation (updated 16 March 2026). Available at: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/configure-safety-filters.

  27. Anthropic. Activating AI Safety Level 3 protections. Anthropic News (22 May 2025). Available at: https://www.anthropic.com/news/activating-asl3-protections.

  28. Hua, Y. et al. A scoping review of large language models for generative tasks in mental health care. NPJ Digit. Med. 8, 230 (2025).

    Google Scholar 

  29. Mmathys. OpenAI Moderation API Evaluation Dataset. Hugging Face. Available at: https://huggingface.co/datasets/mmathys/openai-moderation-api-evaluation.

  30. NVIDIA. Aegis‑AI‑Content‑Safety‑Dataset‑2.0. Hugging Face. Available at: https://huggingface.co/datasets/nvidia/Aegis-AI-Content-Safety-Dataset-2.0.

  31. Komati, N. Suicide and Depression Detection. Kaggle. Available at: https://www.kaggle.com/datasets/nikhileswarkomati/suicide-watch (2021).

  32. Franklin, J. C. et al. Risk factors for suicidal thoughts and behaviors: a meta-analysis of 50 years of research. Psychol. Bull. 143, 187–232 (2017).

    Google Scholar 

  33. Steeg, S. et al. Accuracy of risk scales for predicting repeat self-harm and suicide: a multicentre, population-level cohort study using routine clinical data. BMC Psychiatry 18, 113 (2018).

    Google Scholar 

  34. Simon, G. E. et al. Reconciling statistical and clinicians’ predictions of suicide risk. Psychiatr. Serv. 72, 555–562 (2021).

    Google Scholar 

  35. Reddit. Safety filters. Reddit for Community. https://redditforcommunity.com/features/safety-filters (2024).

  36. Chirkova, N. & Nikoulina, V. Zero-shot cross-lingual transfer in instruction tuning of large language models. In Proc. 17th International Natural Language Generation Conference, 695–708 (2024).

  37. Muller, B., Anastasopoulos, A., Sagot, B. & Seddah, D. When being unseen from mBERT is just the beginning: Handling new languages with multilingual language models. In Proc. 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 448–462 (2021).

  38. Rudestam, K. E. Stockholm and Los Angeles: a cross-cultural study of the communication of suicidal intent. J. Consult. Clin. Psychol. 36, 82–90 (1971).

    Google Scholar 

  39. CDC. Facts about suicide. Suicide prevention https://www.cdc.gov/suicide/facts/index.html (2025).

  40. CDC. About child abuse and neglect. Child abuse and neglect prevention https://www.cdc.gov/child-abuse-neglect/about/index.html (2025).

  41. CDC. Suicide and self-harm injury. https://www.cdc.gov/nchs/fastats/suicide.htm (2025).

  42. Insel, T. America’s mental health crisis. The Pew Charitable. https://pew.org/3R3ugL0 (2023).

  43. Brown, R. C. et al. #cutting: non-suicidal self-injury (NSSI) on Instagram. Psychol. Med. 48, 337–346 (2018).

    Google Scholar 

  44. Lewis, S. P. & Baker, T. G. The possible risks of self-injury web sites: a content analysis. Arch. Suicide Res. 15, 390–396 (2011).

    Google Scholar 

  45. Moreno, M. A., Ton, A., Selkie, E. & Evans, Y. Secret society 123: understanding the language of self-harm on Instagram. J. Adolesc. Health 58, 78–84 (2016).

    Google Scholar 

  46. Bantilan, N., Malgaroli, M., Ray, B. & Hull, T. D. Just in time crisis response: suicide alert system for telemedicine psychotherapy settings. Psychother. Res. 31, 302–312 (2021).

    Google Scholar 

  47. Reddit. r/selfharm. Available at: https://www.reddit.com/r/selfharm/.

  48. Reddit. r/SuicideWatch. Available at: https://www.reddit.com/r/SuicideWatch/.

  49. Reddit. r/therapists. Available at: https://www.reddit.com/r/therapists/.

Download references

Acknowledgements

There was no funding for this study. The authors wish to acknowledge NVIDIA for providing open access to the NVIDIA Aegis AI Content Safety Dataset 2.0.

Author information

Authors and Affiliations

  1. Verily Life Sciences, South San Francisco, CA, USA

    Benjamin W. Nelson, Celeste Wong, Matthew T. Silvestrini, Sooyoon Shin, Alanna Robinson, Jessica Lee, Eric Yang & Andrew Trister

  2. Division of Digital Psychiatry, Department of Psychiatry, Harvard Medical School and Beth Israel Deaconess Medical Center, Boston, MA, USA

    Benjamin W. Nelson & John Torous

Authors
  1. Benjamin W. Nelson
    View author publications

    Search author on:PubMed Google Scholar

  2. Celeste Wong
    View author publications

    Search author on:PubMed Google Scholar

  3. Matthew T. Silvestrini
    View author publications

    Search author on:PubMed Google Scholar

  4. Sooyoon Shin
    View author publications

    Search author on:PubMed Google Scholar

  5. Alanna Robinson
    View author publications

    Search author on:PubMed Google Scholar

  6. Jessica Lee
    View author publications

    Search author on:PubMed Google Scholar

  7. Eric Yang
    View author publications

    Search author on:PubMed Google Scholar

  8. John Torous
    View author publications

    Search author on:PubMed Google Scholar

  9. Andrew Trister
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Study concept and design: B.W.N. Data collection: B.W.N., A.R., J.T., and E.Y. Data analysis and interpretation: C.W., B.W.N., J.T., A.T., M.S., S.S., J.L., and E.Y. Draft writing and review: B.W.N. wrote the initial draft, and all authors reviewed. Draft approval for submission: B.W.N., J.T., and A.T.

Corresponding author

Correspondence to Benjamin W. Nelson.

Ethics declarations

Competing interests

B.W.N., C.W., M.T.S., S.S., A.R., J.L., E.Y., and A.T. report employment and equity ownership in Verily Life Sciences. J.T. reports no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplemental Material RR1 (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nelson, B.W., Wong, C., Silvestrini, M.T. et al. An AI-based mental health guardrail and dataset for identifying psychiatric crises in text-based conversations. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02579-5

Download citation

  • Received: 22 October 2025

  • Accepted: 15 March 2026

  • Published: 03 April 2026

  • DOI: https://doi.org/10.1038/s41746-026-02579-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

AI‑Enabled Therapies in Mental Health

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics