Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Depression and anxiety characterization and detection with multimodal deep learning

Abstract

Depression and anxiety are among the most prevalent mental disorders, necessitating accurate characterization for effective diagnosis and treatment. Multimodal deep learning has emerged as an effective approach to enhance diagnostic precision by integrating diverse data sources, including electronic health records, physiological signals and neuroimaging. This Review provides an overview of the recent advancements in multimodal deep learning for depression and anxiety estimation. Key neural network architectures—such as convolutional neural networks for image analysis, recurrent and transformer models for sequential and textual data, and graph neural networks for capturing complex neuroimaging connectivity patterns—are examined. Challenges in data fusion, feature extraction and model interpretability are discussed, alongside strategies to improve generalizability through transfer learning. Future challenges and opportunities are discussed: large-scale datasets, standardized evaluation protocols and interdisciplinary collaboration to bridge the gap between multimodal deep learning and clinical relevance. By summarizing current practices and identifying critical challenges, this Review highlights the transformative potential of multimodal deep learning in advancing the characterization and detection of depression and anxiety.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of multimodal deep learning approaches for depression and anxiety detection.
The alternative text for this image may have been generated using AI.
Fig. 2: Overview of multimodal deep learning in depression and anxiety.
The alternative text for this image may have been generated using AI.
Fig. 3: An example of a multimodal deep learning framework for depression and anxiety diagnosis.
The alternative text for this image may have been generated using AI.
Fig. 4: The major challenges for multimodal deep learning in depression and anxiety are grouped into two main categories.
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

References

  1. World Health Organization. Mental Disorders (WHO, 2022).

  2. World Health Organization. Anxiety Disorders (WHO, 2023).

  3. Alonso, J. et al. Treatment gap for anxiety disorders is global: results of the World Mental Health Surveys in 21 countries. Depress. Anxiety 35, 195–208 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Kroenke, K., Spitzer, R. L. & Williams, J. B. The PHQ-9: validity of a brief depression severity measure. J. Gen. Intern. Med. 16, 606–613 (2001).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Spitzer, R. L., Kroenke, K., Williams, J. B. & Löwe, B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch. Intern. Med. 166, 1092–1097 (2006).

    Article  PubMed  Google Scholar 

  6. Sheehan, D. V. et al. The Mini-International Neuropsychiatric Interview (MINI): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J. Clin. Psychiatry 59, 22–33 (1998).

    PubMed  Google Scholar 

  7. First, M. B. Structured Clinical Interview for the DSM (SCID). In The Encyclopedia of Clinical Psychology (eds. Cautin, R. L. & Lilienfeld, S. O) https://onlinelibrary.wiley.com/doi/10.1002/9781118625392.wbecp351 (John Wiley & Sons, 2015).

  8. Vaswani, A., Shazeer, N., Parmar, N. et al. Attention is all you need. In Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) (eds. Guyon, I. et al.) 30 (2017).

  9. Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) Vol. 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1423

  10. Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H. & Eichstaedt, J. C. Detecting depression and mental illness on social media: an integrative review. Curr. Opin. Behav. Sci. 18, 43–49 (2017).

    Article  Google Scholar 

  11. Torous, J. et al. The growing field of digital psychiatry: current evidence and the future of apps, social media, chatbots, and virtual reality. World Psychiatry 20, 318–335 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  12. Guntuku, S. C., Ramsay, J. R., Merchant, R. M. & Ungar, L. H. Language of ADHD in adults on social media. J. Atten. Disord. 23, 1475–1485 (2019).

    Article  PubMed  Google Scholar 

  13. Insel, T. R. Digital phenotyping: technology for a new science of behavior. JAMA 318, 1215–1216 (2017).

    Article  PubMed  Google Scholar 

  14. Tse, N. Y. et al. A mega-analysis of functional connectivity and network abnormalities in youth depression. Nat. Ment. Health 2, 1169–1182 (2024).

    Article  Google Scholar 

  15. LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).

  16. Mayberg, H. S. et al. Deep brain stimulation for treatment-resistant depression. Neuron 45, 651–660 (2005).

    Article  PubMed  Google Scholar 

  17. Basser, P. J., Mattiello, J. & LeBihan, D. MR diffusion tensor spectroscopy and imaging. Biophys. J. 66, 259–267 (1994).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Kessler, R. C. et al. Lifetime prevalence and age-of-onset distributions of DSM-IV disorders. Arch. Gen. Psychiatry 62, 593–602 (2005).

    Article  PubMed  Google Scholar 

  19. Insel, T. R. et al. Research Domain Criteria (RDoC): toward a new classification framework for research on mental disorders. Am. J. Psychiatry 167, 748–751 (2010).

    Article  PubMed  Google Scholar 

  20. Cuthbert, B. N. The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry 13, 28–35 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Cuthbert, B. N. & Insel, T. R. Toward the future of psychiatric diagnosis: the seven pillars of RDoC. BMC Med. 11, 126 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Casey, B., Oliveri, M. & Insel, T. A neurodevelopmental perspective on the Research Domain Criteria (RDoC) framework. Biol. Psychiatry 76, 350–353 (2014).

    Article  PubMed  Google Scholar 

  23. Morris, S. E. & Cuthbert, B. N. Research Domain Criteria: cognitive systems, neural circuits, and dimensions of behavior. Dialogues Clin. Neurosci. 14, 29–37 (2012).

    Article  PubMed  PubMed Central  Google Scholar 

  24. Woo, C.-W., Chang, L. J., Lindquist, M. A. & Wager, T. D. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20, 365–377 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Shen, L. & Thompson, P. M. Brain imaging genomics: integrated analysis and machine learning. Proc. IEEE 108, 125–162 (2019).

    Article  Google Scholar 

  26. Tausczik, Y. R. & Pennebaker, J. W. The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29, 24–54 (2010).

    Article  Google Scholar 

  27. Al-Mosaiwi, M. & Johnstone, T. In an absolute state: elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clin. Psychol. Sci. 6, 529–542 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  28. Cummins, N. et al. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015).

    Article  Google Scholar 

  29. Cohn, J. F. et al. Detecting depression from facial actions and vocal prosody. In 2009 3rd International Conf. Affective Computing and Intelligent Interaction and Workshops 1–7 (IEEE, 2009).

  30. Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019).

    Article  PubMed  Google Scholar 

  31. Girard, J. M., Cohn, J. F., Mahoor, M. H., Mavadati, S. & Rosenwald, D. P. Social risk and depression: evidence from manual and automatic facial expression analysis. In 2013 10th IEEE International Conf. and Workshops on Automatic Face and Gesture Recognition (FG) 1–8 (IEEE, 2013).

  32. Liu, S. & Gui, R. Fusing multi-scale fmri features using a brain-inspired multi-channel graph neural network for major depressive disorder diagnosis. Biomed. Signal Process. Control. 90, 105837 (2024).

    Article  Google Scholar 

  33. Wang, Q., Li, L., Qiao, L. & Liu, M. Adaptive multimodal neuroimage integration for major depression disorder detection. Front. Neuroinform. 16, 856175 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  34. Pennebaker, J. W., Mehl, M. R. & Niederhoffer, K. G. Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54, 547–577 (2003).

    Article  PubMed  Google Scholar 

  35. Tadesse, M. M., Lin, H., Xu, B. & Yang, L. Detection of depression-related posts in Reddit social media forum. IEEE Access 7, 44883–44893 (2019).

    Article  Google Scholar 

  36. Teodorescu, D., Cheng, T., Fyshe, A. & Mohammad, S. Language and mental health: measures of emotion dynamics from text as linguistic biosocial markers. In Proc. 2023 Conf. Empirical Methods in Natural Language Processing (eds. Bouamor, H. et al.) 3117–3133 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.emnlp-main.188

  37. France, D. J., Shiavi, R. G., Silverman, S., Silverman, M. & Wilkes, M. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 47, 829–837 (2000).

    Article  PubMed  Google Scholar 

  38. Harlev, D., Singer, S., Goldshalger, M., Wolpe, N. & Bergmann, E. Acoustic speech features are associated with late-life depression and apathy symptoms: preliminary findings. Alzheimers Dement. 17, e70055 (2025).

    Google Scholar 

  39. Low, L.-S. A., Maddage, N. C., Lech, M., Sheeber, L. B. & Allen, N. B. Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans. Biomed. Eng. 58, 574–586 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  40. Little, B. et al. Deep learning-based automated speech detection as a marker of social functioning in late-life depression. Psychol. Med. 51, 1441–1450 (2021).

    Article  PubMed  Google Scholar 

  41. Hershey, S. et al. CNN architectures for large-scale audio classification. In 2017 IEEE International Conf. Acoustics, Speech and Signal Processing (ICASSP) 131–135 (IEEE, 2017).

  42. Fraiwan, M., Fraiwan, L. & Alkhodari, M. et al. Recognition of pulmonary diseases from lung sounds using convolutional neural networks AND long short-term memory. J. Ambient Intell. Human Comput. 13, 4759–4771 (2022).

    Article  Google Scholar 

  43. Katmah, R. et al. A review on mental stress assessment methods using EEG signals. Sensors 21, 5043 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  44. Fitzgerald, P. J. & Watson, B. O. Gamma oscillations as a biomarker for major depression: an emerging topic. Transl. Psychiatry 8, 177 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  45. Schiweck, C., Piette, D., Berckmans, D., Claes, S. & Vrieze, E. Heart rate and high frequency heart rate variability during stress as biomarker for clinical depression. a systematic review. Psychol. Med. 49, 200–211 (2019).

    Article  PubMed  Google Scholar 

  46. Chen, S., Yu, Y. & Pan, J. MadNet: EEG-based depression detection using a deep convolution neural network framework with multi-dimensional attention. In International Conf. Artificial Neural Networks 283–294 (Springer, 2023).

  47. Yan, G., Liang, S., Zhang, Y. & Liu, F. Fusing transformer model with temporal features for ECG heartbeat classification. In 2019 IEEE International Conf. Bioinformatics and Biomedicine (BIBM) 898–905 (IEEE, 2019).

  48. Zheng, G. et al. An attention-based multi-modal mri fusion model for major depressive disorder diagnosis. J. Neural Eng. 20, 066005 (2023).

    Article  Google Scholar 

  49. Gao, J., Li, P., Chen, Z. & Zhang, J. A survey on deep learning for multimodal data fusion. Neural Comput. 32, 829–864 (2020).

    Article  PubMed  Google Scholar 

  50. Ramachandram, D. & Taylor, G. W. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process. Mag. 34, 96–108 (2017).

    Article  Google Scholar 

  51. Tsai, Y.-H. H., Liang, P. P., Zadeh, A., Morency, L.-P. & Salakhutdinov, R. Multimodal transformer for unaligned multimodal language sequences. In Proc. Conf. Computational Linguistics (ACL) 6558–6569 (2019).

  52. Cai, C., He, Y., Sun, L., Lian, Z., Liu, B., Tao, J., Xu, M. & Wang, K. Multimodal sentiment analysis based on recurrent neural network and multimodal attention. In Proc. 2nd Multimodal Sentiment Analysis Challenge 61–67 (Association for Computing Machinery, 2021); https://doi.org/10.1145/3475957.3484454.

  53. Tay, Y., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: a survey. ACM Comput. Surv. 55, 1–28 (2022).

    Article  Google Scholar 

  54. Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T. & Xie, S. A ConvNet for the 2020s. In 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR) 11966–11976 (IEEE/CVF, 2022); https://doi.org/10.1109/CVPR52688.2022.01167

  55. Su, Y. & Kuo, C. J. Recurrent neural networks and their memory behavior: a survey. APSIPA Trans. Signal Inf. Process. https://doi.org/10.1561/116.00000123 (2022).

  56. Sharma, K. et al. A survey of graph neural networks for social recommender systems. ACM Comput. Surv. 56, 1–34 (2024).

    Article  Google Scholar 

  57. Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).

  58. Shen, S., Yao, Z., Li, C., Darrell, T., Keutzer, K. & He, Y. Scaling vision-language models with sparse mixture of experts. In Findings of the Association for Computational Linguistics: EMNLP 2023 (eds. Bouamor, H. et al.) 11329–11344 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.findings-emnlp.758

  59. Sahili, Z. A., Patras, I. & Purver, M. Multimodal machine learning in mental health: a survey of data, algorithms, and challenges. Preprint at https://arxiv.org/abs/2407.16804 (2024).

  60. DeVault, D. et al. Simsensei Kiosk: a virtual human interviewer for healthcare decision support. In Proc. 2014 International Conf. Autonomous Agents and Multi-agent Systems 1061–1068 (2014).

  61. Ringeval, F. et al. AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proc. 9th International on Audio/Visual Emotion Challenge and Workshop 3–12 (2019).

  62. Yoon, J., Kang, C., Kim, S. & Han, J. D-vlog: multimodal vlog dataset for depression detection. Proc. AAAI Conf. Artif. Intell. 36, 12226–12234 (2022).

    Google Scholar 

  63. Zhu, F., Zhang, J., Dang, R., Hu, B. & Wang, Q. MTNet: multimodal transformer network for mild depression detection through fusion of EEG and eye tracking. Biomed. Signal Process. Control. 100, 106996 (2025).

    Article  Google Scholar 

  64. Zhou, L. et al. TAMFN: time-aware attention multimodal fusion network for depression detection. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 669–679 (2022).

    Article  Google Scholar 

  65. Fang, M., Peng, S., Liang, Y., Hung, C.-C. & Liu, S. A multimodal fusion model with multi-level attention mechanism for depression detection. Biomed. Signal Process. Control 82, 104561 (2023).

    Article  Google Scholar 

  66. Ilias, L., Mouzakitis, S. & Askounis, D. Calibration of transformer-based models for identifying stress and depression in social media. IEEE Trans. Comput. Soc. Syst. 11, 1979–1990 (2023).

    Article  Google Scholar 

  67. Sadeghi, M. et al. Harnessing multimodal approaches for depression detection using large language models and facial expressions. npj Ment. Health Res. 3, 66 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Victor, E., Aghajan, Z. M., Sewart, A. R. & Christian, R. Detecting depression using a framework combining deep multimodal neural networks with a purpose-built automated evaluation. Psychol. Assess. 31, 1019 (2019).

    Article  PubMed  Google Scholar 

  69. Li, Z. et al. MHA: a multimodal hierarchical attention model for depression detection in social media. Health Inf. Sci. Syst. 11, 6 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  70. Cai, H., Yuan, Z. & Gao, Y. et al. A multi-modal open dataset for mental-disorder analysis. Sci. Data 9, 178 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  71. Yan, C.-G. et al. Reduced default mode network functional connectivity in patients with recurrent major depressive disorder. Proc. Natl Acad. Sci. USA 116, 9078–9083 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  72. Liu, S. et al. An objective quantitative diagnosis of depression using a local-to-global multimodal fusion graph neural network. Patterns 5, (2024).

  73. He, M., Bakker, E. M. & Lew, M. S. DPD (depression detection) net: a deep neural network for multimodal depression detection. Health Inf. Sci. Syst. 12, 1–17 (2024).

    Article  Google Scholar 

  74. Li, X. et al. BrainGNN: interpretable brain graph neural network for fmri analysis. Med. Image Anal. 74, 102233 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  75. Thapaliya, B. et al. Brain networks and intelligence: a graph neural network based approach to resting state fMRI data. Med. Image Anal. 101, 103433 (2025).

    Article  PubMed  Google Scholar 

  76. Thapaliya, B., Akbas, E., Sapkota, R. et al. SELF-clustering graph transformer approach to model resting state functional brain activity. In 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) https://doi.org/10.1109/ISBI60581.2025.10980889 (IEEE, 2025).

  77. Cui, H. et al. BrainGB: a benchmark for brain network analysis with graph neural networks. IEEE Trans. Med. Imaging 42, 493–506 (2022).

    Article  Google Scholar 

  78. Mo, H., Hui, S. C., Liao, X., Li, Y., Zhang, W. & Ding, S. A multimodal data-driven framework for anxiety screening. IEEE Trans. Instrum. Meas. 73, 4003113 (2024).

  79. Shadid, M., Afnan, M. S. & Patwary, M. J. TI-Fusion: a multimodal anxiety disorder detection method. In 2023 6th International Conf. Electrical Information and Communication Technology (EICT) 1–6 (IEEE, 2023).

  80. Lai, S. & Li, Z. Detection of potential anxiety in social media based on multimodal fusion with deep learning methods. In 2023 IEEE International Conf. Bioinformatics and Biomedicine (BIBM) 560–566 (IEEE, 2023).

  81. Kamakshi, K. & Rengaraj, A. Early detection of stress and anxiety based seizures in position data augmented EEG signal using hybrid deep learning algorithms. IEEE Access 12, 35351–35365 (2024).

    Article  Google Scholar 

  82. Bruin, W. B. et al. Brain-based classification of youth with anxiety disorders: transdiagnostic examinations within the ENIGMA-anxiety database using machine learning. Nat. Ment. Health 2, 104–118 (2024).

    Article  Google Scholar 

  83. Aldayel, M. & Al-Nafjan, A. A comprehensive exploration of machine learning techniques for EEG-based anxiety detection. PeerJ Comput. Sci. 10, e1829 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  84. Zhou, E., Wang, W., Ma, S., Xie, X., Kang, L., Xu, S., Deng, Z., Gong, Q., Nie, Z., Yao, L., Bu, L., Wang, F. & Liu, Z. Prediction of anxious depression using multimodal neuroimaging and machine learning. NeuroImage 285, 120499 (2024).

  85. Diep, B., Stanojevic, M. & Novikova, J. Multi-modal deep learning system for depression and anxiety detection. In Empowering Communities: A Participatory Approach to AI for Mental Health (2022).

  86. Tasnim, M., Ehghaghi, M., Diep, B. & Novikova, J. DEPAC: a corpus for depression and anxiety detection from speech. In Proc. Eighth Workshop on Computational Linguistics and Clinical Psychology (eds. Zirikly, A. et al.) 1–16 (Association for Computational Linguistics, 2022).

  87. Xie, W. et al. Multimodal fusion diagnosis of depression and anxiety based on CNN-LSTM model. Comput. Med. Imaging Graph. 102, 102128 (2022).

    Article  PubMed  Google Scholar 

  88. Qin, J., Liu, C., Tang, T., Liu, D., Wang, M., Huang, Q. & Zhang, R. Mental-perceiver: audio-textual multi-modal learning for estimating mental disorders. Proc. AAAI Conf. Artif. Intell. 39, 25029–25037 (2025).

    Google Scholar 

  89. Ajith, M. et al. A deep learning approach for mental health quality prediction using functional network connectivity and assessment data. Brain Imaging Behav. 18, 630–645 (2024).

    Article  PubMed  Google Scholar 

  90. He, L., Chen, K., Zhao, J., Wang, Y., Pei, E., Chen, H., Jiang, J., Zhang, S., Zhang, J., Wang, Z., He, T. & Tiwari, P. LMVD: A large-scale multimodal vlog dataset for depression detection in the wild. Inf. Fusion 126, 103632 (2026).

  91. Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III) (American Psychiatric Association, 1980).

  92. Sawadogo, M.A.L., Pala, F. & Singh, G. et al. PTSD in the wild: a video database for studying post-traumatic stress disorder recognition in unconstrained environments. Multimed. Tools Appl. 83, 42861–42883 (2024).

    Article  Google Scholar 

  93. Çiftçi, E., Kaya, H., Güleç, H. & Salah, A. A. The Turkish audio-visual bipolar disorder corpus. In 1st Asian Conf. Affective Computing and Intelligent Interaction (ACII Asia 2018) https://doi.org/10.1109/ACIIAsia.2018.8470362 (IEEE, 2018).

  94. Cosma, A. & Radoi, E. PsyMo: a dataset for estimating self-reported psychological traits from gait. In IEEE/CVF Winter Conf. Applications of Computer Vision (WACV) 4591–4601 (IEEE, 2024); https://doi.org/10.1109/WACV57701.2024.00454

  95. Rohanian, M., Hough, J. & Purver, M. Detecting depression with word-level multimodal fusion. In Proc. Interspeech 2019 1443–1447 (2019); https://doi.org/10.21437/Interspeech.2019-2283

  96. Xia, Y. et al. A depression detection model based on multimodal graph neural network. Multimed. Tools Appl. 83, 63379–63395 (2024).

    Article  Google Scholar 

  97. Ansari, G. et al. Multimodal depression detection system using machine learning. In 2023 Second International Conf. Informatics (ICI) 1–7 (IEEE, 2023).

  98. Xing, T. et al. An adaptive multi-graph neural network with multimodal feature fusion learning for MDD detection. Sci. Rep. 14, 28400 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  99. Fitzgerald, P. & Watson, B. Gamma oscillations as a biomarker for major depression: an emerging topic. Transl. Psychiatry 8, 177 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  100. OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).

  101. Guo, D., Yang, D. & Zhang, H. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  102. Voigt, P. & von dem Bussche, A. The EU General Data Protection Regulation (GDPR): A Practical Guide 2nd edn. (Springer, 2024).

  103. Moore, W. & Frye, S. Review of HIPAA Part 1: history protected health information AND privacy AND security rules. J. Nucl. Med. Technol. 47, 269–272 (2019).

    Article  PubMed  Google Scholar 

  104. McMahan, B., Moore, E., Ramage, D., Hampson, S. & Aguera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. In Proc. 20th International Conf. Artificial Intelligence and Statistics (AISTATS) 1273–1282 (2017).

  105. Breen, L. J. et al. A co-designed systematic review and meta-analysis of the efficacy of grief interventions for anxiety and depression in young people. J. Affect. Disord. 335, 289–297 (2023).

    Article  PubMed  Google Scholar 

  106. Khalil, S. S., Tawfik, N. S. & Spruit, M. Federated learning for privacy-preserving depression detection with multilingual language models in social media posts. Patterns 5, (2024).

  107. Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. 34th International Conf. Machine Learning (ICML) 1126–1135 (2017).

  108. Zadeh, A. B., Liang, P. P., Poria, S., Cambria, E. & Morency, L.-P. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In Proc. 56th Annual Meeting of the Association for Computational Linguistics Vol. 1 (Long Papers) 2236–2246 (2018).

  109. Busso, C. et al. IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008).

    Article  Google Scholar 

  110. Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  111. Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics 37, 2112–2120 (2021).

    Article  PubMed  PubMed Central  Google Scholar 

  112. Zhou, Z., Ji, Y., Li, W., Dutta, P., Davuluri, R. V. & Liu, H. DNABERT-2: efficient foundation model and benchmark for multi-species genomes. Preprint at https://openreview.net/forum?id=oMLQB4EZE1 (2024).

  113. Thapaliya, B., Calhoun, V. D. & Liu, J. Environmental and genome-wide association study on children anxiety and depression. In 2021 IEEE International Conf. Bioinformatics and Biomedicine (BIBM) 2330–2337 (IEEE, 2021).

  114. Thapaliya, B. et al. Cross-continental environmental and genome-wide association study on children and adolescent anxiety and depression. Front. Psychiatry 15, 1384298 (2024).

    Article  PubMed  PubMed Central  Google Scholar 

  115. Kim, H., Cheon, E., Bai, D., Lee, Y. & Koo, B. Stress and heart rate variability: a meta-analysis and review of the literature. Psychiatry Investig 15, 235–245 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  116. Thapaliya, B. et al. DSAM: a deep learning framework for analyzing temporal and spatial dynamics in brain networks. Med. Image Anal. 103462 (2025).

  117. Chen, J. et al. Dynamic fusion of genomics and functional network connectivity in UK biobank reveals static and time-varying SNP manifolds. Preprint at medRxiv https://doi.org/10.1101/2024.01.09.24301013 (2024).

  118. Neverova, N., Wolf, C., Taylor, G. & Nebout, F. ModDrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1692–1706 (2016).

    Article  Google Scholar 

  119. Ma, Z., Liu, H., Wang, Y., Zhang, Z. & Wang, F. SMIL: multimodal learning with missing modalities. In Proc. AAAI Conf. Artificial Intelligence Vol. 35, 11508–11516 (2021).

  120. Wise, T., Radua, J. & Via, E. et al. Common and distinct patterns of grey-matter volume alteration in major depression and bipolar disorder: evidence from voxel-based meta-analysis. Mol. Psychiatry 22, 1455–1463 (2017).

    Article  PubMed  Google Scholar 

  121. Zhuang, L., Wayne, L., Ya, S. & Zhao, J. A robustly optimized BERT pre-training approach with post-training. In Proc. 20th Chinese National Conference on Computational Linguistics 1218–1227 (Chinese Information Processing Society of China, 2021).

  122. De Choudhury, M., Gamon, M., Counts, S. & Horvitz, E. Predicting depression via social media. In Proc. International AAAI Conf. Web and Social Media Vol. 7, 128–137 (2013).

  123. Andalibi, N., Ozturk, P. & Forte, A. Sensitive self-disclosures, responses, and social support on Instagram: the case of #depression. In Proc. 2017 ACM Conf. Computer Supported Cooperative Work and Social Computing 1485–1500 (ACM, 2017).

  124. Conneau, A. & Lample, G. Cross-lingual language model pretraining. Adv. Neural Inf. Process. Syst. Vol 32 (eds. Wallach, H. et al) https://proceedings.neurips.cc/paper_files/paper/2019/file/c04c19c2c2474dbf5f7ac4372c5b9af1-Paper.pdf (2019).

  125. Yamaguchi, A., Villavicencio, A. & Aletras, N. An empirical study on cross-lingual vocabulary adaptation for efficient language model inference. In Findings of the Assoc. Computational Linguistics: EMNLP 2024 6760–6785 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.findings-emnlp.396

  126. Baevski, A., Zhou, H., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. In Adv. Neural Inf. Process. Syst. Vol. 33 12449–12460 (Curran Associates, 2020).

  127. Hsu, W.-N., Bolte, B., Tsai, Y.-H. H. et al. HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021).

  128. Flores, R., Tlachac, M., Shrestha, A. & Rundensteiner, E. A. WavFace: a multimodal transformer-based model for depression screening. IEEE J. Biomed. Health Inform. https://doi.org/10.1109/JBHI.2025.3529348 (2025).

    Article  PubMed  Google Scholar 

  129. Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G. & Sutskever, I. Learning transferable visual models From natural language supervision. In Proc. 38th International Conf. Machine Learning (PMLR) 8748–8763 (2021).

  130. Zafar, A., Aftab, D., Qureshi, R., Wang, Y. & Yan, H. Multi-explainable TemporalNet: an interpretable multimodal approach using temporal convolutional network for user-level depression detection. In 2024 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops (CVPRW) 2258–2265 (IEEE, 2024); https://doi.org/10.1109/CVPRW63382.2024.00231

  131. Lin, Z., Feng, M., dos Santos, C. N. et al. A structured self-attentive sentence embedding. In Proc. International Conf. Learning Representations (ICLR) 1142–1156 (Curran Associates, 2017).

  132. Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24 (2021).

    Article  PubMed  Google Scholar 

  133. Edge, D. et al. From local to global: a graph RAG approach to query-focused summarization. Preprint at https://arxiv.org/abs/2404.16130 (2024).

  134. Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conf. Machine Learning (ICML) 1597–1607 (2020).

  135. Rudin, C. Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  136. Mosca, E., Szigeti, F., Tragianni, S., Gallagher, D. & Groh, G. SHAP-based explanation methods: a review for NLP interpretability. In Proc. 29th Int. Conf. Computational Linguistics (COLING 2022) (eds. Calzolari, N. et al.) 4593–4603 (International Committee on Computational Linguistics, 2022).

  137. Garreau, D. & Luxburg, U. Explaining the explainer: a first theoretical analysis of LIME. In International Conf. Artificial Intelligence and Statistics 1287–1296 (PMLR, 2020).

  138. Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Adv. Neural Inf. Process. Syst. Vol. 30 (eds. Guyon, I. et al.) 4768–4777 (Curran Associates, 2017).

  139. Ukwuoma, C. C. et al. Enhancing histopathological medical image classification for early cancer diagnosis using deep learning and explainable AI-LIME & SHAP. Biomed. Signal Process. Control. 100, 107014 (2025).

    Article  Google Scholar 

  140. Nahiduzzaman, M. et al. A hybrid explainable model based on advanced machine learning and deep learning models for classifying brain tumors using MRI images. Sci. Rep. 15, 1649 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  141. Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  142. Yang, C., Rangarajan, A. & Ranka, S. Visual explanations from deep 3D convolutional neural networks for Alzheimer’s disease classification. In AMIA Annual Symposium Proc. 2018 1571–1580 (2018).

  143. Raab, D., Theissler, A. & Spiliopoulou, M. XAI4EEG: spectral and spatio-temporal explanation of deep learning-based seizure detection in EEG time series. Neural Comput. Appl. 35, 10051–10068 (2023).

    Article  Google Scholar 

  144. Qureshi, S. A., Saha, S., Hasanuzzaman, M. & Dias, G. Multitask representation learning for multimodal estimation of depression level. IEEE Intell. Syst. 34, 45–52 (2019).

    Article  Google Scholar 

  145. Liu, X., Shen, H., Li, H., Tao, Y. & Yang, M. Multimodal depression detection based on self-attention network with facial expression and pupil. IEEE Trans. Comput. Soc. Syst. https://doi.org/10.1109/TCSS.2024.3405949 (2024).

    Article  Google Scholar 

  146. Ceccarelli, F. & Mahmoud, M. Multimodal temporal machine learning for bipolar disorder and depression recognition. Pattern Anal. Appl. 25, 493–504 (2022).

    Article  Google Scholar 

  147. Afzal Aghaei, A. & Khodaei, N. Automated depression recognition using multimodal machine learning: a study on the DAIC-WOZ dataset. Comput. Math. Comput. Model. Appl. 2, 45–53 (2023).

    Google Scholar 

  148. Flores, R., Tlachac, M., Toto, E. & Rundensteiner, E. AudiFace: multimodal deep learning for depression screening. In Proc. 7th Machine Learning for Healthcare Conf., Proc. Machine Learning Research Vol. 182 (eds. Lipton, Z. et al.) 609–630 (PMLR, 2022).

  149. Tao, Y., Yang, M., Li, H., Wu, Y. & Hu, B. DepMSTAT: multimodal spatio-temporal attentional transformer for depression detection. IEEE Trans. Knowl. Data Eng. 36, 2956–2966 (2024).

    Article  Google Scholar 

  150. Li, Y. et al. FPT-former: a flexible parallel transformer of recognizing depression by using audiovisual expert-knowledge- based multimodal measures. Int. J. Intell. Syst. 2024, 1564574 (2024).

    Article  Google Scholar 

  151. Zhang, L. et al. DepITCM: an audio-visual method for detecting depression. Front. Psychiatry 15, 1466507 (2025).

    Article  PubMed  PubMed Central  Google Scholar 

  152. Mohammad, F. & Mansoor, K. M. A. MDD: a unified multimodal deep learning approach for depression diagnosis based on text and audio speech. Comput. Mater. Continua 81, (2024).

  153. Zhang, Z., Lin, W., Liu, M. & Mahmoud, M. Multimodal deep learning framework for mental disorder recognition. In 15th IEEE International Conf. Automatic Face and Gesture Recognition (FG 2020) 344–350 (IEEE, 2020).

  154. Ye, J., Zhang, J. & Shan, H. DepMamba: progressive fusion Mamba for multimodal depression detection. In ICASSP 2025—2025 IEEE International Conf. Acoustics, Speech and Signal Processing https://doi.org/10.1109/ICASSP49660.2025.10889975 (IEEE, 2025).

  155. Cohan, A., Desmet, B., Yates, A., Soldaini, L., MacAvaney, S. & Goharian, N. SMHD: a large-scale resource for exploring online language usage for multiple mental health conditions. In Proc. 27th International Conf. Computational Linguistics 1485–1497 (Association for Computational Linguistics, 2018).

  156. Rastogi, A., Liu, Q. & Cambria, E. Stress detection from social media articles: new dataset benchmark and analytical study. In 2022 International Joint Conf. Neural Networks (IJCNN) 1–8 (IEEE, 2022).

Download references

Author information

Authors and Affiliations

Authors

Contributions

T.L., L.C. and Z.Q. contributed equally. T.L.: literature review/collection, drafting and revision, and figures. L.C.: literature collection and figures. Z.Q.: literature collection and tables. Yiran Wang: figures. W.N., Y.W.L., Yuchen Wang and H.Z.: literature collection. W.C.C., M.T. and Z.Z.: supervision and review. L.X. and K.-C.W.: project conception, overall supervision, major revision and responsibility for the paper.

Corresponding authors

Correspondence to LI Xiangtao or Ka-Chun Wong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Mental Health thanks Mohsen Sadat Shahabi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lu, T., Cho, L., Qiu, Z. et al. Depression and anxiety characterization and detection with multimodal deep learning. Nat. Mental Health (2026). https://doi.org/10.1038/s44220-026-00632-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s44220-026-00632-6

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing