Depression and anxiety characterization and detection with multimodal deep learning

Lu, Tianchi; Cho, Ling; Qiu, Zhenyu; Wang, Yiran; Niu, Wenjin; Leung, Yat Wah; Wang, Yuchen; Zhang, He; Chang, Wing Chung; Tanveer, M.; Zhang, Zhaolei; Xiangtao, LI; Wong, Ka-Chun

doi:10.1038/s44220-026-00632-6

Review Article
Published: 04 May 2026

Depression and anxiety characterization and detection with multimodal deep learning

Nature Mental Health (2026)Cite this article

149 Accesses
2 Altmetric
Metrics details

Subjects

Abstract

Depression and anxiety are among the most prevalent mental disorders, necessitating accurate characterization for effective diagnosis and treatment. Multimodal deep learning has emerged as an effective approach to enhance diagnostic precision by integrating diverse data sources, including electronic health records, physiological signals and neuroimaging. This Review provides an overview of the recent advancements in multimodal deep learning for depression and anxiety estimation. Key neural network architectures—such as convolutional neural networks for image analysis, recurrent and transformer models for sequential and textual data, and graph neural networks for capturing complex neuroimaging connectivity patterns—are examined. Challenges in data fusion, feature extraction and model interpretability are discussed, alongside strategies to improve generalizability through transfer learning. Future challenges and opportunities are discussed: large-scale datasets, standardized evaluation protocols and interdisciplinary collaboration to bridge the gap between multimodal deep learning and clinical relevance. By summarizing current practices and identifying critical challenges, this Review highlights the transformative potential of multimodal deep learning in advancing the characterization and detection of depression and anxiety.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of multimodal deep learning approaches for depression and anxiety detection.**

**Fig. 2: Overview of multimodal deep learning in depression and anxiety.**

**Fig. 3: An example of a multimodal deep learning framework for depression and anxiety diagnosis.**

**Fig. 4: The major challenges for multimodal deep learning in depression and anxiety are grouped into two main categories.**

An adaptive multi-graph neural network with multimodal feature fusion learning for MDD detection

Article Open access 18 November 2024

Classification of major depressive disorder using vertex-wise brain sulcal depth, curvature, and thickness with a deep and a shallow learning model

Article Open access 03 October 2025

Functional connectivity signatures of major depressive disorder: machine learning analysis of two multicenter neuroimaging studies

Article Open access 15 February 2023

References

World Health Organization. Mental Disorders (WHO, 2022).
World Health Organization. Anxiety Disorders (WHO, 2023).
Alonso, J. et al. Treatment gap for anxiety disorders is global: results of the World Mental Health Surveys in 21 countries. Depress. Anxiety 35, 195–208 (2018).
Article PubMed PubMed Central Google Scholar
Kroenke, K., Spitzer, R. L. & Williams, J. B. The PHQ-9: validity of a brief depression severity measure. J. Gen. Intern. Med. 16, 606–613 (2001).
Article PubMed PubMed Central Google Scholar
Spitzer, R. L., Kroenke, K., Williams, J. B. & Löwe, B. A brief measure for assessing generalized anxiety disorder: the GAD-7. Arch. Intern. Med. 166, 1092–1097 (2006).
Article PubMed Google Scholar
Sheehan, D. V. et al. The Mini-International Neuropsychiatric Interview (MINI): the development and validation of a structured diagnostic psychiatric interview for DSM-IV and ICD-10. J. Clin. Psychiatry 59, 22–33 (1998).
PubMed Google Scholar
First, M. B. Structured Clinical Interview for the DSM (SCID). In The Encyclopedia of Clinical Psychology (eds. Cautin, R. L. & Lilienfeld, S. O) https://onlinelibrary.wiley.com/doi/10.1002/9781118625392.wbecp351 (John Wiley & Sons, 2015).
Vaswani, A., Shazeer, N., Parmar, N. et al. Attention is all you need. In Proc. Adv. Neural Inf. Process. Syst. (NeurIPS) (eds. Guyon, I. et al.) 30 (2017).
Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. BERT: pre-training of deep bidirectional transformers for language understanding. In Proc. 2019 Conf. North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT) Vol. 1 (Long and Short Papers) 4171–4186 (Association for Computational Linguistics, 2019); https://doi.org/10.18653/v1/N19-1423
Guntuku, S. C., Yaden, D. B., Kern, M. L., Ungar, L. H. & Eichstaedt, J. C. Detecting depression and mental illness on social media: an integrative review. Curr. Opin. Behav. Sci. 18, 43–49 (2017).
Article Google Scholar
Torous, J. et al. The growing field of digital psychiatry: current evidence and the future of apps, social media, chatbots, and virtual reality. World Psychiatry 20, 318–335 (2021).
Article PubMed PubMed Central Google Scholar
Guntuku, S. C., Ramsay, J. R., Merchant, R. M. & Ungar, L. H. Language of ADHD in adults on social media. J. Atten. Disord. 23, 1475–1485 (2019).
Article PubMed Google Scholar
Insel, T. R. Digital phenotyping: technology for a new science of behavior. JAMA 318, 1215–1216 (2017).
Article PubMed Google Scholar
Tse, N. Y. et al. A mega-analysis of functional connectivity and network abnormalities in youth depression. Nat. Ment. Health 2, 1169–1182 (2024).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521, 436–444 (2015).
Mayberg, H. S. et al. Deep brain stimulation for treatment-resistant depression. Neuron 45, 651–660 (2005).
Article PubMed Google Scholar
Basser, P. J., Mattiello, J. & LeBihan, D. MR diffusion tensor spectroscopy and imaging. Biophys. J. 66, 259–267 (1994).
Article PubMed PubMed Central Google Scholar
Kessler, R. C. et al. Lifetime prevalence and age-of-onset distributions of DSM-IV disorders. Arch. Gen. Psychiatry 62, 593–602 (2005).
Article PubMed Google Scholar
Insel, T. R. et al. Research Domain Criteria (RDoC): toward a new classification framework for research on mental disorders. Am. J. Psychiatry 167, 748–751 (2010).
Article PubMed Google Scholar
Cuthbert, B. N. The RDoC framework: facilitating transition from ICD/DSM to dimensional approaches that integrate neuroscience and psychopathology. World Psychiatry 13, 28–35 (2014).
Article PubMed PubMed Central Google Scholar
Cuthbert, B. N. & Insel, T. R. Toward the future of psychiatric diagnosis: the seven pillars of RDoC. BMC Med. 11, 126 (2013).
Article PubMed PubMed Central Google Scholar
Casey, B., Oliveri, M. & Insel, T. A neurodevelopmental perspective on the Research Domain Criteria (RDoC) framework. Biol. Psychiatry 76, 350–353 (2014).
Article PubMed Google Scholar
Morris, S. E. & Cuthbert, B. N. Research Domain Criteria: cognitive systems, neural circuits, and dimensions of behavior. Dialogues Clin. Neurosci. 14, 29–37 (2012).
Article PubMed PubMed Central Google Scholar
Woo, C.-W., Chang, L. J., Lindquist, M. A. & Wager, T. D. Building better biomarkers: brain models in translational neuroimaging. Nat. Neurosci. 20, 365–377 (2017).
Article PubMed PubMed Central Google Scholar
Shen, L. & Thompson, P. M. Brain imaging genomics: integrated analysis and machine learning. Proc. IEEE 108, 125–162 (2019).
Article Google Scholar
Tausczik, Y. R. & Pennebaker, J. W. The psychological meaning of words: LIWC and computerized text analysis methods. J. Lang. Soc. Psychol. 29, 24–54 (2010).
Article Google Scholar
Al-Mosaiwi, M. & Johnstone, T. In an absolute state: elevated use of absolutist words is a marker specific to anxiety, depression, and suicidal ideation. Clin. Psychol. Sci. 6, 529–542 (2018).
Article PubMed PubMed Central Google Scholar
Cummins, N. et al. A review of depression and suicide risk assessment using speech analysis. Speech Commun. 71, 10–49 (2015).
Article Google Scholar
Cohn, J. F. et al. Detecting depression from facial actions and vocal prosody. In 2009 3rd International Conf. Affective Computing and Intelligent Interaction and Workshops 1–7 (IEEE, 2009).
Baltrušaitis, T., Ahuja, C. & Morency, L.-P. Multimodal machine learning: a survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 423–443 (2019).
Article PubMed Google Scholar
Girard, J. M., Cohn, J. F., Mahoor, M. H., Mavadati, S. & Rosenwald, D. P. Social risk and depression: evidence from manual and automatic facial expression analysis. In 2013 10th IEEE International Conf. and Workshops on Automatic Face and Gesture Recognition (FG) 1–8 (IEEE, 2013).
Liu, S. & Gui, R. Fusing multi-scale fmri features using a brain-inspired multi-channel graph neural network for major depressive disorder diagnosis. Biomed. Signal Process. Control. 90, 105837 (2024).
Article Google Scholar
Wang, Q., Li, L., Qiao, L. & Liu, M. Adaptive multimodal neuroimage integration for major depression disorder detection. Front. Neuroinform. 16, 856175 (2022).
Article PubMed PubMed Central Google Scholar
Pennebaker, J. W., Mehl, M. R. & Niederhoffer, K. G. Psychological aspects of natural language use: our words, our selves. Annu. Rev. Psychol. 54, 547–577 (2003).
Article PubMed Google Scholar
Tadesse, M. M., Lin, H., Xu, B. & Yang, L. Detection of depression-related posts in Reddit social media forum. IEEE Access 7, 44883–44893 (2019).
Article Google Scholar
Teodorescu, D., Cheng, T., Fyshe, A. & Mohammad, S. Language and mental health: measures of emotion dynamics from text as linguistic biosocial markers. In Proc. 2023 Conf. Empirical Methods in Natural Language Processing (eds. Bouamor, H. et al.) 3117–3133 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.emnlp-main.188
France, D. J., Shiavi, R. G., Silverman, S., Silverman, M. & Wilkes, M. Acoustical properties of speech as indicators of depression and suicidal risk. IEEE Trans. Biomed. Eng. 47, 829–837 (2000).
Article PubMed Google Scholar
Harlev, D., Singer, S., Goldshalger, M., Wolpe, N. & Bergmann, E. Acoustic speech features are associated with late-life depression and apathy symptoms: preliminary findings. Alzheimers Dement. 17, e70055 (2025).
Google Scholar
Low, L.-S. A., Maddage, N. C., Lech, M., Sheeber, L. B. & Allen, N. B. Detection of clinical depression in adolescents’ speech during family interactions. IEEE Trans. Biomed. Eng. 58, 574–586 (2010).
Article PubMed PubMed Central Google Scholar
Little, B. et al. Deep learning-based automated speech detection as a marker of social functioning in late-life depression. Psychol. Med. 51, 1441–1450 (2021).
Article PubMed Google Scholar
Hershey, S. et al. CNN architectures for large-scale audio classification. In 2017 IEEE International Conf. Acoustics, Speech and Signal Processing (ICASSP) 131–135 (IEEE, 2017).
Fraiwan, M., Fraiwan, L. & Alkhodari, M. et al. Recognition of pulmonary diseases from lung sounds using convolutional neural networks AND long short-term memory. J. Ambient Intell. Human Comput. 13, 4759–4771 (2022).
Article Google Scholar
Katmah, R. et al. A review on mental stress assessment methods using EEG signals. Sensors 21, 5043 (2021).
Article PubMed PubMed Central Google Scholar
Fitzgerald, P. J. & Watson, B. O. Gamma oscillations as a biomarker for major depression: an emerging topic. Transl. Psychiatry 8, 177 (2018).
Article PubMed PubMed Central Google Scholar
Schiweck, C., Piette, D., Berckmans, D., Claes, S. & Vrieze, E. Heart rate and high frequency heart rate variability during stress as biomarker for clinical depression. a systematic review. Psychol. Med. 49, 200–211 (2019).
Article PubMed Google Scholar
Chen, S., Yu, Y. & Pan, J. MadNet: EEG-based depression detection using a deep convolution neural network framework with multi-dimensional attention. In International Conf. Artificial Neural Networks 283–294 (Springer, 2023).
Yan, G., Liang, S., Zhang, Y. & Liu, F. Fusing transformer model with temporal features for ECG heartbeat classification. In 2019 IEEE International Conf. Bioinformatics and Biomedicine (BIBM) 898–905 (IEEE, 2019).
Zheng, G. et al. An attention-based multi-modal mri fusion model for major depressive disorder diagnosis. J. Neural Eng. 20, 066005 (2023).
Article Google Scholar
Gao, J., Li, P., Chen, Z. & Zhang, J. A survey on deep learning for multimodal data fusion. Neural Comput. 32, 829–864 (2020).
Article PubMed Google Scholar
Ramachandram, D. & Taylor, G. W. Deep multimodal learning: a survey on recent advances and trends. IEEE Signal Process. Mag. 34, 96–108 (2017).
Article Google Scholar
Tsai, Y.-H. H., Liang, P. P., Zadeh, A., Morency, L.-P. & Salakhutdinov, R. Multimodal transformer for unaligned multimodal language sequences. In Proc. Conf. Computational Linguistics (ACL) 6558–6569 (2019).
Cai, C., He, Y., Sun, L., Lian, Z., Liu, B., Tao, J., Xu, M. & Wang, K. Multimodal sentiment analysis based on recurrent neural network and multimodal attention. In Proc. 2nd Multimodal Sentiment Analysis Challenge 61–67 (Association for Computing Machinery, 2021); https://doi.org/10.1145/3475957.3484454.
Tay, Y., Dehghani, M., Bahri, D. & Metzler, D. Efficient transformers: a survey. ACM Comput. Surv. 55, 1–28 (2022).
Article Google Scholar
Liu, Z., Mao, H., Wu, C.-Y., Feichtenhofer, C., Darrell, T. & Xie, S. A ConvNet for the 2020s. In 2022 IEEE/CVF Conf. Computer Vision and Pattern Recognition (CVPR) 11966–11976 (IEEE/CVF, 2022); https://doi.org/10.1109/CVPR52688.2022.01167
Su, Y. & Kuo, C. J. Recurrent neural networks and their memory behavior: a survey. APSIPA Trans. Signal Inf. Process. https://doi.org/10.1561/116.00000123 (2022).
Sharma, K. et al. A survey of graph neural networks for social recommender systems. ACM Comput. Surv. 56, 1–34 (2024).
Article Google Scholar
Bommasani, R. et al. On the opportunities and risks of foundation models. Preprint at https://arxiv.org/abs/2108.07258 (2021).
Shen, S., Yao, Z., Li, C., Darrell, T., Keutzer, K. & He, Y. Scaling vision-language models with sparse mixture of experts. In Findings of the Association for Computational Linguistics: EMNLP 2023 (eds. Bouamor, H. et al.) 11329–11344 (Association for Computational Linguistics, 2023); https://doi.org/10.18653/v1/2023.findings-emnlp.758
Sahili, Z. A., Patras, I. & Purver, M. Multimodal machine learning in mental health: a survey of data, algorithms, and challenges. Preprint at https://arxiv.org/abs/2407.16804 (2024).
DeVault, D. et al. Simsensei Kiosk: a virtual human interviewer for healthcare decision support. In Proc. 2014 International Conf. Autonomous Agents and Multi-agent Systems 1061–1068 (2014).
Ringeval, F. et al. AVEC 2019 workshop and challenge: state-of-mind, detecting depression with AI, and cross-cultural affect recognition. In Proc. 9th International on Audio/Visual Emotion Challenge and Workshop 3–12 (2019).
Yoon, J., Kang, C., Kim, S. & Han, J. D-vlog: multimodal vlog dataset for depression detection. Proc. AAAI Conf. Artif. Intell. 36, 12226–12234 (2022).
Google Scholar
Zhu, F., Zhang, J., Dang, R., Hu, B. & Wang, Q. MTNet: multimodal transformer network for mild depression detection through fusion of EEG and eye tracking. Biomed. Signal Process. Control. 100, 106996 (2025).
Article Google Scholar
Zhou, L. et al. TAMFN: time-aware attention multimodal fusion network for depression detection. IEEE Trans. Neural Syst. Rehabil. Eng. 31, 669–679 (2022).
Article Google Scholar
Fang, M., Peng, S., Liang, Y., Hung, C.-C. & Liu, S. A multimodal fusion model with multi-level attention mechanism for depression detection. Biomed. Signal Process. Control 82, 104561 (2023).
Article Google Scholar
Ilias, L., Mouzakitis, S. & Askounis, D. Calibration of transformer-based models for identifying stress and depression in social media. IEEE Trans. Comput. Soc. Syst. 11, 1979–1990 (2023).
Article Google Scholar
Sadeghi, M. et al. Harnessing multimodal approaches for depression detection using large language models and facial expressions. npj Ment. Health Res. 3, 66 (2024).
Article PubMed PubMed Central Google Scholar
Victor, E., Aghajan, Z. M., Sewart, A. R. & Christian, R. Detecting depression using a framework combining deep multimodal neural networks with a purpose-built automated evaluation. Psychol. Assess. 31, 1019 (2019).
Article PubMed Google Scholar
Li, Z. et al. MHA: a multimodal hierarchical attention model for depression detection in social media. Health Inf. Sci. Syst. 11, 6 (2023).
Article PubMed PubMed Central Google Scholar
Cai, H., Yuan, Z. & Gao, Y. et al. A multi-modal open dataset for mental-disorder analysis. Sci. Data 9, 178 (2022).
Article PubMed PubMed Central Google Scholar
Yan, C.-G. et al. Reduced default mode network functional connectivity in patients with recurrent major depressive disorder. Proc. Natl Acad. Sci. USA 116, 9078–9083 (2019).
Article PubMed PubMed Central Google Scholar
Liu, S. et al. An objective quantitative diagnosis of depression using a local-to-global multimodal fusion graph neural network. Patterns 5, (2024).
He, M., Bakker, E. M. & Lew, M. S. DPD (depression detection) net: a deep neural network for multimodal depression detection. Health Inf. Sci. Syst. 12, 1–17 (2024).
Article Google Scholar
Li, X. et al. BrainGNN: interpretable brain graph neural network for fmri analysis. Med. Image Anal. 74, 102233 (2021).
Article PubMed PubMed Central Google Scholar
Thapaliya, B. et al. Brain networks and intelligence: a graph neural network based approach to resting state fMRI data. Med. Image Anal. 101, 103433 (2025).
Article PubMed Google Scholar
Thapaliya, B., Akbas, E., Sapkota, R. et al. SELF-clustering graph transformer approach to model resting state functional brain activity. In 2025 IEEE 22nd International Symposium on Biomedical Imaging (ISBI) https://doi.org/10.1109/ISBI60581.2025.10980889 (IEEE, 2025).
Cui, H. et al. BrainGB: a benchmark for brain network analysis with graph neural networks. IEEE Trans. Med. Imaging 42, 493–506 (2022).
Article Google Scholar
Mo, H., Hui, S. C., Liao, X., Li, Y., Zhang, W. & Ding, S. A multimodal data-driven framework for anxiety screening. IEEE Trans. Instrum. Meas. 73, 4003113 (2024).
Shadid, M., Afnan, M. S. & Patwary, M. J. TI-Fusion: a multimodal anxiety disorder detection method. In 2023 6th International Conf. Electrical Information and Communication Technology (EICT) 1–6 (IEEE, 2023).
Lai, S. & Li, Z. Detection of potential anxiety in social media based on multimodal fusion with deep learning methods. In 2023 IEEE International Conf. Bioinformatics and Biomedicine (BIBM) 560–566 (IEEE, 2023).
Kamakshi, K. & Rengaraj, A. Early detection of stress and anxiety based seizures in position data augmented EEG signal using hybrid deep learning algorithms. IEEE Access 12, 35351–35365 (2024).
Article Google Scholar
Bruin, W. B. et al. Brain-based classification of youth with anxiety disorders: transdiagnostic examinations within the ENIGMA-anxiety database using machine learning. Nat. Ment. Health 2, 104–118 (2024).
Article Google Scholar
Aldayel, M. & Al-Nafjan, A. A comprehensive exploration of machine learning techniques for EEG-based anxiety detection. PeerJ Comput. Sci. 10, e1829 (2024).
Article PubMed PubMed Central Google Scholar
Zhou, E., Wang, W., Ma, S., Xie, X., Kang, L., Xu, S., Deng, Z., Gong, Q., Nie, Z., Yao, L., Bu, L., Wang, F. & Liu, Z. Prediction of anxious depression using multimodal neuroimaging and machine learning. NeuroImage 285, 120499 (2024).
Diep, B., Stanojevic, M. & Novikova, J. Multi-modal deep learning system for depression and anxiety detection. In Empowering Communities: A Participatory Approach to AI for Mental Health (2022).
Tasnim, M., Ehghaghi, M., Diep, B. & Novikova, J. DEPAC: a corpus for depression and anxiety detection from speech. In Proc. Eighth Workshop on Computational Linguistics and Clinical Psychology (eds. Zirikly, A. et al.) 1–16 (Association for Computational Linguistics, 2022).
Xie, W. et al. Multimodal fusion diagnosis of depression and anxiety based on CNN-LSTM model. Comput. Med. Imaging Graph. 102, 102128 (2022).
Article PubMed Google Scholar
Qin, J., Liu, C., Tang, T., Liu, D., Wang, M., Huang, Q. & Zhang, R. Mental-perceiver: audio-textual multi-modal learning for estimating mental disorders. Proc. AAAI Conf. Artif. Intell. 39, 25029–25037 (2025).
Google Scholar
Ajith, M. et al. A deep learning approach for mental health quality prediction using functional network connectivity and assessment data. Brain Imaging Behav. 18, 630–645 (2024).
Article PubMed Google Scholar
He, L., Chen, K., Zhao, J., Wang, Y., Pei, E., Chen, H., Jiang, J., Zhang, S., Zhang, J., Wang, Z., He, T. & Tiwari, P. LMVD: A large-scale multimodal vlog dataset for depression detection in the wild. Inf. Fusion 126, 103632 (2026).
Diagnostic and Statistical Manual of Mental Disorders, Third Edition (DSM-III) (American Psychiatric Association, 1980).
Sawadogo, M.A.L., Pala, F. & Singh, G. et al. PTSD in the wild: a video database for studying post-traumatic stress disorder recognition in unconstrained environments. Multimed. Tools Appl. 83, 42861–42883 (2024).
Article Google Scholar
Çiftçi, E., Kaya, H., Güleç, H. & Salah, A. A. The Turkish audio-visual bipolar disorder corpus. In 1st Asian Conf. Affective Computing and Intelligent Interaction (ACII Asia 2018) https://doi.org/10.1109/ACIIAsia.2018.8470362 (IEEE, 2018).
Cosma, A. & Radoi, E. PsyMo: a dataset for estimating self-reported psychological traits from gait. In IEEE/CVF Winter Conf. Applications of Computer Vision (WACV) 4591–4601 (IEEE, 2024); https://doi.org/10.1109/WACV57701.2024.00454
Rohanian, M., Hough, J. & Purver, M. Detecting depression with word-level multimodal fusion. In Proc. Interspeech 2019 1443–1447 (2019); https://doi.org/10.21437/Interspeech.2019-2283
Xia, Y. et al. A depression detection model based on multimodal graph neural network. Multimed. Tools Appl. 83, 63379–63395 (2024).
Article Google Scholar
Ansari, G. et al. Multimodal depression detection system using machine learning. In 2023 Second International Conf. Informatics (ICI) 1–7 (IEEE, 2023).
Xing, T. et al. An adaptive multi-graph neural network with multimodal feature fusion learning for MDD detection. Sci. Rep. 14, 28400 (2024).
Article PubMed PubMed Central Google Scholar
Fitzgerald, P. & Watson, B. Gamma oscillations as a biomarker for major depression: an emerging topic. Transl. Psychiatry 8, 177 (2018).
Article PubMed PubMed Central Google Scholar
OpenAI et al. GPT-4 technical report. Preprint at https://arxiv.org/abs/2303.08774 (2023).
Guo, D., Yang, D. & Zhang, H. et al. DeepSeek-R1 incentivizes reasoning in LLMs through reinforcement learning. Nature 645, 633–638 (2025).
Article PubMed PubMed Central Google Scholar
Voigt, P. & von dem Bussche, A. The EU General Data Protection Regulation (GDPR): A Practical Guide 2nd edn. (Springer, 2024).
Moore, W. & Frye, S. Review of HIPAA Part 1: history protected health information AND privacy AND security rules. J. Nucl. Med. Technol. 47, 269–272 (2019).
Article PubMed Google Scholar
McMahan, B., Moore, E., Ramage, D., Hampson, S. & Aguera y Arcas, B. Communication-efficient learning of deep networks from decentralized data. In Proc. 20th International Conf. Artificial Intelligence and Statistics (AISTATS) 1273–1282 (2017).
Breen, L. J. et al. A co-designed systematic review and meta-analysis of the efficacy of grief interventions for anxiety and depression in young people. J. Affect. Disord. 335, 289–297 (2023).
Article PubMed Google Scholar
Khalil, S. S., Tawfik, N. S. & Spruit, M. Federated learning for privacy-preserving depression detection with multilingual language models in social media posts. Patterns 5, (2024).
Finn, C., Abbeel, P. & Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proc. 34th International Conf. Machine Learning (ICML) 1126–1135 (2017).
Zadeh, A. B., Liang, P. P., Poria, S., Cambria, E. & Morency, L.-P. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In Proc. 56th Annual Meeting of the Association for Computational Linguistics Vol. 1 (Long Papers) 2236–2246 (2018).
Busso, C. et al. IEMOCAP: interactive emotional dyadic motion capture database. Lang. Resour. Eval. 42, 335–359 (2008).
Article Google Scholar
Wray, N. R. et al. Genome-wide association analyses identify 44 risk variants and refine the genetic architecture of major depression. Nat. Genet. 50, 668–681 (2018).
Article PubMed PubMed Central Google Scholar
Ji, Y., Zhou, Z., Liu, H. & Davuluri, R. V. DNABERT: pre-trained bidirectional encoder representations from transformers model for dna-language in genome. Bioinformatics 37, 2112–2120 (2021).
Article PubMed PubMed Central Google Scholar
Zhou, Z., Ji, Y., Li, W., Dutta, P., Davuluri, R. V. & Liu, H. DNABERT-2: efficient foundation model and benchmark for multi-species genomes. Preprint at https://openreview.net/forum?id=oMLQB4EZE1 (2024).
Thapaliya, B., Calhoun, V. D. & Liu, J. Environmental and genome-wide association study on children anxiety and depression. In 2021 IEEE International Conf. Bioinformatics and Biomedicine (BIBM) 2330–2337 (IEEE, 2021).
Thapaliya, B. et al. Cross-continental environmental and genome-wide association study on children and adolescent anxiety and depression. Front. Psychiatry 15, 1384298 (2024).
Article PubMed PubMed Central Google Scholar
Kim, H., Cheon, E., Bai, D., Lee, Y. & Koo, B. Stress and heart rate variability: a meta-analysis and review of the literature. Psychiatry Investig 15, 235–245 (2018).
Article PubMed PubMed Central Google Scholar
Thapaliya, B. et al. DSAM: a deep learning framework for analyzing temporal and spatial dynamics in brain networks. Med. Image Anal. 103462 (2025).
Chen, J. et al. Dynamic fusion of genomics and functional network connectivity in UK biobank reveals static and time-varying SNP manifolds. Preprint at medRxiv https://doi.org/10.1101/2024.01.09.24301013 (2024).
Neverova, N., Wolf, C., Taylor, G. & Nebout, F. ModDrop: adaptive multi-modal gesture recognition. IEEE Trans. Pattern Anal. Mach. Intell. 38, 1692–1706 (2016).
Article Google Scholar
Ma, Z., Liu, H., Wang, Y., Zhang, Z. & Wang, F. SMIL: multimodal learning with missing modalities. In Proc. AAAI Conf. Artificial Intelligence Vol. 35, 11508–11516 (2021).
Wise, T., Radua, J. & Via, E. et al. Common and distinct patterns of grey-matter volume alteration in major depression and bipolar disorder: evidence from voxel-based meta-analysis. Mol. Psychiatry 22, 1455–1463 (2017).
Article PubMed Google Scholar
Zhuang, L., Wayne, L., Ya, S. & Zhao, J. A robustly optimized BERT pre-training approach with post-training. In Proc. 20th Chinese National Conference on Computational Linguistics 1218–1227 (Chinese Information Processing Society of China, 2021).
De Choudhury, M., Gamon, M., Counts, S. & Horvitz, E. Predicting depression via social media. In Proc. International AAAI Conf. Web and Social Media Vol. 7, 128–137 (2013).
Andalibi, N., Ozturk, P. & Forte, A. Sensitive self-disclosures, responses, and social support on Instagram: the case of #depression. In Proc. 2017 ACM Conf. Computer Supported Cooperative Work and Social Computing 1485–1500 (ACM, 2017).
Conneau, A. & Lample, G. Cross-lingual language model pretraining. Adv. Neural Inf. Process. Syst. Vol 32 (eds. Wallach, H. et al) https://proceedings.neurips.cc/paper_files/paper/2019/file/c04c19c2c2474dbf5f7ac4372c5b9af1-Paper.pdf (2019).
Yamaguchi, A., Villavicencio, A. & Aletras, N. An empirical study on cross-lingual vocabulary adaptation for efficient language model inference. In Findings of the Assoc. Computational Linguistics: EMNLP 2024 6760–6785 (Association for Computational Linguistics, 2024); https://doi.org/10.18653/v1/2024.findings-emnlp.396
Baevski, A., Zhou, H., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. In Adv. Neural Inf. Process. Syst. Vol. 33 12449–12460 (Curran Associates, 2020).
Hsu, W.-N., Bolte, B., Tsai, Y.-H. H. et al. HuBERT: Self-supervised speech representation learning by masked prediction of hidden units. IEEE/ACM Trans. Audio Speech Lang. Process. 29, 3451–3460 (2021).
Flores, R., Tlachac, M., Shrestha, A. & Rundensteiner, E. A. WavFace: a multimodal transformer-based model for depression screening. IEEE J. Biomed. Health Inform. https://doi.org/10.1109/JBHI.2025.3529348 (2025).
Article PubMed Google Scholar
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G. & Sutskever, I. Learning transferable visual models From natural language supervision. In Proc. 38th International Conf. Machine Learning (PMLR) 8748–8763 (2021).
Zafar, A., Aftab, D., Qureshi, R., Wang, Y. & Yan, H. Multi-explainable TemporalNet: an interpretable multimodal approach using temporal convolutional network for user-level depression detection. In 2024 IEEE/CVF Conf. Computer Vision and Pattern Recognition Workshops (CVPRW) 2258–2265 (IEEE, 2024); https://doi.org/10.1109/CVPRW63382.2024.00231
Lin, Z., Feng, M., dos Santos, C. N. et al. A structured self-attentive sentence embedding. In Proc. International Conf. Learning Representations (ICLR) 1142–1156 (Curran Associates, 2017).
Wu, Z. et al. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 32, 4–24 (2021).
Article PubMed Google Scholar
Edge, D. et al. From local to global: a graph RAG approach to query-focused summarization. Preprint at https://arxiv.org/abs/2404.16130 (2024).
Chen, T., Kornblith, S., Norouzi, M. & Hinton, G. A simple framework for contrastive learning of visual representations. In Proc. 37th International Conf. Machine Learning (ICML) 1597–1607 (2020).
Rudin, C. Stop explaining black box machine learning models for high-stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Article PubMed PubMed Central Google Scholar
Mosca, E., Szigeti, F., Tragianni, S., Gallagher, D. & Groh, G. SHAP-based explanation methods: a review for NLP interpretability. In Proc. 29th Int. Conf. Computational Linguistics (COLING 2022) (eds. Calzolari, N. et al.) 4593–4603 (International Committee on Computational Linguistics, 2022).
Garreau, D. & Luxburg, U. Explaining the explainer: a first theoretical analysis of LIME. In International Conf. Artificial Intelligence and Statistics 1287–1296 (PMLR, 2020).
Lundberg, S. M. & Lee, S.-I. A unified approach to interpreting model predictions. In Adv. Neural Inf. Process. Syst. Vol. 30 (eds. Guyon, I. et al.) 4768–4777 (Curran Associates, 2017).
Ukwuoma, C. C. et al. Enhancing histopathological medical image classification for early cancer diagnosis using deep learning and explainable AI-LIME & SHAP. Biomed. Signal Process. Control. 100, 107014 (2025).
Article Google Scholar
Nahiduzzaman, M. et al. A hybrid explainable model based on advanced machine learning and deep learning models for classifying brain tumors using MRI images. Sci. Rep. 15, 1649 (2025).
Article PubMed PubMed Central Google Scholar
Lundberg, S. M. et al. Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2, 749–760 (2018).
Article PubMed PubMed Central Google Scholar
Yang, C., Rangarajan, A. & Ranka, S. Visual explanations from deep 3D convolutional neural networks for Alzheimer’s disease classification. In AMIA Annual Symposium Proc. 2018 1571–1580 (2018).
Raab, D., Theissler, A. & Spiliopoulou, M. XAI4EEG: spectral and spatio-temporal explanation of deep learning-based seizure detection in EEG time series. Neural Comput. Appl. 35, 10051–10068 (2023).
Article Google Scholar
Qureshi, S. A., Saha, S., Hasanuzzaman, M. & Dias, G. Multitask representation learning for multimodal estimation of depression level. IEEE Intell. Syst. 34, 45–52 (2019).
Article Google Scholar
Liu, X., Shen, H., Li, H., Tao, Y. & Yang, M. Multimodal depression detection based on self-attention network with facial expression and pupil. IEEE Trans. Comput. Soc. Syst. https://doi.org/10.1109/TCSS.2024.3405949 (2024).
Article Google Scholar
Ceccarelli, F. & Mahmoud, M. Multimodal temporal machine learning for bipolar disorder and depression recognition. Pattern Anal. Appl. 25, 493–504 (2022).
Article Google Scholar
Afzal Aghaei, A. & Khodaei, N. Automated depression recognition using multimodal machine learning: a study on the DAIC-WOZ dataset. Comput. Math. Comput. Model. Appl. 2, 45–53 (2023).
Google Scholar
Flores, R., Tlachac, M., Toto, E. & Rundensteiner, E. AudiFace: multimodal deep learning for depression screening. In Proc. 7th Machine Learning for Healthcare Conf., Proc. Machine Learning Research Vol. 182 (eds. Lipton, Z. et al.) 609–630 (PMLR, 2022).
Tao, Y., Yang, M., Li, H., Wu, Y. & Hu, B. DepMSTAT: multimodal spatio-temporal attentional transformer for depression detection. IEEE Trans. Knowl. Data Eng. 36, 2956–2966 (2024).
Article Google Scholar
Li, Y. et al. FPT-former: a flexible parallel transformer of recognizing depression by using audiovisual expert-knowledge- based multimodal measures. Int. J. Intell. Syst. 2024, 1564574 (2024).
Article Google Scholar
Zhang, L. et al. DepITCM: an audio-visual method for detecting depression. Front. Psychiatry 15, 1466507 (2025).
Article PubMed PubMed Central Google Scholar
Mohammad, F. & Mansoor, K. M. A. MDD: a unified multimodal deep learning approach for depression diagnosis based on text and audio speech. Comput. Mater. Continua 81, (2024).
Zhang, Z., Lin, W., Liu, M. & Mahmoud, M. Multimodal deep learning framework for mental disorder recognition. In 15th IEEE International Conf. Automatic Face and Gesture Recognition (FG 2020) 344–350 (IEEE, 2020).
Ye, J., Zhang, J. & Shan, H. DepMamba: progressive fusion Mamba for multimodal depression detection. In ICASSP 2025—2025 IEEE International Conf. Acoustics, Speech and Signal Processing https://doi.org/10.1109/ICASSP49660.2025.10889975 (IEEE, 2025).
Cohan, A., Desmet, B., Yates, A., Soldaini, L., MacAvaney, S. & Goharian, N. SMHD: a large-scale resource for exploring online language usage for multiple mental health conditions. In Proc. 27th International Conf. Computational Linguistics 1485–1497 (Association for Computational Linguistics, 2018).
Rastogi, A., Liu, Q. & Cambria, E. Stress detection from social media articles: new dataset benchmark and analytical study. In 2022 International Joint Conf. Neural Networks (IJCNN) 1–8 (IEEE, 2022).

Download references

Author information

These authors contributed equally: Tianchi Lu, Ling Cho.

Authors and Affiliations

Department of Computer Science, City University of Hong Kong, Hong Kong, China
Tianchi Lu, Ling Cho, Zhenyu Qiu, Wenjin Niu, Yat Wah Leung, Yuchen Wang, He Zhang & Ka-Chun Wong
Department of Psychiatry, The Chinese University of Hong Kong, Hong Kong, China
Yiran Wang
Center for Virology and Vaccine Research, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA
Yuchen Wang
Department of Psychiatry, School of Clinical Medicine, LKS Faculty of Medicine, University of Hong Kong, Hong Kong, China
Wing Chung Chang
State Key Laboratory of Brain Cognitive Sciences, University of Hong Kong, Hong Kong, China
Wing Chung Chang
OPTIMAL Research Lab, Department of Mathematics, Indian Institute of Technology Indore, Simrol, India
M. Tanveer
Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
Zhaolei Zhang
Banting and Best Department of Medical Research, University of Toronto, Toronto, Ontario, Canada
Zhaolei Zhang
School of Artificial Intelligence, Jilin University, Jilin, China
LI Xiangtao

Authors

Tianchi Lu
View author publications
Search author on:PubMed Google Scholar
Ling Cho
View author publications
Search author on:PubMed Google Scholar
Zhenyu Qiu
View author publications
Search author on:PubMed Google Scholar
Yiran Wang
View author publications
Search author on:PubMed Google Scholar
Wenjin Niu
View author publications
Search author on:PubMed Google Scholar
Yat Wah Leung
View author publications
Search author on:PubMed Google Scholar
Yuchen Wang
View author publications
Search author on:PubMed Google Scholar
He Zhang
View author publications
Search author on:PubMed Google Scholar
Wing Chung Chang
View author publications
Search author on:PubMed Google Scholar
M. Tanveer
View author publications
Search author on:PubMed Google Scholar
Zhaolei Zhang
View author publications
Search author on:PubMed Google Scholar
LI Xiangtao
View author publications
Search author on:PubMed Google Scholar
Ka-Chun Wong
View author publications
Search author on:PubMed Google Scholar

Contributions

T.L., L.C. and Z.Q. contributed equally. T.L.: literature review/collection, drafting and revision, and figures. L.C.: literature collection and figures. Z.Q.: literature collection and tables. Yiran Wang: figures. W.N., Y.W.L., Yuchen Wang and H.Z.: literature collection. W.C.C., M.T. and Z.Z.: supervision and review. L.X. and K.-C.W.: project conception, overall supervision, major revision and responsibility for the paper.

Corresponding authors

Correspondence to LI Xiangtao or Ka-Chun Wong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Mental Health thanks Mohsen Sadat Shahabi and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Fig. 1 and Table 1.

Reporting Summary (download PDF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Lu, T., Cho, L., Qiu, Z. et al. Depression and anxiety characterization and detection with multimodal deep learning. Nat. Mental Health (2026). https://doi.org/10.1038/s44220-026-00632-6

Download citation

Received: 21 March 2025
Accepted: 10 March 2026
Published: 04 May 2026
Version of record: 04 May 2026
DOI: https://doi.org/10.1038/s44220-026-00632-6