Abstract
Speech provides a rich behavioral signal of psychosis, yet its diagnostic use remains limited because speech patterns vary widely across individuals and contexts. We model this variability as uncertainty, capturing how consistently speech features indicate symptom expression. We introduce a multimodal model that integrates acoustic and linguistic information to predict symptom severity and psychosis-related traits across the spectrum, from high schizotypy to clinical psychosis. By estimating uncertainty for each modality, the model learns when to rely on specific signals, adapting to speech quality and task context to improve accuracy and interpretability. Using speech from 114 participants–32 with early psychosis and 82 with low or high schizotypy–recorded in German across structured and narrative tasks, the model achieved an F1-score of 83% (ECE = 0.045), demonstrating robust and well-calibrated performance. Uncertainty estimation further revealed which speech markers most reliably indicated symptoms, including pitch variability, fluency disruptions, and spectral instability.
Similar content being viewed by others
Data availability
Data used to support these findings are available from the corresponding author upon reasonable request.
References
Omlor, W. et al. Estimating multimodal brain variability in schizophrenia spectrum disorders: a worldwide ENIGMA study. Am. J. Psychiatry. https://doi.org/10.1176/appi.ajp.20230806 (2025).
Insel, T. R. Rethinking schizophrenia. Nature 468, 187–193 (2010).
Keeley, J. & Gaebel, W. Symptom rating scales for schizophrenia and other primary psychotic disorders in ICD-11. Epidemiol. Psychiatr. Sci. 27, 219–224 (2018).
Winkelbeiner, S., Leucht, S., Kane, J. M. & Homan, P. Evaluation of differences in individual treatment response in schizophrenia spectrum disorders. JAMA Psychiatry 76, 1063–1073 (2019).
Homan, P. et al. Relapse prevention through health technology program reduces hospitalization in schizophrenia. Psychol. Med. 1–7. https://doi.org/10.1017/S0033291722000794 (2022).
Griswold, K. S., Del Regno, P. A. & Berger, R. C. Recognition and differential diagnosis of psychosis in primary care. Am. Fam. Physician 91, 856–863 (2015).
Phillips, J. Rethinking categories and dimensions in the dsm. In The Journal of Medicine and Philosophy: A Forum for Bioethics and Philosophy of Medicine Vol. 45, 663–682 (Oxford University Press US, 2020).
Sellbom, M. E. & Suhr, J. A.The Cambridge Handbook of Clinical Assessment And Diagnosis (Cambridge University Press, 2020).
Kvig, E. I. & Nilssen, S. Does method matter? assessing the validity and clinical utility of structured diagnostic interviews among a clinical sample of first-admitted patients with psychosis: a replication study. Front. Psychiatry 14, 1076299 (2023).
Palaniyappan, L., Homan, P. & Alonso-Sanchez, M. F. Language network dysfunction and formal thought disorder in schizophrenia. Schizophr. Bull. https://doi.org/10.1093/schbul/sbac159 (2022).
Corcoran, C. M. & Cecchi, G. A. Using language processing and speech analysis for the identification of psychosis and other disorders. Biol. Psychiatry. Cogn. Neurosci. Neuroimaging 5, 770–779 (2020).
Corcoran, C. M. et al. Language as a biomarker for psychosis: a natural language processing approach. Schizophr. Res. 226, 158–166 (2020).
De Boer, J. et al. Acoustic speech markers for schizophrenia-spectrum disorders: a diagnostic and symptom-recognition tool. Psychol. Med. 53, 1302–1312 (2023).
Dikaios, K. et al. Applications of speech analysis in psychiatry. Harv. Rev. Psychiatry 31, 1–13 (2023).
He, R. et al. Task-voting for schizophrenia spectrum disorders prediction using machine learning across linguistic feature domains. medRxiv. https://doi.org/10.1101/2024.08.31.24312886 (2024).
Hernández, H. C. et al. Natural language processing markers for psychosis and other psychiatric disorders: Emerging themes and research agenda from a cross-linguistic workshop. Schizophr. Bull. 49, S86–S92 (2023).
Palominos, C. et al. Approximating the semantic space: word embedding techniques in psychiatric speech analysis. Schizophrenia 10, 114 (2024).
Ben-Zion, Z. et al. “Chat-GPT on the Couch”: assessing and alleviating state anxiety in large language models. NPJ Digit. Med. https://doi.org/10.31234/osf.io/j7fwb (2025).
Panchalingam, J. et al. Motivational interviewing in patients with acute psychosis: a feasibility study. Schizophr. Bull. Open. https://doi.org/10.1093/schizbullopen/sgaf004 (2025).
Voppel, A. E., de Boer, J. N., Brederoo, S. G., Schnack, H. G. & Sommer, I. E. Semantic and acoustic markers in schizophrenia-spectrum disorders: A combinatory machine learning approach. Schizophr. Bull. 49, S163–S171 (2023).
Parola, A. et al. Speech disturbances in schizophrenia: Assessing cross-linguistic generalizability of nlp automated measures of coherence. Schizophr. Res. https://doi.org/10.1016/j.schres.2022.07.002 (2022).
Hitczenko, K., Mittal, V. A. & Goldrick, M. Understanding language abnormalities and associated clinical markers in psychosis: the promise of computational methods. Schizophr. Bull. 47, 344–362 (2021).
Parola, A., Simonsen, A., Bliksted, V. & Fusaroli, R. Voice patterns in schizophrenia: a systematic review and Bayesian meta-analysis. Schizophr. Res. 216, 24–40 (2020).
Gawlikowski, J. et al. A survey of uncertainty in deep neural networks. Artif. Intell. Rev. 56, 1513–1589 (2023).
Voppel, A. E., de Boer, J. N., Brederoo, S., Schnack, H. & Sommer, I. E. Quantified language connectedness in schizophrenia-spectrum disorders. Psychiatry Res. 304, 114130 (2021).
Cohen, A. S., Mitchell, K. R., Docherty, N. M. & Horan, W. P. Vocal expression in schizophrenia: less than meets the ear. J. Abnorm. Psychol. 125, 299 (2016).
Bone, D., Lee, C.-C., Chaspari, T., Gibson, J. & Narayanan, S. Signal processing and machine learning for mental health research and clinical applications [perspectives]. IEEE Signal Process. Mag. 34, 196–195 (2017).
Chekroud, A. M. et al. The promise of machine learning in predicting treatment outcomes in psychiatry. World Psychiatry 20, 154–170 (2021).
McKnight, S. W., Hogg, A. O., Neo, V. W. & Naylor, P. A. Uncertainty quantification in machine learning for joint speaker diarization and identification. arXiv preprint. https://doi.org/10.48550/arXiv.2312.16763 (2023).
Schrüfer, O., Milling, M., Burkhardt, F., Eyben, F. & Schuller, B. Are you sure? Analysing uncertainty quantification approaches for real-world speech emotion recognition. In Proc. Interspeech 2024, pp. 3210–3214 (2024).
Dighe, P. et al. Leveraging large language models for exploiting asr uncertainty. In ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),12231–12235 (IEEE, 2024).
Kendall, A. & Gal, Y. What uncertainties do we need in Bayesian deep learning for computer vision? Adv. Neural Inf. Process. Syst. 30. https://doi.org/10.48550/arXiv.1703.04977 (2017).
Kompa, B., Snoek, J. & Beam, A. L. Second opinion needed: communicating uncertainty in medical machine learning. NPJ Digit. Med. 4, 4 (2021).
Popat, R. & Ive, J. Embracing the uncertainty in human–machine collaboration to support clinical decision-making for mental health conditions. Front. Digit. Health 5, 1188338 (2023).
Kang, M. et al. Cure: context-and uncertainty-aware mental disorder detection. In Proc. 2024 Conference on Empirical Methods in Natural Language Processing 17924–17940 (Association for Computational Linguistics, 2024).
Sarti, P. et al. Investigating the structure of schizotypy through the ‘multidimensional schizotypy scale’ and ‘Oxford-Liverpool Inventory’: an exploratory network analysis approach in the healthy population. Schizophrenia. https://www.medrxiv.org/content/early/2024/07/14/2024.07.13.24310316 (2025).
Kirchhoff, C. et al. Gender-specific associations of adverse childhood experiences (ACES) and schizotypal traits—an observational study in healthy young adults. medRxiv. https://doi.org/10.1101/2024.07.08.24310072 (2024).
Kiang, M. Schizotypy and language: a review. J. Neurolinguist. 23, 193–203 (2010).
Minor, K. S. & Cohen, A. S. Affective reactivity of speech disturbances in schizotypy. J. Psychiatr. Res. 44, 99–105 (2010).
Cohen, A. S., Auster, T. L., McGovern, J. E. & MacAulay, R. K. The normalities and abnormalities associated with speech in psychometrically-defined schizotypy. Schizophr. Res. 160, 169–172 (2014).
Mason, O. J. The assessment of schizotypy and its clinical relevance. Schizophr. Bull. 41, S374–S385 (2015).
De la Fuente Garcia, S., Ritchie, C. W. & Luz, S. Artificial intelligence, speech, and language processing approaches to monitoring alzheimer’s disease: a systematic review. J. Alzheimer’s. Dis. 78, 1547–1574 (2020).
Asimakidou, E., Job, X. & Kilteni, K. The positive dimension of schizotypy is associated with a reduced attenuation and precision of self-generated touch. Schizophrenia 8, 57 (2022).
Buck, B. & Penn, D. L. Lexical characteristics of emotional narratives in schizophrenia: relationships with symptoms, functioning, and social cognition. J. Nerv. Ment. Dis. 203, 702–708 (2015).
Horan, W. P., Kring, A. M. & Blanchard, J. J. Anhedonia in schizophrenia: a review of assessment strategies. Schizophr. Bull. 32, 259–273 (2006).
Chang, X. et al. Language abnormalities in schizophrenia: binding core symptoms through contemporary empirical evidence. Schizophrenia 8, 95 (2022).
Mason, O., Claridge, G. & Jackson, M. New scales for the assessment of schizotypy. Personal. Individ. Differ. 18, 7–13 (1995).
Mason, O. & Claridge, G. The oxford-liverpool inventory of feelings and experiences (o-life): further description and extended norms. Schizophr. Res. 82, 203–211 (2006).
Kwapil, T. R., Gross, G. M., Silvia, P. J., Raulin, M. L. & Barrantes-Vidal, N. Development and psychometric properties of the multidimensional schizotypy scale: a new measure for assessing positive, negative, and disorganized schizotypy. Schizophr. Res. 193, 209–217 (2018).
Veale, J. F. Edinburgh handedness inventory–short form: a revised version based on confirmatory factor analysis. Laterality 19, 164–177 (2014).
Kay, S. R., Fiszbein, A. & Opler, L. A. The positive and negative syndrome scale (panss) for schizophrenia. Schizophr. Bull. 13, 261–276 (1987).
Bredin, H. & Laurent, A. End-to-end speaker segmentation for overlap-aware resegmentation. In Proc. Interspeech 2021, 3111–3115 (2021).
Bain, M., Huh, J., Han, T. & Zisserman, A. Whisperx: time-accurate speech transcription of long-form audio. In Proc. Interspeech 2023, 4489–4493 (2023).
Spiller, T. R. et al. Efficient and accurate transcription in mental health research—a tutorial on using Whisper AI for audio file transcription. OSF. https://doi.org/10.31219/osf.io/9fue8 (2023).
Baevski, A., Zhou, Y., Mohamed, A. & Auli, M. wav2vec 2.0: a framework for self-supervised learning of speech representations. Adv. neural Inf. Process. Syst. 33, 12449–12460 (2020).
Gong, Y. & Poellabauer, C. Topic modeling based multi-modal depression detection. In Proc. 7th Annual Workshop on Audio/Visual Emotion Challenge 69–76 (Association for Computing Machinery, 2017).
Eyben, F., Wöllmer, M. & Schuller, B. Opensmile: the Munich versatile and fast open-source audio feature extractor. In Proc. the 18th ACM International Conference on Multimedia 1459–1462 (Association for Computing Machinery, 2010).
Eyben, F. et al. The Geneva Minimalistic Acoustic Parameter Set (GEMAPS) for voice research and affective computing. IEEE Trans. Affect. Comput. 7, 190–202 (2015).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In Proc. IEEE Conference On Computer Vision and Pattern Recognition 4700–4708 (Institute of Electrical and Electronics Engineers, 2017).
Amiriparian, S. et al. Snore sound classification using image-based deep spectrum features In Proc. Interspeech 2017, 3512–3516 (2017).
Conneau, A. Unsupervised cross-lingual representation learning at scale. In Proc. 58th Annual Meeting of the Association for Computational Linguistics, 8440–8451 (2020).
Ruder, S. et al. Xtreme-r: towards more challenging and nuanced multilingual evaluation. In Proc. 2021 Conference on Empirical Methods in Natural Language Processing, 10215–10245 (2021).
Hazarika, D., Zimmermann, R. & Poria, S. Misa: Modality-invariant and-specific representations for multimodal sentiment analysis. In Proc. 28th ACM International Conference on Multimedia 1122–1131 (Association for Computing Machinery, 2020).
Tellamekala, M. K. et al. Cold fusion: Calibrated and ordinal latent distribution fusion for uncertainty-aware multimodal emotion recognition. In IEEE Transactions on Pattern Analysis and Machine Intelligence (Institute of Electrical and Electronics Engineers, 2023).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning 3145–3153 (PMlR, 2017).
Kilic, I. Y. & Pan, S. Incorporating LIWC in neural networks to improve human trait and behavior analysis in low resource scenarios. In Proc. Thirteenth Language Resources and Evaluation Conference 4532–4539 (European Language Resources Association, 2022).
Acknowledgements
We are grateful to all participants for their contributions. We also thank Anna Steiner, Linus Hany, and Ueli Stocker for their help with data collection. This work was supported by the European Union (GA 101080251—TRUSTING) and by the Swiss National Science Foundation (POZHP1_191938/1).
Author information
Authors and Affiliations
Contributions
M.R., R.H., and P.H. had full access to all of the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis. M.R., R.H., F.N., N.D., Y.P., W.S., I.S., W.H., N.L., M.K., and P.H. made substantial contributions to the conception, design, and analysis of the work, as well as to the drafting and final approval of the manuscript.
Corresponding author
Ethics declarations
Competing interests
P.H. has received grants and honoraria from Novartis, Lundbeck, Takeda, Mepha, Janssen, Boehringer Ingelheim, Neurolite and OM Pharma outside of this work. No other conflicts of interest were reported.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Rohanian, M., Hüppi, R., Nooralahzadeh, F. et al. Uncertainty modeling in multimodal speech analysis across the psychosis spectrum. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-025-02309-3
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41746-025-02309-3


