Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

npj Digital Medicine
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. npj digital medicine
  3. articles
  4. article
A device-invariant multi-modal learning framework for respiratory disease classification
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 26 February 2026

A device-invariant multi-modal learning framework for respiratory disease classification

  • Mo Yang1,
  • Xuefei Liu2 na1,
  • Wei Du2 na1,
  • Yang Liu1,
  • Wenyu Zhu1,
  • Zhaoyang Bu1,
  • Jiaxuan Mao1,
  • Qian Wang1,
  • Si Chen1,
  • Min Zhou2,3 &
  • …
  • Jie-ming Qu2,3 

npj Digital Medicine , Article number:  (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Diseases
  • Engineering
  • Health care
  • Mathematics and computing

Abstract

Recent advances in cough sound analysis using deep learning techniques enable smartphone-based respiratory disease screening suitable for self-management care in a home setting, yet their utility is limited by device heterogeneity, population diversity, and challenges in multimodal integration. We propose a device-invariant, multimodal deep learning framework that jointly models cough acoustics, demographic data, and symptom descriptions for multi-label classification of adult respiratory diseases. To address the issues of device effect, an adversarial branch is embedded in the audio encoder to enforce device-invariant feature learning, while an invariant risk minimization-augmented loss enhances robustness to non-structural shifts. To evaluate the effectiveness of our proposed method, a real-world, multi-center dataset containing over 10,000 cases spanning seven major respiratory conditions was curated. On the tasks of individual respiratory disease identification for chronic obstructive pulmonary disease (COPD), lower respiratory tract infection (LRTI) and pulmonary shadows (PS), our method achieves superior performance with the area under the receiver operating characteristic curve (AUROC) of 0.9698, 0.8483 and 0.8720, respectively. It also shows promising results in identifying the presence of comorbidities for 7 respiratory diseases with an overall AUROC of 0.8907. More importantly, extensive experimental results demonstrate our method mitigates the issues of device effect and facilitates the cross-device generalization for cough-based respiratory disease diagnoses. This work demonstrates a scalable and transferable AI-based approach for cough-driven respiratory screening, emphasizing the importance of multimodal fusion and robust representation learning in advancing clinical applicability.

Data availability

The datasets generated and/or analyzed during the current study are not publicly available due to the inclusion of sensitive clinical information collected under institutional and regulatory data-use agreements, as well as proprietary components that cannot be openly released, but are available from the corresponding author upon reasonable request.

Code availability

The code developed in this study is proprietary and has substantial commercial potential; accordingly, it cannot be made publicly available. Due to intellectual property protections and ongoing commercialization activities, the source code cannot be shared at this time. All model development, training, and analysis were conducted using Python 3.10 with PyTorch 2.1.0 (which can be accessed at https://pytorch.org/get-started/previous-versions/). Specific training configurations and parameters used to generate and analyze the datasets are detailed in the “Methods” section.

References

  1. Wang, Z. et al. Global, regional, and national burden of chronic obstructive pulmonary disease and its attributable risk factors from 1990 to 2021: an analysis for the global burden of disease study 2021. Respir. Res. 26, 2 (2025).

    Google Scholar 

  2. Bhakta, N. R., McGowan, A., Ramsey, K. A. et al. European Respiratory Society/american thoracic society technical statement: standardisation of the measurement of lung volumes, 2023 update. Eur. Respir. J. 62, 2201519 (2023).

    Google Scholar 

  3. Thawanaphong, S. & Nair, P. Contemporary concise review 2024: chronic obstructive pulmonary disease. Respirology 30, 574–586 (2025).

    Google Scholar 

  4. Agusti, A. & Vogelmeier, C. F. Gold 2024: a brief overview of key changes. J. Bras. Pneumol. 49, e20230369 (2023).

    Google Scholar 

  5. Kim, S. H. & Han, M. K. Challenges and the future of pulmonary function testing in chronic obstructive pulmonary disease (copd): toward earlier diagnosis of copd. Tuberc. Respir. Dis. 88, 413–418 (2025).

    Google Scholar 

  6. Chu, Y. et al. Cycleguardian: a framework for automatic respiratory sound classification based on improved deep clustering and contrastive learning. Complex Intell. Syst. 11, 200 (2025).

    Google Scholar 

  7. Isangula, K. G. & Haule, R. J. Leveraging ai and machine learning to develop and evaluate a contextualized user-friendly cough audio classifier for detecting respiratory diseases: Protocol for a diagnostic study in rural Tanzania. JMIR Res. Protoc. 13, e54388 (2024).

    Google Scholar 

  8. Sharan, R. V. & Xiong, H. Wet and dry cough classification using cough sound characteristics and machine learning: a systematic review. Int. J. Med. Inform. 199, 105912 (2025).

    Google Scholar 

  9. Huddart, S. et al. A dataset of solicited cough sound for tuberculosis triage testing. Sci. Data 11, 1149 (2024).

    Google Scholar 

  10. Morocutti, T., Schmid, F., Koutini, K. & Widmer, G. Device-robust acoustic scene classification via impulse response augmentation. In Proc. 31st European Signal Processing Conference (EUSIPCO) 176-180 (IEEE, 2023).

  11. Mezza, A. I., Habets, E. A., Müller, M. & Sarti, A. Unsupervised domain adaptation for acoustic scene classification using band-wise statistics matching. In Proc. 28th European Signal Processing Conference (EUSIPCO) 11−15 (IEEE, 2020).

  12. Ma, C., Wang, H. & Hoi, S. C. H. Multi-label thoracic disease image classification with cross-attention networks. In Proc. Int. Conf. on Medical Image Computing and Computer-Assisted Intervention 730−738 (Cham: Springer International Publishing, 2019).

  13. Lei, T., Hu, Q., Hou, Z. & Lu, J. Enhancing real-world far-field speech with supervised adversarial training. Appl. Acoust. 229, 110407 (2025).

    Google Scholar 

  14. Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. https://arxiv.org/abs/1907.02893 (2020).

  15. Wang, J. et al. Joint asymmetric loss for learning with noisy labels. In Proc. IEEE/CVF International Conference on Computer Vision 1947−1956 (IEEE, 2025).

  16. Gong, Y., Chung, Y.-A. & Glass, J. Ast: audio spectrogram transformer. In Proc. Interspeech 571-575 (2021).

  17. van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).

    Google Scholar 

  18. Santosh, K. C., Rasmussen, N., Mamun, M. & Aryal, S. A systematic review on cough sound analysis for covid-19 diagnosis and screening: is my cough sound COVID-19? PeerJ Comput. Sci. 8, e958 (2022).

    Google Scholar 

  19. Santamaria, M. et al. Longitudinal voice monitoring in a decentralized bring your own device trial for respiratory illness detection. npj Digit. Med. 8, 202 (2025).

    Google Scholar 

  20. Mersha, M., Lam, K., Wood, J., AlShami, A. K. & Kalita, J. Explainable artificial intelligence: a survey of needs, techniques, applications, and future direction. Neurocomputing 599, 128111 (2024).

    Google Scholar 

  21. Baiardi, A. & Naghi, A. A. The value added of machine learning to causal inference: evidence from revisited studies. Econom. J. 27, 213–234 (2024).

    Google Scholar 

  22. Poinsot, A. et al. Position: causal machine learning requires rigorous synthetic experiments for broader adoption. In Proc. 42nd International Conference on Machine Learning 81995-82015 (PMLR, 2025).

  23. Guo, L.-Z., Jia, L.-H., Shao, J.-J. & Li, Y.-F. Robust semi-supervised learning in open environments. Front. Comput. Sci. 19, 198345 (2025).

    Google Scholar 

  24. Schmidt, R. M. Recurrent neural networks (rnns): a gentle introduction and overview https://arxiv.org/abs/1912.05911 (2019).

  25. Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).

    Google Scholar 

  26. Liu, Y. et al. Roberta: a robustly optimized bert pretraining approach https://arxiv.org/abs/1907.11692 (2019).

  27. Chen, F., Datta, G., Kundu, S. & Beerel, P. Self-attentive pooling for efficient deep learning. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 3974-3983 (IEEE, 2023).

  28. Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35 (2016).

    Google Scholar 

  29. Luo, J., Phan, H., Wang, L. & Reiss, J. D. Bimodal connection attention fusion for speech emotion recognition https://arxiv.org/abs/2503.05858 (2025).

  30. Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (2023).

  31. Aly, H., Al-Ali, A. K. & Suganthan, P. N. Boosted multilayer feedforward neural network with multiple output layers. Pattern Recognit. 156, 110740 (2024).

    Google Scholar 

  32. Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).

    Google Scholar 

  33. Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs) https://arxiv.org/abs/1606.08415 (2016).

  34. Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).

    Google Scholar 

  35. Wang, Z., Xu, B., Yuan, Y., Shen, H. & Cheng, X. Infonce is a free lunch for semantically guided graph contrastive learning. In Proc. 48th International ACM SIGIR Conference on Research and Development in Information Retrieval 719–728 https://doi.org/10.1145/3726302.3730007 (ACM, 2025).

  36. He, K. et al. Dalr: dual-level alignment learning for multimodal sentence representation learning. In Proc. Findings of the Association for Computational Linguistics: ACL 2025 3586-3601 (2025).

  37. Kendall, A., Gal, Y. & Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 7482-7491 (IEEE, 2018).

  38. Chen, W., Liang, Y., Ma, Z., Zheng, Z. & Chen, X. EAT: Self-supervised pre-training with efficient audio transformer. In Proc. Thirty-Third International Joint Conference on Artificial Intelligence 3807-3815 (2024).

  39. Kim, J.-W., Toikkanen, M., Choi, Y., Moon, S.-E. & Jung, H.-Y. Bts: bridging text and sound modalities for metadata-aided respiratory sound classification. In Interspeech 2024 1690–1694 (GitHub, 2024).

  40. Shao, N., Li, X. & Li, X. Fine-tune the pretrained atst model for sound event detection. In Proc. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 911–915 (IEEE, 2024).

  41. Chen, S. et al. BEATs: audio pre-training with acoustic tokenizers. In Proc. 40th International Conference on Machine Learning 5178-5193 (2023).

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (2022YFC2010005).

Author information

Author notes
  1. These authors contributed equally: Xuefei Liu, Wei Du.

Authors and Affiliations

  1. Research&Development Department, Luca Healthcare, Shanghai, China

    Mo Yang, Yang Liu, Wenyu Zhu, Zhaoyang Bu, Jiaxuan Mao, Qian Wang & Si Chen

  2. Department of Pulmonary and Critical Care Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China

    Xuefei Liu, Wei Du, Min Zhou & Jie-ming Qu

  3. Institute of Respiratory Diseases, Shanghai Jiao Tong University School of Medicine, Shanghai, China

    Min Zhou & Jie-ming Qu

Authors
  1. Mo Yang
    View author publications

    Search author on:PubMed Google Scholar

  2. Xuefei Liu
    View author publications

    Search author on:PubMed Google Scholar

  3. Wei Du
    View author publications

    Search author on:PubMed Google Scholar

  4. Yang Liu
    View author publications

    Search author on:PubMed Google Scholar

  5. Wenyu Zhu
    View author publications

    Search author on:PubMed Google Scholar

  6. Zhaoyang Bu
    View author publications

    Search author on:PubMed Google Scholar

  7. Jiaxuan Mao
    View author publications

    Search author on:PubMed Google Scholar

  8. Qian Wang
    View author publications

    Search author on:PubMed Google Scholar

  9. Si Chen
    View author publications

    Search author on:PubMed Google Scholar

  10. Min Zhou
    View author publications

    Search author on:PubMed Google Scholar

  11. Jie-ming Qu
    View author publications

    Search author on:PubMed Google Scholar

Contributions

M.Y. conceived and designed the study. X.L., W.D., Y.L., W.Z., Z.B., and J.M. collected the data. M.Y., W.Z., and Z.B. analyzed the data. M.Y., Q.W. and Y.L. drafted the manuscript. Q.W., S.C., M.Z., and J.Q. revised the draft. Q.W. supervised the study.

Corresponding authors

Correspondence to Qian Wang, Si Chen, Min Zhou or Jie-ming Qu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, M., Liu, X., Du, W. et al. A device-invariant multi-modal learning framework for respiratory disease classification. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02445-4

Download citation

  • Received: 05 September 2025

  • Accepted: 07 February 2026

  • Published: 26 February 2026

  • DOI: https://doi.org/10.1038/s41746-026-02445-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Multimodal AI for Digital Medicine

Advertisement

Explore content

  • Research articles
  • Reviews & Analysis
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Content types
  • Journal Information
  • About the Editors
  • Contact
  • Editorial policies
  • Calls for Papers
  • Journal Metrics
  • About the Partner
  • Open Access
  • Early Career Researcher Editorial Fellowship
  • Editorial Team Vacancies
  • News and Views Student Editor
  • Communication Fellowship

Publish with us

  • For Authors and Referees
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

npj Digital Medicine (npj Digit. Med.)

ISSN 2398-6352 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics