A device-invariant multi-modal learning framework for respiratory disease classification

Yang, Mo; Liu, Xuefei; Du, Wei; Liu, Yang; Zhu, Wenyu; Bu, Zhaoyang; Mao, Jiaxuan; Wang, Qian; Chen, Si; Zhou, Min; Qu, Jie-ming

doi:10.1038/s41746-026-02445-4

Download PDF

Article
Open access
Published: 26 February 2026

A device-invariant multi-modal learning framework for respiratory disease classification

Mo Yang¹,
Xuefei Liu²^na1,
Wei Du²^na1,
Yang Liu¹,
Wenyu Zhu¹,
Zhaoyang Bu¹,
Jiaxuan Mao¹,
Qian Wang¹,
Si Chen¹,
Min Zhou^2,3 &
…
Jie-ming Qu^2,3

npj Digital Medicine , Article number: (2026) Cite this article

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Recent advances in cough sound analysis using deep learning techniques enable smartphone-based respiratory disease screening suitable for self-management care in a home setting, yet their utility is limited by device heterogeneity, population diversity, and challenges in multimodal integration. We propose a device-invariant, multimodal deep learning framework that jointly models cough acoustics, demographic data, and symptom descriptions for multi-label classification of adult respiratory diseases. To address the issues of device effect, an adversarial branch is embedded in the audio encoder to enforce device-invariant feature learning, while an invariant risk minimization-augmented loss enhances robustness to non-structural shifts. To evaluate the effectiveness of our proposed method, a real-world, multi-center dataset containing over 10,000 cases spanning seven major respiratory conditions was curated. On the tasks of individual respiratory disease identification for chronic obstructive pulmonary disease (COPD), lower respiratory tract infection (LRTI) and pulmonary shadows (PS), our method achieves superior performance with the area under the receiver operating characteristic curve (AUROC) of 0.9698, 0.8483 and 0.8720, respectively. It also shows promising results in identifying the presence of comorbidities for 7 respiratory diseases with an overall AUROC of 0.8907. More importantly, extensive experimental results demonstrate our method mitigates the issues of device effect and facilitates the cross-device generalization for cough-based respiratory disease diagnoses. This work demonstrates a scalable and transferable AI-based approach for cough-driven respiratory screening, emphasizing the importance of multimodal fusion and robust representation learning in advancing clinical applicability.

Data availability

The datasets generated and/or analyzed during the current study are not publicly available due to the inclusion of sensitive clinical information collected under institutional and regulatory data-use agreements, as well as proprietary components that cannot be openly released, but are available from the corresponding author upon reasonable request.

Code availability

The code developed in this study is proprietary and has substantial commercial potential; accordingly, it cannot be made publicly available. Due to intellectual property protections and ongoing commercialization activities, the source code cannot be shared at this time. All model development, training, and analysis were conducted using Python 3.10 with PyTorch 2.1.0 (which can be accessed at https://pytorch.org/get-started/previous-versions/). Specific training configurations and parameters used to generate and analyze the datasets are detailed in the “Methods” section.

References

Wang, Z. et al. Global, regional, and national burden of chronic obstructive pulmonary disease and its attributable risk factors from 1990 to 2021: an analysis for the global burden of disease study 2021. Respir. Res. 26, 2 (2025).
Google Scholar
Bhakta, N. R., McGowan, A., Ramsey, K. A. et al. European Respiratory Society/american thoracic society technical statement: standardisation of the measurement of lung volumes, 2023 update. Eur. Respir. J. 62, 2201519 (2023).
Google Scholar
Thawanaphong, S. & Nair, P. Contemporary concise review 2024: chronic obstructive pulmonary disease. Respirology 30, 574–586 (2025).
Google Scholar
Agusti, A. & Vogelmeier, C. F. Gold 2024: a brief overview of key changes. J. Bras. Pneumol. 49, e20230369 (2023).
Google Scholar
Kim, S. H. & Han, M. K. Challenges and the future of pulmonary function testing in chronic obstructive pulmonary disease (copd): toward earlier diagnosis of copd. Tuberc. Respir. Dis. 88, 413–418 (2025).
Google Scholar
Chu, Y. et al. Cycleguardian: a framework for automatic respiratory sound classification based on improved deep clustering and contrastive learning. Complex Intell. Syst. 11, 200 (2025).
Google Scholar
Isangula, K. G. & Haule, R. J. Leveraging ai and machine learning to develop and evaluate a contextualized user-friendly cough audio classifier for detecting respiratory diseases: Protocol for a diagnostic study in rural Tanzania. JMIR Res. Protoc. 13, e54388 (2024).
Google Scholar
Sharan, R. V. & Xiong, H. Wet and dry cough classification using cough sound characteristics and machine learning: a systematic review. Int. J. Med. Inform. 199, 105912 (2025).
Google Scholar
Huddart, S. et al. A dataset of solicited cough sound for tuberculosis triage testing. Sci. Data 11, 1149 (2024).
Google Scholar
Morocutti, T., Schmid, F., Koutini, K. & Widmer, G. Device-robust acoustic scene classification via impulse response augmentation. In Proc. 31st European Signal Processing Conference (EUSIPCO) 176-180 (IEEE, 2023).
Mezza, A. I., Habets, E. A., Müller, M. & Sarti, A. Unsupervised domain adaptation for acoustic scene classification using band-wise statistics matching. In Proc. 28th European Signal Processing Conference (EUSIPCO) 11−15 (IEEE, 2020).
Ma, C., Wang, H. & Hoi, S. C. H. Multi-label thoracic disease image classification with cross-attention networks. In Proc. Int. Conf. on Medical Image Computing and Computer-Assisted Intervention 730−738 (Cham: Springer International Publishing, 2019).
Lei, T., Hu, Q., Hou, Z. & Lu, J. Enhancing real-world far-field speech with supervised adversarial training. Appl. Acoust. 229, 110407 (2025).
Google Scholar
Arjovsky, M., Bottou, L., Gulrajani, I. & Lopez-Paz, D. Invariant risk minimization. https://arxiv.org/abs/1907.02893 (2020).
Wang, J. et al. Joint asymmetric loss for learning with noisy labels. In Proc. IEEE/CVF International Conference on Computer Vision 1947−1956 (IEEE, 2025).
Gong, Y., Chung, Y.-A. & Glass, J. Ast: audio spectrogram transformer. In Proc. Interspeech 571-575 (2021).
van der Maaten, L. & Hinton, G. Visualizing data using t-SNE. J. Mach. Learn. Res. 9, 2579–2605 (2008).
Google Scholar
Santosh, K. C., Rasmussen, N., Mamun, M. & Aryal, S. A systematic review on cough sound analysis for covid-19 diagnosis and screening: is my cough sound COVID-19? PeerJ Comput. Sci. 8, e958 (2022).
Google Scholar
Santamaria, M. et al. Longitudinal voice monitoring in a decentralized bring your own device trial for respiratory illness detection. npj Digit. Med. 8, 202 (2025).
Google Scholar
Mersha, M., Lam, K., Wood, J., AlShami, A. K. & Kalita, J. Explainable artificial intelligence: a survey of needs, techniques, applications, and future direction. Neurocomputing 599, 128111 (2024).
Google Scholar
Baiardi, A. & Naghi, A. A. The value added of machine learning to causal inference: evidence from revisited studies. Econom. J. 27, 213–234 (2024).
Google Scholar
Poinsot, A. et al. Position: causal machine learning requires rigorous synthetic experiments for broader adoption. In Proc. 42nd International Conference on Machine Learning 81995-82015 (PMLR, 2025).
Guo, L.-Z., Jia, L.-H., Shao, J.-J. & Li, Y.-F. Robust semi-supervised learning in open environments. Front. Comput. Sci. 19, 198345 (2025).
Google Scholar
Schmidt, R. M. Recurrent neural networks (rnns): a gentle introduction and overview https://arxiv.org/abs/1912.05911 (2019).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Google Scholar
Liu, Y. et al. Roberta: a robustly optimized bert pretraining approach https://arxiv.org/abs/1907.11692 (2019).
Chen, F., Datta, G., Kundu, S. & Beerel, P. Self-attentive pooling for efficient deep learning. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision 3974-3983 (IEEE, 2023).
Ganin, Y. et al. Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17, 1–35 (2016).
Google Scholar
Luo, J., Phan, H., Wang, L. & Reiss, J. D. Bimodal connection attention fusion for speech emotion recognition https://arxiv.org/abs/2503.05858 (2025).
Vaswani, A. et al. Attention is all you need. In Proc. Advances in Neural Information Processing Systems 30 (2023).
Aly, H., Al-Ali, A. K. & Suganthan, P. N. Boosted multilayer feedforward neural network with multiple output layers. Pattern Recognit. 156, 110740 (2024).
Google Scholar
Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I. & Salakhutdinov, R. Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15, 1929–1958 (2014).
Google Scholar
Hendrycks, D. & Gimpel, K. Gaussian error linear units (GELUs) https://arxiv.org/abs/1606.08415 (2016).
Rumelhart, D. E., Hinton, G. E. & Williams, R. J. Learning representations by back-propagating errors. Nature 323, 533–536 (1986).
Google Scholar
Wang, Z., Xu, B., Yuan, Y., Shen, H. & Cheng, X. Infonce is a free lunch for semantically guided graph contrastive learning. In Proc. 48th International ACM SIGIR Conference on Research and Development in Information Retrieval 719–728 https://doi.org/10.1145/3726302.3730007 (ACM, 2025).
He, K. et al. Dalr: dual-level alignment learning for multimodal sentence representation learning. In Proc. Findings of the Association for Computational Linguistics: ACL 2025 3586-3601 (2025).
Kendall, A., Gal, Y. & Cipolla, R. Multi-task learning using uncertainty to weigh losses for scene geometry and semantics. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 7482-7491 (IEEE, 2018).
Chen, W., Liang, Y., Ma, Z., Zheng, Z. & Chen, X. EAT: Self-supervised pre-training with efficient audio transformer. In Proc. Thirty-Third International Joint Conference on Artificial Intelligence 3807-3815 (2024).
Kim, J.-W., Toikkanen, M., Choi, Y., Moon, S.-E. & Jung, H.-Y. Bts: bridging text and sound modalities for metadata-aided respiratory sound classification. In Interspeech 2024 1690–1694 (GitHub, 2024).
Shao, N., Li, X. & Li, X. Fine-tune the pretrained atst model for sound event detection. In Proc. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 911–915 (IEEE, 2024).
Chen, S. et al. BEATs: audio pre-training with acoustic tokenizers. In Proc. 40th International Conference on Machine Learning 5178-5193 (2023).

Download references

Acknowledgements

This work was supported by the National Key R&D Program of China (2022YFC2010005).

Author information

These authors contributed equally: Xuefei Liu, Wei Du.

Authors and Affiliations

Research&Development Department, Luca Healthcare, Shanghai, China
Mo Yang, Yang Liu, Wenyu Zhu, Zhaoyang Bu, Jiaxuan Mao, Qian Wang & Si Chen
Department of Pulmonary and Critical Care Medicine, Ruijin Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Xuefei Liu, Wei Du, Min Zhou & Jie-ming Qu
Institute of Respiratory Diseases, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Min Zhou & Jie-ming Qu

Authors

Mo Yang
View author publications
Search author on:PubMed Google Scholar
Xuefei Liu
View author publications
Search author on:PubMed Google Scholar
Wei Du
View author publications
Search author on:PubMed Google Scholar
Yang Liu
View author publications
Search author on:PubMed Google Scholar
Wenyu Zhu
View author publications
Search author on:PubMed Google Scholar
Zhaoyang Bu
View author publications
Search author on:PubMed Google Scholar
Jiaxuan Mao
View author publications
Search author on:PubMed Google Scholar
Qian Wang
View author publications
Search author on:PubMed Google Scholar
Si Chen
View author publications
Search author on:PubMed Google Scholar
Min Zhou
View author publications
Search author on:PubMed Google Scholar
Jie-ming Qu
View author publications
Search author on:PubMed Google Scholar

Contributions

M.Y. conceived and designed the study. X.L., W.D., Y.L., W.Z., Z.B., and J.M. collected the data. M.Y., W.Z., and Z.B. analyzed the data. M.Y., Q.W. and Y.L. drafted the manuscript. Q.W., S.C., M.Z., and J.Q. revised the draft. Q.W. supervised the study.

Corresponding authors

Correspondence to Qian Wang, Si Chen, Min Zhou or Jie-ming Qu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Yang, M., Liu, X., Du, W. et al. A device-invariant multi-modal learning framework for respiratory disease classification. npj Digit. Med. (2026). https://doi.org/10.1038/s41746-026-02445-4

Download citation

Received: 05 September 2025
Accepted: 07 February 2026
Published: 26 February 2026
DOI: https://doi.org/10.1038/s41746-026-02445-4