CR-MSNet: a dual-branch multi-scale attention network for multi-label chest X-ray classification

Wang, Yu; Bao, Caiyin; Wang, Zichen; Shi, Yupeng; Yang, Jianlan

doi:10.1038/s41598-026-44591-5

Download PDF

Article
Open access
Published: 23 March 2026

CR-MSNet: a dual-branch multi-scale attention network for multi-label chest X-ray classification

Yu Wang¹^na1,
Caiyin Bao¹,
Zichen Wang¹,
Yupeng Shi¹ &
…
Jianlan Yang^1,2^na1

Scientific Reports , Article number: (2026) Cite this article

394 Accesses
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Multi-label disease diagnosis in chest X-rays necessitates simultaneous consideration of both global organ structures and local lesion characteristics. However, current methodologies primarily utilize single-branch architectures and lack effective attention guidance mechanisms, which complicates the balance between global context and local details. Furthermore, multi-label datasets for chest X-rays often suffer from significant class imbalance. We propose CR-MSNet, a dual-branch multi-scale attention network designed for multi-label chest X-ray classification. The global branch is constructed using CoAtNet-2-rw to capture holistic semantic representations, while the local branch employs a residual convolutional neural network to extract detailed lesion features. We incorporate a cross-attention mechanism to facilitate adaptive interaction and information exchange between global and local representations. Additionally, we propose a Parallel Multi-Scale Channel-Spatial Attention (PMS-CSA) module to enhance both key semantic channels and potential lesion regions, thereby increasing the discriminative power of feature representations. A two-stage training strategy with an adjusted loss function is implemented to effectively alleviate the detrimental effects of class imbalance on model performance. Experimental results indicate that CR-MSNet achieves a macro-average AUC of 0.847 on the ChestX-ray14 dataset, confirming its effectiveness and potential for application in multi-label classification tasks for chest X-rays. By seamlessly integrating a dual-branch architecture with multi-scale attention mechanisms, this study confirms the critical role of attention-guided feature interactions in reconciling global and local representations.

Data availability

The datasets analyzed during the current study are available at the following links: https://www.kaggle.com/datasets/nih-chest-xrays/data

References

Li, X., Xu, X., Liu, Y. & Zhao, X. CheX-DS: Improving chest X-ray image classification with ensemble learning based on DenseNet and swin transformer. In 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 5295–5301. https://doi.org/10.1109/BIBM62325.2024.10822262. (IEEE, 2024).
Rajpurkar, P. et al. CheXNet: radiologist-level pneumonia detection on chest X-rays with deep learning. https://arXiv.org/abs/1711.05225 (2017).
Chen, C., Mat Isa, N. A., Liu, X., Ding, J. & Lu, L. MSA-Net: Multi-scale attention-based DenseNet for multi-label chest X-ray image classification. Biomed. Signal Process. Control 113, 109069 (2026).
Google Scholar
Chowdary, G. J. & Kanhangad, V. A dual-branch network for diagnosis of thorax diseases from chest X-rays. IEEE J. Biomed. Health Inform. 26, 6081–6092 (2022).
Google Scholar
Dai, Z., Liu, H., Le, Q. V. & Tan, M. CoAtNet: Marrying convolution and attention for all data sizes. In Advances in Neural Information Processing Systems (eds Dai, Z. et al.) 3965–3977 (Curran Associates Inc., 2021).
Google Scholar
Wang, X. et al. ChestX-ray8: Hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. 2097–2106 (2017).
Strick, D., Garcia, C. & Huang, A. Reproducing and improving CheXNet: deep learning for chest X-ray disease classification. https://arXiv.org/abs/2505.06646 (2025).
Zhao, X. & Wang, X. Multi-label chest X-ray image classification based on long-range dependencies capture and label relationships learning. Biomed. Signal Process. Control 100, 107018 (2025).
Google Scholar
Taslimi, S., Taslimi, S., Fathi, N., Salehi, M. & Rohban, M. H. SwinCheX: multi-label classification on chest X-ray images with transformers. https://arXiv.org/abs/2206.04246 (2022).
Öztürk, Ş, Turalı, M. Y. & Çukur, T. HydraViT: Adaptive multi-branch transformer for multi-label disease classification from chest X-ray images. Biomed. Signal Process. Control 100, 106959 (2025).
Google Scholar
Ashraf, S. M. N., Mamun, M. A., Abdullah, H. M. & Alam, M. G. R. SynthEnsemble: a fusion of CNN, vision transformer, and hybrid models for multi-label chest X-ray classification. In 2023 26th International Conference on Computer and Information Technology (ICCIT). 1–6. https://doi.org/10.1109/ICCIT60459.2023.10441433. (2023).
Faisal, M. et al. CheXViT: CheXNet and vision transformer to multi-label chest X-ray image classification. In 2023 IEEE International Symposium on Medical Measurements and Applications (MeMeA). 1–6. https://doi.org/10.1109/MeMeA57477.2023.10171855. (IEEE, Jeju, Korea, Republic of, 2023).
Lee, Y.-W., Huang, S.-K. & Chang, R.-F. CheXGAT: A disease correlation-aware network for thorax disease diagnosis from chest X-ray images. Artif. Intell. Med. 132, 102382 (2022).
Google Scholar
Ngo, B. H., Lam, B. T., Nguyen, T. H., Dinh, Q. V. & Choi, T. J. Dual dynamic consistency regularization for semi-supervised domain adaptation. IEEE Access 12, 36267–36279 (2024).
Google Scholar
Ngo, B. H., Do-Tran, N.-T., Nguyen, T.-N., Jeon, H.-G. & Choi, T. J. Learning CNN on ViT: A hybrid model to explicitly class-specific boundaries for domain adaptation. In 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 28545–28554. https://doi.org/10.1109/CVPR52733.2024.02697. (IEEE, 2024).
Bui, D. C., Le, T. V., Ngo, B. H. & Choi, T. J. CLEAR: Cross-transformers with pre-trained language model for person attribute recognition and retrieval. Pattern Recognit. 164, 111486 (2025).
Google Scholar
Xiong, K., Tu, Y., Rao, X., Zou, X. & Du, Y. Multi-label disease detection in chest X-ray imaging using a fine-tuned ConvNeXtV2 with a customized classifier. Informatics 12, 80 (2025).
Google Scholar
Yu, S. & Zhou, P. An optimized transformer model for efficient detection of thoracic diseases in chest X-rays with multi-scale feature fusion. PLoS ONE 20, e0323239 (2025).
Google Scholar
Wang, Q., Wu, Z., Gao, J., Yu, H. & Cheng, Y. A multi-label chest X-ray image classification algorithm based on multi-scale and attribute-aware semantic graph. Expert Syst. Appl. 298, 129898 (2026).
Google Scholar
Alqahtani, O., Ghouse, M., Sabahath, A., Hussain, O. B. & Begum, A. Multi-scale vision transformer with dynamic multi-loss function for medical image retrieval and classification. Comput. Mater. Contin. 83, 2221–2244 (2025).
Google Scholar
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/cvpr.2016.90. (IEEE, 2016).
Hu, J., Shen, L. & Sun, G. Squeeze-and-excitation networks. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7132–7141. (2024).
Woo, S., Park, J., Lee, J.-Y. & Kweon, I. S. CBAM: Convolutional block attention module. In Computer Vision – ECCV 2018 Vol. 11211 (eds Ferrari, V. et al.) 3–19 (Springer International Publishing, 2018).
Google Scholar
Wang, Q. et al. ECA-net: Efficient channel attention for deep convolutional neural networks. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11531–11539. https://doi.org/10.1109/CVPR42600.2020.01155. (IEEE, 2020).
Singh, S. Computer-aided diagnosis of thoracic diseases in chest X-rays using hybrid CNN-transformer architecture. https://arXiv.org/abs/2404.11843 (2024).
Pan, X., Ye, T., Xia, Z., Song, S. & Huang, G. Slide-transformer: Hierarchical vision transformer with local self-attention. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2082–2091 (2023).
Xu, Q. & Duan, W. DualAttNet: Synergistic fusion of image-level and fine-grained disease attention for multi-label lesion detection in chest X-rays. https://arXiv.org/abs/2306.13813 (2023).
Ucan, M., Kaya, B., Aygun, O., Kaya, M. & Alhajj, R. Comparison of EfficientNet CNN models for multi-label chest X-ray disease diagnosis. PeerJ Comput. Sci. 11, e2968 (2025).
Google Scholar
Jiang, X., Zhu, Y., Liu, Y., Cai, G. & Fang, H. TransDD: A transformer-based dual-path decoder for improving the performance of thoracic diseases classification using chest X-ray. Biomed. Signal Process. Control 91, 105937 (2024).
Google Scholar
Wan, J., Lai, Z., Liu, J., Zhou, J. & Gao, C. Robust face alignment by multi-order high-precision hourglass network. IEEE Trans. Image Process. 30, 121–133 (2020).
Google Scholar
Wan, J., Lai, Z., Li, J., Zhou, J. & Gao, C. Robust facial landmark detection by multi-order multi-constraint deep networks. IEEE Trans. Neural Netw. Learn. Syst. 33, 2181–2194 (2021).
Google Scholar
Wan, J. et al. Fine-grained image captioning by ranking diffusion transformer. IEEE Trans. Image Process. 34, 8332–8344 (2025).
Google Scholar
Albahli, S., Rauf, H. T., Algosaibi, A. & Balas, V. E. AI-driven deep CNN approach for multi-label pathology classification using chest X-rays. PeerJ Comput. Sci. 7, e495 (2021).
Google Scholar
Nguyen-Mau, T.-H., Huynh, T.-L., Le, T.-D., Nguyen, H.-D. & Tran, M.-T. Advanced augmentation and ensemble approaches for classifying long-tailed multi-label chest X-rays. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). 2721–2730. https://doi.org/10.1109/ICCVW60793.2023.00288. (IEEE, 2023).
Cai, D. et al. Label semantic improvement with graph convolutional networks for multi-label chest X-ray image classification. In 2023 13th International Conference on Information Technology in Medicine and Education (ITME). 711–717. https://doi.org/10.1109/ITME60234.2023.00147. (IEEE, 2023).
Hanif, M. S., Bilal, M., Alsaggaf, A. H. & Al-Saggaf, U. M. Enhancing multi-label chest X-ray classification using an improved ranking loss. Bioengineering 12, 593 (2025).
Google Scholar
Seo, H. et al. Enhancing multi-label long-tailed classification on chest X-rays through ML-GCN augmentation. In 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW). 2739–2748. https://doi.org/10.1109/iccvw60793.2023.00290. (IEEE, 2023).
Xiao, J., Bai, Y., Yuille, A. & Zhou, Z. Delving into masked autoencoders for multi-label thorax disease classification. https://arXiv.org/abs/2210.12843 (2022).
Huang, G., Liu, Z., Van Der Maaten, L. & Weinberger, K. Q. Densely connected convolutional networks. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261–2269. https://doi.org/10.1109/cvpr.2017.243. (IEEE, 2017).
Dosovitskiy, A., Beyer, L., Kolesnikov, A., & Others. An image is worth 16x16 words: transformers for image recognition at scale. https://arXiv.org/abs/2010.11929 (2021).
Liu, Z. et al. Swin transformer: Hierarchical vision transformer using shifted windows. In 2021 IEEE/CVF International Conference on Computer Vision (ICCV). 9992–10002. https://doi.org/10.1109/iccv48922.2021.00986. (IEEE, 2021).
Tu, Z. et al. MaxViT: Multi-axis vision transformer. https://arXiv.org/abs/2204.01697 (2022).
DiCiccio, T. J. & Efron, B. Bootstrap confidence intervals. Stat. Sci. 11, 189–228 (1996).
Google Scholar
Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. Proceedings of the IEEE International Conference on Computer Vision. 618–626 (2017).
Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proceedings of the AAAI Conference on Artificial Intelligence. 33, 590-597 (2019).
Guo, C., Pleiss, G., Sun, Y. & Weinberger, K. Q. On calibration of modern neural networks. In Proceedings of the 34th International Conference on Machine Learning. 1321–1330 (PMLR, 2017).

Download references

Acknowledgements

The authors gratefully acknowledge all individualswho contributed directly or indirectly to this work.

Author information

Yu Wang and Jianlan Yang contributed equally to this work and should be considered co-first authors.

Authors and Affiliations

School of Medical Information Engineering, Gansu University of Chinese Medicine, Lanzhou, 730000, Gansu, China
Yu Wang, Caiyin Bao, Zichen Wang, Yupeng Shi & Jianlan Yang
Quanzhou Orthopedic Traumatological Hospital, Quanzhou, 362000, Fujian, China
Jianlan Yang

Authors

Yu Wang
View author publications
Search author on:PubMed Google Scholar
Caiyin Bao
View author publications
Search author on:PubMed Google Scholar
Zichen Wang
View author publications
Search author on:PubMed Google Scholar
Yupeng Shi
View author publications
Search author on:PubMed Google Scholar
Jianlan Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

**Yu Wang**: Conceptualization, Methodology, Writing—original draft. **Caiyin Bao**: Data curation, Visualization. **Zichen Wang**: Validation. **Yupeng Shi**: Investigation, Formal analysis. **Jianlan Yang**: Supervision, Writing—review &editing.All authors have read and approved the final manuscript.

Corresponding author

Correspondence to Jianlan Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Y., Bao, C., Wang, Z. et al. CR-MSNet: a dual-branch multi-scale attention network for multi-label chest X-ray classification. Sci Rep (2026). https://doi.org/10.1038/s41598-026-44591-5

Download citation

Received: 25 December 2025
Accepted: 12 March 2026
Published: 23 March 2026
DOI: https://doi.org/10.1038/s41598-026-44591-5

CR-MSNet: a dual-branch multi-scale attention network for multi-label chest X-ray classification

Subjects

Abstract

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Search

Quick links

Subjects

Abstract

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links