Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Reports
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific reports
  3. articles
  4. article
Class-attention pooling and token sparsity based vision transformers for chest X-ray interpretation
Download PDF
Download PDF
  • Article
  • Open access
  • Published: 10 February 2026

Class-attention pooling and token sparsity based vision transformers for chest X-ray interpretation

  • Vaibhav Lokunde1,
  • Keerthan Sundar1,
  • Anuj Khokhar1,
  • Bhawana Tyagi1,
  • Naga Priyadarsini R1 &
  • …
  • MohanKumar B1 

Scientific Reports , Article number:  (2026) Cite this article

  • 195 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Computational biology and bioinformatics
  • Diseases
  • Health care
  • Mathematics and computing
  • Medical research

Abstract

Chest radiography is the most widely used imaging technique for the diagnosing of lung diseases, but interpreting X-rays can be very difficult due to the subtle abnormalities and variations in image quality taken during X-rays. While deep learning models, especially the Convolution Neural Networks (CNNs), have shown strong performance when it comes to detecting pneumonia and other conditions, Vision Transformers (ViTs) have recently surpassed CNNs on several chest X-ray benchmarks by dividing images into small patches and learning global relationships. However, standard ViTs can sometimes focus on irrelevant regions, making their decisions less interpretable. To address this, we propose an enhanced ViT model tailored for chest X-ray analysis that prioritizes both accuracy and explainability. Our model introduces class-attention pooling technique, where each disease-specific class token learns to highlight relevant regions of the image, improving disease-wise focus. Token sparsity and random token dropping further help the model attend to only the most informative patches, enhancing robustness against noise. A convolutional stem is added before patch creation to extract fine local features like edges and textures, ensuring early capture of lung-specific patterns. Additionally, each X-ray undergoes preprocessing using Contrast Limited Adaptive Histogram Equalization (CLAHE), which enhances local contrast and makes subtle lesions more visible. The model is trained with mixed-precision computation, a warm-up cosine learning rate schedule, and the AdamW optimizer, allowing stable and efficient training on large datasets. It is then evaluated on Tuberculosis Chest X-Rays and Pulmonary Chest X-Rays datasets which are publicly available, the proposed framework achieved 99.19% Training accuracy, a Validation Accuracy of 97.78%, an F1-score of 0.94, and an AUC of 0.99, outperforming the baseline ViT. It is pointed out that the above scores are obtained based on the image-level split, owing to the limitations of the dataset, and the performance may be over-estimated compared to the validation on the patient level. The Grad-CAM heatmaps further confirm the fact that the model focuses on clinically relevant areas such as opacities or nodules, reinforcing interpretability and trust. Overall, this improved ViT framework offers both high diagnostic accuracy and also clear visual explanations, implying its possible usage in acting as an AI assistant for radiologists in efficiently detecting lung diseases.

Data availability

**Dataset 1: Tuberculosis (TB) Chest X-ray Database: ****https://www.kaggle.com/datasets/tawsifurrahman/tuberculosis-tb-chest-xray-dataset****Dataset 2: Pulmonary Chest X-Ray Abnormalities****https://www.kaggle.com/datasets/kmader/pulmonary-chest-xray-abnormalities**.

Abbreviations

ViT:

Vision transformer

CLAHE:

Contrast limited adaptive histogram equalization

EMA:

Exponential moving average

Grad-CAM:

Gradient—weighted class activation mapping

CE:

Cross entropy

LR:

Learning rate

ML:

Machine learning

AI:

Artificial intelligence

AUC:

Area under the ROC curve

F1:

F1 score

References

  1. Rajaraman, S., Candemir, T., Kim, I. & Antani, S. Visualization and interpretation of convolutional neural network predictions in detecting pneumonia in pediatric chest radiographs. Appl. Sci. 8 (10), 1715. https://doi.org/10.3390/app8101715 (2018).

    Google Scholar 

  2. Rahman, T. et al. Transfer learning with deep convolutional neural network (CNN) for pneumonia detection using chest X-ray. Appl. Sci. 10 (9), 3233. https://doi.org/10.3390/app10093233 (2020).

    Google Scholar 

  3. Huy, V. T. Q. & Lin, C. M. An improved densenet deep neural network model for tuberculosis detection using chest X-ray images. IEEE Access. 11, 42839–42849. https://doi.org/10.1109/ACCESS.2023.3270774 (2023).

    Google Scholar 

  4. Xu, T. & Yuan, Z. Convolution neural network with coordinate attention for the automatic detection of pulmonary tuberculosis images on chest X-rays. IEEE Access. 10, 86710–86717. https://doi.org/10.1109/ACCESS.2022.3199419 (2022).

    Google Scholar 

  5. Duong, L. T., Le, N. H., Tran, T. B., Ngo, V. M. & Nguyen, P. T. Detection of tuberculosis from chest X-ray images: Boosting the performance with Vision Transformer and transfer learning. Expert Syst. Appl. https://doi.org/10.1016/j.eswa.2021.115519 (2021).

    Google Scholar 

  6. El-Ghany, S. A., Elmogy, M., Mahmood, M. A. & Abd El-Aziz, A. A. A robust tuberculosis diagnosis using chest X-rays based on a hybrid vision transformer and principal component analysis. Diagnostics 14 (23), 2736. https://doi.org/10.3390/diagnostics14232736 (2024).

    Google Scholar 

  7. Ennab, M. & Mcheick, H. Advancing AI interpretability in medical imaging: A comparative analysis of pixel-level interpretability and Gradient-weighted class activation mapping (Grad-CAM) models. Mach. Learn. Knowl. Extr. 7 (1), 12. https://doi.org/10.3390/make7010012 (2025).

    Google Scholar 

  8. Rajaraman, S., Zamzmi, G., Folio, L. R. & Antani, S. Detecting tuberculosis-consistent findings in lateral chest X-rays using an ensemble of convolutional neural networks and vision Transformers. Front. Genet. 13, 864724. https://doi.org/10.3389/fgene.2022.864724 (2022).

    Google Scholar 

  9. Kotei, E. & Thirunavukarasu, R. Tuberculosis detection from chest X-ray image modalities based on transformer and convolutional neural network. IEEE Access. https://doi.org/10.1109/ACCESS.2024.3428446 (2024). Advance online publication.

    Google Scholar 

  10. Selvaraju, R. R. et al. Grad-CAM: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV) (pp. 618-626). https://doi.org/10.1109/ICCV.2017.74(2017).

  11. Ignatius, J. L. P. et al. Histogram matched chest X-rays based tuberculosis detection using CNN. Comput. Syst. Sci. Eng. 44 (1), 81–97. https://doi.org/10.32604/csse.2023.025195 (2023).

    Google Scholar 

  12. Urooj, S., Suchitra, S., Krishnasamy, L., Sharma, N. & Pathak, N. Stochastic learning-based artificial neural network model for an automatic tuberculosis detection system using chest X-ray images. IEEE Access, 10, 103632-103643. https://doi.org/10.1109/ACCESS.2022.3208882(2022).

  13. Rajakumar, M. P., Sonia, R., Uma Maheswari, B. & Karuppiah, S. P. Tuberculosis detection in chest X-ray using Mayfly-algorithm optimized dual-deep-learning features. J. X-Ray Sci. Technol. 29 (6), 961–974. https://doi.org/10.3233/XST-210976 (2021).

    Google Scholar 

  14. Sharma, V. et al. Deep learning models for tuberculosis detection and infected region visualization in chest X-ray images. Intell. Med. https://doi.org/10.1016/j.imed.2023.06.001 (2023).

    Google Scholar 

  15. Devasia, J., Goswami, H., Lakshminarayanan, S., Rajaram, M. & Adithan, S. Deep learning classification of active tuberculosis lung zone-wise manifestations using chest X-rays: A multi-label approach. Sci. Rep. 13, 887. https://doi.org/10.1038/s41598-023-28079-0 (2023).

    Google Scholar 

  16. Vanitha, K., Mahesh, T. R., Kumar, V., Guluwadi, S. & V., & Enhanced tuberculosis detection using vision Transformers and explainable AI with a Grad-CAM approach on chest X-rays. BMC Med. Imaging. https://doi.org/10.1186/s12880-025-01630-3 (2025)., 25, Article 96.

    Google Scholar 

  17. Dosovitskiy, A. et al. An image is worth 16×16 words: Transformers for image recognition at scale. In Proceedings of the International Conference on Learning Representations (ICLR). https://openreview.net/forum?id=YicbFdNTTy(2021).

  18. Vaswani, A. et al. Attention is all you need. Adv. Neural. Inf. Process. Syst. 30, 5998–6008 (2017). https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf

    Google Scholar 

  19. Chu, X., Tian, Z., Zhang, B., Wang, X. & Shen, C. Conditional positional encodings for vision transformers. In Proceedings of the 10th International Conference on Learning Representations (ICLR 2023 poster session). OpenReview. https://openreview.net/forum?id=3KWnuT-R1bh(2023).

  20. Touvron, H. et al. Going deeper with image transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) (pp. 32–42). IEEE. https://doi.org/10.1109/ICCV48922.2021.00010(2021).

  21. Singh, A., Sengupta, S. & Lakshminarayanan, V. Explainable deep learning models in medical image analysis. J. Imaging. 6 (6), 52. https://doi.org/10.3390/jimaging6060052 (2020).

    Google Scholar 

  22. Khan, A. et al. A survey of the vision Transformers and their CNN-Transformer based variants. Artif. Intell. Rev. 56 (Suppl 3), 2917–2970. https://doi.org/10.1007/s10462-023-10595-0 (2023).

    Google Scholar 

  23. Suara, S., Jha, A., Sinha, P. & Sekh, A. Is Grad-CAM explainable in medical images? arXiv. https://doi.org/10.1007/978-3-031-58181-6_11(2023).

  24. Rahman, T. et al. Reliable tuberculosis detection using chest X-ray with deep learning, segmentation and visualization. IEEE Access. 8, 191586–191601. https://doi.org/10.1109/ACCESS.2020.3031384 (2020).

    Google Scholar 

  25. Jaeger, S. et al. Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quant. Imaging Med. Surg. 4 (6), 475–477. https://doi.org/10.3978/j.issn.2223-4292.2014.11.20 (2014).

    Google Scholar 

  26. Zuiderveld, K. Contrast limited adaptive histogram equalization. In (ed Heckbert, P. S.) Graphics Gems IV (474–485). Academic Press Professional, Inc. https://doi.org/10.1016/B978-0-12-336156-1.50061-6 (1994).

  27. Ko, J., Park, S. & Woo, H. G. Optimization of vision transformer-based detection of lung diseases from chest X-ray images. BMC Med. Inf. Decis. Mak. 24, 191. https://doi.org/10.1186/s12911-024-02591-3 (2024).

    Google Scholar 

  28. Fu, X., Lin, R., Du, W., Tavares, A. & Liang, Y. Explainable hybrid transformer for multi-classification of lung disease using chest X-rays. Sci. Rep. 15, 6650. https://doi.org/10.1038/s41598-025-90607-x (2025).

    Google Scholar 

  29. Singh, S. et al. Efficient pneumonia detection using vision Transformers on chest X-rays. Sci. Rep. 14, 2487. https://doi.org/10.1038/s41598-024-52703-2 (2024).

    Google Scholar 

  30. Regmi, S., Subedi, A., Bagci, U. & Jha, D. Vision Transformer for efficient chest X-ray and gastrointestinal image classification [Preprint]. arXiv. https://doi.org/10.48550/arXiv.2304.11529(2023).

  31. Halder, A. et al. Implementing vision transformer for classifying 2D biomedical images. Sci. Rep. 14, 12567. https://doi.org/10.1038/s41598-024-63094-9 (2024).

    Google Scholar 

Download references

Funding

Open access funding provided by Vellore Institute of Technology.

Author information

Authors and Affiliations

  1. School of Computer Science & Engineering, Vellore Institute of Technology, Vellore, Tamil Nadu, India

    Vaibhav Lokunde, Keerthan Sundar, Anuj Khokhar, Bhawana Tyagi, Naga Priyadarsini R & MohanKumar B

Authors
  1. Vaibhav Lokunde
    View author publications

    Search author on:PubMed Google Scholar

  2. Keerthan Sundar
    View author publications

    Search author on:PubMed Google Scholar

  3. Anuj Khokhar
    View author publications

    Search author on:PubMed Google Scholar

  4. Bhawana Tyagi
    View author publications

    Search author on:PubMed Google Scholar

  5. Naga Priyadarsini R
    View author publications

    Search author on:PubMed Google Scholar

  6. MohanKumar B
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Vaibhav Lokund, Keerthan Sundar & Anuj Khokhar,: Conceptualization, Methodology, Formal analysis, Software, Writing – review & editing.Bhawana Tyagi, Naga Priyadarsini R, & MohanKumar B : Supervision, Writing – review & editing.

Corresponding author

Correspondence to Bhawana Tyagi.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lokunde, V., Sundar, K., Khokhar, A. et al. Class-attention pooling and token sparsity based vision transformers for chest X-ray interpretation. Sci Rep (2026). https://doi.org/10.1038/s41598-026-37109-6

Download citation

  • Received: 04 November 2025

  • Accepted: 19 January 2026

  • Published: 10 February 2026

  • DOI: https://doi.org/10.1038/s41598-026-37109-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • Vision transformer
  • Chest x-ray classification
  • Explainable AI
  • Class-attention pooling
  • Token-level sparsity
  • Robust training
  • Grad-CAM
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Subjects
  • Follow us on Facebook
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • About Scientific Reports
  • Contact
  • Journal policies
  • Guide to referees
  • Calls for Papers
  • Editor's Choice
  • Journal highlights
  • Open Access Fees and Funding

Publish with us

  • For authors
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Reports (Sci Rep)

ISSN 2045-2322 (online)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics