Abstract
Deep neural networks excel in radiological image classification but frequently suffer from poor interpretability, limiting clinical acceptance. We present MedicalPatchNet, an inherently self-explainable architecture for chest X-ray classification that transparently attributes decisions to distinct image regions. MedicalPatchNet splits images into non-overlapping patches, independently classifies each patch, and aggregates predictions, enabling intuitive visualization of each patch’s diagnostic contribution without post-hoc techniques. Trained on the CheXpert dataset (223,414 images), MedicalPatchNet matches the classification performance (AUROC 0.907 vs. 0.908) of EfficientNetV2-S, while improving interpretability: MedicalPatchNet demonstrates improved interpretability with higher pathology localization accuracy (mean hit-rate 0.485 vs. 0.376 with Grad-CAM) on the CheXlocalize dataset. By providing explicit, reliable explanations accessible even to non-AI experts, MedicalPatchNet mitigates risks associated with shortcut learning, thus improving clinical trust. Our model is publicly available with reproducible training and inference scripts and contributes to safer, explainable AI-assisted diagnostics across medical imaging domains. We make the code publicly available: https://github.com/TruhnLab/MedicalPatchNet
Data availability
The CheXpert dataset41 used for training is publicly available from the Stanford ML Group at https://stanfordmlgroup.github.io/competitions/chexpert/. The CheXlocalize dataset25, used for evaluation, is publicly available from the Stanford AIMI group at https://stanfordaimi.azurewebsites.net/datasets/23c56a0d-15de-405b-87c8-99c30138950c. The source code for our model, training, and evaluation, are publicly available on GitHub at github.com/TruhnLab/MedicalPatchNet. The model weights are publicly available on Hugging Face: https://huggingface.co/patrick-w/MedicalPatchNet
References
Shahid, N., Rappon, T. & Berta, W. Applications of artificial neural networks in health care organizational decision-making: A scoping review. PLoS One 14, e0212356 (2019).
Marinovich, M. L. et al. Artificial intelligence (AI) for breast cancer screening: Breastscreen population-based cohort study of cancer detection. eBioMedicine 90, 104498. https://doi.org/10.1016/j.ebiom.2023.104498 (2023).
Xia, H., Daley, B. J., Petrie, A. & Zhao, X. A neural network model for mortality prediction in icu. In 2012 Computing in Cardiology, 261–264 (2012).
Yin, L. et al. Convolution-transformer for image feature extraction. Comput. Model. Eng. Sci. 141, 87–106. https://doi.org/10.32604/cmes.2024.051083 (2024).
Jiang, R. et al. A transformer-based weakly supervised computational pathology method for clinical-grade diagnosis and molecular marker discovery of gliomas. Nat. Mach. Intell. 6, 876–891. https://doi.org/10.1038/s42256-024-00868-w (2024).
Feng, X. et al. Autofe-pointer: Auto-weighted feature extractor based on pointer network for dna methylation prediction. Int. J. Biol. Macromol. 311, 143668. https://doi.org/10.1016/j.ijbiomac.2025.143668 (2025).
Shi, S. & Liu, W. B2-vit net: Broad vision transformer network with broad attention for seizure prediction. IEEE Trans. Neural Syst. Rehabil. Eng. 32, 178–188 (2023).
Li, H. et al. Ucfnnet: Ulcerative colitis evaluation based on fine-grained lesion learner and noise suppression gating. Comput. Methods Programs Biomed. 247, 108080. https://doi.org/10.1016/j.cmpb.2024.108080 (2024).
Gan, X. Graphservice: Topology-aware constructor for large-scale graph applications. ACM Trans. Archit. Code Optim. 22, 2:1–2:24. https://doi.org/10.1145/3689341 (2025).
Gan, X. Tianhegraph,. Topology-aware graph processing. ACM Trans. Archit. Code Optim. 22, 112:1-112:24. https://doi.org/10.1145/3750450 (2025).
Xu, G. et al. Anonymity-enhanced sequential multi-signer ring signature for secure medical data sharing in IOMT. IEEE Trans. Inf. Forensics Secur. https://doi.org/10.1109/TIFS.2025.3574959 (2025).
Xu, K. et al. Clinical features, diagnosis, and management of COVID-19 vaccine-associated vogt-koyanagi-harada disease. Hum. Vaccin. Immunother. 19, 2220630 (2023).
McKinney, S. M. et al. International evaluation of an AI system for breast cancer screening. Nature 577, 89–94 (2020).
Elton, D. C. Self-explaining AI as an alternative to interpretable AI. In (eds Goertzel, B., Panov, A. I., Potapov, A. & Yampolskiy, R.) Artificial General Intelligence - 13th International Conference, AGI 2020, St. Petersburg, Russia, September 16–19, 2020, Proceedings, vol. 12177 of Lecture Notes in Computer Science, 95–106 (Springer, 2020). https://doi.org/10.1007/978-3-030-52152-3_10.
Makino, T. et al. Differences between human and machine perception in medical diagnosis. Sci. Rep. 12, 6877. https://doi.org/10.1038/s41598-022-10526-z (2022).
Selvaraju, R. R. et al. Grad-cam: Visual explanations from deep networks via gradient-based localization. In IEEE International Conference on Computer Vision, ICCV 2017, Venice, Italy, October 22–29, 618–626. https://doi.org/10.1109/ICCV.2017.74 (IEEE Computer Society, 2017)
Chattopadhyay, A., Sarkar, A., Howlader, P. & Balasubramanian, V. N. Grad-cam++: Generalized gradient-based visual explanations for deep convolutional networks. In 2018 IEEE Winter Conference on Applications of Computer Vision, WACV 2018, Lake Tahoe, NV, USA, March 12–15, 2018, 839–847. https://doi.org/10.1109/WACV.2018.00097 (IEEE Computer Society, 2018)
Muhammad, M. B. & Yeasin, M. Eigen-cam: Class activation map using principal components. In 2020 International Joint Conference on Neural Networks, IJCNN 2020, Glasgow, United Kingdom, July 19–24 1–7. https://doi.org/10.1109/IJCNN48605.2020.9206626 (IEEE, 2020)
Sundararajan, M., Taly, A. & Yan, Q. Axiomatic attribution for deep networks. In Precup, D. & Teh, Y. W. (eds.) Proceedings of the 34th International Conference on Machine Learning, vol. 70 of Proceedings of Machine Learning Research, 3319–3328 (PMLR, 2017).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In (eds Precup, D. & Teh, Y. W.) Proceedings of the 34th International Conference on Machine Learning, vol. 70 of Proceedings of Machine Learning Research, 3145–3153 (PMLR, 2017).
Bach, S. et al. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS One 10, e0130140 (2015).
Zeiler, M. D. & Fergus, R. Visualizing and understanding convolutional networks In (eds Fleet, D. et al.) (2014).
Muhammad, D. & Bendechache, M. Unveiling the black box: A systematic review of explainable artificial intelligence in medical image analysis. Comput. Struct. Biotechnol. J. 24, 542–560. https://doi.org/10.1016/j.csbj.2024.08.005 (2024).
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215. https://doi.org/10.1038/S42256-019-0048-X (2019).
Saporta, A. et al. Benchmarking saliency methods for chest x-ray interpretation. Nat. Mach. Intell. 4, 867–878. https://doi.org/10.1038/s42256-022-00536-x (2022).
Mittelstadt, B., Russell, C. & Wachter, S. Explaining explanations in ai. In Proceedings of the Conference on Fairness, Accountability, and Transparency, FAT* ’19, 279–288. https://doi.org/10.1145/3287560.3287574 (Association for Computing Machinery, New York, NY, USA 2019)
Hou, J. et al. Self-explainable ai for medical image analysis: A survey and new outlooks. arXiv preprint arXiv:2410.02331 (2024).
Chen, C. et al. This looks like that: Deep learning for interpretable image recognition. In (eds. Wallach, H. M. et al.) Advances in Neural Information Processing Systems 32: Annual Conference on Neural Information Processing Systems 2019, NeurIPS 2019, December 8–14. Vancouver, BC, Canada, 8928–8939 (2019).
Zhang, X. et al. smri-patchnet: A novel efficient explainable patch-based deep learning network for alzheimer’s disease diagnosis with structural MRI. IEEE Access 11, 108603–108616. https://doi.org/10.1109/ACCESS.2023.3321220 (2023).
Oh, Y., Park, S. & Ye, J. C. Deep learning COVID-19 features on CXR using limited training data sets. IEEE Trans. Med. Imaging 39, 2688–2700. https://doi.org/10.1109/TMI.2020.2993291 (2020).
Szczepanski, T., Sitek, A., Trzcinski, T. & Plotka, S. POTHER: patch-voted deep learning-based chest x-ray bias analysis for COVID-19 detection. In (eds Groen, D. et al.) Computational Science - ICCS 2022–22nd International Conference, London, UK, June 21–23, 2022, Proceedings, Part II, vol. 13351 of Lecture Notes in Computer Science, 441–454. https://doi.org/10.1007/978-3-031-08754-7_51 (Springer 2022)
Ilse, M., Tomczak, J. M. & Welling, M. Attention-based deep multiple instance learning. In (eds Dy, J. G. & Krause, A.) Proceedings of the 35th International Conference on Machine Learning, ICML 2018, Stockholmsmässan, Stockholm, Sweden, July 10–15, 2018, vol. 80 of Proceedings of Machine Learning Research, 2132–2141 (PMLR, 2018).
Shao, Z. et al. Transmil: Transformer based correlated multiple instance learning for whole slide image classification. In (eds Ranzato, M., Beygelzimer, A., Dauphin, Y. N., Liang, P. & Vaughan, J. W.) Advances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6–14, 2021, virtual, 2136–2147 (2021).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Zhu, K., Xiong, N. N. & Lu, M. A survey of weakly-supervised semantic segmentation. In 9th Intl Conference on Big Data Security on Cloud, BigDataSecurity, IEEE Intl Conference on High Performance and Smart Computing, HPSC and IEEE Intl Conference on Intelligent Data and Security IDS 2023, New York, NY, USA, May 6–8. 10–15.https://doi.org/10.1109/BIGDATASECURITY-HPSC-IDS58521.2023.00013 (IEEE, 2023)
Ciga, O. & Martel, A. L. Learning to segment images with classification labels. Med. Image Anal. 68, 101912. https://doi.org/10.1016/j.media.2020.101912 (2021).
Abnar, S. & Zuidema, W. H. Quantifying attention flow in transformers. In (eds Jurafsky, D., Chai, J., Schluter, N. & Tetreault, J. R.) Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 4190–4197 (2020). https://doi.org/10.18653/V1/2020.ACL-MAIN.385 (Association for Computational Linguistics, 2020).
Zech, J. R. et al. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: A cross-sectional study. PLoS Med. 15, e1002683 (2018).
DeGrave, A. J., Janizek, J. D. & Lee, S. AI for radiographic COVID-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619. https://doi.org/10.1038/s42256-021-00338-7 (2021).
Tan, M. & Le, Q. V. Efficientnetv2: Smaller models and faster training. In (eds. Meila, M. & Zhang, T.) Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18–24 July 2021, Virtual Event, vol. 139 of Proceedings of Machine Learning Research, 10096–10106 (PMLR, 2021).
Irvin, J. et al. Chexpert: A large chest radiograph dataset with uncertainty labels and expert comparison. In The Thirty-Third AAAI Conference on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii, USA, January 27 - February 1, 590–597, 2019, DOI: https://doi.org/10.1609/AAAI.V33I01.3301590(AAAI Press, 2019)
Jain, S. et al. Visualchexbert: addressing the discrepancy between radiology report labels and image labels. In (eds Ghassemi, M., Naumann, T. & Pierson, E.) ACM CHIL ’21: ACM Conference on Health, Inference, and Learning, Virtual Event, USA, April 8–9, 2021, 105–115. https://doi.org/10.1145/3450439.3451862 (ACM, 2021)
Deng, J. et al. Imagenet: A large-scale hierarchical image database. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, 248–255. https://doi.org/10.1109/CVPR.2009.5206848(IEEE Computer Society, 2009)
Loshchilov, I. & Hutter, F. Decoupled weight decay regularization. In 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, May 6–9, (2019).(OpenReview.net, 2019).
Smith, L. N. & Topin, N. Super-convergence: Very fast training of neural networks using large learning rates (2018). arXiv:1708.07120.
Nauta, M., Schlötterer, J., van Keulen, M. & Seifert, C. Pip-net: Patch-based intuitive prototypes for interpretable image classification. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2023, Vancouver, BC, Canada, June 17–24, 2744–2753. https://doi.org/10.1109/CVPR52729.2023.00269 (IEEE, 2023).
Liu, Z. et al. A convnet for the 2020s. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2022, New Orleans, LA, USA, June 18–24, 2022, 11966–11976 (2022). https://doi.org/10.1109/CVPR52688.2022.01167 (IEEE, 2022)
Acknowledgements
The authors gratefully acknowledge the computing time provided to them at the NHR Center NHR4CES at RWTH Aachen University (project number p0021834). This is funded by the Federal Ministry of Education and Research, and the state governments participating on the basis of the resolutions of the GWK for national high performance computing at universities (www.nhr-verein.de/unsere-partner). The data used in this publication was managed using the research data management platform Coscine (http://doi.org/10.17616/R31NJNJZ) with storage space of the Research Data Storage (RDS) (DFG: INST222/1261-1) and DataStorage.nrw (DFG: INST222/1530-1) granted by the DFG and Ministry of Culture and Science of the State of North Rhine-Westphalia. We used generative AI tools for language editing and rephrasing; all scientific content, data, analyses, and conclusions were written and verified by the authors.
Funding
Open Access funding enabled and organized by Projekt DEAL. This research is supported by the Deutsche Forschungsgemeinschaft - DFG (NE 2136/7-1, NE 2136/3-1, TR 1700/7-1), the German Federal Ministry of Research, Technology and Space (Transform Liver - 031L0312C, DECIPHER-M, 01KD2420B) and the European Union Research and Innovation Programme (ODELIA - GA 101057091).
Author information
Authors and Affiliations
Contributions
P.W. conceptualized the study, developed the software, conducted the experiments, and wrote the manuscript. D.T. conceptualized the study, provided supervision, and reviewed the manuscript. C.K., J.N.K., and S.N. provided supervision and contributed to reviewing the manuscript. All authors read and approved the final manuscript.
Corresponding author
Ethics declarations
Competing interests
D.T. received honoraria for lectures by Bayer, GE, Roche, Astra Zenica, and Philips and holds shares in StratifAI GmbH, Germany and in Synagen GmbH, Germany. J.N.K. declares consulting services for Panakeia, AstraZeneca, MultiplexDx, Mindpeak, Owkin, DoMore Diagnostics, and Bioptimus. Furthermore, he holds shares in StratifAI, Synagen, Tremont AI, and Ignition Labs, has received an institutional research grant from GSK, and has received honoraria from AstraZeneca, Bayer, Daiichi Sankyo, Eisai, Janssen, Merck, MSD, BMS, Roche, Pfizer, and Fresenius.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wienholt, P., Kuhl, C., Kather, J.N. et al. MedicalPatchNet: a patch-based self-explainable AI architecture for chest X-ray classification. Sci Rep (2026). https://doi.org/10.1038/s41598-026-40358-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-026-40358-0