Abstract
Explainability has increasingly become a core requirement for intelligent medical devices. Current medical artificial intelligence (AI) technologies suffer from the ‘interpretability gap’ despite tremendous efforts for enhancing explainability. Here we propose class-association manifold learning, a generative approach that enhances explainability of medical AI models. Our method efficiently decouples common decision-related patterns from individual backgrounds, enabling us to represent global class-associated knowledge in a low-dimensional mapping while preserving near-perfect diagnostic accuracy. The extracted knowledge is further used to enable AI-generated modifications on arbitrary samples and visualize differential diagnosis rules. Moreover, we develop a topology map to model the entire decision rule set, so that the logic underlying black-box models can be intuitively explicated by traversing the map and generating virtual contrastive examples. Extensive experiments show that our method not only achieves higher accuracy in explaining the behaviour of medical AI models but also helps with extracting medical-compliant knowledge that are unknown during model training, thus providing a potential means of assisting clinical rule and medical knowledge discovery with AI techniques.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout








Similar content being viewed by others
Data availability
The retinal Optical Coherence Tomography (OCT) and the Chest X-rays image datasets are available at https://data.mendeley.com/datasets/rscbjbr9sj/2. The Pathologic Myopia Challenge (PALM) dataset can be found at https://ieee-dataport.org/documents/palm-pathologic-myopia-challenge. The OIA-DDR dataset is available at https://github.com/nkicsl/DDR-dataset. The Brain Tumor dataset 1 can be downloaded from https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection. The Brain Tumor dataset 2 can be found at https://www.kaggle.com/datasets/dschettler8845/brats-2021-task1. The Retinal Fundus Multi-Disease Image Dataset (RFMID) is available for download at https://riadd.grand-challenge.org/download-all-classes/. The Derm7pt dataset is available at https://derm.cs.sfu.ca/Download.html. The MIT-BIH dataset can be accessed at https://physionet.org/content/mitdb/1.0.0/. The BRCA dataset is available at https://www.kaggle.com/datasets/samdemharter/brca-multiomics-tcga. The NIH-CXR dataset can be downloaded from https://nihcc.app.box.com/v/ChestXray-NIHCC. The MIMIC-CXR dataset is accessible at https://physionet.org/content/mimic-cxr/2.0.0/. The CheXpert dataset can be obtained from https://stanfordaimi.azurewebsites.net/datasets/8cbd9ed4-2eb9-4565-affc-111cf4f7ebe2.
Code availability
The code of this work is available via GitHub at https://github.com/xrt11/XAI-CAML. All contacts regarding how to use the code on your datasets are welcome.
References
Schwalbe, N. & Wahl, B. Artificial intelligence and the future of global health. Lancet 395, 1579–1586 (2020).
Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan (U.S. Food and Drug Administration, 2021); https://www.fda.gov/media/145022/download
Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4, e406–e414 (2022).
DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic Covid-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).
Xu, M., Zhang, T., Li, Z., Liu, M. & Zhang, D. Towards evaluating the robustness of deep diagnostic models by adversarial attack. Med. Image Anal. 69, 101977 (2021).
Vaidya, A. et al. Demographic bias in misdiagnosis by computational pathology models. Nat. Med. 30, 1174–1190 (2024).
Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 30, 2838–2848 (2024).
Mehta, M. C., Katz, I. T. & Jha, A. K. Transforming global health with AI. N. Engl. J. Med. 382, 791–793 (2020).
Huang, Z. et al. A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat. Biomed. Eng. 9, 455–470 (2025).
Kundu, S. AI in medicine must be explainable. Nat. Med. 27, 1328 (2021).
Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles (U.S. Food and Drug Administration, 2024); https://www.fda.gov/media/179269/download
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Van Noorden, R. & Perkel, J. M. AI and science: what 1,600 researchers think. Nature 621, 672–675 (2023).
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Messeri, L. & Crockett, M. J. Artificial intelligence and illusions of understanding in scientific research. Nature 627, 49–58 (2024).
Letham, B., Rudin, C., McCormick, T. H. & Madigan, D. Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350 – 1371 (2015).
Laber, E., Murtinho, L. & Oliveira, F. Shallow decision trees for explainable k-means clustering. Pattern Recognit. 137, 109239 (2023).
Boruah, A. N., Biswas, S. K. & Bandyopadhyay, S. Transparent rule generator random forest (TRG-RF): an interpretable random forest. Evol. Syst. 14, 69–83 (2023).
Tan, S., Caruana, R., Hooker, G. & Lou, Y. Distill-and-Compare: auditing black-box models using transparent model distillation. In Proc. 2018 AAAI/ACM Conference on AI, Ethics, and Society 303–310 (Association for Computing Machinery, 2018).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Alfeo, A. L. et al. From local counterfactuals to global feature importance: efficient, robust, and model-agnostic explanations for brain connectivity networks. Comput. Methods Programs Biomed. 236, 107550 (2023).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision (ICCV) 618–626 (2017).
Srinivas, S. & Fleuret, F. Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 32 (Curran Associates, 2019).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144 (ACM, 2016).
Yang, Q. et al. MFPP: morphological fragmental perturbation pyramid for black-box model explanations. In 2020 25th International Conference on Pattern Recognition (ICPR) 1376–1383 (IEEE, 2021).
Huang, Q., Yamada, M., Tian, Y., Singh, D. & Chang, Y. GraphLIME: local interpretable model explanations for graph neural networks. IEEE Trans. Knowl. Data Eng. 35, 6968–6972 (2022).
Guidotti, R., Monreale, A., Matwin, S. & Pedreschi, D. Explaining image classifiers generating exemplars and counter-exemplars from latent representations. In Proc. AAAI Conference on Artificial Intelligence 13665–13668 (2020).
Akula, A., Wang, S. & Zhu, S.-C. CoCoX: generating conceptual and counterfactual explanations via fault-lines. In Proc. AAAI Conference on Artificial Intelligence 2594–2601 (2020).
Akula, A. R. et al. CX-ToM: counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models. iScience 25, 103581 (2022).
Bass, C. et al. ICAM-Reg: interpretable classification and regression with feature attribution for mapping neurological phenotypes in individual scans. IEEE Trans. Med. Imaging 42, 959–970 (2022).
Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning 3145–3153 (PMLR, 2017).
Adebayo, J. et al. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems (eds Bengio, S. et al.) 31 (Curran Associates, Inc., 2018).
Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14, 1–22 (2019).
Kindermans, P.-J. et al. The (Un)reliability of Saliency Methods.In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds Samek, W. et al.) 267–280 (Springer, 2019).
Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2668–2677 (PMLR, 2018).
Tenenbaum, J. B., Silva, V. D. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).
Koh, P. W. et al. Concept bottleneck models. In International Conference on Machine Learning (eds Daumé, H. III & Singh, A.) 5338–5348 (PMLR, 2020).
Singla, S., Wallace, S., Triantafillou, S. & Batmanghelich, K. Using causal analysis for conceptual deep learning explanation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Bruijne, M. et al.) 519–528 (Springer, 2021).
Xie, R. et al. Accurate explanation model for image classifiers using class association embedding. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) 2271–2284 (IEEE, 2024).
Geirhos, R. et al. Generalisation in humans and deep neural networks. In Advances in Neural Information Processing Systems (eds Bengio, S. et al.) 31 (Curran Associates, Inc., 2018).
Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131 (2018).
Fu, H. et al. PALM: Pathologic Myopia Challenge. IEEE DataPort https://doi.org/10.21227/55pk-8z03 (2019).
Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 501, 511–522 (2019).
Hamada, A. Br35H :: Brain Tumor Detection 2020. Kaggle https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection (2020).
Baid, U. et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. Preprint at https://arxiv.org/abs/2107.02314 (2021).
Antwarg, L., Galed, C., Shimoni, N., Rokach, L. & Shapira, B. Shapley-based feature augmentation. Inf. Fusion 96, 92–102 (2023).
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2021).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2921–2929 (2016).
Kawahara, J., Daneshvar, S., Argenziano, G. & Hamarneh, G. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J. Biomed. Health Inform. 23, 538–546 (2018).
Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).
Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conference on Artificial Intelligence 590–597 (2019).
Jain, S. et al. RadGraph: extracting clinical entities and relations from radiology reports. Preprint at https://arxiv.org/abs/2106.14463 (2021).
Johnson, P. T. et al. Drusen-associated degeneration in the retina. Investig. Ophthalmol. Vis. Sci. 44, 4481–4488 (2003).
Lamin, A., El Nokrashy, A., Chandra, S. & Sivaprasad, S. Association of longitudinal changes in drusen characteristics and retinal layer volumes with subsequent subtype of choroidal neovascularisation. Ophthalmic Res. 63, 375–382 (2020).
Pachade, S. et al. Retinal Fundus Multi-Disease Image Dataset (RFMID): a dataset for multi-disease detection research. Data 6, 14 (2021).
Nicholson, L., Talks, S. J., Amoaku, W., Talks, K. & Sivaprasad, S. Retinal vein occlusion (RVO) guideline: executive summary. Eye 36, 909–912 (2022).
An, D., Chandrasekera, E., Yu, D.-Y. & Balaratnasingam, C. Non-proliferative diabetic retinopathy is characterized by nonuniform alterations of peripapillary capillary networks. Investig. Ophthalmol. Vis. Sci. 61, 39 (2020).
Guo, Y. et al. Developing and validating models to predict progression to proliferative diabetic retinopathy. Ophthalmol. Sci. 3, 100276 (2023).
Roy, S. & Kim, D. Retinal capillary basement membrane thickening: role in the pathogenesis of diabetic retinopathy. Prog. Retin. Eye Res. 82, 100903 (2021).
Zhou, J. & Chen, B. Retinal cell damage in diabetic retinopathy. Cells 12, 1342 (2023).
Abdolrahimzadeh, S., Di Pippo, M., Ciancimino, C., Di Staso, F. & Lotery, A. J. Choroidal vascularity index and choroidal thickness: potential biomarkers in retinitis pigmentosa. Eye 37, 1766–1773 (2023).
Arrigo, A. et al. Choroidal patterns in retinitis pigmentosa: correlation with visual acuity and disease progression. Transl. Vis. Sci. Technol. 9, 17 (2020).
Gan, Y. et al. Correlation between focal choroidal excavation and underlying retinochoroidal disease: a pathological hypothesis from clinical observation. Retina 42, 348–356 (2022).
Sugiyama, R., Ohnishi, T., Yamagami, S. & Nagaoka, T. A case of acute syphilitic posterior placoid chorioretinitis showing improved choroidal blood flow after treatment. Am. J. Ophthalmol. Case Rep. 32, 101880 (2023).
Li, H.-P., Yuan, S.-Q., Wang, X.-G., Sheng, X.-L. & Li, X.-R. Myopia with X-linked retinitis pigmentosa results from a novel gross deletion of RPGR gene. Int. J. Ophthalmol. 13, 1306 (2020).
Coviltir, V. et al. Update on myopia risk factors and microenvironmental changes. J. Ophthalmol. 2019, 4960852 (2019).
Hooker, S., Erhan, D., Kindermans, P.-J. & Kim, B. A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 32 (Curran Associates, 2019).
Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Müller, K.-R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28, 2660–2673 (2016).
Ghosh, S., Yu, K. & Batmanghelich, K. Distilling blackbox to interpretable models for efficient transfer learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Greenspan, H. et al.) 628–638 (Springer, 2023).
Moody, G. B. & Mark, R. G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20, 45–50 (2001).
Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).
Wang, X. et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2097–2106 (2017).
Kim, C. et al. Transparent medical image AI via an image–text foundation model grounded in medical literature. Nat. Med. 30, 1154–1165 (2024).
Carion, N. et al. SAM 3: segment anything with concepts. Preprint at https://arxiv.org/abs/2511.16719 (2026).
Montenegro, H. & Cardoso, J. S. Anonymizing medical case-based explanations through disentanglement. Med. Image Anal. 95, 103209 (2024).
Yu, Y. et al. White-box transformers via sparse rate reduction. In Advances in Neural Information Processing Systems (eds. Oh, A. et al.) 9422–9457 (Curran Associates, Inc., 2023).
Offroy, M. & Duponchel, L. Topological data analysis: a promising big data exploration tool in biology, analytical chemistry and physical chemistry. Anal. Chim. Acta 910, 1–11 (2016).
Joshi, M. & Joshi, D. A survey of topological data analysis methods for big data in healthcare intelligence. Int. J. Appl. Eng. Res 14, 584–588 (2019).
Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In International Conference on Knowledge Discovery and Data Mining 226–231 (AAAI Press, 1996).
Idiap Research Institute. fullgrad-saliency. GitHub https://github.com/idiap/fullgrad-saliency (2019).
Tao, Y. et al. LAGAN: lesion-aware generative adversarial networks for edema area segmentation in SD-OCT images. IEEE J. Biomed. Health Inform. 27, 2432–2443 (2023).
Fang, Y. et al. Diffexplainer: unveiling black box models via counterfactual generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Linguraru, M. G. et al.) 208–218 (Springer, 2024).
Acknowledgements
We thank the clinicians from the Zhongshan Ophthalmic Center, Sun Yat-sen University, for helping us by participating in the blinded expert evaluations. This work was supported by the Strategic Priority Research Program (Pre-research Project) of the Chinese Academy of Sciences (XDA0510201 to Y.C.), the Shenzhen Science and Technology Program (KQTD20200820113106007 to Y.P.), the Shenzhen Key Laboratory of Intelligent Bioinformatics (ZDSYS20220422103800001 to Y.P.) and the National Natural Science Foundation of China (U22A2041 to Y.P. and Y.C.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
Author information
Authors and Affiliations
Contributions
R. Xie and Y.C. conceived the idea. R. Xie and Y.C. designed the experiments. R. Xie, X.H., L.J. and J.C. conducted the experiments. R. Xie, R. Xiao and Y.C. collected the datasets. R. Xie and M.H.W. analysed the data and experimental results. R. Xie and Y.C. wrote the paper. J.T. and Y.P. participated in discussions and provided critical guidance for the methods, experiments and the writing. B.Y. and Y.L. designed and built the scoring website. Y.C. and Y.P. offered computing resources and financial support. All authors reviewed and approved the final version of the paper.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Biomedical Engineering thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Experiment results for cross-dataset validation.
Cross-dataset experiments on the NIH-CXR and CheXpert datasets. a. Cross-dataset training results. Classification accuracy for Consolidation on the CheXpert test set using different methods, where ‘All’ represents results obtained using the complete CheXpert training data, ‘0%’ to ‘15%’ denote results obtained using the NIH-CXR dataset combined with x% of the CheXpert training set, and ‘Decline’ indicates the performance degradation when comparing models trained solely on the cross-dataset NIH-CXR (‘0%’) versus those trained entirely on CheXpert (‘All’). b. Class-associated manifolds obtained on the CheXpert test set (red: Consolidation class; blue: normal class), where the left manifold is derived from training on the NIH-CXR dataset and the right manifold from training on the CheXpert training set.
Extended Data Fig. 2 Experiments results showing the model behaviors regarding spurious correlation inputs.
Examples generated by the CAML method on the NIH-CXR and CheXpert datasets. Left and right inputs (with class labels shown above) provide identity (ID) and class (CL) codes respectively, with synthesized images in the center. Red arrows highlight artifacts. Below, we detail the types of these artifacts, their association with the corresponding diseases, and the desired behavior of the explanation model (whether such artifacts should appear in generated samples, assuming a well-calibrated black-box classifier): First row (left): likely metallic foreign body, which has no direct association with pneumothorax and should be excluded from generated examples; First row (right): likely chest tube, which is associated with pneumothorax but not causal (chest tubes are a standard treatment for pneumothorax, and in the NIH-CXR dataset, some pneumothorax-labeled images show indwelling chest tubes indicating active treatment), and should also be excluded; Second row (left): cardiac pacemaker, which has no direct association with pneumothorax and should be retained; Second row (right): likely metallic foreign body, which has no direct association with pneumothorax and should be retained; Third row (left): cardiac pacemaker, which has no direct association with pneumothorax and should be excluded; Third row (right): likely chest tube, which is associated with pleural effusion but not causal (chest tubes are often used to drain fluid from the pleural space and relieve lung compression, and in the dataset, some pleural effusion-labeled images show indwelling chest tubes indicating active treatment), and should be excluded; Fourth row (left): cardiac pacemaker, which has no direct association with pleural effusion and should be excluded; Fourth row (right): likely chest tube, which is associated with pleural effusion but not causal and should be retained. It can be observed that when the receiver (the sample providing the ID code) contains these artifacts, the generated samples retain the receiver’s artifacts even if the donor (the sample providing the CL code) does not have such artifacts. Conversely, when the receiver does not contain these artifacts but the donor does, the generated samples do not exhibit these artifacts. This indicates that these artifacts are not treated as class-associated features in these examples.
Extended Data Fig. 3 Experiment results showing short-cut learning successfully detected by CAML.
Cases revealing classifier shortcut learning behavior through CAML. a. Class-associated manifold from the PALM dataset. b. A series of images generated along the path, with classifier (trained on the PALM dataset where pathological myopia samples undergo brightness enhancement) predictions displayed above each image (values in parentheses indicate the predicted probability of pathological myopia). During counterfactual generation process, only brightness changes in the image, yet the classifier’s prediction shifts, indicating that the classifier uses brightness as a shortcut feature for pathological class identification.
Supplementary information
Supplementary Information (download PDF )
Supplementary Appendices 1–6 and Figs. 1–18.
Supplementary Data 1 (download ZIP )
Source data for Supplementary Figs. 1–4 and 12.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xie, R., He, X., Jiang, L. et al. Bridging the interpretability gap for medical artificial intelligence models using class-association manifold learning. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-026-01676-w
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41551-026-01676-w


