Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Bridging the interpretability gap for medical artificial intelligence models using class-association manifold learning

Abstract

Explainability has increasingly become a core requirement for intelligent medical devices. Current medical artificial intelligence (AI) technologies suffer from the ‘interpretability gap’ despite tremendous efforts for enhancing explainability. Here we propose class-association manifold learning, a generative approach that enhances explainability of medical AI models. Our method efficiently decouples common decision-related patterns from individual backgrounds, enabling us to represent global class-associated knowledge in a low-dimensional mapping while preserving near-perfect diagnostic accuracy. The extracted knowledge is further used to enable AI-generated modifications on arbitrary samples and visualize differential diagnosis rules. Moreover, we develop a topology map to model the entire decision rule set, so that the logic underlying black-box models can be intuitively explicated by traversing the map and generating virtual contrastive examples. Extensive experiments show that our method not only achieves higher accuracy in explaining the behaviour of medical AI models but also helps with extracting medical-compliant knowledge that are unknown during model training, thus providing a potential means of assisting clinical rule and medical knowledge discovery with AI techniques.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of CAML.
The alternative text for this image may have been generated using AI.
Fig. 2: Successful extraction of global decision-rule patterns by CAE.
The alternative text for this image may have been generated using AI.
Fig. 3: Global decision rules and potential knowledge exhibited on the class-association manifold.
The alternative text for this image may have been generated using AI.
Fig. 4: Distribution of multiple concept annotations in the class-associated manifold (left) learned from the Derm7pt dataset and MIMIC-CXR dataset compared with the disease classification labels (right).
The alternative text for this image may have been generated using AI.
Fig. 5: Results of adopting CAE for explaining individual images and comparison with existing methods.
The alternative text for this image may have been generated using AI.
Fig. 6: Reliability evaluation on the OCT dataset.
The alternative text for this image may have been generated using AI.
Fig. 7: The experimental results on the non-image datasets.
The alternative text for this image may have been generated using AI.
Fig. 8: Method flowchart of the CAML framework.
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

The retinal Optical Coherence Tomography (OCT) and the Chest X-rays image datasets are available at https://data.mendeley.com/datasets/rscbjbr9sj/2. The Pathologic Myopia Challenge (PALM) dataset can be found at https://ieee-dataport.org/documents/palm-pathologic-myopia-challenge. The OIA-DDR dataset is available at https://github.com/nkicsl/DDR-dataset. The Brain Tumor dataset 1 can be downloaded from https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection. The Brain Tumor dataset 2 can be found at https://www.kaggle.com/datasets/dschettler8845/brats-2021-task1. The Retinal Fundus Multi-Disease Image Dataset (RFMID) is available for download at https://riadd.grand-challenge.org/download-all-classes/. The Derm7pt dataset is available at https://derm.cs.sfu.ca/Download.html. The MIT-BIH dataset can be accessed at https://physionet.org/content/mitdb/1.0.0/. The BRCA dataset is available at https://www.kaggle.com/datasets/samdemharter/brca-multiomics-tcga. The NIH-CXR dataset can be downloaded from https://nihcc.app.box.com/v/ChestXray-NIHCC. The MIMIC-CXR dataset is accessible at https://physionet.org/content/mimic-cxr/2.0.0/. The CheXpert dataset can be obtained from https://stanfordaimi.azurewebsites.net/datasets/8cbd9ed4-2eb9-4565-affc-111cf4f7ebe2.

Code availability

The code of this work is available via GitHub at https://github.com/xrt11/XAI-CAML. All contacts regarding how to use the code on your datasets are welcome.

References

  1. Schwalbe, N. & Wahl, B. Artificial intelligence and the future of global health. Lancet 395, 1579–1586 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan (U.S. Food and Drug Administration, 2021); https://www.fda.gov/media/145022/download

  3. Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4, e406–e414 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic Covid-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).

    Article  Google Scholar 

  5. Xu, M., Zhang, T., Li, Z., Liu, M. & Zhang, D. Towards evaluating the robustness of deep diagnostic models by adversarial attack. Med. Image Anal. 69, 101977 (2021).

    Article  PubMed  Google Scholar 

  6. Vaidya, A. et al. Demographic bias in misdiagnosis by computational pathology models. Nat. Med. 30, 1174–1190 (2024).

    Article  CAS  PubMed  Google Scholar 

  7. Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 30, 2838–2848 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. Mehta, M. C., Katz, I. T. & Jha, A. K. Transforming global health with AI. N. Engl. J. Med. 382, 791–793 (2020).

    Article  PubMed  Google Scholar 

  9. Huang, Z. et al. A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat. Biomed. Eng. 9, 455–470 (2025).

    Article  CAS  PubMed  Google Scholar 

  10. Kundu, S. AI in medicine must be explainable. Nat. Med. 27, 1328 (2021).

    Article  CAS  PubMed  Google Scholar 

  11. Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles (U.S. Food and Drug Administration, 2024); https://www.fda.gov/media/179269/download

  12. Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  13. Van Noorden, R. & Perkel, J. M. AI and science: what 1,600 researchers think. Nature 621, 672–675 (2023).

    Article  PubMed  Google Scholar 

  14. Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).

    Article  CAS  PubMed  Google Scholar 

  15. Messeri, L. & Crockett, M. J. Artificial intelligence and illusions of understanding in scientific research. Nature 627, 49–58 (2024).

    Article  CAS  PubMed  Google Scholar 

  16. Letham, B., Rudin, C., McCormick, T. H. & Madigan, D. Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350 – 1371 (2015).

    Article  Google Scholar 

  17. Laber, E., Murtinho, L. & Oliveira, F. Shallow decision trees for explainable k-means clustering. Pattern Recognit. 137, 109239 (2023).

    Article  Google Scholar 

  18. Boruah, A. N., Biswas, S. K. & Bandyopadhyay, S. Transparent rule generator random forest (TRG-RF): an interpretable random forest. Evol. Syst. 14, 69–83 (2023).

    Article  Google Scholar 

  19. Tan, S., Caruana, R., Hooker, G. & Lou, Y. Distill-and-Compare: auditing black-box models using transparent model distillation. In Proc. 2018 AAAI/ACM Conference on AI, Ethics, and Society 303–310 (Association for Computing Machinery, 2018).

  20. Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  21. Alfeo, A. L. et al. From local counterfactuals to global feature importance: efficient, robust, and model-agnostic explanations for brain connectivity networks. Comput. Methods Programs Biomed. 236, 107550 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  22. Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision (ICCV) 618–626 (2017).

  23. Srinivas, S. & Fleuret, F. Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 32 (Curran Associates, 2019).

  24. Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144 (ACM, 2016).

  25. Yang, Q. et al. MFPP: morphological fragmental perturbation pyramid for black-box model explanations. In 2020 25th International Conference on Pattern Recognition (ICPR) 1376–1383 (IEEE, 2021).

  26. Huang, Q., Yamada, M., Tian, Y., Singh, D. & Chang, Y. GraphLIME: local interpretable model explanations for graph neural networks. IEEE Trans. Knowl. Data Eng. 35, 6968–6972 (2022).

    Article  Google Scholar 

  27. Guidotti, R., Monreale, A., Matwin, S. & Pedreschi, D. Explaining image classifiers generating exemplars and counter-exemplars from latent representations. In Proc. AAAI Conference on Artificial Intelligence 13665–13668 (2020).

  28. Akula, A., Wang, S. & Zhu, S.-C. CoCoX: generating conceptual and counterfactual explanations via fault-lines. In Proc. AAAI Conference on Artificial Intelligence 2594–2601 (2020).

  29. Akula, A. R. et al. CX-ToM: counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models. iScience 25, 103581 (2022).

    Article  PubMed  Google Scholar 

  30. Bass, C. et al. ICAM-Reg: interpretable classification and regression with feature attribution for mapping neurological phenotypes in individual scans. IEEE Trans. Med. Imaging 42, 959–970 (2022).

    Article  Google Scholar 

  31. Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).

    Article  CAS  PubMed  Google Scholar 

  32. Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning 3145–3153 (PMLR, 2017).

  33. Adebayo, J. et al. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems (eds Bengio, S. et al.) 31 (Curran Associates, Inc., 2018).

  34. Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14, 1–22 (2019).

    Google Scholar 

  35. Kindermans, P.-J. et al. The (Un)reliability of Saliency Methods.In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds Samek, W. et al.) 267–280 (Springer, 2019).

  36. Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2668–2677 (PMLR, 2018).

  37. Tenenbaum, J. B., Silva, V. D. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).

    Article  CAS  PubMed  Google Scholar 

  38. Koh, P. W. et al. Concept bottleneck models. In International Conference on Machine Learning (eds Daumé, H. III & Singh, A.) 5338–5348 (PMLR, 2020).

  39. Singla, S., Wallace, S., Triantafillou, S. & Batmanghelich, K. Using causal analysis for conceptual deep learning explanation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Bruijne, M. et al.) 519–528 (Springer, 2021).

  40. Xie, R. et al. Accurate explanation model for image classifiers using class association embedding. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) 2271–2284 (IEEE, 2024).

  41. Geirhos, R. et al. Generalisation in humans and deep neural networks. In Advances in Neural Information Processing Systems (eds Bengio, S. et al.) 31 (Curran Associates, Inc., 2018).

  42. Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131 (2018).

    Article  CAS  PubMed  Google Scholar 

  43. Fu, H. et al. PALM: Pathologic Myopia Challenge. IEEE DataPort https://doi.org/10.21227/55pk-8z03 (2019).

  44. Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 501, 511–522 (2019).

    Article  Google Scholar 

  45. Hamada, A. Br35H :: Brain Tumor Detection 2020. Kaggle https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection (2020).

  46. Baid, U. et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. Preprint at https://arxiv.org/abs/2107.02314 (2021).

  47. Antwarg, L., Galed, C., Shimoni, N., Rokach, L. & Shapira, B. Shapley-based feature augmentation. Inf. Fusion 96, 92–102 (2023).

    Article  Google Scholar 

  48. Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2021).

  49. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2921–2929 (2016).

  50. Kawahara, J., Daneshvar, S., Argenziano, G. & Hamarneh, G. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J. Biomed. Health Inform. 23, 538–546 (2018).

    Article  Google Scholar 

  51. Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  52. Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conference on Artificial Intelligence 590–597 (2019).

  53. Jain, S. et al. RadGraph: extracting clinical entities and relations from radiology reports. Preprint at https://arxiv.org/abs/2106.14463 (2021).

  54. Johnson, P. T. et al. Drusen-associated degeneration in the retina. Investig. Ophthalmol. Vis. Sci. 44, 4481–4488 (2003).

    Article  Google Scholar 

  55. Lamin, A., El Nokrashy, A., Chandra, S. & Sivaprasad, S. Association of longitudinal changes in drusen characteristics and retinal layer volumes with subsequent subtype of choroidal neovascularisation. Ophthalmic Res. 63, 375–382 (2020).

    Article  PubMed  Google Scholar 

  56. Pachade, S. et al. Retinal Fundus Multi-Disease Image Dataset (RFMID): a dataset for multi-disease detection research. Data 6, 14 (2021).

    Article  Google Scholar 

  57. Nicholson, L., Talks, S. J., Amoaku, W., Talks, K. & Sivaprasad, S. Retinal vein occlusion (RVO) guideline: executive summary. Eye 36, 909–912 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  58. An, D., Chandrasekera, E., Yu, D.-Y. & Balaratnasingam, C. Non-proliferative diabetic retinopathy is characterized by nonuniform alterations of peripapillary capillary networks. Investig. Ophthalmol. Vis. Sci. 61, 39 (2020).

    Article  Google Scholar 

  59. Guo, Y. et al. Developing and validating models to predict progression to proliferative diabetic retinopathy. Ophthalmol. Sci. 3, 100276 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  60. Roy, S. & Kim, D. Retinal capillary basement membrane thickening: role in the pathogenesis of diabetic retinopathy. Prog. Retin. Eye Res. 82, 100903 (2021).

    Article  PubMed  Google Scholar 

  61. Zhou, J. & Chen, B. Retinal cell damage in diabetic retinopathy. Cells 12, 1342 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Abdolrahimzadeh, S., Di Pippo, M., Ciancimino, C., Di Staso, F. & Lotery, A. J. Choroidal vascularity index and choroidal thickness: potential biomarkers in retinitis pigmentosa. Eye 37, 1766–1773 (2023).

    Article  PubMed  Google Scholar 

  63. Arrigo, A. et al. Choroidal patterns in retinitis pigmentosa: correlation with visual acuity and disease progression. Transl. Vis. Sci. Technol. 9, 17 (2020).

    PubMed  PubMed Central  Google Scholar 

  64. Gan, Y. et al. Correlation between focal choroidal excavation and underlying retinochoroidal disease: a pathological hypothesis from clinical observation. Retina 42, 348–356 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Sugiyama, R., Ohnishi, T., Yamagami, S. & Nagaoka, T. A case of acute syphilitic posterior placoid chorioretinitis showing improved choroidal blood flow after treatment. Am. J. Ophthalmol. Case Rep. 32, 101880 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  66. Li, H.-P., Yuan, S.-Q., Wang, X.-G., Sheng, X.-L. & Li, X.-R. Myopia with X-linked retinitis pigmentosa results from a novel gross deletion of RPGR gene. Int. J. Ophthalmol. 13, 1306 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  67. Coviltir, V. et al. Update on myopia risk factors and microenvironmental changes. J. Ophthalmol. 2019, 4960852 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  68. Hooker, S., Erhan, D., Kindermans, P.-J. & Kim, B. A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 32 (Curran Associates, 2019).

  69. Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Müller, K.-R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28, 2660–2673 (2016).

    Article  Google Scholar 

  70. Ghosh, S., Yu, K. & Batmanghelich, K. Distilling blackbox to interpretable models for efficient transfer learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Greenspan, H. et al.) 628–638 (Springer, 2023).

  71. Moody, G. B. & Mark, R. G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20, 45–50 (2001).

    Article  CAS  PubMed  Google Scholar 

  72. Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Wang, X. et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2097–2106 (2017).

  74. Kim, C. et al. Transparent medical image AI via an image–text foundation model grounded in medical literature. Nat. Med. 30, 1154–1165 (2024).

    Article  CAS  PubMed  Google Scholar 

  75. Carion, N. et al. SAM 3: segment anything with concepts. Preprint at https://arxiv.org/abs/2511.16719 (2026).

  76. Montenegro, H. & Cardoso, J. S. Anonymizing medical case-based explanations through disentanglement. Med. Image Anal. 95, 103209 (2024).

    Article  PubMed  Google Scholar 

  77. Yu, Y. et al. White-box transformers via sparse rate reduction. In Advances in Neural Information Processing Systems (eds. Oh, A. et al.) 9422–9457 (Curran Associates, Inc., 2023).

  78. Offroy, M. & Duponchel, L. Topological data analysis: a promising big data exploration tool in biology, analytical chemistry and physical chemistry. Anal. Chim. Acta 910, 1–11 (2016).

    Article  CAS  PubMed  Google Scholar 

  79. Joshi, M. & Joshi, D. A survey of topological data analysis methods for big data in healthcare intelligence. Int. J. Appl. Eng. Res 14, 584–588 (2019).

    Google Scholar 

  80. Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In International Conference on Knowledge Discovery and Data Mining 226–231 (AAAI Press, 1996).

  81. Idiap Research Institute. fullgrad-saliency. GitHub https://github.com/idiap/fullgrad-saliency (2019).

  82. Tao, Y. et al. LAGAN: lesion-aware generative adversarial networks for edema area segmentation in SD-OCT images. IEEE J. Biomed. Health Inform. 27, 2432–2443 (2023).

    Article  PubMed  Google Scholar 

  83. Fang, Y. et al. Diffexplainer: unveiling black box models via counterfactual generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Linguraru, M. G. et al.) 208–218 (Springer, 2024).

Download references

Acknowledgements

We thank the clinicians from the Zhongshan Ophthalmic Center, Sun Yat-sen University, for helping us by participating in the blinded expert evaluations. This work was supported by the Strategic Priority Research Program (Pre-research Project) of the Chinese Academy of Sciences (XDA0510201 to Y.C.), the Shenzhen Science and Technology Program (KQTD20200820113106007 to Y.P.), the Shenzhen Key Laboratory of Intelligent Bioinformatics (ZDSYS20220422103800001 to Y.P.) and the National Natural Science Foundation of China (U22A2041 to Y.P. and Y.C.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

R. Xie and Y.C. conceived the idea. R. Xie and Y.C. designed the experiments. R. Xie, X.H., L.J. and J.C. conducted the experiments. R. Xie, R. Xiao and Y.C. collected the datasets. R. Xie and M.H.W. analysed the data and experimental results. R. Xie and Y.C. wrote the paper. J.T. and Y.P. participated in discussions and provided critical guidance for the methods, experiments and the writing. B.Y. and Y.L. designed and built the scoring website. Y.C. and Y.P. offered computing resources and financial support. All authors reviewed and approved the final version of the paper.

Corresponding authors

Correspondence to Jinling Tang, Yi Pan or Yunpeng Cai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Experiment results for cross-dataset validation.

Cross-dataset experiments on the NIH-CXR and CheXpert datasets. a. Cross-dataset training results. Classification accuracy for Consolidation on the CheXpert test set using different methods, where ‘All’ represents results obtained using the complete CheXpert training data, ‘0%’ to ‘15%’ denote results obtained using the NIH-CXR dataset combined with x% of the CheXpert training set, and ‘Decline’ indicates the performance degradation when comparing models trained solely on the cross-dataset NIH-CXR (‘0%’) versus those trained entirely on CheXpert (‘All’). b. Class-associated manifolds obtained on the CheXpert test set (red: Consolidation class; blue: normal class), where the left manifold is derived from training on the NIH-CXR dataset and the right manifold from training on the CheXpert training set.

Extended Data Fig. 2 Experiments results showing the model behaviors regarding spurious correlation inputs.

Examples generated by the CAML method on the NIH-CXR and CheXpert datasets. Left and right inputs (with class labels shown above) provide identity (ID) and class (CL) codes respectively, with synthesized images in the center. Red arrows highlight artifacts. Below, we detail the types of these artifacts, their association with the corresponding diseases, and the desired behavior of the explanation model (whether such artifacts should appear in generated samples, assuming a well-calibrated black-box classifier): First row (left): likely metallic foreign body, which has no direct association with pneumothorax and should be excluded from generated examples; First row (right): likely chest tube, which is associated with pneumothorax but not causal (chest tubes are a standard treatment for pneumothorax, and in the NIH-CXR dataset, some pneumothorax-labeled images show indwelling chest tubes indicating active treatment), and should also be excluded; Second row (left): cardiac pacemaker, which has no direct association with pneumothorax and should be retained; Second row (right): likely metallic foreign body, which has no direct association with pneumothorax and should be retained; Third row (left): cardiac pacemaker, which has no direct association with pneumothorax and should be excluded; Third row (right): likely chest tube, which is associated with pleural effusion but not causal (chest tubes are often used to drain fluid from the pleural space and relieve lung compression, and in the dataset, some pleural effusion-labeled images show indwelling chest tubes indicating active treatment), and should be excluded; Fourth row (left): cardiac pacemaker, which has no direct association with pleural effusion and should be excluded; Fourth row (right): likely chest tube, which is associated with pleural effusion but not causal and should be retained. It can be observed that when the receiver (the sample providing the ID code) contains these artifacts, the generated samples retain the receiver’s artifacts even if the donor (the sample providing the CL code) does not have such artifacts. Conversely, when the receiver does not contain these artifacts but the donor does, the generated samples do not exhibit these artifacts. This indicates that these artifacts are not treated as class-associated features in these examples.

Extended Data Fig. 3 Experiment results showing short-cut learning successfully detected by CAML.

Cases revealing classifier shortcut learning behavior through CAML. a. Class-associated manifold from the PALM dataset. b. A series of images generated along the path, with classifier (trained on the PALM dataset where pathological myopia samples undergo brightness enhancement) predictions displayed above each image (values in parentheses indicate the predicted probability of pathological myopia). During counterfactual generation process, only brightness changes in the image, yet the classifier’s prediction shifts, indicating that the classifier uses brightness as a shortcut feature for pathological class identification.

Supplementary information

Supplementary Information (download PDF )

Supplementary Appendices 1–6 and Figs. 1–18.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Data 1 (download ZIP )

Source data for Supplementary Figs. 1–4 and 12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xie, R., He, X., Jiang, L. et al. Bridging the interpretability gap for medical artificial intelligence models using class-association manifold learning. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-026-01676-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41551-026-01676-w

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics