Bridging the interpretability gap for medical artificial intelligence models using class-association manifold learning

Xie, Ruitao; He, Xiaoxi; Jiang, Limai; Wang, Mini Han; Chen, Jingbang; Xiao, Rui; Yang, Bokai; Li, Ye; Tang, Jinling; Pan, Yi; Cai, Yunpeng

doi:10.1038/s41551-026-01676-w

Article
Published: 18 May 2026

Bridging the interpretability gap for medical artificial intelligence models using class-association manifold learning

Ruitao Xie ORCID: orcid.org/0009-0000-4360-8364^1,2,3,
Xiaoxi He¹,
Limai Jiang^1,3,
Mini Han Wang^4,5,6,
Jingbang Chen⁷,
Rui Xiao¹,
Bokai Yang^1,8,
Ye Li¹,
Jinling Tang ORCID: orcid.org/0000-0002-4516-8179²,
Yi Pan ORCID: orcid.org/0000-0002-2766-3096^2,9 &
…
Yunpeng Cai ORCID: orcid.org/0000-0001-8797-4243^1,2

Nature Biomedical Engineering (2026) Cite this article

Subjects

Abstract

Explainability has increasingly become a core requirement for intelligent medical devices. Current medical artificial intelligence (AI) technologies suffer from the ‘interpretability gap’ despite tremendous efforts for enhancing explainability. Here we propose class-association manifold learning, a generative approach that enhances explainability of medical AI models. Our method efficiently decouples common decision-related patterns from individual backgrounds, enabling us to represent global class-associated knowledge in a low-dimensional mapping while preserving near-perfect diagnostic accuracy. The extracted knowledge is further used to enable AI-generated modifications on arbitrary samples and visualize differential diagnosis rules. Moreover, we develop a topology map to model the entire decision rule set, so that the logic underlying black-box models can be intuitively explicated by traversing the map and generating virtual contrastive examples. Extensive experiments show that our method not only achieves higher accuracy in explaining the behaviour of medical AI models but also helps with extracting medical-compliant knowledge that are unknown during model training, thus providing a potential means of assisting clinical rule and medical knowledge discovery with AI techniques.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 2: Successful extraction of global decision-rule patterns by CAE.**

**Fig. 3: Global decision rules and potential knowledge exhibited on the class-association manifold.**

Fig. 4: Distribution of multiple concept annotations in the class-associated manifold (left) learned from the Derm7pt dataset and MIMIC-CXR dataset compared with the disease classification labels (right).

**Fig. 5: Results of adopting CAE for explaining individual images and comparison with existing methods.**

**Fig. 6: Reliability evaluation on the OCT dataset.**

**Fig. 7: The experimental results on the non-image datasets.**

**Fig. 8: Method flowchart of the CAML framework.**

Transparency of medical artificial intelligence systems

Article 10 September 2025

Scaling medical AI across clinical contexts

Article 03 February 2026

A systematic review and meta-analysis of diagnostic performance comparison between generative AI and physicians

Article Open access 22 March 2025

Data availability

The retinal Optical Coherence Tomography (OCT) and the Chest X-rays image datasets are available at https://data.mendeley.com/datasets/rscbjbr9sj/2. The Pathologic Myopia Challenge (PALM) dataset can be found at https://ieee-dataport.org/documents/palm-pathologic-myopia-challenge. The OIA-DDR dataset is available at https://github.com/nkicsl/DDR-dataset. The Brain Tumor dataset 1 can be downloaded from https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection. The Brain Tumor dataset 2 can be found at https://www.kaggle.com/datasets/dschettler8845/brats-2021-task1. The Retinal Fundus Multi-Disease Image Dataset (RFMID) is available for download at https://riadd.grand-challenge.org/download-all-classes/. The Derm7pt dataset is available at https://derm.cs.sfu.ca/Download.html. The MIT-BIH dataset can be accessed at https://physionet.org/content/mitdb/1.0.0/. The BRCA dataset is available at https://www.kaggle.com/datasets/samdemharter/brca-multiomics-tcga. The NIH-CXR dataset can be downloaded from https://nihcc.app.box.com/v/ChestXray-NIHCC. The MIMIC-CXR dataset is accessible at https://physionet.org/content/mimic-cxr/2.0.0/. The CheXpert dataset can be obtained from https://stanfordaimi.azurewebsites.net/datasets/8cbd9ed4-2eb9-4565-affc-111cf4f7ebe2.

Code availability

The code of this work is available via GitHub at https://github.com/xrt11/XAI-CAML. All contacts regarding how to use the code on your datasets are welcome.

References

Schwalbe, N. & Wahl, B. Artificial intelligence and the future of global health. Lancet 395, 1579–1586 (2020).
Article CAS PubMed PubMed Central Google Scholar
Artificial Intelligence/Machine Learning (AI/ML)-Based Software as a Medical Device (SaMD) Action Plan (U.S. Food and Drug Administration, 2021); https://www.fda.gov/media/145022/download
Gichoya, J. W. et al. AI recognition of patient race in medical imaging: a modelling study. Lancet Digit. Health 4, e406–e414 (2022).
Article CAS PubMed PubMed Central Google Scholar
DeGrave, A. J., Janizek, J. D. & Lee, S.-I. AI for radiographic Covid-19 detection selects shortcuts over signal. Nat. Mach. Intell. 3, 610–619 (2021).
Article Google Scholar
Xu, M., Zhang, T., Li, Z., Liu, M. & Zhang, D. Towards evaluating the robustness of deep diagnostic models by adversarial attack. Med. Image Anal. 69, 101977 (2021).
Article PubMed Google Scholar
Vaidya, A. et al. Demographic bias in misdiagnosis by computational pathology models. Nat. Med. 30, 1174–1190 (2024).
Article CAS PubMed Google Scholar
Yang, Y., Zhang, H., Gichoya, J. W., Katabi, D. & Ghassemi, M. The limits of fair medical imaging AI in real-world generalization. Nat. Med. 30, 2838–2848 (2024).
Article CAS PubMed PubMed Central Google Scholar
Mehta, M. C., Katz, I. T. & Jha, A. K. Transforming global health with AI. N. Engl. J. Med. 382, 791–793 (2020).
Article PubMed Google Scholar
Huang, Z. et al. A pathologist–AI collaboration framework for enhancing diagnostic accuracies and efficiencies. Nat. Biomed. Eng. 9, 455–470 (2025).
Article CAS PubMed Google Scholar
Kundu, S. AI in medicine must be explainable. Nat. Med. 27, 1328 (2021).
Article CAS PubMed Google Scholar
Transparency for Machine Learning-Enabled Medical Devices: Guiding Principles (U.S. Food and Drug Administration, 2024); https://www.fda.gov/media/179269/download
Rudin, C. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1, 206–215 (2019).
Article PubMed PubMed Central Google Scholar
Van Noorden, R. & Perkel, J. M. AI and science: what 1,600 researchers think. Nature 621, 672–675 (2023).
Article PubMed Google Scholar
Wang, H. et al. Scientific discovery in the age of artificial intelligence. Nature 620, 47–60 (2023).
Article CAS PubMed Google Scholar
Messeri, L. & Crockett, M. J. Artificial intelligence and illusions of understanding in scientific research. Nature 627, 49–58 (2024).
Article CAS PubMed Google Scholar
Letham, B., Rudin, C., McCormick, T. H. & Madigan, D. Interpretable classifiers using rules and Bayesian analysis: building a better stroke prediction model. Ann. Appl. Stat. 9, 1350 – 1371 (2015).
Article Google Scholar
Laber, E., Murtinho, L. & Oliveira, F. Shallow decision trees for explainable k-means clustering. Pattern Recognit. 137, 109239 (2023).
Article Google Scholar
Boruah, A. N., Biswas, S. K. & Bandyopadhyay, S. Transparent rule generator random forest (TRG-RF): an interpretable random forest. Evol. Syst. 14, 69–83 (2023).
Article Google Scholar
Tan, S., Caruana, R., Hooker, G. & Lou, Y. Distill-and-Compare: auditing black-box models using transparent model distillation. In Proc. 2018 AAAI/ACM Conference on AI, Ethics, and Society 303–310 (Association for Computing Machinery, 2018).
Lundberg, S. M. et al. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2, 56–67 (2020).
Article PubMed PubMed Central Google Scholar
Alfeo, A. L. et al. From local counterfactuals to global feature importance: efficient, robust, and model-agnostic explanations for brain connectivity networks. Comput. Methods Programs Biomed. 236, 107550 (2023).
Article PubMed PubMed Central Google Scholar
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. In Proc. IEEE International Conference on Computer Vision (ICCV) 618–626 (2017).
Srinivas, S. & Fleuret, F. Full-gradient representation for neural network visualization. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 32 (Curran Associates, 2019).
Ribeiro, M. T., Singh, S. & Guestrin, C. ‘Why should I trust you?’ Explaining the predictions of any classifier. In Proc. 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1135–1144 (ACM, 2016).
Yang, Q. et al. MFPP: morphological fragmental perturbation pyramid for black-box model explanations. In 2020 25th International Conference on Pattern Recognition (ICPR) 1376–1383 (IEEE, 2021).
Huang, Q., Yamada, M., Tian, Y., Singh, D. & Chang, Y. GraphLIME: local interpretable model explanations for graph neural networks. IEEE Trans. Knowl. Data Eng. 35, 6968–6972 (2022).
Article Google Scholar
Guidotti, R., Monreale, A., Matwin, S. & Pedreschi, D. Explaining image classifiers generating exemplars and counter-exemplars from latent representations. In Proc. AAAI Conference on Artificial Intelligence 13665–13668 (2020).
Akula, A., Wang, S. & Zhu, S.-C. CoCoX: generating conceptual and counterfactual explanations via fault-lines. In Proc. AAAI Conference on Artificial Intelligence 2594–2601 (2020).
Akula, A. R. et al. CX-ToM: counterfactual explanations with theory-of-mind for enhancing human trust in image recognition models. iScience 25, 103581 (2022).
Article PubMed Google Scholar
Bass, C. et al. ICAM-Reg: interpretable classification and regression with feature attribution for mapping neurological phenotypes in individual scans. IEEE Trans. Med. Imaging 42, 959–970 (2022).
Article Google Scholar
Ghassemi, M., Oakden-Rayner, L. & Beam, A. L. The false hope of current approaches to explainable artificial intelligence in health care. Lancet Digit. Health 3, e745–e750 (2021).
Article CAS PubMed Google Scholar
Shrikumar, A., Greenside, P. & Kundaje, A. Learning important features through propagating activation differences. In International Conference on Machine Learning 3145–3153 (PMLR, 2017).
Adebayo, J. et al. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems (eds Bengio, S. et al.) 31 (Curran Associates, Inc., 2018).
Chen, L. et al. Hidden bias in the DUD-E dataset leads to misleading performance of deep learning in structure-based virtual screening. PLoS ONE 14, 1–22 (2019).
Google Scholar
Kindermans, P.-J. et al. The (Un)reliability of Saliency Methods.In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning (eds Samek, W. et al.) 267–280 (Springer, 2019).
Kim, B. et al. Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2668–2677 (PMLR, 2018).
Tenenbaum, J. B., Silva, V. D. & Langford, J. C. A global geometric framework for nonlinear dimensionality reduction. Science 290, 2319–2323 (2000).
Article CAS PubMed Google Scholar
Koh, P. W. et al. Concept bottleneck models. In International Conference on Machine Learning (eds Daumé, H. III & Singh, A.) 5338–5348 (PMLR, 2020).
Singla, S., Wallace, S., Triantafillou, S. & Batmanghelich, K. Using causal analysis for conceptual deep learning explanation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Bruijne, M. et al.) 519–528 (Springer, 2021).
Xie, R. et al. Accurate explanation model for image classifiers using class association embedding. In 2024 IEEE 40th International Conference on Data Engineering (ICDE) 2271–2284 (IEEE, 2024).
Geirhos, R. et al. Generalisation in humans and deep neural networks. In Advances in Neural Information Processing Systems (eds Bengio, S. et al.) 31 (Curran Associates, Inc., 2018).
Kermany, D. S. et al. Identifying medical diagnoses and treatable diseases by image-based deep learning. Cell 172, 1122–1131 (2018).
Article CAS PubMed Google Scholar
Fu, H. et al. PALM: Pathologic Myopia Challenge. IEEE DataPort https://doi.org/10.21227/55pk-8z03 (2019).
Li, T. et al. Diagnostic assessment of deep learning algorithms for diabetic retinopathy screening. Inf. Sci. 501, 511–522 (2019).
Article Google Scholar
Hamada, A. Br35H :: Brain Tumor Detection 2020. Kaggle https://www.kaggle.com/datasets/ahmedhamada0/brain-tumor-detection (2020).
Baid, U. et al. The RSNA-ASNR-MICCAI BraTS 2021 benchmark on brain tumor segmentation and radiogenomic classification. Preprint at https://arxiv.org/abs/2107.02314 (2021).
Antwarg, L., Galed, C., Shimoni, N., Rokach, L. & Shapira, B. Shapley-based feature augmentation. Inf. Fusion 96, 92–102 (2023).
Article Google Scholar
Dosovitskiy, A. et al. An image is worth 16x16 words: transformers for image recognition at scale. Preprint at https://arxiv.org/abs/2010.11929 (2021).
Zhou, B., Khosla, A., Lapedriza, A., Oliva, A. & Torralba, A. Learning deep features for discriminative localization. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2921–2929 (2016).
Kawahara, J., Daneshvar, S., Argenziano, G. & Hamarneh, G. Seven-point checklist and skin lesion classification using multitask multimodal neural nets. IEEE J. Biomed. Health Inform. 23, 538–546 (2018).
Article Google Scholar
Johnson, A. E. et al. MIMIC-CXR, a de-identified publicly available database of chest radiographs with free-text reports. Sci. Data 6, 317 (2019).
Article PubMed PubMed Central Google Scholar
Irvin, J. et al. CheXpert: a large chest radiograph dataset with uncertainty labels and expert comparison. In Proc. AAAI Conference on Artificial Intelligence 590–597 (2019).
Jain, S. et al. RadGraph: extracting clinical entities and relations from radiology reports. Preprint at https://arxiv.org/abs/2106.14463 (2021).
Johnson, P. T. et al. Drusen-associated degeneration in the retina. Investig. Ophthalmol. Vis. Sci. 44, 4481–4488 (2003).
Article Google Scholar
Lamin, A., El Nokrashy, A., Chandra, S. & Sivaprasad, S. Association of longitudinal changes in drusen characteristics and retinal layer volumes with subsequent subtype of choroidal neovascularisation. Ophthalmic Res. 63, 375–382 (2020).
Article PubMed Google Scholar
Pachade, S. et al. Retinal Fundus Multi-Disease Image Dataset (RFMID): a dataset for multi-disease detection research. Data 6, 14 (2021).
Article Google Scholar
Nicholson, L., Talks, S. J., Amoaku, W., Talks, K. & Sivaprasad, S. Retinal vein occlusion (RVO) guideline: executive summary. Eye 36, 909–912 (2022).
Article PubMed PubMed Central Google Scholar
An, D., Chandrasekera, E., Yu, D.-Y. & Balaratnasingam, C. Non-proliferative diabetic retinopathy is characterized by nonuniform alterations of peripapillary capillary networks. Investig. Ophthalmol. Vis. Sci. 61, 39 (2020).
Article Google Scholar
Guo, Y. et al. Developing and validating models to predict progression to proliferative diabetic retinopathy. Ophthalmol. Sci. 3, 100276 (2023).
Article PubMed PubMed Central Google Scholar
Roy, S. & Kim, D. Retinal capillary basement membrane thickening: role in the pathogenesis of diabetic retinopathy. Prog. Retin. Eye Res. 82, 100903 (2021).
Article PubMed Google Scholar
Zhou, J. & Chen, B. Retinal cell damage in diabetic retinopathy. Cells 12, 1342 (2023).
Article CAS PubMed PubMed Central Google Scholar
Abdolrahimzadeh, S., Di Pippo, M., Ciancimino, C., Di Staso, F. & Lotery, A. J. Choroidal vascularity index and choroidal thickness: potential biomarkers in retinitis pigmentosa. Eye 37, 1766–1773 (2023).
Article PubMed Google Scholar
Arrigo, A. et al. Choroidal patterns in retinitis pigmentosa: correlation with visual acuity and disease progression. Transl. Vis. Sci. Technol. 9, 17 (2020).
PubMed PubMed Central Google Scholar
Gan, Y. et al. Correlation between focal choroidal excavation and underlying retinochoroidal disease: a pathological hypothesis from clinical observation. Retina 42, 348–356 (2022).
Article CAS PubMed PubMed Central Google Scholar
Sugiyama, R., Ohnishi, T., Yamagami, S. & Nagaoka, T. A case of acute syphilitic posterior placoid chorioretinitis showing improved choroidal blood flow after treatment. Am. J. Ophthalmol. Case Rep. 32, 101880 (2023).
Article PubMed PubMed Central Google Scholar
Li, H.-P., Yuan, S.-Q., Wang, X.-G., Sheng, X.-L. & Li, X.-R. Myopia with X-linked retinitis pigmentosa results from a novel gross deletion of RPGR gene. Int. J. Ophthalmol. 13, 1306 (2020).
Article PubMed PubMed Central Google Scholar
Coviltir, V. et al. Update on myopia risk factors and microenvironmental changes. J. Ophthalmol. 2019, 4960852 (2019).
Article PubMed PubMed Central Google Scholar
Hooker, S., Erhan, D., Kindermans, P.-J. & Kim, B. A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems (eds Wallach, H. et al.) 32 (Curran Associates, 2019).
Samek, W., Binder, A., Montavon, G., Lapuschkin, S. & Müller, K.-R. Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28, 2660–2673 (2016).
Article Google Scholar
Ghosh, S., Yu, K. & Batmanghelich, K. Distilling blackbox to interpretable models for efficient transfer learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Greenspan, H. et al.) 628–638 (Springer, 2023).
Moody, G. B. & Mark, R. G. The impact of the MIT-BIH arrhythmia database. IEEE Eng. Med. Biol. Mag. 20, 45–50 (2001).
Article CAS PubMed Google Scholar
Ciriello, G. et al. Comprehensive molecular portraits of invasive lobular breast cancer. Cell 163, 506–519 (2015).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. et al. ChestX-ray8: hospital-scale chest X-ray database and benchmarks on weakly-supervised classification and localization of common thorax diseases. In Proc. IEEE Conference on Computer Vision and Pattern Recognition 2097–2106 (2017).
Kim, C. et al. Transparent medical image AI via an image–text foundation model grounded in medical literature. Nat. Med. 30, 1154–1165 (2024).
Article CAS PubMed Google Scholar
Carion, N. et al. SAM 3: segment anything with concepts. Preprint at https://arxiv.org/abs/2511.16719 (2026).
Montenegro, H. & Cardoso, J. S. Anonymizing medical case-based explanations through disentanglement. Med. Image Anal. 95, 103209 (2024).
Article PubMed Google Scholar
Yu, Y. et al. White-box transformers via sparse rate reduction. In Advances in Neural Information Processing Systems (eds. Oh, A. et al.) 9422–9457 (Curran Associates, Inc., 2023).
Offroy, M. & Duponchel, L. Topological data analysis: a promising big data exploration tool in biology, analytical chemistry and physical chemistry. Anal. Chim. Acta 910, 1–11 (2016).
Article CAS PubMed Google Scholar
Joshi, M. & Joshi, D. A survey of topological data analysis methods for big data in healthcare intelligence. Int. J. Appl. Eng. Res 14, 584–588 (2019).
Google Scholar
Ester, M., Kriegel, H.-P., Sander, J. & Xu, X. A density-based algorithm for discovering clusters in large spatial databases with noise. In International Conference on Knowledge Discovery and Data Mining 226–231 (AAAI Press, 1996).
Idiap Research Institute. fullgrad-saliency. GitHub https://github.com/idiap/fullgrad-saliency (2019).
Tao, Y. et al. LAGAN: lesion-aware generative adversarial networks for edema area segmentation in SD-OCT images. IEEE J. Biomed. Health Inform. 27, 2432–2443 (2023).
Article PubMed Google Scholar
Fang, Y. et al. Diffexplainer: unveiling black box models via counterfactual generation. In International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Linguraru, M. G. et al.) 208–218 (Springer, 2024).

Download references

Acknowledgements

We thank the clinicians from the Zhongshan Ophthalmic Center, Sun Yat-sen University, for helping us by participating in the blinded expert evaluations. This work was supported by the Strategic Priority Research Program (Pre-research Project) of the Chinese Academy of Sciences (XDA0510201 to Y.C.), the Shenzhen Science and Technology Program (KQTD20200820113106007 to Y.P.), the Shenzhen Key Laboratory of Intelligent Bioinformatics (ZDSYS20220422103800001 to Y.P.) and the National Natural Science Foundation of China (U22A2041 to Y.P. and Y.C.). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Author information

Authors and Affiliations

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Ruitao Xie, Xiaoxi He, Limai Jiang, Rui Xiao, Bokai Yang, Ye Li & Yunpeng Cai
Faculty of Computer Science and Artificial Intelligence, Shenzhen University of Advanced Technology, Shenzhen, China
Ruitao Xie, Jinling Tang, Yi Pan & Yunpeng Cai
University of Chinese Academy of Sciences, Beijing, China
Ruitao Xie & Limai Jiang
Faculty of Medicine, Chinese University of Hong Kong, Hong Kong, China
Mini Han Wang
Zhuhai People’s Hospital (The Affiliated Hospital of Beijing Institute of Technology, Zhuhai Clinical Medical College of Jinan University), Zhuhai, China
Mini Han Wang
Zhuhai Institute of Advanced Technology Chinese Academy of Sciences, Zhuhai, China
Mini Han Wang
Department of Statistics and Data Science, University of California, Los Angeles, Los Angeles, CA, USA
Jingbang Chen
Institute of Intelligence Science and Engineering, Shenzhen Polytechnic University, Shenzhen, China
Bokai Yang
Shenzhen Key Laboratory of Intelligent Bioinformatics, Shenzhen Institute of Advanced Technology, Shenzhen, China
Yi Pan

Authors

Ruitao Xie
View author publications
Search author on:PubMed Google Scholar
Xiaoxi He
View author publications
Search author on:PubMed Google Scholar
Limai Jiang
View author publications
Search author on:PubMed Google Scholar
Mini Han Wang
View author publications
Search author on:PubMed Google Scholar
Jingbang Chen
View author publications
Search author on:PubMed Google Scholar
Rui Xiao
View author publications
Search author on:PubMed Google Scholar
Bokai Yang
View author publications
Search author on:PubMed Google Scholar
Ye Li
View author publications
Search author on:PubMed Google Scholar
Jinling Tang
View author publications
Search author on:PubMed Google Scholar
Yi Pan
View author publications
Search author on:PubMed Google Scholar
Yunpeng Cai
View author publications
Search author on:PubMed Google Scholar

Contributions

R. Xie and Y.C. conceived the idea. R. Xie and Y.C. designed the experiments. R. Xie, X.H., L.J. and J.C. conducted the experiments. R. Xie, R. Xiao and Y.C. collected the datasets. R. Xie and M.H.W. analysed the data and experimental results. R. Xie and Y.C. wrote the paper. J.T. and Y.P. participated in discussions and provided critical guidance for the methods, experiments and the writing. B.Y. and Y.L. designed and built the scoring website. Y.C. and Y.P. offered computing resources and financial support. All authors reviewed and approved the final version of the paper.

Corresponding authors

Correspondence to Jinling Tang, Yi Pan or Yunpeng Cai.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Biomedical Engineering thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Experiment results for cross-dataset validation.

Cross-dataset experiments on the NIH-CXR and CheXpert datasets. a. Cross-dataset training results. Classification accuracy for Consolidation on the CheXpert test set using different methods, where ‘All’ represents results obtained using the complete CheXpert training data, ‘0%’ to ‘15%’ denote results obtained using the NIH-CXR dataset combined with x% of the CheXpert training set, and ‘Decline’ indicates the performance degradation when comparing models trained solely on the cross-dataset NIH-CXR (‘0%’) versus those trained entirely on CheXpert (‘All’). b. Class-associated manifolds obtained on the CheXpert test set (red: Consolidation class; blue: normal class), where the left manifold is derived from training on the NIH-CXR dataset and the right manifold from training on the CheXpert training set.

Extended Data Fig. 2 Experiments results showing the model behaviors regarding spurious correlation inputs.

Examples generated by the CAML method on the NIH-CXR and CheXpert datasets. Left and right inputs (with class labels shown above) provide identity (ID) and class (CL) codes respectively, with synthesized images in the center. Red arrows highlight artifacts. Below, we detail the types of these artifacts, their association with the corresponding diseases, and the desired behavior of the explanation model (whether such artifacts should appear in generated samples, assuming a well-calibrated black-box classifier): First row (left): likely metallic foreign body, which has no direct association with pneumothorax and should be excluded from generated examples; First row (right): likely chest tube, which is associated with pneumothorax but not causal (chest tubes are a standard treatment for pneumothorax, and in the NIH-CXR dataset, some pneumothorax-labeled images show indwelling chest tubes indicating active treatment), and should also be excluded; Second row (left): cardiac pacemaker, which has no direct association with pneumothorax and should be retained; Second row (right): likely metallic foreign body, which has no direct association with pneumothorax and should be retained; Third row (left): cardiac pacemaker, which has no direct association with pneumothorax and should be excluded; Third row (right): likely chest tube, which is associated with pleural effusion but not causal (chest tubes are often used to drain fluid from the pleural space and relieve lung compression, and in the dataset, some pleural effusion-labeled images show indwelling chest tubes indicating active treatment), and should be excluded; Fourth row (left): cardiac pacemaker, which has no direct association with pleural effusion and should be excluded; Fourth row (right): likely chest tube, which is associated with pleural effusion but not causal and should be retained. It can be observed that when the receiver (the sample providing the ID code) contains these artifacts, the generated samples retain the receiver’s artifacts even if the donor (the sample providing the CL code) does not have such artifacts. Conversely, when the receiver does not contain these artifacts but the donor does, the generated samples do not exhibit these artifacts. This indicates that these artifacts are not treated as class-associated features in these examples.

Extended Data Fig. 3 Experiment results showing short-cut learning successfully detected by CAML.

Cases revealing classifier shortcut learning behavior through CAML. a. Class-associated manifold from the PALM dataset. b. A series of images generated along the path, with classifier (trained on the PALM dataset where pathological myopia samples undergo brightness enhancement) predictions displayed above each image (values in parentheses indicate the predicted probability of pathological myopia). During counterfactual generation process, only brightness changes in the image, yet the classifier’s prediction shifts, indicating that the classifier uses brightness as a shortcut feature for pathological class identification.

Supplementary information

Supplementary Information (download PDF )

Supplementary Appendices 1–6 and Figs. 1–18.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Data 1 (download ZIP )

Source data for Supplementary Figs. 1–4 and 12.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Xie, R., He, X., Jiang, L. et al. Bridging the interpretability gap for medical artificial intelligence models using class-association manifold learning. Nat. Biomed. Eng (2026). https://doi.org/10.1038/s41551-026-01676-w

Download citation

Received: 19 September 2024
Accepted: 01 April 2026
Published: 18 May 2026
Version of record: 18 May 2026
DOI: https://doi.org/10.1038/s41551-026-01676-w