Boosting pathology foundation models via few-shot prompt-tuning for rare cancer subtyping

He, Dexuan; Zhou, Xiao; Guan, Wenbin; Zhang, Liyuan; Zhang, Xiaoman; Xu, Sinuo; Wang, Ge; Wang, Lifeng; Yuan, Xiaojun; Ma, Jing; Sun, Xin; Wang, Yanfeng; Sun, Kun; Zhang, Ya; Xie, Weidi

doi:10.1038/s41467-026-71715-2

Download PDF

Article
Open access
Published: 11 April 2026

Boosting pathology foundation models via few-shot prompt-tuning for rare cancer subtyping

Nature Communications (2026) Cite this article

6867 Accesses
1 Citations
1 Altmetric
Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

Abstract

Rare cancers comprise 20–25% of malignancies (over 70% in pediatric oncology) but face major diagnostic challenges due to limited expert availability. While pathology vision-language models show promising zero-shot capabilities for common cancers, their performance on rare cancers remains limited. Existing multi-instance learning (MIL) methods rely solely on visual features, overlooking cross-modal knowledge and compromising interpretability critical for rare cancer diagnosis. To address this, we propose PathPT, a framework that exploits vision-language foundation models through spatially-aware visual aggregation and task-specific prompt tuning. PathPT converts WSI-level supervision into fine-grained tile-level guidance, preserving tumor localization and enabling cross-modal reasoning. Across eight rare and three common cancer datasets–spanning 56 subtypes and 3958 WSIs, PathPT consistently outperforms state-of-the-art methods under data-scarce settings. It achieves substantial gains in both subtyping accuracy and cancerous region grounding ability, providing a scalable, interpretable AI solution to improve rare cancer subtyping with limited access to specialized expertise.

Adaptive diagnostic reasoning framework for pathology with multimodal large language models

Article Open access 07 March 2026

Large language models driven neural architecture search for universal and lightweight disease diagnosis on histopathology slide images

Article Open access 18 November 2025

Cross-platform multi-cancer histopathology classification using local-window vision transformers

Article Open access 19 November 2025

Data availability

The test datasets of TCGA-BRCA, TCGA-BRAIN (including TCGA-GBM, TCGA-LGG), TCGA-SARC, TCGA-UCS, TCGA-THYM for cancer subtyping used in this study are available in the TCGA database (https://portal.gdc.cancer.gov/), the EBRAINS database (https://data-proxy.ebrains.eu/datasets/), and UBC-OCEAN. The test datasets for cancer region segmentation used in this study are available in CAMELYON16, PANDA, and AGGC22. The rare pediatric cancer WSI data (KidRare) generated in this study have been deposited in the Hugging Face database (https://huggingface.co/datasets/Firehdx233/KidRare/). The KidRare data are available under restricted access to ensure they are used exclusively for non-commercial, academic research purposes. Access can be obtained by submitting the data access request form detailing the user’s full name, affiliation, and intended research use. Source data generated in this study are provided with this paper. Source data are provided with this paper.

Code availability

The source codes for PathPT are available at https://github.com/MAGIC-AI4Med/PathPT.

References

DeSantis, C. E., Kramer, J. L. & Jemal, A. The burden of rare cancers in the united states. CA Cancer J. Clin. 67, 261–272 (2017).
Google Scholar
Butler, E. et al. Recent progress in the treatment of cancer in children. CA Cancer J. Clin. 71, 315–332 (2021).
Google Scholar
Ni, X. et al. Socioeconomic inequalities in cancer incidence and access to health services among children and adolescents in china: a cross-sectional study. Lancet 400, 1020–1032 (2022).
Google Scholar
Chen, X., Fan, H., Girshick, R. & He, K. Improved baselines with momentum contrastive learning. Preprint at https://arxiv.org/abs/2003.04297 (2020).
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16000–16009 (IEEE, 2022).
Zhou, J. et al. ibot: Image bert pre-training with online tokenizer. Preprint at https://arxiv.org/abs/2111.07832 (2021).
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 9650–9660 (IEEE, 2021).
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763, (ICML, 2021).
Zhai, X., Mustafa, B., Kolesnikov, A. & Beyer, L. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11975–11986 (IEEE, 2023).
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M. & Wu, Y. CoCa: Contrastive Captioners are Image-Text Foundation Models. Trans. Mach. Learn. Res. Aug 2022 (2022).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).
Google Scholar
Ikezogwo, W. et al. Quilt-1m: One million image-text pairs for histopathology. In Advances in Neural Information Processing Systems, 36 (NIPS, 2024).
Ma, J. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng.10, 545–564 (2026).
Nechaev, D., Pchelnikov, A. & Ivanova, E. Hibou: a family of foundational vision transformers for pathology. Preprint at https://arxiv.org/abs/2406.05074 (2024).
Shaikovski, G. et al. Prism: a multi-modal generative foundation model for slide-level histopathology. Preprint at https://arxiv.org/abs/2405.10254 (2024).
Sun, Y. et al. Cpath-omni: a unified multimodal foundation model for patch and whole slide image analysis in computational pathology. In Proceedings of the Computer Vision and Pattern Recognition Conference, 10360–10371 (CVPR, 2025).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Google Scholar
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Xiang, J. et al. A vision–language foundation model for precision oncology. Nature 638, 769–778 (2025).
Xu, Y. et al. A multimodal knowledge-enhanced whole-slide pathology foundation model. Nat Commun. 16, 11406 (2025).
Yang, Z. et al. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images. Nat. Commun. 16, 2366 (2025).
Google Scholar
Zhou, X. et al. Knowledge-enhanced visual-language pretraining for computational pathology. In European Conference on Computer Vision, 345–362 (Springer, 2024).
Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).
Ding, T., Wagner, S.J., Song, A.H. et al. A multimodal whole-slide foundation model for pathology. Nat Med 31, 3749–3761 (2025).
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Google Scholar
Zhou, X. et al. Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis. Cancer Cell (2026).
Ilse, M., Tomczak, J. M. & Welling, M. Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, 2127–2136 (PMLR, 2018).
Lu, M. Y. et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19764–19775 (IEEE, 2023).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Google Scholar
Qu, L. et al. The rise of ai language pathologists: exploring two-level prompt learning for few-shot weakly-supervised whole slide image classification. In Advances in Neural Information Processing Systems, 36 (NIPS, 2024).
Shao, Z. et al. Transmil: transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 34, 2136–2147 (2021).
Google Scholar
Shi, J., Li, C., Gong, T., Zheng, Y. & Fu, H. ViLa-MIL: Dual-scale vision-language multiple instance learning for whole slide image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11248–11258 (IEEE, 2024).
Zhu, W. et al. DGR-MIL: exploring diverse global representation in multiple instance learning for whole slide image classification. In European Conference on Computer Vision, 333–351 (Springer, 2024).
Roetzer-Pejrimovsky, T. et al. The digital brain tumour atlas, an open histopathology resource. Sci. Data 9, 55 (2022).
Google Scholar
Farahani, H. et al. Deep learning-based histotype diagnosis of ovarian carcinoma whole-slide pathology images. Mod. Pathol. 35, 1983–1990 (2022).
Google Scholar
Asadi-Aghbolaghi, M. et al. Machine learning-driven histotype diagnosis of ovarian carcinoma: Insights from the ocean ai challenge. Preprint at https://www.medrxiv.org/content/10.1101/2024.04.19.24306099v1 (2024).
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Google Scholar
Bulten, W. et al. Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge. Nat. Med. 28, 154–163 (2022).
Google Scholar
Huo, X. et al. A comprehensive ai model development framework for consistent gleason grading. Commun. Med. 4, 84 (2024).
Google Scholar
Zhou, K., Yang, J., Loy, C. C. & Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vis. 130, 2337–2348 (2022).

Download references

Acknowledgements

This work was supported by the Scientific Research Innovation Capability Support Project for Young Faculty (ZYGXQNJSKYCXNLZCXM-I22 to W.X.), the National Natural Science Foundation of China (No. 24Z031503678 to W.X.), the Science and Technology Innovation Action Plan of Shanghai Municipality (No.24QA2703800 to W.X.), and the China Postdoctoral Science Foundation (Certificate Number: 2023M741850 to X.Z.).

Author information

These authors contributed equally: Dexuan He, Xiao Zhou, Wenbin Guan.

Authors and Affiliations

School of Artificial Intelligence, Shanghai Jiao Tong University, Shanghai, China
Dexuan He, Liyuan Zhang, Sinuo Xu, Yanfeng Wang, Ya Zhang & Weidi Xie
Shanghai Artificial Intelligence Laboratory, Shanghai, China
Xiao Zhou, Ya Zhang & Weidi Xie
Department of Pathology, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Wenbin Guan & Lifeng Wang
Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
Xiaoman Zhang
Department of Oral Pathology, Shanghai Ninth People’s Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Ge Wang
Department of Pediatric Hematology/Oncology, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Xiaojun Yuan
Department of Pathology, Shanghai Children’s Medical Center, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Jing Ma
Clinical Research and Innovation Unit, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Xin Sun
Department of Pediatric Cardiology, Xinhua Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China
Kun Sun
Engineering Research Centre of Techniques and Instruments for Diagnosis and Treatment of Congenital Heart Disease, Ministry of Education, Shanghai, China
Kun Sun
Institute of Artificial Intelligence for Medicine, School of Medicine, Shanghai Jiao Tong University, Shanghai, China
Ya Zhang

Authors

Dexuan He
View author publications
Search author on:PubMed Google Scholar
Xiao Zhou
View author publications
Search author on:PubMed Google Scholar
Wenbin Guan
View author publications
Search author on:PubMed Google Scholar
Liyuan Zhang
View author publications
Search author on:PubMed Google Scholar
Xiaoman Zhang
View author publications
Search author on:PubMed Google Scholar
Sinuo Xu
View author publications
Search author on:PubMed Google Scholar
Ge Wang
View author publications
Search author on:PubMed Google Scholar
Lifeng Wang
View author publications
Search author on:PubMed Google Scholar
Xiaojun Yuan
View author publications
Search author on:PubMed Google Scholar
Jing Ma
View author publications
Search author on:PubMed Google Scholar
Xin Sun
View author publications
Search author on:PubMed Google Scholar
Yanfeng Wang
View author publications
Search author on:PubMed Google Scholar
Kun Sun
View author publications
Search author on:PubMed Google Scholar
Ya Zhang
View author publications
Search author on:PubMed Google Scholar
Weidi Xie
View author publications
Search author on:PubMed Google Scholar

Contributions

D.H. and X.Z. processed the data, developed the code, performed the experiments, and wrote the manuscript. W.G., L.W., X.Y., and J.M. were responsible for pathology data collection and scanning. L.Z., S.X., and G.W. contributed to pathology data processing, with L.Z. additionally assisting in coding and experiments. X.M.Z. provided valuable advice and revisions for the manuscript. W.X. directly led and supervised the project. X.S., Y.W., K.S., and Y.Z. provided institutional leadership, overall project supervision, and guidance. All authors reviewed and approved the final manuscript.

Corresponding authors

Correspondence to Kun Sun, Ya Zhang or Weidi Xie.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

: Nature Communications thanks Issam El Naqa and Xiaoxi Pan for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Transparent Peer Review file (download PDF )

Reporting Summary (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

He, D., Zhou, X., Guan, W. et al. Boosting pathology foundation models via few-shot prompt-tuning for rare cancer subtyping. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71715-2

Download citation

Received: 30 August 2025
Accepted: 26 March 2026
Published: 11 April 2026
DOI: https://doi.org/10.1038/s41467-026-71715-2