Abstract
Rare cancers comprise 20–25% of malignancies (over 70% in pediatric oncology) but face major diagnostic challenges due to limited expert availability. While pathology vision-language models show promising zero-shot capabilities for common cancers, their performance on rare cancers remains limited. Existing multi-instance learning (MIL) methods rely solely on visual features, overlooking cross-modal knowledge and compromising interpretability critical for rare cancer diagnosis. To address this, we propose PathPT, a framework that exploits vision-language foundation models through spatially-aware visual aggregation and task-specific prompt tuning. PathPT converts WSI-level supervision into fine-grained tile-level guidance, preserving tumor localization and enabling cross-modal reasoning. Across eight rare and three common cancer datasets–spanning 56 subtypes and 3958 WSIs, PathPT consistently outperforms state-of-the-art methods under data-scarce settings. It achieves substantial gains in both subtyping accuracy and cancerous region grounding ability, providing a scalable, interpretable AI solution to improve rare cancer subtyping with limited access to specialized expertise.
Similar content being viewed by others
Data availability
The test datasets of TCGA-BRCA, TCGA-BRAIN (including TCGA-GBM, TCGA-LGG), TCGA-SARC, TCGA-UCS, TCGA-THYM for cancer subtyping used in this study are available in the TCGA database (https://portal.gdc.cancer.gov/), the EBRAINS database (https://data-proxy.ebrains.eu/datasets/), and UBC-OCEAN. The test datasets for cancer region segmentation used in this study are available in CAMELYON16, PANDA, and AGGC22. The rare pediatric cancer WSI data (KidRare) generated in this study have been deposited in the Hugging Face database (https://huggingface.co/datasets/Firehdx233/KidRare/). The KidRare data are available under restricted access to ensure they are used exclusively for non-commercial, academic research purposes. Access can be obtained by submitting the data access request form detailing the user’s full name, affiliation, and intended research use. Source data generated in this study are provided with this paper. Source data are provided with this paper.
Code availability
The source codes for PathPT are available at https://github.com/MAGIC-AI4Med/PathPT.
References
DeSantis, C. E., Kramer, J. L. & Jemal, A. The burden of rare cancers in the united states. CA Cancer J. Clin. 67, 261–272 (2017).
Butler, E. et al. Recent progress in the treatment of cancer in children. CA Cancer J. Clin. 71, 315–332 (2021).
Ni, X. et al. Socioeconomic inequalities in cancer incidence and access to health services among children and adolescents in china: a cross-sectional study. Lancet 400, 1020–1032 (2022).
Chen, X., Fan, H., Girshick, R. & He, K. Improved baselines with momentum contrastive learning. Preprint at https://arxiv.org/abs/2003.04297 (2020).
He, K. et al. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 16000–16009 (IEEE, 2022).
Zhou, J. et al. ibot: Image bert pre-training with online tokenizer. Preprint at https://arxiv.org/abs/2111.07832 (2021).
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 9650–9660 (IEEE, 2021).
Radford, A. et al. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning, 8748–8763, (ICML, 2021).
Zhai, X., Mustafa, B., Kolesnikov, A. & Beyer, L. Sigmoid loss for language image pre-training. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 11975–11986 (IEEE, 2023).
Yu, J., Wang, Z., Vasudevan, V., Yeung, L., Seyedhosseini, M. & Wu, Y. CoCa: Contrastive Captioners are Image-Text Foundation Models. Trans. Mach. Learn. Res. Aug 2022 (2022).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical twitter. Nat. Med. 29, 2307–2316 (2023).
Ikezogwo, W. et al. Quilt-1m: One million image-text pairs for histopathology. In Advances in Neural Information Processing Systems, 36 (NIPS, 2024).
Ma, J. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng.10, 545–564 (2026).
Nechaev, D., Pchelnikov, A. & Ivanova, E. Hibou: a family of foundational vision transformers for pathology. Preprint at https://arxiv.org/abs/2406.05074 (2024).
Shaikovski, G. et al. Prism: a multi-modal generative foundation model for slide-level histopathology. Preprint at https://arxiv.org/abs/2405.10254 (2024).
Sun, Y. et al. Cpath-omni: a unified multimodal foundation model for patch and whole slide image analysis in computational pathology. In Proceedings of the Computer Vision and Pattern Recognition Conference, 10360–10371 (CVPR, 2025).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Xiang, J. et al. A vision–language foundation model for precision oncology. Nature 638, 769–778 (2025).
Xu, Y. et al. A multimodal knowledge-enhanced whole-slide pathology foundation model. Nat Commun. 16, 11406 (2025).
Yang, Z. et al. A foundation model for generalizable cancer diagnosis and survival prediction from histopathological images. Nat. Commun. 16, 2366 (2025).
Zhou, X. et al. Knowledge-enhanced visual-language pretraining for computational pathology. In European Conference on Computer Vision, 345–362 (Springer, 2024).
Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).
Ding, T., Wagner, S.J., Song, A.H. et al. A multimodal whole-slide foundation model for pathology. Nat Med 31, 3749–3761 (2025).
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Zhou, X. et al. Knowledge-enhanced pretraining for vision-language pathology foundation model on cancer diagnosis. Cancer Cell (2026).
Ilse, M., Tomczak, J. M. & Welling, M. Attention-based deep multiple instance learning. In Proceedings of the 35th International Conference on Machine Learning, 2127–2136 (PMLR, 2018).
Lu, M. Y. et al. Visual language pretrained multiple instance zero-shot transfer for histopathology images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 19764–19775 (IEEE, 2023).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Qu, L. et al. The rise of ai language pathologists: exploring two-level prompt learning for few-shot weakly-supervised whole slide image classification. In Advances in Neural Information Processing Systems, 36 (NIPS, 2024).
Shao, Z. et al. Transmil: transformer based correlated multiple instance learning for whole slide image classification. Adv. Neural Inf. Process. Syst. 34, 2136–2147 (2021).
Shi, J., Li, C., Gong, T., Zheng, Y. & Fu, H. ViLa-MIL: Dual-scale vision-language multiple instance learning for whole slide image classification. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 11248–11258 (IEEE, 2024).
Zhu, W. et al. DGR-MIL: exploring diverse global representation in multiple instance learning for whole slide image classification. In European Conference on Computer Vision, 333–351 (Springer, 2024).
Roetzer-Pejrimovsky, T. et al. The digital brain tumour atlas, an open histopathology resource. Sci. Data 9, 55 (2022).
Farahani, H. et al. Deep learning-based histotype diagnosis of ovarian carcinoma whole-slide pathology images. Mod. Pathol. 35, 1983–1990 (2022).
Asadi-Aghbolaghi, M. et al. Machine learning-driven histotype diagnosis of ovarian carcinoma: Insights from the ocean ai challenge. Preprint at https://www.medrxiv.org/content/10.1101/2024.04.19.24306099v1 (2024).
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Bulten, W. et al. Artificial intelligence for diagnosis and gleason grading of prostate cancer: the panda challenge. Nat. Med. 28, 154–163 (2022).
Huo, X. et al. A comprehensive ai model development framework for consistent gleason grading. Commun. Med. 4, 84 (2024).
Zhou, K., Yang, J., Loy, C. C. & Liu, Z. Learning to prompt for vision-language models. Int. J. Comput. Vis. 130, 2337–2348 (2022).
Acknowledgements
This work was supported by the Scientific Research Innovation Capability Support Project for Young Faculty (ZYGXQNJSKYCXNLZCXM-I22 to W.X.), the National Natural Science Foundation of China (No. 24Z031503678 to W.X.), the Science and Technology Innovation Action Plan of Shanghai Municipality (No.24QA2703800 to W.X.), and the China Postdoctoral Science Foundation (Certificate Number: 2023M741850 to X.Z.).
Author information
Authors and Affiliations
Contributions
D.H. and X.Z. processed the data, developed the code, performed the experiments, and wrote the manuscript. W.G., L.W., X.Y., and J.M. were responsible for pathology data collection and scanning. L.Z., S.X., and G.W. contributed to pathology data processing, with L.Z. additionally assisting in coding and experiments. X.M.Z. provided valuable advice and revisions for the manuscript. W.X. directly led and supervised the project. X.S., Y.W., K.S., and Y.Z. provided institutional leadership, overall project supervision, and guidance. All authors reviewed and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
: Nature Communications thanks Issam El Naqa and Xiaoxi Pan for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Source data
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
He, D., Zhou, X., Guan, W. et al. Boosting pathology foundation models via few-shot prompt-tuning for rare cancer subtyping. Nat Commun (2026). https://doi.org/10.1038/s41467-026-71715-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41467-026-71715-2


