Abstract
Pathological examination stands as the cornerstone in cancer diagnosis, impacting millions worldwide annually. With the shortage of pathologists globally, artificial intelligence (AI) has emerged rapidly to automate the diagnostics process. However, conventional AI models require substantial labeled data for each disease, posing huge challenges in scalability and practicality. Therefore, we introduce PRET (pan-cancer recognition without examples training), a few-shot system to achieve flexible, scalable, and effective cancer recognition across diverse organs, hospitals and tasks without training. Evaluated on 23 international benchmarks comprising 4,484 whole-slide images, our method outperforms existing approaches across 20 tasks, achieving over 97% area under the curve on 15 benchmarks with a maximum improvement of 36.76%. Notably, PRET delivers clinical-grade diagnostic performance in lymph node metastasis detection using only eight slide examples, outperforming 11 pathologists. By offering a flexible and cost-effective solution for pan-cancer recognition, PRET paves the way for accessible and equitable AI-based pathology systems, particularly benefiting minority populations and underserved regions.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout





Data availability
The in-house datasets from GDPH and QPCH are publicly available (https://huggingface.co/datasets/yili7eli/PRET/tree/main), including ESCC, PTC, CRC, GC, LC, BC, lymphoma, NSCLC-HQ and PTC-QP. The visual prompts involved for both in-house datasets and open datasets were also released, with data lists to reproduce data splits. The public datasets CAMELYON16 (ref. 28) and CAMELYON17 (ref. 30) are available online (https://camelyon16.grand-challenge.org/) and CAMELYON16-C (ref. 31) was realized by scanning corruption with code available from GitHub (https://github.com/superjamessyx/robustness_benchmark). TCGA datasets can be found from the National Institutes of Health Genomic Data Commons (https://portal.gdc.cancer.gov/), including NSCLC, RCC, ESCA and SARC. Source data are provided with this paper.
Code availability
All the involved model weights and Python packages are available online. Our work is publicly available from GitHub (https://github.com/xmed-lab/PRET), with detailed instructions, comments and evaluation scripts.
References
Ferlay, J. et al. Cancer statistics for the year 2020: an overview. Int. J. Cancer 149, 778–789 (2021).
Benediktsson, H., Whitelaw, J. & Roy, I. Pathology services in developing countries: a challenge. Arch. Pathol. Lab. Med. 131, 1636–1639 (2007).
Metter, D. M., Colgan, T. J., Leung, S. T., Timmons, C. F. & Park, J. Y. Trends in the US and Canadian pathologist workforces from 2007 to 2017. JAMA Netw. Open 2, e194337 (2019).
Märkl, B., Füzesi, L., Huss, R., Bauer, S. & Schaller, T. Number of pathologists in Germany: comparison with European countries, USA, and Canada. Virchows Arch. 478, 335–341 (2021).
Bera, K., Schalper, K. A., Rimm, D. L., Velcheti, V. & Madabhushi, A. Artificial intelligence in digital pathology—new tools for diagnosis and precision oncology. Nat. Rev. Clin. Oncol. 16, 703–715 (2019).
Song, A. H. et al. Artificial intelligence for digital and computational pathology. Nat. Rev. Bioeng. 1, 930–949 (2023).
Campanella, G. et al. Clinical-grade computational pathology using weakly supervised deep learning on whole slide images. Nat. Med. 25, 1301–1309 (2019).
Huang, S.-C. et al. Deep neural network trained on gigapixel images improves lymph node metastasis detection in clinical settings. Nat. Commun. 13, 3347 (2022).
Kundra, R. et al. OncoTree: a cancer classification system for precision oncology. JCO Clin. Cancer Inform. 5, 221–230 (2021).
Lu, M. Y. et al. Data-efficient and weakly supervised computational pathology on whole-slide images. Nat. Biomed. Eng. 5, 555–570 (2021).
Lu, M. Y. et al. A visual-language foundation model for computational pathology. Nat. Med. 30, 863–874 (2024).
Wang, X. et al. A pathology foundation model for cancer diagnosis and prognosis prediction. Nature 634, 970–978 (2024).
Xu, H. et al. A whole-slide foundation model for digital pathology from real-world data. Nature 630, 181–188 (2024).
Arslan, S. et al. A systematic pan-cancer study on deep learning-based prediction of multi-omic biomarkers from routine pathology images. Commun. Med. 4, 48 (2024).
Huang, Z., Bianchi, F., Yuksekgonul, M., Montine, T. J. & Zou, J. A visual–language foundation model for pathology image analysis using medical Twitter. Nat. Med. 29, 2307–2316 (2023).
Chen, R. J. et al. Towards a general-purpose foundation model for computational pathology. Nat. Med. 30, 850–862 (2024).
Kang, M., Song, H., Park, S., Yoo, D. & Pereira, S. Benchmarking self-supervised learning on diverse pathology datasets. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Brown, M. S. et al.) 3344–3354 (IEEE, 2023).
Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. & Krause, A.) 2127–2136 (PMLR, 2018).
Chen, R. J. et al. Scaling vision transformers to gigapixel images via hierarchical self-supervised learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Chellappa, R. et al.) 16144–16155 (IEEE, 2022).
Snell, J., Swersky, K. & Zemel, R. Prototypical networks for few-shot learning. In Proc. 31st International Conference on Neural Information Processing Systems (eds von Luxburg, U. et al.) 4080–4090 (ACM, 2017).
Wang, Y., Chao, W.-L., Weinberger, K. Q. & van der Maaten, L. SimpleShot: revisiting nearest-neighbor classification for few-shot learning. Preprint at https://arxiv.org/abs/1911.04623 (2019).
Brown, T. et al. Language models are few-shot learners. In Proc. 34th International Conference on Neural Information Processing Systems (eds Larochelle, H. et al.) 1877–1901 (Curran Associates, Inc., 2020).
Wang, X., Wang, W., Cao, Y., Shen, C. & Huang, T. Images speak in images: a generalist painter for in-context visual learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Brown, M. S. et al.) 6830–6839 (IEEE, 2023).
Zhang, J., Wang, B., Li, L., Nakashima, Y. & Nagahara, H. Instruct me more! Random prompting for visual in-context learning. In Proc. IEEE/CVF Winter Conference on Applications of Computer Vision (eds Souvenir, R. et al.) 2585–2594 (IEEE, 2024).
Sheng, D. et al. Towards more unified in-context visual understanding. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Camps, O. et al.) 13362–13372 (IEEE, 2024).
Zhao, H. et al. MMICL: empowering vision-language model with multi-modal in-context learning. In Proc. 12th International Conference on Learning Representations (ed. Kim, B.) (ICLR, 2024).
Ferber, D. et al. In-context learning enables multimodal large language models to classify cancer pathology images. Nat. Commun. 15, 10104 (2024).
Bejnordi, B. E. et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA 318, 2199–2210 (2017).
Qu, L. et al. The rise of AI language pathologists: exploring two-level prompt learning for few-shot weakly-supervised whole slide image classification. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 67551–67564 (Curran Associates, Inc., 2023).
Litjens, G. et al. 1399 H&E-stained sentinel lymph node sections of breast cancer patients: the CAMELYON dataset. Gigascience 7, giy065 (2018).
Zhang, Y. et al. Benchmarking the robustness of deep neural networks to common corruptions in digital pathology. In Proc. 25th International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Wang, L. et al.) 242–252 (Springer, 2022).
He, K., Zhang, X., Ren, S. & Sun, J. Deep residual learning for image recognition. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (eds Tuytelaars, T. et al.) 770–778 (IEEE, 2016).
Deng, J. et al. ImageNet: a large-scale hierarchical image database. In Proc. IEEE Conference on Computer Vision and Pattern Recognition (eds Huttenlocher, D. et al.) 248–255 (IEEE, 2009).
Vorontsov, E. et al. A foundation model for clinical-grade computational pathology and rare cancers detection. Nat. Med. 30, 2924–2935 (2024).
Ma, J. et al. A generalizable pathology foundation model using a unified knowledge distillation pretraining framework. Nat. Biomed. Eng. https://doi.org/10.1038/s41551-025-01488-4 (2025).
Ding, T. et al. A multimodal whole-slide foundation model for pathology. Nat. Med. 31, 3749–3761 (2025).
Shao, Z. et al. TransMIL: transformer based correlated multiple instance learning for whole slide image classification. In Proc. 35th International Conference on Neural Information Processing Systems (eds Ranzato, M. et al.) 2136–2147 (Curran Associates, Inc., 2021).
Li, H. et al. Task-specific fine-tuning via variational information bottleneck for weakly-supervised pathology whole slide image classification. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Brown, M. S. et al.) 7454–7463 (IEEE, 2023).
Tang, W. et al. Multiple instance learning framework with masked hard instance mining for whole slide image classification. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Brown, M. S. et al.) 4078–4087 (IEEE, 2023).
Chen, Y. et al. dMIL-Transformer: multiple instance learning via integrating morphological and spatial information for lymph node metastasis classification. IEEE J. Biomed. Health Inform. 27, 4433–4443 (2023).
Xiang, J. et al. A vision-language foundation model for precision oncology. Nature 638, 769–778 (2025).
Coudray, N. et al. Classification and mutation prediction from non-small cell lung cancer histopathology images using deep learning. Nat. Med. 24, 1559–1567 (2018).
Liang, J. et al. Deep learning supported discovery of biomarkers for clinical prognosis of liver cancer. Nat. Mach. Intell. 5, 408–420 (2023).
Zhou, Y., Li, X., Wang, Q. & Shen, J. Visual in-context learning for large vision-language models. In Findings of the Association for Computational Linguistics (eds Ku, L.-W. et al.) 15890–15902 (ACL, 2024).
Li, C. et al. LLaVA-med: training a large language-and-vision assistant for biomedicine in one day. In Proc. 37th International Conference on Neural Information Processing Systems (eds Oh, A. et al.) 28541–28564 (Curran Associates, Inc., 2023).
Li, Y. et al. Few-shot lymph node metastasis classification meets high performance on whole slide images via the informative non-parametric classifier. In Proc. 25th International Conference on Medical Image Computing and Computer-Assisted Intervention (eds Linguraru, M. G. et al.) 109–119 (Springer, 2024).
Caron, M. et al. Emerging properties in self-supervised vision transformers. In Proc. IEEE/CVF International Conference on Computer Vision (eds Berg, T. et al.) 9650–9660 (IEEE, 2021).
Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. In Proc. 9th International Conference on Learning Representations (ed. Mohamed, S.) (ICLR, 2021).
Xu, Y. et al. A multimodal knowledge-enhanced whole-slide pathology foundation model. Nat. Commun. 16, 11406 (2025).
Wang, X. et al. Transformer-based unsupervised contrastive learning for histopathological image classification. Med. Image Anal. 81, 102559 (2022).
Oquab, M. et al. DINOv2: Learning robust visual features without supervision. Trans. Mach. Learn. Res. https://openreview.net/forum?id=a68SUt6zFt (2024).
Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. International Conference on Machine Learning (eds Meila, M. & Zhang, T.) 8748–8763 (PMLR, 2021).
Ding, J. et al. LongNet: scaling transformers to 1,000,000,000 tokens. In Proc. 10th International Conference on Learning Representations (eds Hofmann, K. & Rush, A.) (ICLR, 2023).
Komura, D. et al. Universal encoding of pan-cancer histology by deep texture representations. Cell Rep. 38, 110424 (2022).
Riasatian, A. et al. Fine-tuning and training of DenseNet for histopathology image representation using tcga diagnostic slides. Med. Image Anal. 70, 102032 (2021).
Lu, M. Y. et al. AI-based pathology predicts origins for cancers of unknown primary. Nature 594, 106–110 (2021).
Acknowledgements
Y.L., T.X., Qixiang Zhang and X.L. received support from the Research Grants Council (RGC) of the Hong Kong Special Administrative Region (project nos. R6005-24, AoE/E-601/24-N and T45-401/22-N), the Hong Kong Joint Research Scheme of the National Natural Science Foundation of China/RGC (project no. N_HKUST654/24), the Hong Kong Innovation and Technology Fund under Project PRP/041/22FX and the National Natural Science Foundation of China (grant no. 62306254). T.X. also received support from the Hong Kong PhD Fellowship Scheme. Z.N. and Qingling Zhang are supported by grants from the Key R&D Program Projects in Guangdong Province (2021B0101420005 to Qingling Zhang), the National Natural Science Foundation of China (grant no. 82173033 to Qingling Zhang), the High-level Hospital Construction Project (DFJHBF202108 and YKY- KF202204 to X.-W.B. and Qingling Zhang) and the Guangdong Provincial Key Laboratory of AI in Medical Image Analysis and Application (2022B1212010011 to Z.L.). K.-H.Y. is supported in part by the National Institute of General Medical Sciences (grant no. R35GM142879), the National Heart, Lung and Blood Institute (grant no. R01HL174679), the Department of Defense Peer-Reviewed Cancer Research Program Career Development Award (HT9425-231-0523), the Research Scholar Grant (RSG-24-1253761-01-ESED) to K.-H.Y. from the American Cancer Society and the Harvard Medical School Dean’s Innovation Award. K.Z. is supported by the National Natural Science Foundation of China (W2431057), Macau Science and Technology Development Fund, Macau (0007/2020/AFJ, 0070/2020/A2 and 0003/2021/AKP) and Guangzhou National Laboratory (YW-SLJC0201). We would like to thank Q. Shao from the Hong Kong University of Science and Technology for his valuable suggestions.
Author information
Authors and Affiliations
Contributions
X.L. and Qingling Zhang planned the study. Y.L. designed the PRET methods guided by X.L. and conducted the experiments. Z.N. and Qingling Zhang collected the in-house data required for this study and organized the data labeling. T.X. and Qixiang Zhang implemented some baseline methods. Y.L., X.L., Qingling Zhang and K.-H.Y contributed to experimental design. The in-house WSIs were collected by Z.N. and Qingling Zhang, with support from M.Y. and X.-W.B. Data labeling was performed by Z.N., Z.L., F.F., B.Z., X.Q., L.S., J.Q., L.X., C.F., T.Q. and Q.W., who also accounted for the labeling costs. X.L. supervised the study. Y.L. wrote the paper, with contributions from Z.N., X.L. and K.Z. All authors discussed the results and contributed to the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
Qingling Zhang, Z.N., X.L. and Y.L. are inventors on a pending Chinese patent application (application no. 202510130037.4, ‘A pan-cancer AI pathology diagnosis system using few examples without training’; applicant: GDPH, Guangdong Academy of Medical Sciences) related to the PRET algorithm described in this paper. K.-H.Y. is an inventor on US patent 10,832,406. This patent is assigned to Harvard University and is not directly related to this paper. The other authors declare no competing interests.
Peer review
Peer review information
Nature Cancer thanks Jana Lipkova, Sheng Wang and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Method details of PRET and comparison with other settings.
a. PRET contains six major components, including the extractor, tagger, miner, classifier, aggregator, and post processor. b. Linear probing methods require task-specific parameter fine-tuning. The fine-tuned parameters exist in the multiple instance learning (MIL) module and linear classifier. Both the training and testing examples are embedded into slide-level global features. c. The KNN classifier also deploys slide-level features for each example slide without local path-level features. The MIL module turns to be global pooling from and linear classifier becomes KNN classifier. d. The MI-Prototype20 and MI-SimpleShot16 apply the prototypical classifier, where the prototypes are the mean features from examples. Note that prototypes are dataset-level features whose size is the number of classes. Then, they are used to calculate similarities with patch-level features from the test slide with a top K operation. e. The proposed PRET preserves rich information of patch-level features from both example and test slides, with novel modules to leverage visual in-context. The new modules involve a visual in-context tagger to process visual prompts, a miner to explore discriminative test patches, an informative in-context classifier to classify patches, and an attention aggregator to get the slide-level prediction. f. Comparison among methods in multiple aspects. PRET is unique at local information in example slides, effective prompt design and utilization, with advantages in aspects of training-free, pan-cancer recognition, segmentation supporting, high performance, local information in the test slide.
Extended Data Fig. 2 Dataset characteristics and labeling details.
a. Statistics of patient age for the proposed datasets (n = 906). b. Statistics of age patient sex for the proposed datasets (n = 906). c. Density map of tumor size (n = 565). The x-axis indicates the number of tumor patches per slide, suggesting the tumor size with the frequency on the y-axis. d. Examples of slides and visual prompts. The slide label is a certain number, while other visual prompts are boxes or outlines given in these examples.
Extended Data Fig. 3 Pan-cancer recognition performances with multiple visual prompts.
The proposed PRET almost surpasses the baselines in all datasets and prompts at the AUC metric. The gray error bars indicate standard deviation, and the gray dots are the specific results for n = 5 independent experiments. The p value indicates the significance of PRET outperforming the best baseline with two-sided Wilcoxon test, reporting the median value across all repeat experiments and varied shots.
Extended Data Fig. 4 Visualization and case study for cancer screening and subtyping.
The classification tasks do not require accurate pixel-level localization as the segmentation task. Instead, the score maps are supposed to be distinguishable among classes. The score maps between positive and negative examples are much more distinguishable for the proposed PRET compared with other methods. So that PRET meets fewer false positives. These cases include ESCC for cancer screening and NSCLC for cancer subtyping, involving multiple visual prompt types, where L indicates slide label and B, R means bounding box and rough mask, respectively.
Extended Data Fig. 5 Experiment results about foundation models, hyperparameters, and segmentation.
a. PRET is born for foundation models with much larger performance gaps compared with the non-foundation model (ResNet-5032 pretrained from ImageNet). There are 1-4 shot results. The error bars show the standard deviation across n = 5 independent experiments, and the gray dots are the specific results for different data splits. b. Hyperparameter gird search on NSCLC validation set with slide labels at 8-shot. The result fluctuates within 2.5%, showcasing the model’s robustness. Among these hyperparameters, the uncertain range factor te in Supplementary Algorithm 1 for the local visual in-context tagger (LVIT) and discriminative instance miner (DIM) is relatively influential since they control the quantity of visual in-context. {v1, v2, v3, v4, v5} are {0, 0.02, 0.04, 0.06, 0.08} for the uncertain range factor of the LVIT in the examples, and {0.1, 0.15, 0.2, 0.25, 0.3} for that of the DIM to the test slides, {20, 30, 40, 50, 60} for top k of the informative in-context classifier (IIC). Besides, they are set to {0.86, 0.87, 0.88, 0.89, 0.9}, {1000, 2000, 3000, 4000, 5000}, {1, 5, 10, 20, 30} for the related threshold tr, high score instances number n, and softmax temperature τ in the attention aggregator (AA). c. The results on different data splits and hyperparameter groups. Thanks to the robustness, default hyperparameters without grid search and those from the RCC dataset also perform well, showing close performances compared with the searched hyperparameters group. Different data splits reflect the varied example quality, which is less robust than hyperparameters, thus we report the average results for better stability. d. The proposed PRET almost surpasses the baselines in all datasets and prompts at the DICE metric. The gray error bars indicate standard deviation across n = 5 independent experiments, and the gray dots are the specific results for different data splits. The p value indicates the significance of PRET outperforming the best baseline with two-sided Wilcoxon test, reporting the median value across all repeat experiments and varied shots.
Extended Data Fig. 6 Visualization and comparison of tumor segmentation task based on weak visual prompts.
Qualitative illustrations comparing PRET to MI-Prototype and MI-SimpleShot on PTC and BC datasets using 8 examples. The tumor region is colored yellow. Note that the blue area in the first “BC-L” row are stains on the slide from a blue marker pen.
Extended Data Fig. 7 Visualization of in-context tagger.
Our method successfully tags the instances of examples that are close to manually labeled regions via weak visual prompts. This in-context tagger module is used for examples to label visual in-context instead of weakly segmentation for the test slides. Different visual prompts support varied conditions and provide more choices for pathologists. The bounding box and rough mask show fewer responses out of the manual mask with better score maps, while the slide label produces more positive regions. The used dataset is NSCLC (subtyping for LUAD and LUSC) on 8-shot settings.
Extended Data Fig. 8 PRET is scalable and achieves comparable performance to many-shot methods with much less data.
a. PRET is scalable when applying more examples. We increased the example size (+ 16 and 32) and witnessed a continuous growth in performance. These results indicate that PRET is scalable and maintains its advantages over baseline methods (p < 0.001). The involved datasets are RCC, NSCLC, and Lymphoma under slide label to support more examples. b. PRET achieves comparable performance to many-shot fine-tuned methods using much less data (for example 8 vs. 128 or full). The many-shot methods are mainstream fine-tuned multiple instance learning methods, including ABMIL18, TransMIL37, CLAM-SB10, and CLAM-MB10. PRET applies only 8 examples per class (8-shot) without training, while many-shot methods are trained with much more data, including 128-shot and full data scale. The prompt type corresponds to Fig. 2b, and the calculation of p value follows the above principle. PRET archives higher results than many-shot methods at the same data scale for all three benchmarks. Moreover, our performance is higher than many-shot methods in the full data scale on the CAMELYON1628 lymph node metastasis detection task (p < 0.001), while achieving comparable results for the rest of the benchmarks (p < 0.001). a, b. The gray error bars indicate standard deviation across n = 5 independent experiments, and the gray dots are the specific results. The p value indicates the significance of PRET outperforming the best baseline with two-sided Wilcoxon test, reporting the median value across all repeat experiments and varied shots.
Supplementary information
Supplementary Information (download PDF )
Supplementary Algorithms, Baselines and Tables 1–10.
Source data
Source Data Fig. 1 (download XLSX )
Statistical source data.
Source Data Fig. 2 (download XLSX )
Statistical source data.
Source Data Fig. 3 (download XLSX )
Statistical source data.
Source Data Fig. 4 (download XLSX )
Statistical source data.
Source Data Fig. 5 (download XLSX )
Statistical source data.
Source Data Extended Data Fig. 2 (download XLSX )
Statistical source data.
Source Data Extended Data Fig. 3 (download XLSX )
Statistical source data.
Source Data Extended Data Fig. 5 (download XLSX )
Statistical source data.
Source Data Extended Data Fig. 8 (download XLSX )
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Li, Y., Ning, Z., Xiang, T. et al. PRET is a few-shot system for pan-cancer recognition without example training. Nat Cancer (2026). https://doi.org/10.1038/s43018-026-01141-2
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s43018-026-01141-2