Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Leveraging large language and vision models for knowledge extraction from large-scale image–text colonoscopy records

Abstract

The development of artificial intelligence systems for colonoscopy analysis often necessitates expert-annotated image datasets. However, limitations in dataset size and diversity impede model performance and generalization. Image–text colonoscopy records from routine clinical practice, comprising millions of images and text reports, serve as a valuable data source, although annotating them is labour intensive. Here we leverage recent advancements in large language and vision models and propose EndoKED, a data mining paradigm for deep knowledge extraction and distillation. EndoKED automates the transformation of raw colonoscopy records into image datasets with pixel-level annotation. We apply EndoKED to multicentre datasets of raw colonoscopy records (~1 million images), showing its superior performance in detecting polyps at the report and image levels, as well as annotating polyps at the pixel level. The state-of-the-art performance and generalization ability of polyp segmentation models are achieved through EndoKED pretraining. Furthermore, the EndoKED vision backbone enables data-efficient learning for optical biopsy, achieving expert-level performance in internal, external and prospective validation datasets.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Overview of the EndoKED design and applications to polyp diagnosis.
The alternative text for this image may have been generated using AI.
Fig. 2: Network design for cross-scale knowledge distillation.
The alternative text for this image may have been generated using AI.
Fig. 3: Datasets for training and validating the EndoKED framework.
The alternative text for this image may have been generated using AI.
Fig. 4: Performance of the EndoKED framework in report abstraction, image-level detection, pixel-level segmentation and model pretraining.
The alternative text for this image may have been generated using AI.
Fig. 5: Accuracy and data efficiency of the EndoKED-PATH optical biopsy model.
The alternative text for this image may have been generated using AI.
Fig. 6: Visualization of the distribution of polyp images in the feature space using Uniform Manifold Approximation and Projection (UMAP).
The alternative text for this image may have been generated using AI.

Similar content being viewed by others

Data availability

The main data supporting the findings of this study are available within the article and Supplementary Information. The public datasets used for segmentation can be accessed through the following web links: CVC-ClinicDB (https://polyp.grand-challenge.org/CVCClinicDB/), Kvasir-SEG (https://datasets.simula.no/kvasir-seg/), ETIS (https://polyp.grand-challenge.org/ETISLarib/), CVC-ColonDB (http://vi.cvc.uab.es/colon-qa/cvccolondb/), CVC-300 (https://pages.cvc.uab.es/CVC-Colon/) and PolypGen (https://www.synapse.org/#!Synapse:syn26376615). The large-scale clinical pretraining datasets are not publicly available owing to patient privacy constraints. Access to these restricted colonoscopy reports for non-commercial use may be granted upon reasonable request to corresponding author S.W. All requests will be reviewed and are subject to a formal data sharing agreement, consent from each participating medical centre and ethics approval. Source data are provided with this paper.

Code availability

Code51 and trained models of this study are publicly available via GitHub at https://github.com/shuowang26/EndoKED.

References

  1. Arnold, M. et al. Global patterns and trends in colorectal cancer incidence and mortality. Gut 66, 683–691 (2017).

    Article  PubMed  Google Scholar 

  2. Bray, F. et al. Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin. 68, 394–424 (2018).

    PubMed  Google Scholar 

  3. Atkin, W. et al. Long term effects of once-only flexible sigmoidoscopy screening after 17 years of follow-up: the UK Flexible Sigmoidoscopy Screening randomised controlled trial. Lancet 389, 1299–1311 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  4. Le Berre, C. et al. Application of artificial intelligence to gastroenterology and hepatology. Gastroenterology 158, 76–94.e2 (2020).

    Article  PubMed  Google Scholar 

  5. Ali, S. Where do we stand in AI for endoscopic image analysis? Deciphering gaps and future directions. NPJ Digit. Med. 5, 184 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  6. Messmann, H. et al. Expected value of artificial intelligence in gastrointestinal endoscopy: European Society of Gastrointestinal Endoscopy (ESGE) Position Statement. Endoscopy 54, 1211–1231 (2022).

    Article  PubMed  Google Scholar 

  7. Ali, S. et al. A multi-centre polyp detection and segmentation dataset for generalisability assessment. Sci. Data 10, 75 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  8. Ali, S. et al. Assessing generalisability of deep learning-based polyp detection and segmentation methods through a computer vision challenge. Sci. Rep. 14, 2032 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Ali, S. et al. Deep learning for detection and segmentation of artefact and disease instances in gastrointestinal endoscopy. Med. Image Anal. 70, 102002 (2021).

    Article  PubMed  Google Scholar 

  10. Ahmad, O. F. et al. Artificial intelligence and computer-aided diagnosis in colonoscopy: current evidence and future directions. Lancet Gastroenterol. Hepatol. 4, 71–80 (2019).

    Article  PubMed  Google Scholar 

  11. Gupta, M. & Mishra, A. A systematic review of deep learning based image segmentation to detect polyp. Artif. Intell. Rev. 57, 7 (2024).

    Article  Google Scholar 

  12. Lieberman, D. et al. Standardized colonoscopy reporting and data system: report of the Quality Assurance Task Group of the National Colorectal Cancer Roundtable. Gastrointest. Endosc. 65, 757–766 (2007).

    Article  PubMed  Google Scholar 

  13. Bernal, J. et al. Comparative validation of polyp detection methods in video colonoscopy: results from the MICCAI 2015 Endoscopic Vision Challenge. IEEE Trans. Med. Imaging 36, 1231–1249 (2017).

    Article  PubMed  Google Scholar 

  14. Hassan, C., Balsamo, G., Lorenzetti, R., Zullo, A. & Antonelli, G. Artificial intelligence allows leaving-in-situ colorectal polyps. Clin. Gastroenterol. Hepatol. 20, 2505–2513.e4 (2022).

    Article  PubMed  Google Scholar 

  15. Radford, A. et al. Learning transferable visual models from natural language supervision. In Proc. 38th International Conference on Machine Learning (eds Meila, M. et al.) 8748–8763 (PMLR, 2021).

  16. Tiu, E. et al. Expert-level detection of pathologies from unannotated chest X-ray images via self-supervised learning. Nat. Biomed. Eng. 6, 1399–1406 (2022).

    Article  PubMed  PubMed Central  Google Scholar 

  17. Zhang, X., Wu, C., Zhang, Y., Xie, W. & Wang, Y. Knowledge-enhanced visual-language pre-training on chest radiology images. Nat. Commun. 14, 4542 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Moor, M. et al. Foundation models for generalist medical artificial intelligence. Nature 616, 259–265 (2023).

    Article  CAS  PubMed  Google Scholar 

  19. Adams, L. C. et al. Leveraging GPT-4 for post hoc transformation of free-text radiology reports into structured reporting: a multilingual feasibility study. Radiology 307, e230725 (2023).

    Article  PubMed  Google Scholar 

  20. Kirillov, A. et al. Segment anything. In Proc. IEEE/CVF International Conference on Computer Vision (eds Agapito, L. et al.) 4015–4026 (IEEE, 2023).

  21. Wu, J. et al. Medical SAM adapter: adapting Segment Anything Model for medical image segmentation. Med. Image Anal. 102, 103547 (2025).

    Article  PubMed  Google Scholar 

  22. Brown, T. et al. Language models are few-shot learners. In Advances in Neural Information Processing Systems (eds Larochelle, H. et al.) 1877–1901 (Curran Associates, 2020).

  23. Bai, Y. et al. Constitutional AI: harmlessness from AI feedback. Preprint at https://arxiv.org/abs/2212.08073 (2022).

  24. Wang, W. et al. ERNIE 3.0: large-scale knowledge enhanced pre-training for language understanding and generation. Preprint at https://arxiv.org/abs/2107.02137 (2021).

  25. Pogorelov, K. et al. KVASIR: a multi-class image dataset for computer aided gastrointestinal disease detection. In Proc. 8th ACM on Multimedia Systems Conference (eds Chen, K. T. et al.) 164–169 (ACM, 2017).

  26. Bernal, J. et al. WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians. Comput. Med. Imaging Graph. 43, 99–111 (2015).

    Article  PubMed  Google Scholar 

  27. Bernal, J., Sánchez, J. & Vilariño, F. Towards automatic polyp detection with a polyp appearance model. Pattern Recognit. 45, 3166–3182 (2012).

    Article  Google Scholar 

  28. Vázquez, D. et al. A benchmark for endoluminal scene segmentation of colonoscopy images. J. Healthc. Eng. 2017, 4037190 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  29. Silva, J., Histace, A., Romain, O., Dray, X. & Granado, B. Toward embedded detection of polyps in WCE images for early diagnosis of colorectal cancer. Int. J. Comput. Assist. Radiol. Surg. 9, 283–293 (2014).

    Article  PubMed  Google Scholar 

  30. Falk, T. et al. U-Net: deep learning for cell counting, detection, and morphometry. Nat. Methods 16, 67–70 (2019).

    Article  CAS  PubMed  Google Scholar 

  31. Zhou, Z. et al. UNet++: a nested U-Net architecture for medical image segmentation. In Deep Learning in Medical Image Analysis and Multimodal Learning for Clinical Decision Support (eds Stoyanov, D. et al.) 3–11 (Springer, 2018).

  32. Lei, J. et al. C2FNet: a coarse-to-fine network for multi-view 3D point cloud generation. IEEE Trans. Image Process. 31, 6707–6718 (2022).

    Article  PubMed  Google Scholar 

  33. Yin, Z. Duplex contextual relation network for polyp segmentation. In Proc. IEEE International Symposium on Biomedical Imaging (eds Awate, S. P. et al.) 1–5 (IEEE, 2022).

  34. Zhang, R. et al. Lesion-aware dynamic kernel for polyp segmentation. In Proc. Medical Image Computing and Computer Assisted Intervention (eds Wang, L. et al.) 99–109 (Springer, 2022).

  35. Dong, B. et al. Polyp-PVT: polyp segmentation with pyramid vision transformers. CAAI Artif. Intell. Res. 2, 9150015 (2023).

    Article  Google Scholar 

  36. Misawa, M. et al. Development of a computer-aided detection system for colonoscopy and a publicly accessible large colonoscopy video database (with video). Gastrointest. Endosc. 93, 960–967.e3 (2021).

    Article  PubMed  Google Scholar 

  37. Ma, Y., Chen, X., Cheng, K., Li, Y. & Sun, B. LDPolypVideo benchmark: a large-scale colonoscopy video dataset of diverse polyps. In Proc. Medical Image Computing and Computer Assisted Intervention (eds de Bruijne, M. et al.) 387–396 (Springer, 2021).

  38. Wang, H. et al. EndoBoost: a plug-and-play module for false positive suppression during computer-aided polyp detection in real-world colonoscopy (with dataset). Preprint at https://arxiv.org/abs/2212.12204 (2022).

  39. Jha, D. et al. Kvasir-seg: a segmented polyp dataset. In International Conference on Multimedia Modeling (eds Ro, Y. M. et al.) 451–462 (Springer, 2020).

  40. Ren, G. et al. Towards automated polyp segmentation using weakly- and semi-supervised learning and deformable transformers. In Proc. IEEE/CVF International Conference on Computer Vision (eds Agapito, L. et al.) 4355–4364 (IEEE, 2023).

  41. Cheng, J. et al. SAM-Med2D. Preprint at https://arxiv.org/abs/2308.16184 (2023).

  42. Mazurowski, M. A. et al. Segment Anything Model for medical image analysis: an experimental study. Med. Image Anal. 89, 102918 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  43. Zhang, Y. et al. SamDSK: combining Segment Anything Model with domain-specific knowledge for semi-supervised learning in medical image segmentation. In Chinese Conference on Pattern Recognition and Computer Vision (eds Lin, Z. et al.) 343–357 (Springer, 2024).

  44. Ahmad, O. F. et al. Establishing key research questions for the implementation of artificial intelligence in colonoscopy: a modified Delphi method. Endoscopy 53, 893–901 (2021).

    Article  PubMed  Google Scholar 

  45. Chen, P.-J. et al. Accurate classification of diminutive colorectal polyps using computer-aided analysis. Gastroenterology 154, 568–575 (2018).

    Article  PubMed  Google Scholar 

  46. Byrne, M. F. et al. Real-time differentiation of adenomatous and hyperplastic diminutive colorectal polyps during analysis of unaltered videos of standard colonoscopy using a deep learning model. Gut 68, 94–100 (2019).

    Article  PubMed  Google Scholar 

  47. Genta, R. M. & Sonnenberg, A. Big data in gastroenterology research. Nat. Rev. Gastroenterol. Hepatol. 11, 386–390 (2014).

    Article  PubMed  Google Scholar 

  48. Ilse, M., Tomczak, J. & Welling, M. Attention-based deep multiple instance learning. In Proc. 35th International Conference on Machine Learning (eds Dy, J. et al.) 2127–2136 (PMLR, 2018).

  49. Li, B., Li, Y. & Eliceiri, K. W. Dual-stream multiple instance learning network for whole slide image classification with self-supervised contrastive learning. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Forsyth, D. et al) 14318–14328 (IEEE, 2021).

  50. Ru, L., Zheng, H., Zhan, Y. & Du, B. Token contrast for weakly-supervised semantic segmentation. In Proc. IEEE/CVF Conference on Computer Vision and Pattern Recognition (eds Geiger, A. et al.) 3093–3102 (IEEE, 2023).

  51. Yang, Z. & Wang, S. shuowang26/EndoKED: zendo (Version 20250724). Zenodo https://doi.org/10.5281/zenodo.16370348/ (2025).

Download references

Acknowledgements

S.W. was supported in part by the National Key Research and Development Program of China (number 2024YFF1207500), the Shanghai Municipal Education Commission Project for Promoting Research Paradigm Reform and Empowering Disciplinary Advancement through Artificial Intelligence (24RGZNC01) and the International Science and Technology Cooperation Program of the Shanghai Action Plan for Science (23410710400). Y. Zhu acknowledges support from the National Natural Science Foundation of China (82203193). Q.L. was supported by the National Natural Science Foundation of China (82170555) and the Shanghai Academic/Technology Research Leader Program (22XD1422400).

Author information

Authors and Affiliations

Authors

Contributions

S.W., Y. Zhu, Q.L., P.Z. and Y.G. conceptualized the study. S.W. and Y. Zhu led the study design, data analysis and paper drafting. Z.Y., X.L., P.F., H.W. and Y. Zhang also performed data analysis and interpretation, while Z.Y. and X.L. assisted with drafting the paper. M.W., Z.S., Q.L., P.Z. and Y.G. supervised the project and critically revised the paper.

Corresponding authors

Correspondence to Shuo Wang, Quanlin Li or Pinghong Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethics statement

The study was approved by the ethics committees of Zhongshan Hospital, Xiamen Branch of Zhongshan Hospital, Zhengzhou Central Hospital and No. 988 Hospital of Joint Logistics Support Force.

Peer review

Peer review information

Nature Biomedical Engineering thanks Sharib Ali, Thomas de Lange and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Examples of polyp segmentation on challenging samples.

Qualitative comparison of nine state-of-the-art segmentation models (columns, model names at bottom) on various colonoscopy images (top row). The second row displays the ground truth segmentations (expert annotation). The fourth row shows the baseline performance of each model (w/o EndoKED pre-training). The fifth row shows the performance of the same models after pre-training with the proposed distilled annotations (with EndoKED pre-training). The pre-trained models consistently produce more accurate and robust segmentation masks that are closer to the expert annotations, with fewer false positives and artifacts. The third row shows the output of the EndoKEG-SEG model for reference.

Supplementary information

Source data

Source Data Fig. 4 (download XLSX )

Precision-Recall (PR) curve data for Fig. 4c,d.

Source Data Fig. 5 (download XLSX )

ROC curve data for Fig. 5a and data-efficiency curve data for Fig 5b.

Source Data Fig. 6 (download XLSX )

Embedding UMAP data for Fig. 6.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, S., Zhu, Y., Yang, Z. et al. Leveraging large language and vision models for knowledge extraction from large-scale image–text colonoscopy records. Nat. Biomed. Eng (2025). https://doi.org/10.1038/s41551-025-01500-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • DOI: https://doi.org/10.1038/s41551-025-01500-x

Search

Quick links

Nature Briefing AI and Robotics

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: AI and Robotics