Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
InfoColon: A dataset for consecutive informative frames in Colonoscopy
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 26 March 2026

InfoColon: A dataset for consecutive informative frames in Colonoscopy

  • Taemin Choi1,
  • Hee Seok Moon2,
  • Seunghyun Jang3,
  • Chang Min Park  ORCID: orcid.org/0000-0003-1884-37384,5,
  • Dongheon Lee  ORCID: orcid.org/0000-0002-3121-70994,5,6 &
  • …
  • Eun Hyo Jin7,8 

Scientific Data , Article number:  (2026) Cite this article

  • 1283 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Large intestine
  • Translational research

Abstract

The presence of uninformative frames in colonoscopy videos is a major factor that reduces the accuracy and efficiency of various video analysis applications. To address this issue, research on informative frame classification has been conducted, but the lack of a publicly available dataset has made reproducibility difficult. In this study, we propose a novel dataset, InfoColon, which integrates video data collected from multiple medical institutions with major public colonoscopy datasets. All colonoscopy frames were labeled as either an informative frame or one of six types of uninformative frames. We also propose an active learning method to efficiently label large amounts of data with a small initial labeled dataset. Using the constructed InfoColon, we demonstrate the potential for its application in consecutive informative frame classification and 3D reconstruction. We expect that the proposed InfoColon will be valuable for various applications involving colonoscopy video analysis.

Similar content being viewed by others

REAL-Colon: A dataset for developing real-world AI applications in colonoscopy

Article Open access 25 May 2024

CAS-Colon: A Comprehensive Colonoscopy Anatomical Segmentation Dataset for Artificial Intelligence Development

Article Open access 07 August 2025

Density clustering-based automatic anatomical section recognition in colonoscopy video using deep learning

Article Open access 09 January 2024

Data availability

The colonoscopy videos, 7-class labels, calibration videos, and parameters for InfoColon have been uploaded and made publicly available on Synapse (https://www.synapse.org/InfoColon). Users must adhere to the data usage terms and conditions of the Synapse platform, and any research utilizing this dataset must cite the present paper.

Code availability

The code required for the data processing, model, and evaluation used in this study has been made publicly available at the following address: https://github.com/Choi-Tae-min/InfoColon.

References

  1. Ali, S. et al. An objective comparison of detection and segmentation algorithms for artefacts in clinical endoscopy. Sci Rep 10, 2748, https://doi.org/10.1038/s41598-020-59413-5 (2020).

    Google Scholar 

  2. Koh, G. E. et al. Real-World Assessment of the Efficacy of Computer-Assisted Diagnosis in Colonoscopy: A Single Institution Cohort Study in Singapore. Mayo Clin Proc Digit Health 2, 647–655, https://doi.org/10.1016/j.mcpdig.2024.10.002 (2024).

    Google Scholar 

  3. Tang, C. P. et al. Polyp detection and false-positive rates by computer-aided analysis of withdrawal-phase videos of colonoscopy of the right-sided colon segment in a randomized controlled trial comparing water exchange and air insufflation. Gastrointest Endosc 95, 1198–1206 e1196, https://doi.org/10.1016/j.gie.2021.12.020 (2022).

    Google Scholar 

  4. Münzer, B., Schoeffmann, K. & Böszörmenyi, L. Content-based processing and analysis of endoscopic images and videos: A survey. Multimedia Tools and Applications 77, 1323–1362, https://doi.org/10.1007/s11042-016-4219-z (2018).

    Google Scholar 

  5. Tavanapong, W. et al. Artificial Intelligence for Colonoscopy: Past, Present, and Future. IEEE Journal of Biomedical and Health Informatics 26, 3950–3965, https://doi.org/10.1109/Jbhi.2022.3160098 (2022).

    Google Scholar 

  6. Ali, S. Where do we stand in AI for endoscopic image analysis? Deciphering gaps and future directions. npj Digital Medicine 5, 184 (2022).

    Google Scholar 

  7. De Carvalho, T. et al. Automated colonoscopy withdrawal phase duration estimation using cecum detection and surgical tasks classification. Biomed Opt Express 14, 2629–2644, https://doi.org/10.1364/boe.485069 (2023).

    Google Scholar 

  8. Kim, B. S. et al. Density clustering-based automatic anatomical section recognition in colonoscopy video using deep learning. Scientific Reports 14, 872, https://doi.org/10.1038/s41598-023-51056-6 (2024).

    Google Scholar 

  9. Batlle, V. M., Montiel, J. M. M. & Tardós, J. D. Photometric single-view dense 3D reconstruction in endoscopy. In Proc. IEEE/RSJ Int. Conf. Intell. Robots Syst. (IROS), 4904–4910 https://doi.org/10.1109/Iros47612.2022.9981742 (2022).

  10. Li, Q., Yang, S., Shen, D. & Jin, Y. Free-dygs: Camera-pose-free scene reconstruction based on gaussian splatting for dynamic surgical videos. arXiv e-prints, arXiv: 2409.01003 (2024).

  11. Brachmann, E. et al. Scene Coordinate Reconstruction: Posing of Image Collections via Incremental Learning of a Relocalizer. Computer Vision - Eccv 2024, Pt Lvi 15114, 421–440, https://doi.org/10.1007/978-3-031-72992-8_24 (2025).

    Google Scholar 

  12. Frank, N., Posner, E., Muhlethaler, E., Zholkover, A. & Bouhnik, M. ColNav: Real-Time Colon Navigation for Colonoscopy. Cancer Prevention through Early Detection, Caption 2023 14295, 119–131, https://doi.org/10.1007/978-3-031-45350-2_10 (2023).

    Google Scholar 

  13. Pore, A. et al. Colonoscopy Navigation using End-to-End Deep Visuomotor Control: A User Study. 2022 Ieee/Rsj International Conference on Intelligent Robots and Systems (Iros), 9582–9588 https://doi.org/10.1109/Iros47612.2022.9981480 (2022).

  14. Wang, K. L. et al. EndoGSLAM: Real-Time Dense Reconstruction and Tracking in Endoscopic Surgeries Using Gaussian Splatting. Medical Image Computing and Computer Assisted Intervention - Miccai 2024, Pt Vi 15006, 219–229, https://doi.org/10.1007/978-3-031-72089-5_21 (2024).

    Google Scholar 

  15. Yao, H. M., Stidham, R. W., Soroushmehr, R., Gryak, J. & Najarian, K. Automated Detection of Non-Informative Frames for Colonoscopy Through a Combination of Deep Learning and Feature Extraction. 2019 41st Annual International Conference of the Ieee Engineering in Medicine and Biology Society (Embc), 2402–2406 https://doi.org/10.1109/embc.2019.8856625 (2019).

  16. Bashar, M. K., Kitasaka, T., Suenaga, Y., Mekada, Y. & Mori, K. Automatic detection of informative frames from wireless capsule endoscopy images. Medical Image Analysis 14, 449–470, https://doi.org/10.1016/j.media.2009.12.001 (2010).

    Google Scholar 

  17. Akbari, M. et al. Classification of Informative Frames in Colonoscopy Videos Using Convolutional Neural Networks with Binarized Weights. 2018 40th Annual International Conference of the Ieee Engineering in Medicine and Biology Society (Embc), 65–68 (2018).

  18. Oh, J. et al. Informative frame classification for endoscopy video. Medical Image Analysis 11, 110–127, https://doi.org/10.1016/j.media.2006.10.003 (2007).

    Google Scholar 

  19. Rungseekajee, N., Lohvithee, M. & Nilkhamhang, I. Informative frame classification method for real-time analysis of colonoscopy video. Ecti-Con: 2009 6th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, Vols 1 and 2, 1042–1045 (2009).

  20. van der Putten, J. et al. Informative Frame Classification of Endoscopic Videos Using Convolutional Neural Networks and Hidden Markov Models. 2019 Ieee International Conference on Image Processing (Icip), 380–384 https://doi.org/10.1109/icip.2019.8802947 (2019).

  21. An, Y. H. et al. Informative-frame filtering in endoscopy videos. Medical Imaging 2005: Image Processing, Pt 1-3 5747, 291–302, https://doi.org/10.1117/12.595622 (2005).

    Google Scholar 

  22. Azagra, P. et al. Endomapper dataset of complete calibrated endoscopy procedures. Scientific Data 10, 671, https://doi.org/10.1038/s41597-023-02564-7 (2023).

    Google Scholar 

  23. Borgli, H. et al. a comprehensive multi-class image and video dataset for gastrointestinal endoscopy. Scientific Data 7, 283, https://doi.org/10.1038/s41597-020-00622-y (2020).

    Google Scholar 

  24. Pogorelov, K. et al. NERTHUS: A Bowel Preparation Quality Video Dataset. Proceedings of the 8th Acm Multimedia Systems Conference (Mmsys'17), 170–174. https://doi.org/10.1145/3083187.3083216 (2017).

  25. Ma, Y. T., Chen, X. J., Cheng, K., Li, Y. & Sun, B. LDPolypVideo Benchmark: A Large-Scale Colonoscopy Video Dataset of Diverse Polyps. Medical Image Computing and Computer Assisted Intervention - Miccai 2021, Pt V 12905, 387–396, https://doi.org/10.1007/978-3-030-87240-3_37 (2021).

    Google Scholar 

  26. Jang, S. et al. SeamXSim: Seamless-textured virtual colonoscopy simulator via unpaired long-term video translation. Computers in Biology and Medicine 198, 111217 (2025).

    Google Scholar 

  27. Zhang, Z. Y. A flexible new technique for camera calibration. Ieee Transactions on Pattern Analysis and Machine Intelligence 22, 1330–1334, https://doi.org/10.1109/34.888718 (2000).

    Google Scholar 

  28. Lai, E. J., Calderwood, A. H., Doros, G., Fix, O. K. & Jacobson, B. C. The Boston bowel preparation scale: a valid and reliable instrument for colonoscopy-oriented research. Gastrointestinal Endoscopy 69, 620–625, https://doi.org/10.1016/j.gie.2008.05.057 (2009).

    Google Scholar 

  29. Lee, D.-H. Pseudo-label: The simple and efficient semi-supervised learning method for deep neural networks. In Proc. ICML Workshop on Challenges in Representation Learning, 896 (2013).

  30. Gal, Y., Islam, R. & Ghahramani, Z. Deep Bayesian active learning with image data. In Proc. Int. Conf. Mach. Learn. (ICML), PMLR 70, 1183–1192 (2017).

    Google Scholar 

  31. Choi, T. et al. InfoColon: A Real-World Dataset for Informative Frame Classification in Colonoscopy. Synapse. https://doi.org/10.7303/syn55251782 (2025).

    Google Scholar 

  32. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020).

  33. Golhar, M. V., Bobrow, T. L., Ngamruengphong, S. & Durr, N. J. GAN Inversion for Data Augmentation to Improve Colonoscopy Lesion Classification. Ieee Journal of Biomedical and Health Informatics 29, 3864–3873, https://doi.org/10.1109/Jbhi.2024.3397611 (2025).

    Google Scholar 

  34. Nie, C., Xu, C., Li, Z. P., Chu, L. L. & Hu, Y. Specular Reflections Detection and Removal for Endoscopic Images Based on Brightness Classification. Sensors 23, 974, https://doi.org/10.3390/s23020974 (2023).

    Google Scholar 

  35. Sharma, V., Bhuyan, M. K. & Das, P. K. Can adversarial networks make uninformative colonoscopy video frames clinically informative? (student abstract). In Proc. AAAI Conf. Artif. Intell. 37, 16322–16323 (2023).

    Google Scholar 

  36. Rau, A., Bhattarai, B., Agapito, L. & Stoyanov, D. Task-Guided Domain Gap Reduction for Monocular Depth Prediction in Endoscopy. Data Engineering in Medical Imaging, Demi 2023 14314, 111–122, https://doi.org/10.1007/978-3-031-44992-5_11 (2023).

    Google Scholar 

  37. Struski, L. et al. MeVGAN: GAN-based plugin model for video generation with applications in colonoscopy. PLoS One 20, e0312038, https://doi.org/10.1371/journal.pone.0312038 (2025).

    Google Scholar 

  38. Sengupta, A. & Bartoli, A. Colonoscopic 3D reconstruction by tubular non-rigid structure-from-motion. International Journal of Computer Assisted Radiology and Surgery 16, 1237–1241, https://doi.org/10.1007/s11548-021-02409-x (2021).

    Google Scholar 

  39. Bonilla, S. et al. Gaussian Pancakes: Geometrically-Regularized 3D Gaussian Splatting for Realistic Endoscopic Reconstruction. Medical Image Computing and Computer Assisted Intervention - Miccai 2024, Pt Vi 15006, 274–283, https://doi.org/10.1007/978-3-031-72089-5_26 (2024).

    Google Scholar 

  40. Shi, Y., Lu, B., Liu, J.-W., Li, M. & Shou, M. Z. Colonnerf: Neural radiance fields for highfidelity long-sequence colonoscopy reconstruction. arXiv preprint arXiv:2312.02015 (2023).

  41. Chavarrias-Solano, P. E., Bhattarai, B. & Ali, S. Improving NeRF representation with no pose prior for novel view synthesis in colonoscopy. In Proc. MICCAI Workshop on Data Engineering in Medical Imaging, 144–154 (2023).

Download references

Acknowledgements

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1I1A3047535), and was supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government(MSIT) [NO.RS-2021-II211343, Artificial Intelligence Graduate School Program (Seoul National University)].

Author information

Authors and Affiliations

  1. Interdisciplinary Program in Medical Informatics, Seoul National University Graduate School, Seoul, Republic of Korea

    Taemin Choi

  2. Division of Gastroenterology, Department of Internal Medicine, Chungnam National University College of Medicine, Daejeon, Korea

    Hee Seok Moon

  3. Interdisciplinary Program in Bioengineering, Seoul National University Graduate School, Seoul, Republic of Korea

    Seunghyun Jang

  4. Department of Radiology, Seoul National University College of Medicine, Seoul National University Hospital, Seoul, Republic of Korea

    Chang Min Park & Dongheon Lee

  5. Institute of Medical and Biological Engineering, Seoul National University Medical Research Center, Seoul, Republic of Korea

    Chang Min Park & Dongheon Lee

  6. Interdisciplinary Program in Artificial Intelligence, Seoul National University, Seoul, Republic of Korea

    Dongheon Lee

  7. Department of Internal Medicine, Seoul National University College of Medicine, Seoul, Republic of Korea

    Eun Hyo Jin

  8. Department of Internal Medicine, Healthcare Research Institute, Seoul National University Hospital Healthcare System Gangnam Center, Seoul, Republic of Korea

    Eun Hyo Jin

Authors
  1. Taemin Choi
    View author publications

    Search author on:PubMed Google Scholar

  2. Hee Seok Moon
    View author publications

    Search author on:PubMed Google Scholar

  3. Seunghyun Jang
    View author publications

    Search author on:PubMed Google Scholar

  4. Chang Min Park
    View author publications

    Search author on:PubMed Google Scholar

  5. Dongheon Lee
    View author publications

    Search author on:PubMed Google Scholar

  6. Eun Hyo Jin
    View author publications

    Search author on:PubMed Google Scholar

Contributions

Taemin Choi was responsible for dataset construction, experiment design, and analysis. Hee Seok Moon and Eun Hyo Jin conducted clinical video acquisition and data labeling validation. Seunghyun Jan and Chang Min Park assisted with data set construction. Dongheon Lee was responsible for research planning, overall project supervision, and guiding the methodology development. All authors were involved in the manuscript preparation and approved the final manuscript.

Corresponding authors

Correspondence to Dongheon Lee or Eun Hyo Jin.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Note (download DOCX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Choi, T., Moon, H.S., Jang, S. et al. InfoColon: A dataset for consecutive informative frames in Colonoscopy. Sci Data (2026). https://doi.org/10.1038/s41597-026-07060-2

Download citation

  • Received: 26 September 2025

  • Accepted: 06 March 2026

  • Published: 26 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-07060-2

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research