Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A Smartphone-based Comprehensive Dataset of Annotated Oral Cavity Images for Enhanced Oral Disease Diagnosis
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 13 March 2026

A Smartphone-based Comprehensive Dataset of Annotated Oral Cavity Images for Enhanced Oral Disease Diagnosis

  • P. D. Madan Kumar1,
  • K. Ranganathan1,
  • C. Lavanya1,
  • S. Rajeshwari1,
  • Anwesh Nayak2,
  • Ramesh Kestur2,
  • Raghuram Bharadwaj Diddigi2 &
  • …
  • Sushree S. Behera2 

Scientific Data , Article number:  (2026) Cite this article

  • 1250 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Abstract

This study introduces a SMARTphone-based, expert annotated dataset of Oral Mucosa images (SMART-OM), collected to facilitate the development of Artificial Intelligence and Machine Learning (AI/ML) technologies for automated diagnosis of Oral Cancer (OC) and Oral Potentially Malignant Disorders (OPMD). The dataset consists of 2,469 images from 331 subjects from four distinct classes: healthy/normal, variations from normal, OPMD, and OC. The images are captured using Android and iOS smartphone cameras under real-world clinical conditions in visible light. Each image is annotated by expert dental surgeons using the open-source VGG image annotator. Elaborate patient metadata, including clinical diagnosis, age, sex, and lifestyle-based risk indicators such as smoking, smokeless tobacco usage, alcohol consumption, and areca nut chewing, are recorded via a customized Jotform. The data collection and handling procedures are adhered to the ethical guidelines outlined in the Declaration of Helsinki and its amendments for research involving human subjects, with informed consent obtained from each subject. The SMART-OM dataset is intended to advance research and development of AI/ML algorithms for automated oral lesion detection.

Similar content being viewed by others

An annotated clinical image dataset for AI classification of malignant and potentially malignant oral lesions

Article Open access 12 December 2025

In-vivo non-contact multispectral oral disease image dataset with segmentation

Article Open access 28 November 2024

High-resolution AI image dataset for diagnosing oral submucous fibrosis and squamous cell carcinoma

Article Open access 27 September 2024

Data availability

The SMART-OM dataset has been deposited in the Figshare repository and can be accessed here15 https://doi.org/10.6084/m9.figshare.31341790.

Code availability

The GitHub repository containing the codes for technical analysis, model training and inference, as well as hyperparameter tuning, can be accessed at https://github.com/Anwesh2000/SMART_OM_Dataset_Technical_Validation32.

References

  1. Rai, P. et al. Oral Cancer in Asia-a systematic review. Advances in Oral and Maxillofacial Surgery 8, 100366 (2022).

    Google Scholar 

  2. Sankaranarayanan, R., Ramadas, K., Amarasinghe, H., Subramanian, S. & Johnson, N., Oral cancer: prevention, early detection, and treatment. Cancer: disease control priorities. 3rd ed. Washington, DC: The International Bank for Reconstruction and Development/The World Bank, 3, 85-99 (2015).

  3. Jain, A. K. Oral cancer screening: insights into epidemiology, risk factors, and screening programs for improved early detection. Cancer Screening and Prevention 3(2), 97–105 (2024).

    Google Scholar 

  4. Mira, E. S. et al. Early diagnosis of oral cancer using image processing and Artificial intelligence. Fusion: Practice & Applications, 14(1) (2024).

  5. Chaudhary, N. et al. High-resolution AI image dataset for diagnosing oral submucous fibrosis and squamous cell carcinoma. Scientific Data 11(1), 1050 (2024).

    Google Scholar 

  6. Talwar, V. et al. AI-assisted screening of oral potentially malignant disorders using smartphone-based photographic images. Cancers 15(16), 4120 (2023).

    Google Scholar 

  7. Di Fede, O., Panzarella, V., Buttacavoli, F., La Mantia, G. & Campisi, G. Doctoral: A smartphone-based decision support tool for the early detection of oral potentially malignant disorders. Digital Health 9, 20552076231177141 (2023).

    Google Scholar 

  8. Dixit, S., Kumar, A. & Srinivasan, K. A current review of machine learning and deep learning models in oral cancer diagnosis: recent technologies, open challenges, and future research directions. Diagnostics 13(7), 1353 (2023).

    Google Scholar 

  9. Song, B. et al. Bayesian deep learning for reliable oral cancer image classification. Biomedical Optics Express 12(10), 6422–6430 (2021).

    Google Scholar 

  10. Fu, Q. et al. A deep learning algorithm for detection of oral cavity squamous cell carcinoma from photographic images: A retrospective study. EClinicalMedicine, 27 (2020).

  11. Sengupta, N., Sarode, S. C., Sarode, G. S. & Ghone, U. Scarcity of publicly available oral cancer image datasets for machine learning research. Oral Oncology 126, 105737 (2022).

    Google Scholar 

  12. Barot, S. Oral cancer (lips and tongue) images. https://www.kaggle.com/datasets/shivam17299/oral-cancer-lips-and-tongue-images (2020).

  13. Piyarathne, N. S. et al. A comprehensive dataset of annotated oral cavity images for diagnosis of oral cancer and oral potentially malignant disorders. Oral Oncology 156, 106946 (2024).

    Google Scholar 

  14. Dutta, A. & Zisserman, A. The VIA annotation software for images, audio and video. In Proceedings of the 27th ACM international conference on multimedia 2276-2279 (2019).

  15. P D, Madan Kumar et al. SMART-OM: A SMARTphone based expert annotated dataset of Oral Mucosa images. figshare. Dataset, https://doi.org/10.6084/m9.figshare.31341790.v1 (2026).

  16. Radford, A. et al. Learning transferable visual models from natural language supervision. In International conference on machine learning. 8748-8763. PmLR (2021).

  17. Wiggins, W. F. & Tejani, A. S. On the opportunities and risks of foundation models for natural language processing in radiology. Radiology: Artificial Intelligence 4(4), e220119 (2022).

    Google Scholar 

  18. Kirillov, A. et al. Segment anything. In Proceedings of the IEEE/CVF international conference on computer vision 4015-4026 (2023).

  19. Azad, B. et al. Foundational models in medical imaging: A comprehensive survey and future vision. arXiv preprint arXiv:2310.18689 (2023).

  20. Ito FA et al. Standardization in Oral Photography. In Clinical Decision-Making in Oral Medicine: A Concise Guide to Diagnosis and Treatment pp. 11–16. Cham: Springer International Publishing (2023).

  21. Lin, I., Datta, M., Laronde, D. M., Rosin, M. P. & Chan, B. Intraoral photography recommendations for remote risk assessment and monitoring of oral mucosal lesions. international dental journal 71(5), 384–389 (2021).

    Google Scholar 

  22. Rajendran, S. et al. Image collection and annotation platforms to establish a multi‐source database of oral lesions. Oral Diseases 29(5), 2230–2238 (2023).

    Google Scholar 

  23. Casaglia, A., De Dominicis, P., Arcuri, L., Gargari, M. & Ottria, L. Dental photography today. Part 1: basic concepts. ORAL & implantology 8(4), 122 (2016).

    Google Scholar 

  24. Momin, S. et al. Comparison of image quality, color accuracy, and resolution in intraoral photography using digital single lens reflex camera and smartphone cameras: A pilot study. Journal of Dental Sciences. (2025).

  25. Piemonte, E. D., Gilligan, G. M., Costa, M. F. G. & Lazos, J. P. How to improve photographs with smartphones for oral telemedicine. Exploration of Digital Health Technologies 2(5), 249–258 (2024).

    Google Scholar 

  26. Shahrul, A. I., Shukor, N. & Norman, N. H. Technique for orthodontic clinical photographs using a smartphone. International Journal of Dentistry 2022(1), 2811684 (2022).

    Google Scholar 

  27. Ferreira, C. D. A. P., Pereira, F. D. A. V., Rodrigues, M. V. B., Cabral, J. L. D. O. A. & de Campos Tuña, I. T. Art and science of dental photography: suggested photographic protocol with cellular device. Observatório. De La Economía Latinoamericana 22(8), e6167–e6167 (2024).

    Google Scholar 

  28. Dosovitskiy, A. et al. An image is worth 16x16 words: Transformers for image recognition at scale. Preprint at arXiv:2010.11929 (2020).

  29. He, K., Zhang, X., Ren, S. & Sun, J., Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 770–778 (2016).

  30. Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. Preprint at arXiv:1409.1556 (2014).

  31. Tan, M. & Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In International conference on machine learning, 6105-6114. PMLR (2019).

  32. SMART-OM Dataset Technical Validation, https://github.com/Anwesh2000/SMART_OM_Dataset_Technical_Validation Accessed on 08-10-2025.

Download references

Acknowledgements

The authors acknowledge the use of AI-assisted tools to aid in rephrasing sections of the manuscript. This study is part of an Indian Council of Medical Research (ICMR), India project (Project ID IIRP-2023-1049) funded by Small Extramural Grants – 2023.

Author information

Authors and Affiliations

  1. Ragas Dental College and Hospital Chennai, Uthandi, 600119, India

    P. D. Madan Kumar, K. Ranganathan, C. Lavanya & S. Rajeshwari

  2. International Institute of Information Technology Bangalore, Bangalore, 560100, India

    Anwesh Nayak, Ramesh Kestur, Raghuram Bharadwaj Diddigi & Sushree S. Behera

Authors
  1. P. D. Madan Kumar
    View author publications

    Search author on:PubMed Google Scholar

  2. K. Ranganathan
    View author publications

    Search author on:PubMed Google Scholar

  3. C. Lavanya
    View author publications

    Search author on:PubMed Google Scholar

  4. S. Rajeshwari
    View author publications

    Search author on:PubMed Google Scholar

  5. Anwesh Nayak
    View author publications

    Search author on:PubMed Google Scholar

  6. Ramesh Kestur
    View author publications

    Search author on:PubMed Google Scholar

  7. Raghuram Bharadwaj Diddigi
    View author publications

    Search author on:PubMed Google Scholar

  8. Sushree S. Behera
    View author publications

    Search author on:PubMed Google Scholar

Contributions

P.D.M.K., K.R., C.L., and S.R. contributed to data acquisition, data interpretation, image annotation, drafting, expert validation, and revision of the manuscript. A.N. performed technical validation and was responsible for model development, training, and evaluation of the deep learning models. R.K., R.B.D., and S.S.B. contributed to the conceptualization of the technical framework and study design, provided overall mentoring, and were involved in drafting and critical revision of the manuscript. All authors reviewed and approved the final manuscript.

Corresponding author

Correspondence to Sushree S. Behera.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Madan Kumar, P.D., Ranganathan, K., Lavanya, C. et al. A Smartphone-based Comprehensive Dataset of Annotated Oral Cavity Images for Enhanced Oral Disease Diagnosis. Sci Data (2026). https://doi.org/10.1038/s41597-026-06954-5

Download citation

  • Received: 10 October 2025

  • Accepted: 23 February 2026

  • Published: 13 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-06954-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing