Background & summary

The World Health Organization (WHO) expressed deep concern over the significant burden of oral diseases, including dental caries, which affects 3.5 billion people worldwide1. The prevalence of dental caries continues to increase, especially in Low and Middle-Income Countries (LMICs), particularly among children and adolescents2,3. This may be due to limited access to dental services, lack of awareness and negligence regarding oral hygiene measures.

Conventional diagnostic methods, such as visual examination and X-ray imaging, have long served as the cornerstone of caries detection in dentistry. However, these approaches are not always reproducible, as they rely on the clinician’s experience and judgment4. Although diagnostic studies have shown that trained dentists can generally achieve good intra- and inter-examiner reliability, there are repeated situations in daily practice where contradictory diagnoses are observed5. Factors such as lighting conditions, limited visibility, and variations in clinical settings can contribute to these inconsistencies6. As a result, there is an increasing demand for innovative solutions to enhance diagnostic accuracy and improve reproducibility. To address this challenge, the integration of AI in dental caries detection has shown significant advancements, leveraging Deep Learning (DL) and Machine Learning (ML) techniques7,8.

The success of AI models in accurately detecting dental caries depends heavily on the quality and quantity of the datasets9. A critical requirement is the availability of large-scale clinical and radiographic datasets. While radiographic datasets are widely accessible and well-studied, there is a significant gap in publicly available datasets of annotated intraoral images6,10,11,12. Studies such as Yoon et al. have utilized a dataset of over 24,000 intraoral images, each annotated with bounding boxes to mark carious regions13. A notable limitation is that this dataset is not open source, restricting its accessibility to the researchers. In contrast, another study reported a publicly available dataset of 718 intraoral images annotated for carious teeth highlighting the importance of well-annotated datasets14. However, the relatively small size of this dataset presents a limitation, as AI models typically require larger datasets to achieve robust and accurate performance. To address these concerns, this work aims to provide a comprehensive dataset of annotated intraoral images which has been made openly available on Zenodo15.

Methods

Ethical approval

The collection and curation of this dataset was conducted in accordance with the ethical guidelines set by the Ethical Review Committee (ERC) of The Aga Khan University Hospital (AKUH). The images in this dataset were collected during a previously conducted study (ERC#2024-9433-29648)16. Written informed consent and assent was obtained from all participants in both English and local languages (Urdu and Sindhi) for the use and open sharing of their anonymized data. Assent was obtained from the participants under the age of 18 along with consent from their parents or legal guardians prior to data collection. To ensure privacy and confidentiality, all patient metadata was anonymized using a custom python code.

Data inclusion criteria

Adolescents (10–18 years) and young adults (19–24 years), both males and females.

Data exclusion criteria

  1. 1.

    Images with suboptimal diagnostic quality, such as blurred images and those obscured by saliva or blood.

  2. 2.

    Images with developmental dental anomalies, such as cleft lip/palate.

  3. 3.

    Patients with limited mouth opening.

  4. 4.

    Patient who did not give consent.

Data collection

Study setting

The data collection process was carried out in the government schools of Mithi, Sindh, Pakistan.

Mobile application for data collection

A mobile application was developed as part of the authors’ previous study16. Firebase cloud storage was integrated into the application, enabling a streamlined data pipeline for efficient image transfer from the data collection sites to AKUH. This pipeline begins with the images being captured and uploaded from the application to cloud storage. Subsequently, images were imported into the secure servers for annotations and model development at AKUH.

Training of the data collectors

Dentists underwent a comprehensive hands-on training session at AKUH prior to the data collection process. This included a detailed demonstration of the mobile application workflow to ensure its usability was adequately explained. The data collectors were also given hands-on training for capturing intraoral images with and without cheek retractors to ensure consistency.

The data collectors were instructed to obtain informed consent and assent followed by performing ultrasonic scaling of each participant. Images were to be captured using a Samsung Galaxy A23 (Android version 14.0) mobile device, from various angles of maxilla and mandible, including occlusal, frontal, and lateral views. Before capturing the images, the data collectors were instructed to clear the teeth, gingiva, buccal and lingual vestibules and floor of the mouth of visible saliva.

Pilot activity

To validate the robustness of the data collection process, a pilot activity was first conducted. This preliminary step aimed to test the feasibility of the process, identifying potential challenges, and ensuring consistency in capturing high-quality images. In the pilot phase a sample size of 101 participants was chosen to provide a representative subset of the target population resulting in 505 intraoral images in total taken only without a cheek retractor. The insights gained from this pilot phase were instrumental in refining the methodology before transitioning to the full data collection process.

Annotations

The intraoral images retained in the cloud storage were transferred to a HP Desktop Pro (G3 Intel(R) Core(TM) i5-9400, built-in Intel(R) UHD Graphics 630) and visually analyzed using a HP P19b G4 WXGA second-grade monitor display (1366 × 768)17. Two dentists (A.K and A.N), each with more than two years of experience, carried out the annotation process at AKUH, using the open-source tool LabelMe V5.4.1 (Massachusetts Institute of Technology - MIT, Cambridge, United States) (https://github.com/wkentaro/labelme). Before commencing the annotation process, these annotators received calibration training by a specialist dentist (N. A). The training focused on accurately identifying and labeling all carious teeth visible in the intraoral images received during pilot activity. The annotators used bounding boxes to define the height, width, and coordinates of the corners of each box surrounding a tooth with visible decay. Disagreements between the two annotators were resolved by a third dentist (N.A) ensuring the accuracy and consistency of annotations. Figure 1 illustrates the dataset curation pipeline, including data collection, filtering, and annotation.

Fig. 1
figure 1

The data production process, including data collection, data filtering and annotations. W/R: With Retractor, W/O-R: Without Retractor.

Data records

The dataset15 consists of intraoral images in JPG format, annotated using LabelMe software. Each annotation file includes detailed information about the spatial positioning of teeth and cavities in the images. This data includes the coordinates of bounding boxes marking regions of interest and the location of decay, with ‘d’ and ‘D’ denoting primary and permanent tooth decay, respectively. The annotation files are saved in.json format, adhering to the standardized structure for LabelMe. The JSON annotation files were also converted into You Only Look Once (YOLO) (.txt), Pattern Analysis, Statistical Modeling, and Computational Learning Visual Object Classes (PASCAL VOC) (.xml), and Common Objects in Context (COCO) (.json) formats using Python, thereby making them available in various formats for future studies.

Technical validation

To ensure the reliability and quality of the dataset, a thorough validation process was conducted. First, to maintain consistency in image capture, the mobile application incorporates a built-in grid system, ensuring a uniform aspect ratio and standardized framing across all views. Images were taken with patients seated on a dental chair, and lighting conditions were standardized using the dental unit’s overhead light to ensure  adequate illumination and minimize variability in image quality.

Second, all image and annotation files were reviewed for missing or corrupted data. Any files flagged during this process were rechecked and either corrected or removed. All patient-identifying information was thoroughly de-identified using a custom Python code. Annotation files were also verified for accuracy across all supported formats (YOLO, PASCAL VOC, and COCO), and bounding box coordinates and labels were checked to ensure compliance with the specifications of each format.

Third, to assess the consistency of annotations, inter-rater reliability was evaluated between the two annotators (A.K and A.N). Cohen’s Kappa coefficient, calculated on 10% of the total dataset (i.e, 631 out of 6,313 intraoral images), was found to be 0.89, indicating a high level of agreement. The specific images used for this assessment were included in the dataset. This rigorous validation ensures that the dataset is suitable for training and benchmarking DL models for dental caries detection.

Data description

Root folder

The dataset is available on the Zenodo repository and is organized into three main folders: images, annotations and benchmark as shown in Fig. 215. The ‘images’ folder has three subfolders: ‘pilot activity images’, images taken with a ‘cheek retractor (W/R)’ and another for images taken ‘without a cheek retractor (W/O-R)’. The W/R and W/O-R subfolders are further divided into folders based on different views (maxillary occlusal, mandibular occlusal, left lateral, right lateral and frontal), and labeled accordingly. The annotations folder is divided into four subfolders for different annotation formats: LabelMe, YOLO, PASCAL VOC, and COCO. Each format folder, except for COCO, contains subfolders labeled as ‘Pilot’, ‘WR’ and ‘W/O-R’. This is because, in all formats except COCO, each image has a corresponding label, whereas in the COCO format, an entire folder of images is assigned a single label. Additionally, these subfolders are further organized by views.

Fig. 2
figure 2

The folder structure of the dataset available on Zenodo. W/R: With Retractor, W/O-R: Without Retractor, YOLO: You Only Look Once, PASCAL VOC: Pattern Analysis, Statistical Modeling, and Computational Learning Visual Object Classes, COCO: Common Objects in Context, JSON: JavaScript Object Notion.

Benchmark folder

The benchmark folder as shown in Fig. 3 is organized into three subfolders: train, test, and valid. Each subfolder is further divided into the following categories: Images, YOLO, PASCAL VOC, LabelMe, and COCO. These categories contain the images and their corresponding annotation files, ensuring compatibility with various annotation formats. This clear and structured organization makes the dataset easy to navigate and use for research and training AI models.

Fig. 3
figure 3

The folder structure of the benchmark dataset. W/R: With Retractor, W/O-R: Without Retractor, YOLO: You Only Look Once, PASCAL VOC: Pattern Analysis, Statistical Modeling, and Computational Learning Visual Object Classes, COCO: Common Objects in Context, JSON: JavaScript Object Notion.

Image views

A total of ten images were captured for each patient, five with cheek retractors and five without them as shown in Fig. 4. The first set of ‘images’ folder, labeled as ‘W/R’ in the dataset indicates the use of cheek retractors. While images in the second folder, labeled as ‘W/O-R’ in the dataset were captured using a tongue depressor only. Each set includes five images per patient in the following views:

  1. 1.

    An occlusal view of the maxillary arch (first image).

  2. 2.

    An occlusal view of the mandibular arch (second image), taken after posterior retraction of the tongue.

  3. 3.

    Buccal view of the left lateral teeth (third image).

  4. 4.

    Buccal view of the right lateral teeth (fourth image).

  5. 5.

    A frontal view image with the dentition in habitual occlusion (fifth image).

Fig. 4
figure 4

Various views of intraoral images, both with and without the use of cheek retractors.

Data results

Five models, YOLOv5s, YOLOv8s, YOLOv11, SSD-MobileNet-v2-FPNLite-320, and Faster R-CNN, were trained and evaluated on the curated dataset. The dataset comprised of 6,313 images which was split into 5,050 for training, 631 for validation, and 631 for testing. Default data augmentation techniques were applied to enhance generalizability, and all models were trained using default parameters to ensure a fair comparison. The specifications of the trained models are presented in Table 1.

Table 1 Performance metrics and training configurations of different AI models used for dental caries detection, detailing the model names, number of epochs, batch sizes, GPUs utilized, evaluation metrics, and optimization algorithms.

Based on the results of the descriptive analysis, YOLOv8s exhibited superior performance across all models, achieving the highest values in key evaluation metrics such as precision, recall and mean Average Precision (mAP). This establishes it as the most suitable model for tasks demanding high accuracy and recall. YOLOv5s offers a practical alternative in scenarios where shorter training times are prioritized. While YOLOv11 is robust, it required more training epochs to approach the performance level of YOLOv8s.

Limitations

A major limitation of this dataset is the absence of annotations for dental fluorosis, despite its high prevalence in the region of data collection. This omission was due to the primary focus of the study on dental caries and the inherent difficulty in distinguishing fluorosis from non-cavitated carious lesions using only intraoral images. The visual similarity between the two conditions, particularly in 2D images without supporting clinical context, may lead to mislabeling or misinterpretation. Future datasets should incorporate separate annotations for fluorosis, ideally supported by clinical examination to enable accurate differentiation. Another limitation of this dataset is that it comprises images captured exclusively using a single type of mobile phone camera, which limits the generalizability of the image quality. Furthermore, the dataset excludes individuals under the age of 10, limiting its applicability for detecting caries in children, particularly in primary dentition. Future work should focus on examining caries in primary dentition to address this gap.