A multimodal dental dataset facilitating machine learning research and clinic services

Huang, Yunyou; Liu, Wenjing; Yao, Caiqin; Miao, Xiuxia; Guan, Xianglong; Lu, Xiangjiang; Liang, Xiaoshuang; Ma, Li; Tang, Suqin; Zhang, Zhifei; Zhan, Jianfeng

doi:10.1038/s41597-024-04130-1

Download PDF

Data Descriptor
Open access
Published: 27 November 2024

A multimodal dental dataset facilitating machine learning research and clinic services

Yunyou Huang^1,2,3,
Wenjing Liu^1,2,4,
Caiqin Yao⁵,
Xiuxia Miao^1,2,
Xianglong Guan^1,2,
Xiangjiang Lu^1,2,
Xiaoshuang Liang^1,2,
Li Ma⁴,
Suqin Tang¹,
Zhifei Zhang⁶ &
…
Jianfeng Zhan^3,7,8

Scientific Data volume 11, Article number: 1291 (2024) Cite this article

7596 Accesses
6 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Oral diseases affect nearly 3.5 billion people, and medical resources are limited, which makes access to oral health services nontrivial. Imaging-based machine learning technology is one of the most promising technologies to improve oral medical services and reduce patient costs. The development of machine learning technology requires publicly accessible datasets. However, previous public dental datasets have several limitations: a small volume of computed tomography (CT) images, a lack of multimodal data, and a lack of complexity and diversity of data. These issues are detrimental to the development of the field of dentistry. Thus, to solve these problems, this paper introduces a new dental dataset that contains 169 patients, three commonly used dental image modalities, and images of various health conditions of the oral cavity. The proposed dataset has good potential to facilitate research on oral medical services, such as reconstructing the 3D structure of assisting clinicians in diagnosis and treatment, image translation, and image segmentation.

3D reconstruction from 2D multi-view dental 2D images based on EfficientNetB0 model

Article Open access 06 August 2025

Characteristics, licensing, and ethical considerations of openly accessible oral-maxillofacial imaging datasets: a systematic review

Article Open access 05 July 2025

MMDental - A multimodal dataset of tooth CBCT images with expert medical records

Article Open access 09 July 2025

Background & Summary

According to the Global Burden of Disease Study in 2019 (https://ghdx.healthdata.org/gbd-results-tool), oral diseases affect nearly 3.5 billion people, posing a large health burden for society. The World Health Organization (WHO) report also states that oral care is expensive and usually outside the components of universal health coverage. People, particularly in low- and middle-income countries, cannot afford such services (https://www.who.int/team/noncommunicable-diseases/global-status-report-on-oral-health-2022). As there are vast numbers of sufferers and a shortage of medical resources, accurate, inexpensive, and accessible methods for diagnosing and treating disease are highly important for three purposes: (1) to improve the dental health care service; (2) to reduce patient costs; (3) to cover more patients, particularly those in remote areas. However, traditional dental diagnosis and treatment methods (e.g., dentists taking X-rays of patients and then obtaining diagnosis and treatment strategies manually)¹ have difficulty meeting the requirements of high efficiency, low cost, and accessibility. As one of the most promising technologies for improving medical services and reducing health burdens, imaging-based machine learning technology has been widely introduced into dentistry for assisting clinicians in diagnosis and treatment, image translation, and image segmentation, etc^{2,3,4,5,6,7,8,9,10}.

Based on the advantages of radiographs in bone imaging, as shown in Fig. 1, three radiographs have become the most common method to assist dentists in obtaining a diagnosis and developing treatment strategies. First, CT is a 3D image that provides high-resolution anatomical information on the patient’s case¹¹ and is most commonly used for implant planning because it provides accurate information on the height and width of the jaw and the position of important structures^12,13. However, the price of CT machines produced in European and American countries is at least 74,350 dollars, and the radiation is 58.9 − 1025.4μSv¹⁴. Second, extraoral radiographs (e.g., panoramic radiographs (PaX-ray)) show both the mandible and the surrounding oral and maxillofacial structures, including the temporomandibular joints¹², and the radiation is 5.5 − 22.0μSv¹⁴. However, because it is a two-dimensional image, it cannot provide accurate measurement information of the jaw. The inherent magnification and overlapping of teeth in the technique varies depending on the machine¹². Third, intraoral X-rays such as periapical radiographs (PeX-ray) provide information on the entire tooth from the crown to the root and are often used to rule out lesions at the apex of the tooth, which may occur when the tooth has become nonvital¹², and the price of intraoral X-rays is much lower than the above two radiographs, and the radiation is < 5μSv¹⁴. However, a PeX-ray can only provide bone information within 2-3 mm around the apex of the tooth and is limited to teeth in one arch¹².

The multimodal dental dataset is expected to facilitate advancements in fields such as assisting doctors in diagnosis, image translation, and image segmentation. CBCT images, panoramic radiographs, and periapical radiographs are the three most commonly used types of clinical dental imaging. Researchers can leverage this image data to train models that assist doctors in diagnosis and treatment, including the diagnosis of oral diseases^15,16, automatic measurements in orthodontics^17,18, and preoperative planning for dental implants etc^19,20. The dataset provides a CSV file indicating whether CBCT files contain dental implants, which can be used for implant detection tasks in dentistry. However, it lacks specific disease annotations. Researchers can annotate the dataset according to their tasks, thereby facilitating model training and evaluation.

One of the common tasks in image translation is to convert images from one domain to another. According to the aforementioned report by the WHO, three-quarters of the world’s population threatened by oral diseases reside in low- and middle-income regions. Compared to CBCT, panoramic and periapical radiographs are much cheaper and expose patients to much less radiation. If it were possible to reconstruct CT scans from panoramic or a small number of periapical radiographs, it would significantly reduce patients’ radiation exposure and costs, which is particularly important for low- and middle-income populations. Research on reconstructing CT scans from panoramic images already exists²¹, and the dataset containing data from different modalities can further advance this task.

Similarly, the multimodal oral dataset supports segmentation tasks for teeth and alveolar bone^22,23. While segmentation predominantly occurs on oral CT datasets, publicly available CBCT datasets are scarce. Our dataset consists of 329 CBCT data from 169 patients, addressing this issue and empowering researchers to effectively explore segmentation tasks.

Despite the high potential of imaging-based machine learning in contributing to dentistry research and clinical usage, oral image datasets are limited to machine learning research. We surveyed all studies mentioned in three recent overviews of dentistry, which involve 74 works, and only 2 studies were based on publicly available oral datasets^7,24,25. These two datasets are the Tufts Dental Database (Panetta et al., 2022)²⁶ and the Virtual Skeleton Database (Kistler et al., 2013)²⁷. The privateness of datasets prevents third parties from objectively evaluating and exploring a study, which is detrimental to the development of the field of dentistry. Conversely, to our knowledge, there are five publicly available datasets, as shown in Table 1. However, current publicly available oral datasets also have several limitations.

First, the limited number of cases in available datasets, especially CT data, poses a challenge to the development of data-driven deep learning. As shown in Table 1, only one CT dataset is available among the majority of public datasets, which mainly consist of PaX-ray. In addition, among the 74 studies we surveyed, only 13 (17.6%) used CT data from private datasets. Furthermore, among those studies that included more than 100 patients, only 5 (6.7%) made use of CT data^7,24,25.
Second, the absence of paired data for different modalities in these datasets makes it impossible to compare techniques across modalities. Furthermore, these datasets do not support the development of multi-scenario applications, which refer to applications that must be used in various scenarios due to different medical or other conditions, each requiring data from different modalities. As Table 1 shows, no publicly available datasets currently contain data for all the modalities mentioned above.
Third, current public datasets lack diversity and complexity and are often biased towards overhealth and overdisease²⁸, which renders them unable to accurately represent the real clinical setting. In addition, models trained on datasets with these flaws suffer from data drift, resulting in a good performance during training but poor performance during a real deployment. Ultimately, techniques developed based on these datasets are difficult to implement in actual clinical settings.

Table 1 Publicly available oral datasets.

Full size table

Based on the above considerations, we present a publicly accessible multimodal dental dataset²⁹ that is useful for machine learning research and clinic services. First, the dataset includes 329 CBCT images. All CBCT image data were collected from 169 patients using Smart3D-X (Beijing Langshi Instrument Co., Ltd., Beijing, China) (Fig. 3a). A total of 67 patients had more than one CBCT image taken at different times. Second, this dataset has the three most common modalities of data: CBCT images, panoramic radiographs, and periapical radiographs. The periapical radiograph is generated from the CBCT using cxr-ct (https://github.com/KendallPark/cxr-ct)(Fig. 3a). In this dataset²⁹, 188 CBCT images have paired periapical radiographs. All panoramic radiographs have paired CBCT images and periapical radiographs. Finally, to keep the characteristics of the real clinical setting (such as variety), the dataset contains various types of patients (e.g., the entire upper jaw has no teeth, all dentures, irregular teeth, and implanted teeth), as shown in Fig. 2, encouraging other researchers in the field to use it to develop and test their methods of assisting clinicians in diagnosis and treatment, image translation, and image segmentation, etc.

Methods

Ethics statement

This research has received approval from the Ethics Committee of Guilin Medical University (Approval No: GLMC20230502). The approved content encompasses the collection of imaging data, reconstruction of patients’ oral three-dimensional models, and sharing of imaging data. Within this dataset, all personally identifiable information, except for the patient’s gender and age, has been either removed or regenerated to align with U.S. HIPAA regulations. Moreover, the dataset is exclusively restricted for legitimate scientific research purposes. Additionally, informed consent has been obtained from the patients.

Patient characteristics

Considering the potential hazards of obtaining radiological images, this study did not design a prospective experiment to recruit volunteers for unnecessary radiological imaging examinations to obtain data but used existing patient data. We collected data from all adult patients who visited dental hospitals from 2021 to 2022. After excluding data with quality issues, attempts were made to obtain informed consent from the users, ultimately obtaining informed consent from 169 patients, as shown in Table 2. Eight patients simultaneously had data for three different modalities.

Table 2 Gender and age distribution in Multi-modal dental dataset.

Full size table

Data collection

The dataset²⁹ contains 329 volumetric oral cavity CBCT scans, encompassing data from 169 patients, along with 8 panoramic radiographs, each corresponding to a different patient. Additionally, there are 16,203 periapical radiographs available, with three different angle views for each tooth, totaling 5,401 teeth, corresponding to 188 CBCT files.

CBCT is a variation of traditional CT that uses a cone-shaped X-ray beam to capture the data of the oral cavity and creates a 3D representation inside the oral cavity^30,31. Compared to traditional CT, CBCT has many advantages, such as low cost, easy accessibility, and low radiation exposure, and it has been widely used in the field of dentistry³². All CBCT images in the dataset are from a CBCT machine that uses a two-dimensional flat panel detector to collect object cone beam ray projection data and a large diameter cone X-ray beam for scanning and performs 180°–360° synchronous rotation of the patient’s head on the plane for the acquisition of volumetric image data of the entire scanned area³³ (Fig. 3a). All images are reconstructed using the Filted Back-Projection (FBP) reconstruction method, and the T-MAR artifact correction function is used to automatically identify high-density substances in the mouth and remove artifacts by deep learning (Fig. 3b). Among the 329 medical records we collected, the output size of 327 images is set to 640 × 640, these images’ slice thickness is 0.25 mm, and the pixel spacing is 0.25 mm × 0.25mm. The output size of 2 images is set to 550 × 550, the slice thickness of these images is 0.15 mm, and the pixel spacing is 0.15 mm × 0.15 mm. All images are saved in the Digital Imaging and Communications in Medicine (DICOM) format³⁴.

A panoramic radiograph also uses a cone-shaped X-ray beam to capture the data of the oral cavity and creates a single flat 2D image of the curved structure of the entire mouth (Fig. 3a). Compared to traditional CBCT, the panoramic radiograph only generates approximately 1/40 radiation but lacks spatial structure information. All panoramic radiograph images in the dataset are obtained from the CBCT machine using the principles of narrow slot and circular orbital tomography principles. The machine rotates 180° around the patient for data acquisition. The output size of the images is set to 1468 × 2904, the thickness of these images is 0.075 mm, and the pixel spacing is 0.075 mm × 0.075 mm.

A periapical radiograph is typically used by the X-ray beam to capture the data of the oral cavity and creates a 2D image of the teeth. Compared to the other radiographs mentioned above, the periapical radiograph only focuses on a small part of the oral cavity (usually covering 3-4 teeth) through the built-in film or intraoral X-ray sensors, generating little radiation. In the real clinical setting, periapical radiograph images are obtained from a portable handheld X-ray generator and the built-in film (the size of the film usually contains 40 mm × 30 mm). However, it is nontrivial to collect many periapical radiograph images, particularly paired CBCT, panoramic, and periapical radiographs. First, taking radiographs multiple times can cause patients to receive unnecessary radiation doses. Second, although obtaining dental films is simple, to obtain complete oral information, ensure the complexity and diversity of data, and meet the needs of developing machine learning technology, 10-30 data collections are required. Finally, in current dental hospitals, the built-in film of a patient is usually handed over to the patient and is not stored as data in the hospital.

Considering that a CT image is obtained by a rotating X-ray source, the CT image contains all the information of a single X-ray image. Thus, many researchers focus on using CT images to generate the corresponding X-ray image and have achieved good results^35,36,37. In this study, to obtain periapical radiographs, we generated them from CBCT images using the Siddon-Jacobs ray-tracing algorithm^38,39, which is one of the methods for computing DRR(Digitally Reconstructured Radiograph). The Siddon-Jacobs ray-tracing algorithm simulates the process of X-rays passing through the human body and being attenuated by human tissue to generate radiographic images. Due to its convenience and efficiency, it is the most commonly used method for generating computed DRRs^40,41,42. Furthermore, research has shown that the images generated using this algorithm exhibit errors within an acceptable range when compared to real images⁴³. Additionally, the dataset is continuously updated, and in the future, we will integrate emerging technologies to generate periapical radiographs.

Figure 4 shows that periapical radiograph generation consists of four steps. First, a 60 mm × 50 mm × 50 mm cube is cut out from the 3D CBCT image of the patient’s tooth. The midline l_m passes through the teeth, and the line l_s of the cube is tangent to outsize the face. For a patient, 20-32 cubes are obtained. Second, to apply the Siddon-Jacobs ray-tracing algorithm on the cube, a rotation is applied to the cube to ensure that the cube’s direction is the same. The outside of the face faces the positive direction of the y-axis, and the teeth face the positive direction of the z-axis. Third, the X-ray process is simulated by propagating incident X-ray photons (from a radiation source) through a cube using the Siddon-Jacobs ray-tracing algorithm of the Insight Segmentation and Registration Toolkit (ITK) imaging package. When using this algorithm, we set the distance between the X-ray source and the cube as 1000mm, add a random value of 0-5mm, and set three angles of X-ray incidence, which are 20-25 degrees to the left and 5-10 degrees to the left, and 20-25 degrees to the right, to generate periapical radiographs with different angles. Finally, considering the size of the built-in film of adults, there are usually two sizes of images in real life. Therefore, the periapical radiograph generated above will be cut as 40 mm × 30 mm. In the dataset, there are a total of 329 CBCT files. We attempted to label each tooth in every file, however, severe tooth loss in some CBCT image files hindered accurate annotation of each tooth’s position. Additionally, tooth incompleteness issues emerged after segmentation from annotated files. These data were removed following expert quality control procedures. Nonetheless, even after removal, we still have 188 PeX file data, comprising 16,203 images of 5,401 teeth from three different angles. For machine learning, this still represents a considerable amount of data.

Privacy

To ensure the protection of patient privacy, all demographic-sensitive information of patients, except for gender and age, has been either deleted or replaced with new values. Patient names and IDs have been replaced with randomly generated new IDs. Other IDs in the files, such as StudyInstanceUID, have also been regenerated. The date of birth has been removed, and other dates (e.g., study time, etc.) have been randomly offset to fall between 2200 and 2300. However, the chronological order of timestamps for multiple visits by each patient has been retained. The dataset does not include individuals under the age of 18 or patients aged 89 and above.

Data statistics

The demographics of the patients are summarized in Table 2 and Fig. 5a. As shown in Fig. 5a, the number of patients who chose to have CBCT images taken was far greater than the number of patients who chose to have PaX-rays taken. Possible reasons for this disparity are that PaX-ray is two-dimensional and has significant limitations, including distortion, lack of spatial structure information, etc.; thus, oral surgeons prefer to take CBCT images so that the pathology can be evaluated in 3 dimensions and the pathology of the lesion can be determined¹². Second, if the patient has a plan for dental implants, CBCT is the primary choice. In CBCT images, the number of female patients exceeds that of male patients, and this trend is also observed among patients who have had multiple visits Table 3.

Table 3 Gender and age distribution in patients with multiple visits.

Full size table

The age distribution of the patients is shown in Fig. 5b using a boxplot, which indicates that the patients who underwent panoramic radiographs were younger. Table 4 presents the imaging settings for different types of images, with peak kilovoltage (kVp) and X-ray tube current affecting the radiation exposure dose and slice thickness representing the axial resolution³⁴. As depicted in Table 4, the slice thickness is mostly set to 0.25 mm, accounting for 99.7% of CBCT images. The remaining two were set to 0.15 mm. For all CBCT images, the kVp was 100. The X-ray tube current was typically between 6-8 mA. For panoramic radiographs, the slice thickness was 0.25 mm, kVp was 100, and X-ray Tube Current was set to 10 mA for patients.

Table 4 Scan settings used to acquire the Multi-modal dental dataset.

Full size table

Data Records

The multimodal dental dataset²⁹ has been released on PhysioNet for users to download. As illustrated in Fig. 6, the dataset is structured hierarchically. CBCT images and panoramic and periapical radiographs are organized into separate folders at the top-level directory. CBCT images and panoramic radiographs are saved in DICOM format, while periapical radiographs are generated by cutting dental slices from CBCT images and irradiating them from three angles. The resulting periapical radiographs are stored in TIF format under the PeX-ray folder within three subfolders. The files within these three folders are named using the patient’s ID followed by an underscore and a number to indicate the patient’s visit number. For example, ‘0006_0’ denotes data from the first visit of patient 0006. In addition to these three folders, the remaining CSV files are as follows: CBCT_Info.csv, PaX_Info.csv, PeX_Info.csv, Patient_Statistics_Info.csv, and Implant_Marking_Info.csv. CBCT_Info.csv contains patient age and gender information for each CBCT file, along with details about the CBCT file itself, such as tube current and tube voltage used for the CBCT image acquisition, as well as the dimensions of the file, etc. PaX_Info.csv is similar to CBCT_Info.csv but specifically records information related to panoramic radiographs. PeX_Info.csv provides statistics on the number of periapical radiographs from different angles for each patient. Patient_Statistics_Info.csv offers patient-level statistics, indicating whether each patient has data for these three modalities and the corresponding file names. Implant_Marking_Info.csv marks whether patients have dental implants. The specific meaning of each column in every CSV file is detailed in Table 5. All files are sorted in ascending order based on patient ID.

Table 5 The specific meaning of each column in CSV files.

Full size table

Technical Validation

To obtain high-quality standard data, quality control and calibration of the CBCT scanning device are essential for CBCT and panoramic radiographs. Therefore, an autocalibration procedure is executed daily to ensure calibrated and accurate performance of the CBCT scanner³¹. Additionally, the manufacturer of the CBCT scanner conducts an annual quality control service to maintain the high quality of the CBCT image.

For periapical radiographs, the performance of cutting CBCT to generate periapical radiographs and the Siddon-Jacobs ray tracing algorithm is key to obtaining standard high-quality data. Therefore, we organized 13 people to label the CBCT images and record the labeled data in the file. One person was selected as the person in charge; the 13 people were divided into groups of two, and the remaining one was in a separate group. Each person was required to label approximately 25 CBCT images. After labeling, the quality of the labels was checked by mutual inspection within each group. The process for checking the labels was as follows: first, we sliced the CBCT images based on these labels and then used 3D Slicer software to inspect each slice to see whether the tooth corresponding to the label was in the middle of the slice and whether the height of the section included the entire portion of the tooth. If there were quality issues with the labeled data, the person who labeled the file had to re-label it. After the checking was completed, we sent all the labeled files to the responsible person, who checked all the labeled files again to ensure the correctness of the labels and the quality of the obtained periapical radiographs. There are currently 188 labeled files that meet the requirements, and the rest will be updated in the future.

Usage Notes

Currently, there is a shortage of publicly available dental datasets, particularly lacking CBCT data. The establishment of the multimodal dental dataset²⁹ aims to provide a broader range of diverse data types to facilitate the advancement of machine learning in dental healthcare services. To access the data, researchers are required to complete the following steps:

Become a credentialed user of the PhysioNet platform.
Complete the mandatory training.
Submit a data access request and await approval.

Once the application is approved, researchers will be granted access to the data.

Code availability

The code developed in this study is publicly available from the github website (https://github.com/wenjing567/dental-cxr-ct.git).

References

Rashid, U. et al. A hybrid mask rcnn-based tool to localize dental cavities from real-time mixed photographic images. PeerJ Computer Science 8, e888 (2022).
Article PubMed PubMed Central Google Scholar
Cui, Z., Li, C. & Wang, W. Toothnet: automatic tooth instance segmentation and identification from cone beam ct images. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 6368–6377 (2019).
Lee, D.-W., Kim, S.-Y., Jeong, S.-N. & Lee, J.-H. Artificial intelligence in fractured dental implant detection and classification: evaluation using dataset from two dental hospitals. Diagnostics 11, 233 (2021).
Article PubMed PubMed Central Google Scholar
Zhang, X. et al. Development and evaluation of deep learning for screening dental caries from oral photographs. Oral diseases 28, 173–181 (2022).
Article PubMed Google Scholar
Hwang, J.-J., Jung, Y.-H., Cho, B.-H. & Heo, M.-S. An overview of deep learning in the field of dentistry. Imaging science in dentistry 49, 1–7 (2019).
Article PubMed PubMed Central Google Scholar
Nguyen, T. T., Larrivée, N., Lee, A., Bilaniuk, O. & Durand, R. Use of artificial intelligence in dentistry: current clinical trends and research advances. J Can Dent Assoc 87, 1488–2159 (2021).
Google Scholar
Khanagar, S. B. et al. Developments, application, and performance of artificial intelligence in dentistry–a systematic review. Journal of dental sciences 16, 508–522 (2021).
Article PubMed Google Scholar
Carrillo-Perez, F. et al. Applications of artificial intelligence in dentistry: A comprehensive review. Journal of Esthetic and Restorative Dentistry 34, 259–280 (2022).
Article PubMed Google Scholar
Paavilainen, P., Akram, S. U. & Kannala, J. Bridging the gap between paired and unpaired medical image translation. In Deep Generative Models, and Data Augmentation, Labelling, and Imperfections, 35–44 (Springer, 2021).
Jang, W. S. et al. Accurate detection for dental implant and peri-implant tissue by transfer learning of faster r-cnn: a diagnostic accuracy study. BMC Oral Health 22, 1–7 (2022).
Article Google Scholar
Dillenseger, J.-L., Laguitton, S. & Delabrousse, E. Fast simulation of ultrasound images from a ct volume. Computers in biology and medicine 39, 180–186 (2009).
Article PubMed Google Scholar
Koenig, L. J. Imaging of the jaws. In Seminars in Ultrasound, CT and MRI (Elsevier, 2015).
Abrahams, J. J. Dental ct imaging: a look at the jaw. Radiology 219, 334–345 (2001).
Article CAS PubMed Google Scholar
Brooks, S. L. Cbct dosimetry: orthodontic considerations. In Seminars in Orthodontics, vol. 15, 14–18 (Elsevier, 2009).
Zhang, K., Wu, J., Chen, H. & Lyu, P. An effective teeth recognition method using label tree with cascade network structure. Computerized Medical Imaging and Graphics 68, 61–70 (2018).
Article PubMed Google Scholar
Lee, J.-H., Kim, D.-H., Jeong, S.-N. & Choi, S.-H. Detection and diagnosis of dental caries using a deep learning-based convolutional neural network algorithm. Journal of dentistry 77, 106–111 (2018).
Article PubMed Google Scholar
Mamatha, J. et al. Cone beam computed tomography-dawn of a new imaging modality in orthodontics. Journal of International Oral Health: JIOH 7, 96 (2015).
CAS PubMed PubMed Central Google Scholar
Kapila, S. & Nervina, J. Cbct in orthodontics: assessment of treatment outcomes and indications for its use. Dentomaxillofacial radiology 44, 20140282 (2015).
Article CAS PubMed Google Scholar
Bornstein, M. M., Scarfe, W. C., Vaughn, V. M. & Jacobs, R. Cone beam computed tomography in implant dentistry: a systematic review focusing on guidelines, indications, and radiation dose risks. International journal of oral & maxillofacial implants 29 (2014).
Harris, D. et al. Eao guidelines for the use of diagnostic imaging in implant dentistry 2011. a consensus workshop organized by the european association for osseointegration at the medical university of warsaw. Clinical oral implants research 23, 1243–1253 (2012).
Article PubMed Google Scholar
Song, W., Liang, Y., Yang, J., Wang, K. & He, L. Oral-3d: reconstructing the 3d structure of oral cavity from panoramic x-ray. In Proceedings of the AAAI conference on artificial intelligence, vol. 35, 566–573 (2021).
Cui, Z. et al. A fully automatic ai system for tooth and alveolar bone segmentation from cone-beam ct images. Nature communications 13, 2096 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Polizzi, A. et al. Tooth automatic segmentation from cbct images: a systematic review. Clinical Oral Investigations 27, 3363–3378 (2023).
Article PubMed Google Scholar
Corbella, S., Srinivas, S. & Cabitza, F. Applications of deep learning in dentistry. Oral Surgery, Oral Medicine, Oral Pathology and Oral Radiology 132, 225–238 (2021).
Article PubMed Google Scholar
Ren, R., Luo, H., Su, C., Yao, Y. & Liao, W. Machine learning in dental, oral and craniofacial imaging: a review of recent progress. PeerJ 9, e11451 (2021).
Article PubMed PubMed Central Google Scholar
Panetta, K., Rajendran, R., Ramesh, A., Rao, S. P. & Agaian, S. Tufts dental database: a multimodal panoramic x-ray dataset for benchmarking diagnostic systems. IEEE journal of biomedical and health informatics 26, 1650–1659 (2021).
Article Google Scholar
Kistler, M. et al. The virtual skeleton database: an open access repository for biomedical research and collaboration. Journal of medical Internet research 15, e2930 (2013).
Article Google Scholar
Panetta, K., Rajendran, R., Ramesh, A., Rao, S. P. & Agaian, S. Tufts dental database: A multimodal panoramic x-ray dataset for benchmarking diagnostic systems. IEEE Journal of Biomedical and Health Informatics (2021).
Liu, W., Huang, Y. & Tang, S. A multimodal dental dataset facilitating machine learning research and clinic services. physionet https://doi.org/10.13026/h1tt-fc69 (2024).
Alamri, H. M., Sadrameli, M., Alshalhoob, M. A. & Alshehri, M. Applications of cbct in dental practice: a review of the literature. General dentistry 60, 390–400 (2012).
PubMed Google Scholar
Afshar, P. et al. Covid-ct-md, covid-19 computed tomography scan dataset applicable in machine learning and deep learning. Scientific Data 8, 1–8 (2021).
Article CAS Google Scholar
De Vos, W., Casselman, J. & Swennen, G. Cone-beam computerized tomography (cbct) imaging of the oral and maxillofacial region: a systematic review of the literature. International journal of oral and maxillofacial surgery 38, 609–625 (2009).
Article PubMed Google Scholar
Scarfe, W. C., Farman, A. G. & Sukovic, P. Clinical applications of cone-beam computed tomography in dental practice. Journal 72, 75–80 (2006).
Google Scholar
Afshar, P. et al. Covid-ct-md, covid-19 computed tomography scan dataset applicable in machine learning and deep learning. Scientific Data 8, 121 (2021).
Article CAS PubMed PubMed Central Google Scholar
Moturu, A. & Chang, A. Creation of synthetic x-rays to train a neural network to detect lung cancer. Journal Beyond Sciences Initiative, University of Toronto, in Toronto (2018).
Teixeira, B. et al. Generating synthetic x-ray images of a person from the surface geometry. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 9059–9067 (2018).
Henzler, P., Rasche, V., Ropinski, T. & Ritschel, T. Single-image tomography: 3d volumes from 2d cranial x-rays. In Computer Graphics Forum, vol. 37, 377–388 (Wiley Online Library, 2018).
Jacobs, F., Sundermann, E., De Sutter, B., Christiaens, M. & Lemahieu, I. A fast algorithm to calculate the exact radiological path through a pixel or voxel space. Journal of computing and information technology 6, 89–94 (1998).
Google Scholar
Siddon, R. L. Fast calculation of the exact radiological path for a three-dimensional ct array. Medical physics 12, 252–255 (1985).
Article ADS CAS PubMed Google Scholar
Liu, S. et al. 2d/3d multimode medical image registration based on normalized cross-correlation. Applied Sciences 12, 2828 (2022).
Article CAS Google Scholar
Quan, T. M. et al. Xpgan: X-ray projected generative adversarial network for improving covid-19 image classification. In 2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI), 1509–1513 (IEEE, 2021).
Akbari-Shandiz, M. et al. Mri vs ct-based 2d-3d auto-registration accuracy for quantifying shoulder motion using biplane video-radiography. Journal of biomechanics 82, 375–380 (2019).
Article PubMed Google Scholar
Bollet, M. A. et al. Can digitally reconstructed radiographs (drrs) replace simulation films in prostate cancer conformal radiotherapy? International Journal of Radiation Oncology* Biology* Physics 57, 1122–1130 (2003).
Article PubMed Google Scholar
Abdi, A. H., Kasaei, S. & Mehdizadeh, M. Automatic segmentation of mandible in panoramic x-ray. Journal of Medical Imaging 2, 044003 (2015).
Article PubMed PubMed Central Google Scholar
Wang, C.-W. et al. A benchmark for comparison of dental radiography analysis algorithms. Medical image analysis 31, 63–76 (2016).
Article ADS CAS PubMed Google Scholar

Download references

Acknowledgements

This study was supported by the Project of Guangxi Science and Technology, China (No. GuiKeAD20297004) and the National Natural Science Foundation of China (Grant No. U21A20474, Grant No.61967002).

Author information

Authors and Affiliations

Key Lab of Education Blockchain and Intelligent Technology, Ministry of Education, Guangxi Normal University, Guilin, 541004, China
Yunyou Huang, Wenjing Liu, Xiuxia Miao, Xianglong Guan, Xiangjiang Lu, Xiaoshuang Liang & Suqin Tang
Guangxi Key Lab of Multi-Source Information Mining and Security, Guangxi Normal University, Guilin, 541004, China
Yunyou Huang, Wenjing Liu, Xiuxia Miao, Xianglong Guan, Xiangjiang Lu & Xiaoshuang Liang
The International Open Benchmark Council, 19801, Delaware, USA
Yunyou Huang & Jianfeng Zhan
Guilin Medical University, Guilin, 541199, China
Wenjing Liu & Li Ma
The Second Nanning People’s Hospital, Nanning, 530031, China
Caiqin Yao
Department of Physiology and Pathophysiology, Capital Medical University, Beijing, 100069, China
Zhifei Zhang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100086, China
Jianfeng Zhan
University of Chinese Academy of Sciences, Beijing, 100086, China
Jianfeng Zhan

Authors

Yunyou Huang
View author publications
Search author on:PubMed Google Scholar
Wenjing Liu
View author publications
Search author on:PubMed Google Scholar
Caiqin Yao
View author publications
Search author on:PubMed Google Scholar
Xiuxia Miao
View author publications
Search author on:PubMed Google Scholar
Xianglong Guan
View author publications
Search author on:PubMed Google Scholar
Xiangjiang Lu
View author publications
Search author on:PubMed Google Scholar
Xiaoshuang Liang
View author publications
Search author on:PubMed Google Scholar
Li Ma
View author publications
Search author on:PubMed Google Scholar
Suqin Tang
View author publications
Search author on:PubMed Google Scholar
Zhifei Zhang
View author publications
Search author on:PubMed Google Scholar
Jianfeng Zhan
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.H. conceptualized this study, collected and analyzed the data, and wrote the manuscript. W.L. conceptualized this study and revised the manuscript. C.Y. conceptualized this study and analyzed the data. W.L., X.M., X.G., X.L., X.L., and L.M. labeled and processed the data. L.M., S.T., Z.Z., and J.Z. directed the project and revised the manuscript. All authors have read and approved the final manuscript.

Corresponding authors

Correspondence to Li Ma, Suqin Tang, Zhifei Zhang or Jianfeng Zhan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Huang, Y., Liu, W., Yao, C. et al. A multimodal dental dataset facilitating machine learning research and clinic services. Sci Data 11, 1291 (2024). https://doi.org/10.1038/s41597-024-04130-1

Download citation

Received: 13 October 2023
Accepted: 13 November 2024
Published: 27 November 2024
DOI: https://doi.org/10.1038/s41597-024-04130-1

This article is cited by

Localisation and classification of multi-stage caries on CBCT images with a 3D convolutional neural network
- Shuai Qi
- Yujie Fu
- Qi Zhang
Clinical Oral Investigations (2025)