Abstract
This study aims to develop and validate a deep learning model based on Cone Beam Computed Tomography (CBCT) radiomic features to achieve early detection of potential carotid atherosclerosis in periodontitis patients. The study utilised data from 279 observations, each with 206 features, to distinguish between periodontitis patients with and without concomitant carotid atherosclerosis. To address class imbalance, Synthetic Minority Over-sampling Technique(SMOTE) oversampling was applied (dup_size = 1), increasing the sample size to 390 observations. A bootstrap method (n_bootstrap = 1000) was employed for feature selection. In each iteration, a dataset was created by resampling with replacement. Features were first filtered using Spearman’s rank correlation to remove redundant variables (correlation coefficient > 0.8), followed by Lasso regression with ten-fold cross-validation to select predictive variables based on non-zero coefficients. High-frequency features identified through 1000 iterations underwent a second round of bootstrap analysis, where Logistic Regression combined with the Akaike Information Criterion (AIC) was used to determine the final variable set. This rigorous process ensured optimal feature selection for developing an effective early detection model for carotid atherosclerosis in periodontitis patients. The study analyzed data from 279 observations, with each observation characterized by 206 features, to differentiate between periodontitis patients with concurrent carotid atherosclerosis and those without. After SMOTE oversampling, the dataset was increased to 390 observations. As stated in the Methods, SMOTE was applied after baseline analysis to augment the dataset for model development. Feature selection through bootstrap methods identified 26 high-frequency features (> 500 times), which were further refined to a final set of 20 features using Logistic Regression combined with AIC. Three machine learning models—Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF)—were developed and evaluated using five-fold cross-validation. The best-performing model was the RF model, achieving an Area Under the Curve(AUC) of 0.892, sensitivity of 0.957, specificity of 0.710, and accuracy of 0.859. Receiver Operating Characteristic(ROC) curves and calibration plots demonstrated good predictive performance and model calibration across all three models. Decision curve analysis showed that the RF model provided the highest net benefit across a range of risk thresholds, indicating its potential for clinical utility in early detection of carotid atherosclerosis in periodontitis patients. This study developed a random forest model using CBCT radiomics to detect carotid atherosclerosis in periodontitis patients early. After rigorous feature selection and five-fold cross-validation, it achieved an AUC of 0.892, with sensitivity of 0.957 and specificity of 0.710. The model shows high predictive performance and clinical utility, offering an effective tool for early detection.
Introduction
In the dynamic field of cardiovascular disease (CVD) management, early detection remains pivotal for reducing morbidity and mortality1,2,3. Among various manifestations of atherosclerosis, carotid artery stenosis is a critical risk factor for ischemic stroke, posing significant challenges to global healthcare systems4,5. Emerging evidence highlights a crucial link between periodontitis—an inflammatory condition affecting tooth-supporting structures and an increased risk of cardiovascular events, including carotid atherosclerosis6,7,8,9,10,11,12. Moreover, in mice with atherosclerosis, periodontal inflammation was more severe, and the unique changes in lipid profiles observed in these mice were closely related to the aforementioned findings13. Despite these compelling associations, there is still a substantial unmet need for effective early detection methods tailored to this high-risk population.
Cone Beam Computed Tomography (CBCT), traditionally used in dental imaging, is increasingly recognized for its potential to provide detailed anatomical insights relevant to oral health14,15. The rapidly advancing field of radiomics-characterized by the extraction of extensive quantitative features from medical images-promises unprecedented precision in disease characterization, potentially revolutionizing diagnostic accuracy and patient outcomes. However, the application of CBCT radiomic features in predictive models for detecting carotid atherosclerosis among periodontitis patients remains largely unexplored.
This study aims to pioneer the development and validation of a deep learning model leveraging CBCT radiomic features to detect early signs of carotid atherosclerosis in patients with periodontitis. Our methodology integrates state-of-the-art machine learning algorithms, including Random Forest, Support Vector Machines, and Logistic Regression, with sophisticated feature selection techniques such as SMOTE oversampling, Spearman’s rank correlation, Lasso regression, and Akaike Information Criterion (AIC). This rigorous approach not only enhances the identification of individuals at heightened risk but also deepens our understanding of the complex interplay between periodontal and systemic vascular health.
Materials and methods
Datasets
This retrospective study was approved by the Ethics Committee of Nanjing Drum Tower Hospital (approval number: 2023-461-02), and the requirement for written informed consent was waived. In this study, a total of 279 patients were included for analysis, comprising 168 cases of carotid artery stenosis with periodontitis and 111 cases of periodontitis alone. To address the issue of data imbalance, we applied the Synthetic Minority Over-sampling Technique (SMOTE) with a duplication size parameter set to 1 (dup_size = 1). After resampling, the number of periodontitis-only cases increased to 222, resulting in a total sample size of 390 cases. It is important to note that all baseline characteristic analyses (Table 1) were performed on the original, non-augmented dataset (n = 279) to accurately reflect the real-world patient population. The SMOTE procedure was applied exclusively to the feature dataset for the purpose of training and validating the machine learning models, after the baseline comparison was completed. The study was conducted in accordance with the principles set forth in the Declaration of Helsinki.
CBCT-Based diagnosis of periodontitis
The diagnosis of periodontitis was established radiographically using CBCT scans, following the consensus criteria of the 2018 Classification of Periodontal Diseases16. The evaluation was independently performed by two blinded examiners (with inter-examiner reliability reported as Kappa = 0.75).
Periodontitis was defined by the presence of interdental bone loss (IBL) affecting at least two non-adjacent teeth. Bone loss was quantified by measuring the distance from the Cementoenamel Junction (CEJ) to the Alveolar Bone Crest (ABC) on multiplanar reconstructed images. A site was considered positive for periodontitis-associated bone loss if the CEJ-ABC distance was ≥ 3 mm at its deepest point, in the absence of other clear causes (e.g., periapical pathology). Subjects meeting this criterion in at least two quadrants were included in the periodontitis group. This binary classification (presence/absence of periodontitis) formed the basis for the subsequent association analysis with carotid stenosis.
Inclusion and exclusion criteria
Inclusion criteria
Patients were included if they met all of the following criteria:
-
a.
Diagnosed with carotid artery stenosis by carotid duplex ultrasound or computed tomography angiography (CTA);
-
b.
Aged between 18 and 85 years (inclusive);
-
c.
Underwent a contemporaneous jawbone cone-beam computed tomography (CBCT) scan.
Exclusion criteria
Patients were excluded based on any of the following criteria:
-
a.
Presence of severe artifacts or poor-quality jawbone CBCT images that would compromise accurate segmentation of the region of interest (ROI);
-
b.
History of jawbone tumors, cysts, osteomyelitis, osteoradionecrosis, or major jaw surgery;
-
c.
Acute orofacial infection or surgery in the relevant area within the 3 months prior to the study;
-
d.
Unavailability of key clinical data or data required for assessing plaque stability;
-
e.
Diagnosis of a systemic disease known to severely affect bone metabolism (e.g., hyperparathyroidism).
Grouping method
The diagnosis of periodontitis was confirmed by oral medicine specialists using CBCT, while carotid artery stenosis was diagnosed by vascular surgeons based on carotid ultrasound or carotid artery angiography. Patients with both periodontitis and carotid atherosclerosis were grouped together, and those with periodontitis alone formed a separate group.
CBCT image acquisition
For CBCT imaging, we used the NewTom VGI Tomograph (NewTom, Verona, Italy), which references the Frankfurt plane and supports 360-degree rotation. The system automatically adjusts the X-ray dose based on the patient’s anatomy to optimise image quality while minimising radiation exposure. Imaging parameters were: 110 kVp, 40–50 mA; high-resolution mode with a 0.3 mm focal spot; exposure time of 5.4 s; and slice thickness of 0.25 mm, ensuring detailed visualisation of anatomical structures.
Whole periodontal region segmentation
The mandibular and maxillary regions were segmented and subsequently merged into a single region of interest (ROI). This process was performed using 3D Slicer software(www.slicer.org, version 5.6.2) by an oral and maxillofacial specialist with 10 years of experience. Region of Interest (ROI) Definition for Periodontal Analysis: Maxilla: The upper boundary is defined by the line connecting the lower wall of the maxillary sinus and the floor of the nose to the lower wall of the contralateral maxillary sinus. The lower boundary is the line connecting all the enamel-cementum junctions of the teeth. Mandible: The upper boundary is the line connecting all the enamel-cementum junctions of the teeth. The lower boundary is defined by the line connecting the inferior alveolar canals bilaterally in the posterior region and the bilateral mental foramen in the anterior region. Multiplanar Reconstruction (MPR) Views (Axial, Sagittal, Coronal) for Both Maxilla and Mandible: The boundaries include the labial and buccal alveolar bone plates and the lingual or palatal bone plates(Fig. 1). The segmentation results were independently reviewed and any inaccuracies were corrected to ensure the highest quality of the delineated anatomical regions.
A schematic illustration from a Cone Beam Computed Tomography (CBCT) scan of a patient, highlighting the outlines of the periodontal tissues. A: 3D view of the regions of interest. B: Axial view of the CBCT. C: Coronal view of the CBCT. D: Sagittal view of the CBCT.
Feature selection
The Pyradiomics module in Python(3.7.16) was used to extract features from the regions of interest (ROIs), with a total of 206 features extracted for each patient. To identify the most relevant features for our analysis, we employed a robust feature selection procedure based on the bootstrap method. The process is described in detail below:
Bootstrap sampling and initial feature screening
We performed bootstrap sampling with n_bootstrap = 1000. Each bootstrap sample dataset was generated by randomly sampling with replacement from the original dataset, where the number of samples equaled the number of rows in the original dataset (nrow(data)). Within each bootstrap iteration, the following operations were conducted:
Spearman rank correlation analysis
To eliminate redundancy among features, we calculated the Spearman rank correlation coefficients between all pairs of variables. Variables with a correlation coefficient greater than 0.8 were considered redundant, and the variable with the weaker correlation to the outcome variable was removed.
Lasso regression
Lasso (Least Absolute Shrinkage and Selection Operator) regression was applied to further reduce the dimensionality of the predictor variables and select the most relevant features. To determine the optimal number of variables and avoid overfitting, we performed ten-fold cross-validation to select the optimal regularization parameter λ. Variables with non-zero coefficients in the final Lasso model were retained.
The above process was repeated 1000 times, and the frequency of each feature being selected across all bootstrap iterations was recorded. Features that appeared with higher frequency were carried forward to the second stage of the bootstrap process.
Secondary bootstrap process with logistic regression and AIC
In the second stage, we performed another round of bootstrap sampling (n_bootstrap = 1000) to refine the feature selection:
Logistic regression with AIC-Based variable selection
Using the features with higher selection frequencies from the first stage, we constructed multivariate logistic regression models. A stepwise regression approach was applied to identify the optimal combination of variables, guided by the Akaike Information Criterion (AIC). The frequency of each variable combination appearing across all bootstrap iterations was recorded.
This process was repeated 1000 times, and the variable combination with the highest selection frequency was chosen for subsequent modeling.
Statistical analysis
Statistical analysis was conducted utilizing SPSS software(version 27.0) and Python software (version 3.7.16). For comparisons between two groups, continuous variables that adhered to a normal distribution were reported as mean ± standard deviation (‾x ± s) and analyzed using independent samples t-tests. Conversely, continuous variables that did not conform to a normal distribution were presented as median (interquartile range) [M(IQR)] and analyzed using the Mann-Whitney U test. Categorical variables were expressed as counts or percentages and analyzed using the chi-squared (χ²) test.
Results
A total of 279 patients were included in the study, of which 168 patients had carotid stenosis combined with periodontitis, and 111 patients had periodontitis alone. The baseline characteristics of the two groups are presented in Table 1. Among the baseline characteristics, a history of previous stroke showed a statistically significant difference between the two groups (P = 0.04), while no significant differences were observed in other variables.
Feature selection and stability analysis
A rigorous two-stage bootstrap approach (n = 1000 iterations) was employed to identify robust radiomic features predictive of concomitant carotid artery stenosis in periodontitis patients.
First-stage feature reduction
Spearman rank correlation (p > 0.8 threshold) and LASSO regression with 10-fold cross-validation were applied to eliminate redundant variables. This process yielded 26 high-frequency features (selection count > 500), predominantly comprising: Morphometric parameters: Original_shape_Sphericity_1/2, Original_shape_Elongation_1/2, and minimum axis lengths. First-order intensity statistics: Original_firstorder_Kurtosis_1, median values, and interquartile ranges. Higher-order texture features: Gray-level co-occurrence matrix (GLCM) metrics (DifferenceVariance, MaximumProbability), Gray-level size zone matrix (GLSZM) large-area emphasis descriptors, Neighborhood gray-tone difference matrix (NGTDM) contrast and strength.
Notably, bilateral feature symmetry was observed, with morphometric and textural markers concurrently appearing in both maxillary and mandibular regions (Table 2), suggesting systemic pathophysiological influences.
A multivariate logistic regression model with AIC-based stepwise refinement was iterated (n = 1000 bootstraps), converging on a 20-feature signature with maximal predictive stability (Table 3). The final ensemble retained 85% of first-stage features, confirming their discriminant validity, while introducing additional texture biomarkers.
Predictive model performance
Three machine learning architectures were benchmarked via stratified 5-fold cross-validation: Cross-validation performance metrics.
Cross-validation performance
In this study, we evaluated the performance of three machine learning models—Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF)—using various performance metrics across different folds. The results are summarized in Table 4. The Area Under the Curve (AUC), sensitivity, specificity, and accuracy for each model are presented for each fold. Notably: For Logistic Regression (LR), the highest AUC was observed in Fold 2 with an AUC of 0.874. For Support Vector Machine (SVM), the highest AUC was observed in Fold 3 with an AUC of 0.897. For Random Forest (RF), the highest AUC was observed in Fold 3 with an AUC of 0.891.
Overall, the Random Forest model consistently showed higher AUC values compared to LR and SVM across most folds, indicating its superior performance in distinguishing between positive and negative cases. The RF model also demonstrated high sensitivity and specificity, particularly in Fold 3, where it achieved an AUC of 0.891, a sensitivity of 0.957, and a specificity of 0.710(Fig. 2).
Comprehensive Evaluation of Machine Learning Model Performance: A Comparative Analysis Based on AUC, Sensitivity, Specificity, and Accuracy. (A) Comparison of AUC across different cross-validations. (B) Comparison of sensitivity across different cross-validations. (C) Comparison of specificity across different cross-validations. (D) Comparison of accuracy across different cross-validations.
In this study, we developed three machine learning models—Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF)—using selected features. Their diagnostic accuracy and calibration were evaluated using ROC curves and calibration plots, respectively. Figure 3A shows the ROC curves, plotting sensitivity against 1 - specificity for the LR-model (red), SVM-model (blue), and RF-model (green). The dashed line represents chance performance (AUC = 0.5). The RF-model demonstrates the highest AUC, indicating superior diagnostic accuracy across all risk thresholds, with notable advantages at both lower and higher thresholds. Figure 3B displays calibration curves, comparing predicted probabilities to observed event frequencies. The RF-model aligns closest to the diagonal, reflecting excellent calibration, while the LR-model and SVM-model show reasonable but less precise calibration. These results highlight the Random Forest model’s potential as a robust tool for clinical decision-making, excelling in both diagnostic accuracy and probability calibration.
ROC and Calibration Curves for Three Machine Learning Models. (A) ROC Curves: The Receiver Operating Characteristic (ROC) curves evaluate the diagnostic accuracy of three machine learning models: Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF). The sensitivity is plotted against 1 - specificity. The LR-model, SVM-model, and RF-model are represented by red, blue, and green lines, respectively. The dashed line represents chance performance (AUC = 0.5). (B) Calibration Curves: The calibration curves assess the agreement between predicted probabilities and observed frequencies for the same models. The observed event percentage is plotted against the predicted probability (midpoint of bin). The LR-model, SVM-model, and RF-model are represented by red, blue, and green lines, respectively. Perfect calibration follows the diagonal.
Clinical decision impact
In this study, we evaluated the clinical utility of three machine learning models—Logistic Regression (LR), Support Vector Machine (SVM), and Random Forest (RF)—using Decision Curve Analysis (DCA). The results show that the Random Forest model (RF-model) exhibited the highest net benefit in the lower risk threshold range (0.0 to 0.2). As the risk threshold increased, the Logistic Regression model (LR-model) and the Support Vector Machine model (SVM-model) demonstrated comparable performance; however, the Random Forest model maintained a slight advantage. Notably, in the higher risk threshold range (0.8 to 1.0), the Random Forest model continued to outperform the other two models. Overall, the decision curve analysis indicates that the Random Forest model provides the best clinical utility across a wide range of risk thresholds, suggesting its significant potential as a robust tool for clinical decision-making(Fig. 4).
Decision Curve Analysis for Three Different Machine Learning Models. The decision curve analysis (DCA) evaluates the clinical utility of three machine learning models: logistic regression (LR), support vector machine (SVM), and random forest (RF). The standardized net benefit is plotted against the high-risk threshold. The LR-model, SVM-model, and RF-model are represented by red, green, and blue lines, respectively. “All” represents the strategy of treating all patients, while “None” indicates no patients are treated. The RF-model shows a higher net benefit across most risk thresholds, indicating superior clinical utility.
Discussion
The main finding of this study lies in the successful development of a random forest model that utilises radiomic features extracted from cone beam computed tomography (CBCT) of the to enable the early detection of carotid atherosclerosis in patients with periodontitis. By analysing 279 observations, each characterised by 206 features, and addressing class imbalance through the application of SMOTE oversampling, we identified a final set of 20 features following a rigorous feature selection process. In five-fold cross-validation, the model demonstrated excellent performance, achieving an AUC of 0.892, sensitivity of 0.957, and specificity of 0.710. Importantly, compared to logistic regression and support vector machine models, the random forest model not only exhibited superior predictive accuracy but also showed broader clinical utility in decision curve analysis, indicating its ability to provide higher net benefit across a range of risk thresholds. This underscores the potential value and applicability of using CBCT-based radiomic features for the early identification of carotid atherosclerosis in patients with periodontitis.
Mounting epidemiological and mechanistic evidence links periodontitis to accelerated carotid atherosclerosis, likely mediated by chronic inflammation, bacterial dissemination, and endothelial dysfunction6,7,8,9,17,18. In previous clinical work, it has been observed that patients with more severe periodontal disease (PD) face an increased risk of developing coronary artery atherosclerotic heart disease, peripheral arterial disease, and experiencing a first cerebrovascular event19,20. Moreover, patients with cardiovascular and cerebrovascular diseases are also at a higher risk of recurrent related events21. Measurements of flow-mediated vasodilation, arterial stiffness (e.g., pulse wave velocity), intima-media thickness, and arterial calcification scores have revealed significant endothelial dysfunction in patients with periodontitis22. Crucially, no prior research has explored CBCT radiomics as a predictor of carotid pathology—a gap addressed by our study. Recent advances in artificial intelligence and radiomic analysis now enable the decoding of complex, non-intuitive patterns in medical imaging. By applying these techniques to routine dental CBCT scans, we demonstrate that morphometric and textural alterations in periodontal structures reflect systemic vascular risk, establishing a novel, non-invasive diagnostic paradigm. This approach capitalises on existing dental imaging infrastructure, circumventing the need for additional costly or invasive tests.
Recent advances in predicting carotid atherosclerosis have been achieved through the use of advanced computational techniques. One study integrated demographic, clinical, and molecular data with ultrasonographic measurements, employing neural network algorithms and hierarchical clustering to identify four subclinical carotid atherosclerosis endotypes, ranging from mild to severe, within the IMPROVE cohort (n = 3340)23. This approach not only improved ASCVD risk discrimination but also demonstrated potential applications in precision medicine for ASCVD prevention. Another study utilized deep learning (DL) models to predict cardiovascular disease and coronary artery disease risks based on carotid plaque features24. Among 459 participants undergoing coronary angiography, contrast-enhanced ultrasound, and B-mode carotid imaging, parameters such as maximum plaque height, total plaque area, carotid intima-media thickness (cIMT), and intraplaque neovascularization were analysed. Results showed that DL models outperformed traditional machine learning methods, achieving a 21% improvement in AUC and a ~ 17% increase in c-index compared to the Cox proportional hazards model (CPHM). Notably, IPN demonstrated significant predictive power for cardiovascular events (p < 0.0001). Additionally, advancements in cardiovascular modelling have enabled the creation of patient-specific three-dimensional carotid artery models25. This study combined non-imaging and imaging data with simulated haemodynamic data to develop a prognostic model for carotid stenosis progression, achieving 71% accuracy using a neural network classifier. The novelty of this work lies in its unique problem definition and extensive use of simulated data as input for the predictive model. These studies collectively highlight the potential of integrating advanced machine learning techniques with detailed clinical and imaging data to enhance the prediction and management of carotid atherosclerosis.
In our study, the Random Forest model’s exceptional performance, with a sensitivity of 0.957, highlights its potential as an effective screening tool. The frequent selection of intensity-based features, such as Kurtosis and Skewness, further provides a mechanistic link between oral inflammation and atherosclerosis. This suggests not only the practical utility of the model in clinical settings but also deepens our understanding of the underlying pathways connecting periodontal health to cardiovascular disease. A cross-sectional study conducted in Japan found that both the absence of regular dental check-ups and the presence of periodontitis were significantly associated with atherosclerosis among community-dwelling residents26. This suggests that maintaining good oral health may play an important role in reducing the risk of cardiovascular diseases. Further research has highlighted the critical role of inflammatory responses in this association. A case-control study indicated that systemic inflammatory responses triggered by inflammatory cytokines, bacterial pathogens, and altered lipoprotein metabolism in patients with periodontitis may promote the development and progression of atherosclerosis27. The study also demonstrated that clinical attachment loss (CAL) and carotid intima-media thickness (IMT) were significantly higher in individuals with periodontitis compared to healthy controls, underscoring the significant role of inflammation in the atherosclerotic process. Additionally, a study conducted among Korean adults further supported this view, showing a close relationship between severe periodontitis and the development of early atherosclerotic vascular disease, particularly among non-smokers28. As the severity of periodontitis increased, the adjusted mean carotid intima-media thickness (cIMT) significantly increased, while the ankle-brachial index (ABI) decreased, indicating a dose-dependent negative impact of periodontitis on atherosclerosis. Evidence from studies on specific populations also provided additional insights. For instance, among individuals with heterozygous familial hypercholesterolemia (hFH), severe periodontitis was associated with higher diastolic blood pressure (DBP), suggesting that severe periodontitis may be an important factor contributing to elevated cardiovascular risk in this population29.
Taken together, these studies collectively reveal that periodontitis accelerates the progression of atherosclerosis by inducing systemic inflammatory responses, leading to dyslipidemia, elevated blood glucose levels, and changes in other traditional cardiovascular risk factors. Specifically, periodontal pathogens and their by-products may reach distant organs via the bloodstream, triggering chronic low-grade inflammation, which subsequently impairs vascular endothelial function and ultimately promotes the occurrence and development of atherosclerosis.
In terms of discriminative accuracy and clinical utility, as demonstrated by Decision Curve Analysis (DCA), the Random Forest (RF) model(AUC: 0.892) outperformed Logistic Regression (LR) and Support Vector Machine (SVM). This superiority of RF likely stems from its ability to handle non-linear relationships and feature interactions, which may better capture the complex biological interplay between periodontitis and carotid stenosis. Notably, in the third fold, the model maintained high sensitivity (0.957), suggesting its potential as a screening tool for high-risk patients. However, its specificity varied between 0.667 and 0.743 (0.710 in the third fold), indicating some variability and implying that false positives could limit diagnostic precision in low-prevalence populations. Although SVM showed a slightly higher AUC than RF in the third fold (0.897 vs. 0.892), the overall performance of RF still demonstrates a good balance between sensitivity and specificity.
Our feature selection methodology—combining correlation filtering, LASSO regression, and bootstrap stability analysis—enhanced the generalizability of the radiomic signature30,31. Nevertheless, the clinical translation of radiomics in periodontitis remains nascent. While previous research has focused on imaging biomarkers for periodontal bone loss32, this study extends their application to systemic vascular comorbidity, offering a novel diagnostic paradigm.
The high-frequency selection of Original_firstorder_Kurtosis and Original_firstorder_Skewness suggests that intensity distribution abnormalities in periodontal tissues may serve as early markers of vascular involvement. However, the exact mechanistic basis for these radiomic patterns warrants further investigation, particularly whether they reflect localized inflammation or systemic endothelial dysfunction.
Despite these advances, several limitations must be acknowledged. First, the retrospective, single-center design limits causal inference and introduces potential selection bias. Second, the absence of an external validation cohort constrains the assessment of the model’s generalizability to other populations and imaging protocols. Third, the reliance on SMOTE-augmented data, while mitigating class imbalance, may introduce synthetic patterns that do not fully represent biological variability; future studies with larger, naturally balanced cohorts are needed. Fourth, we only evaluated three interpretable machine learning models (LR, SVM, RF). Although these provided strong performance, future work should explore other architectures such as Artificial Neural Networks (ANN), K-Nearest Neighbors (KNN), or Decision Trees (DT) to potentially achieve higher accuracy or different insights. Fifth, important clinical confounders such as detailed medication use (e.g., statins, anti-inflammatory drugs), periodontal severity indices (e.g., full-mouth plaque/bleeding scores), and carotid plaque characteristics were not integrated into the current model, which may affect predictive precision. Sixth, the radiomic feature stability requires further investigation regarding test-retest reproducibility, inter-observer segmentation variability, and robustness across different CBCT scanners. Finally, while the bootstrap feature selection is rigorous, the biological interpretability of the selected radiomic features and their direct link to the pathophysiology linking periodontitis and atherosclerosis warrant deeper mechanistic investigation.
Conclusion
This study developed and validated a random forest model based on CBCT radiomic features for the early detection of carotid atherosclerosis in periodontitis patients. The model demonstrated excellent predictive performance (AUC = 0.892, sensitivity = 0.957, specificity = 0.710) and showed high clinical utility in decision curve analysis, indicating its potential value in early detection.
Data availability
All data generated or analyzed during this study are included in this published article. The raw data that support the findings of this study are available from the authors without undue reservation. The datasets supporting the conclusions of this article can also be obtained from the corresponding author upon reasonable request.
References
Kim, H. L. & Kim, S. H. Pulse wave velocity in atherosclerosis. Front. Cardiovasc. Med. ;6, 00041. (2019).
Chong, B. et al. Global burden of cardiovascular diseases: projections from 2025 to 2050. Eur. J. Prev. Cardiol. 32 (11), 1001–1015 (2025).
Zhou, X. D. et al. Global burden of disease attributable to metabolic risk factors in adolescents and young adults aged 15–39, 1990–2021. Clin. Nutr. 43 (12), 391–404 (2024).
Bonati, L. H., Jansen, O., de Borst, G. J. & Brown, M. M. Management of atherosclerotic extracranial carotid artery stenosis. Lancet Neurol. 21 (3), 273–283 (2022).
Saba, L. et al. Imaging biomarkers of vulnerable carotid plaques for stroke risk prediction and their potential clinical implications. Lancet Neurol. 18 (6), 559–572 (2019).
Malvicini, G. et al. Association between apical periodontitis and secondary outcomes of atherosclerotic cardiovascular disease: A case-control study. Int. Endod J. 57 (3), 281–296 (2024).
Leng, Y. R. et al. Periodontal disease is associated with the risk of cardiovascular disease independent of sex: A meta-analysis. Front. Cardiovasc. Med. ;10, 1114927(2023).
Sanz, M. et al. Periodontitis and cardiovascular diseases: consensus report. J. Clin. Periodontol. 47 (3), 268–288 (2020).
Schenkein, H. A., Papapanou, P. N., Genco, R. & Sanz, M. Mechanisms underlying the association between periodontitis and atherosclerotic disease. Periodontol 2000. 83 (1), 90–106 (2020).
Carra, M. C., Rangé, H., Caligiuri, G. & Bouchard, P. Periodontitis and atherosclerotic cardiovascular disease: A critical appraisal. Periodontol 2000. (2023).
Kim, J. Y., Lee, K. Y. H., Lee, M. G. & Kim, S. J. Periodontitis and atherosclerotic cardiovascular disease. Mol. Cells ;47(12), 100146 (2024).
Lu, L. J. et al. The role of periodontitis in the development of atherosclerotic cardiovascular disease in participants with the components of metabolic syndrome: a systematic review and meta-analysis. Clin. Oral Invest. ;28(6), 339 (2024).
Fan, T. T. et al. Study on the effect of periodontitis on renal tissue in atherosclerotic mice. J. Periodontal Res. 58 (3), 655–667 (2023).
Bayrakdar, S. K. et al. A deep learning approach for dental implant planning in cone-beam computed tomography images. Bmc Med. Imaging ;21(1), 86(2021).
Huang, H. R., Chen, D., Lippuner, K. & Hunziker, E. B. Human bone typing using quantitative Cone-Beam computed tomography. Int. Dent. J. 73 (2), 259–266 (2023).
Caton, J. G. et al. A new classification scheme for periodontal and peri-implant diseases and conditions - Introduction and key changes from the 1999 classification. J. Periodontol. 89, S1–S8 (2018).
Tong, C., Wang, Y. H. & Chang, Y. C. Increased risk of carotid atherosclerosis in male patients with chronic periodontitis: A nationwide Population-Based retrospective cohort study. Int. J. Env Res. Pub He ;16(15), 2635 (2019).
Zhou, L. J. et al. Periodontitis exacerbates atherosclerosis through Fusobacterium nucleatum-promoted hepatic Glycolysis and lipogenesis. Cardiovasc. Res. 119 (8), 1706–1717 (2023).
Carasol, M., Aguilera, E. M. & Ruilope, L. M. Oral health, hypertension and cardiovascular diseases. Hiperten Riesgo Vasc. 40 (4), 167–170 (2023).
Zhong, Y. et al. The Oral-Gut-Brain axis: the influence of microbes as a link of periodontitis with ischemic stroke. Cns Neurosci. Ther. ;30(12), e70152 (2024).
Sharma, A. et al. The relationship between periodontal disease (PD) and recurrent vascular events in ischemic Stroke/Transient ischemic attack (TIA) patients: A Hospital-Based cohort study. Cureus J. Med. Sci. ;15(3), e36530 (2023).
Hansen, P. R. & Holmstrup, P. Cardiovascular diseases and periodontitis. Adv. Exp. Med. Biol. 1373, 261–280 (2022).
Chen, Q. S. et al. A machine learning based approach to identify carotid subclinical atherosclerosis endotypes. Cardiovasc. Res. 119 (16), 2594–2606 (2023).
Bhagawati, M. et al. Deep learning approach for cardiovascular disease risk stratification and survival analysis on a Canadian cohort. Int. J. Cardiovas Imag. 40 (6), 1283–1303 (2024).
Siogkas, P. K. et al. A machine learning model for the prediction of the progression of carotid arterial stenoses. Ieee Eng. Med. Bio. 2023, 1-4 (2023).
Yamada, S. et al. Regular dental visits, periodontitis, tooth loss, and atherosclerosis: the Ohasama study. J. Periodontal Res. 57 (3), 615–622 (2022).
Soronzonbold, A. et al. Measurement of atherosclerosis markers in individuals with periodontitis. J. Periodontal Implan. 54 (1), 37–43 (2024).
Ahn, Y. B. et al. Periodontitis is associated with the risk of subclinical atherosclerosis and peripheral arterial disease in Korean adults. Atherosclerosis 251, 311–318 (2016).
Vieira, C. L. Z. et al. Severe periodontitis is associated with diastolic blood pressure elevation in individuals with heterozygous Familial hypercholesterolemia: A pilot study. J. Periodontol. 82 (5), 683–688 (2011).
Zhang, X. Y. & Cheng, G. Simultaneous inference for High-Dimensional linear models. J. Am. Stat. Assoc. 112 (518), 757–768 (2017).
Gentzkow, M., Kelly, B. & Taddy, M. Text as data. J. Econ. Lit. 57 (3), 535–574 (2019).
Yu, B. & Wang, C. Y. Osteoporosis and periodontal diseases - An update on their association and mechanistic links. Periodontol 2000. 89 (1), 99–113 (2022).
Acknowledgements
We extend our thank to the subjects whose participation made this study possible.We also express our sincere gratitude to the advisors of this research for their guidance and dedicated efforts.
Funding
This work was supported by the National Natural Science Foundation of China (Nos. 81870348 and 82200543), the Qing Lan Project of Jiangsu Province (2024), and the Key Open Innovation Research Project of Nanjing University of Chinese Medicine (JBGS202403).
Author information
Authors and Affiliations
Contributions
ZMQ and YY conceptualised the analysis. LZ, YSX, ZZ, YT, AND QT conducted the data search, critically analysed the selected documents, and drafted the manuscript. CJ , CQ, SM, CZP contributed to revising and finalising the draft manuscript. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics approval and consent to participate
This retrospective study was approved by the Ethics Committee of Nanjing Drum Tower Hospital (approval number: 2023-461-02), and the requirement for written informed consent was waived.
Consent for publication
Not applicable.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhang, M., Cai, J., Cao, Q. et al. Radiomic features and carotid stenosis in periodontitis a two stage bootstrap and multimodal machine learning study. Sci Rep 16, 8177 (2026). https://doi.org/10.1038/s41598-026-38463-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-026-38463-1



