Abstract
Cardiovascular disease (CVD) remains the most common cause of death worldwide. Carotid plaque is an indicator of subclinical CVDs. Metabolic dysfunction-associated steatotic liver disease (MASLD) is a risk factor for atherosclerotic CVDs. We aimed to develop and validate a predictive model for carotid plaque occurrence in annual health check-up populations, to integrate health check-up indicators with machine learning (ML) algorithms and LASSO-based feature selection and leverage advanced interpretability frameworks to elucidate the contribution of individual risk factors. In this retrospective cohort study, we enrolled 4,973 MASLD patients, among whom 1,178 were diagnosed with carotid plaques using carotid ultrasound. Collected baseline data included demographic indicators, clinical histories, blood biochemical parameters, and liver function test indicators. A predictive model for carotid plaques was developed and validated using five ML algorithms. Model performance was evaluated based on the area under the curve, sensitivity, specificity, accuracy, and F1 Score. For model interpretability, we adopted the Shapley Additive Explanations (SHAP) framework to quantify the contribution of individual features to the prediction outcomes. Among the five ML algorithm models, the support vectors machine model demonstrated superior discriminative capability, higher goodness-of-fit, and greater clinical utility compared to other ML algorithm models. Moreover, age, systolic blood pressure, total cholesterol, sex, and fasting plasma glucose were the most important risk factors associated with carotid plaques in the MASLD population. This study demonstrated the feasibility of constructing a predictive model for carotid plaques in MASLD populations using health check-up indicators combined with ML algorithms. The application of SHAP methods enhanced model interpretability by quantifying the contribution of individual risk factors to prediction outcomes, enabling clinicians to identify high risk MASLD patients prone to carotid plaque development, so that they can adjust interventions accordingly.
Similar content being viewed by others
Introduction
Cardiovascular diseases (CVDs), principally ischemic heart disease and stroke, are the leading causes of global mortality and are major contributors to disabilities1. Carotid plaques serve as both a critical subclinical marker of atherosclerosis and a predictor of adverse cardiovascular events2,3. Metabolic dysfunction-associated steatotic liver disease (MASLD), affecting 25–30% of adults globally, is increasingly recognized as an independent risk factor for CVDs, with studies reporting a 40–60% prevalence of carotid plaques in MASLD populations4. Despite this strong association, current cardiovascular risk stratification tools, such as the Framingham Risk Score, fail to incorporate MASLD -specific biomarkers or leverage advanced predictive analytics, leading to suboptimal risk discrimination in this high-risk cohort5.
Recent advances in machine learning (ML) algorithms have demonstrated promising results in clinical prediction models but have been hindered by the “black box” nature of algorithms, limiting their clinical interpretabilities and actionable insights for personalized interventions6. The Shapley Additive Explanation (SHAP) is a model-agnostic interpretability method based on the Shapley value concept, derived from cooperative game theory, designed to quantify the contribution of parameters for ML prediction, and to enhance model transparency. By decomposing prediction results into feature contributions, the SHAP transforms complex “black-box” models into explainable frameworks, providing both global and local interpretabilities7,8.
However, existing MASLD -focused studies predominantly rely on traditional regression models, which inadequately capture nonlinear interactions among metabolic, inflammatory, and hemodynamic risk factors9. Furthermore, while SHAP has been validated in other medical applications to improve model transparency, its application in MASLD -related cardiovascular risk prediction remains unexplored. In this study, we developed and validated a prediction model for the occurrence of carotid plaques in the MASLD population using ML algorithms based on health check-up indicators. The SHAP values were subsequently used to interpret the model’s predictions, revealing the marginal contributions of individual features and their interactions to the risks of carotid plaque development.
Methods
Study participants
Participants were enrolled from the annual health check-up population at the Second Hospital of Hebei Medical University, between January 2024 and December 2024. The inclusion criteria were: (1) participants aged ≥ 18 years, and (2) participants with liver ultrasound and carotid ultrasound results with clear diagnostic outcomes. Exclusion criteria included (1) age < 18 years; (2) history of cardiovascular and cerebrovascular diseases or malignant tumors; (3) participants with coexisting etiologies for chronic liver diseases, including hemochromatosis, autoimmune liver disease, chronic viral hepatitis, alpha-1 antitrypsin deficiency, Wilson’s disease, and drug-induced liver injury10; (4) Participants missing essential clinical examination indicators (blood tests, biochemistry, anthropometrics); and (5) participants without MASLD. Finally, 4,973 participants were enrolled in this study (Fig. 1).
This study was approved by the Institutional Review Board (IRB) of the Second Hospital of Hebei Medical University (Approval No. 2022-R341). Informed consent was waived by the Institutional Review Board (IRB) of the Second Hospital of Hebei Medical University owing to the study’s retrospective nature. Identifying information was anonymized, adhering to the ethical principles of the Declaration of Helsinki.
Potential risk features and outcomes
All potential risk factors for carotid plaque reported in recent literature were systematically evaluated. Based on data availability in the study cohort, 26 variables were selected and categorized as follows: demographic characteristics (sex and age), anthropometric measurements [height, weight, body mass index, systolic blood pressure (SBP), diastolic blood pressure (DBP), and pulse]. Blood biochemical indicators were total cholesterol (TC), triglycerides (TG), high-density lipoprotein cholesterol (HDL-C), and low-density lipoprotein cholesterol (LDL-C)]. Metabolic markers were fasting blood glucose (FBG) and uric acid. Hepatic function indicators were alanine transaminase (ALT), aspartate aminotransferase (AST), total protein (TP), albumin (ALB), globulin (GLB), A/G ratio, total bilirubin (TBIL), direct bilirubin (DBIL), and indirect bilirubin (IBIL)]. Renal functions included serum creatinine. Comorbid conditions were hypertension, diabetes mellitus (DM), and hyperlipidemia.
The outcome was whether the participant was diagnosed with MASLD and carotid plaques. The ultrasonographic manifestations of MASLD were primarily characterized by the basic symptoms of steatosis, which increased echogenicity of the liver parenchyma in comparison to the cortex of the right kidney, because intracellular accumulation of fat vacuoles reflected the ultrasound beam11. Furthermore, diagnosis of MASLD also met the requirements of the multi-society Delphi consensus statement on new fatty liver disease nomenclature12. Carotid artery plaque diagnosis by ultrasound was based on guidelines and standards from the 2020 American Society of Echocardiography13.
Data processing and feature selection
The dataset was randomly divided into a training cohort (n = 3,480) and validation cohort (n = 1,493), using a 7:3 ratio for model training and internal validation, respectively. All continuous variables underwent Z-score normalization to standardize feature scales, ensuring comparability across parameters. Variables exhibiting significant skewness (Shapiro-Wilk P < 0.05) were retained without transformation, leveraging the inherent robustness of tree-based methods to non-normality. The feature selection process and model development were conducted using the training cohort, while the independent validation cohort was used for evaluating the model’s performance.
Correlation between variables was evaluated using Pearson’s (parametric) and Spearman’s (nonparametric) tests. Variables exhibiting absolute correlation coefficients ≥ 0.8 were excluded to mitigate multicollinearity risks. To identify the most predictive features and improve analytical robustness, the least absolute shrinkage and selection operator (LASSO) regression was used for feature selection. This regularization technique applies L1 penalty to shrink less important coefficients toward zero, effectively selecting key variables while mitigating overfitting. The selected features preserved clinically relevant signal strengths, thereby enhancing comparability of latent patterns across datasets.
Model construction and evaluation
We used three ML algorithms [support vectors machine (SVM), Logistic Regression (LR), Decision Tree (DT), Random Forest (RF) and eXtreme Gradient Boost (XGBoost)] to construct the model based on the features selected by LASSO regression. Hyperparameter optimization was performed via 5-fold cross-validated grid search to ensure robust model performance and reproducibility. For the five ML algorithms evaluated: Support Vector Machine (SVM): Optimized regularization parameter (C: [0.1, 1, 10]) and kernel coefficient (gamma: [0.01, 0.1, 1]); Logistic Regression (LR): Tuned penalty strength (L2 regularization, C: [0.01, 0.1, 1]); Decision Tree (DT): Evaluated maximum depth3,5,10 and minimum samples per leaf5,10; Random Forest (RF): Optimized number of trees ([100, 200, 300, 400, 500]), features per split ([sqrt(p), log2(p)] where p = number of features), and node splitting criteria ([Gini impurity, entropy]); XGBoost: Calibrated learning rate (η: [0.01, 0.05, 0.1, 0.2]), maximum depth2,3,4,5,6, subsample ratio ([0.6, 0.7, 0.8, 0.9, 1.0]), column sampling ([0.6, 0.7, 0.8, 0.9, 1.0]), and minimum loss reduction ([0, 1, 3]).Final configurations were selected by maximizing the AUC on the training cohort through systematic evaluation of all hyperparameter combinations. The optimal parameters were: SVM: C = 10, gamma = 0.1;LR: C = 0.1;DT: max_depth = 5, min_samples_leaf = 10;RF: ntree = 260, mtry = sqrt(p); XGBoost: eta = 0.1, max_depth = 3. The area under the curve (AUC) of receiver operating characteristics (ROC), sensitivity, specificity, accuracy, and F1 score were used to provide a comprehensive assessment of model predictive performance. Furthermore, we used the calibration curve and Decision Curve Analysis (DCA) to evaluate the model’s performance. Calibration curves served as an important tool for assessing predictive performance, by juxtaposing predicted probabilities against actual outcomes, thereby validating the model’s alignment with real-world risk profiles. DCA amalgamated predictive efficacy and clinical utility were used to evaluate a model’s practical value across diverse decision thresholds. By quantifying the net benefit (the trade-off between true positives and false positives), DCA identified thresholds where a model’s discriminative power and clinical relevance outperformed simpler heuristics (e.g., “universal treatment” or “no treatment”)14,15. Confidence intervals (CI) for performance metrics were derived via 1,000 bootstrap iterations using percentile methods. Stratified sampling maintained original class distributions during resampling.
Model interpretation
The interpretation of predictive outcomes and the elucidation of feature importance remained critical challenges in machine learning, particularly for complex models where decision-making processes were often opaque. To address these challenges, we employed the SHAP framework, a theoretically grounded post-hoc explanation technique rooted in cooperative game theory, to quantify the contribution of each input feature to model predictions. This choice leveraged Shapley values to ensure fairness and consistency in feature attributions, addressing limitations associated with traditional feature importance metrics.
Statistical analysis
Statistical analysis was conducted using R software version 3.4.3 (http://www.r-project.org). Continuous variables were tested for normality using the Shapiro-Wilk test. If the data conformed to a normal distribution, means ± standard deviations (x ̅±s) were reported, and group comparisons were performed using independent sample t-tests. If the data were non-normally distributed, medians and interquartile ranges (M, IQR) were presented, with group comparisons conducted using the Mann-Whitney U test. Categorical variables are expressed as counts and percentages (n, %), and group differences were analyzed using the chi square test or Fisher’s exact test.
Results
Baseline characteristics
There were 4,973 participants enrolled in this study, with 3,462 males (69.62%) and 1,511 females (30.38%). Among the participants, 1,178 (23.69%) were diagnosed with carotid plaques. Compared to the negative group, the positive group had higher age, SBP, DBP, LDL-C, TC, FPG, urea, UA, TP, ALB, A/G ratio, and AST levels. In addition, the positive group had higher levels of hypertension, DM, and hyperlipemia; but a lower proportion of males (all, p ≤ 0.05) (Table 1).
Comparison of model performance
Table 2 compares five ML algorithms (SVM, DT, LR, RF and XGBoost) across training and validation cohorts. Evaluation metrics included the AUC, sensitivity, specificity, accuracy and F1 score. Among the models, SVM demonstrated superior discriminative capability (validation AUC = 0.813), while XGBoost achieved marginally higher AUC (0.829) but lower specificity (0.744 vs. 0.785) (Fig. 2A and B). SVM maintained robust performance across sensitivity (0.674), specificity (0.785), accuracy (0.737), and F1 score (0.773), exhibiting the most balanced clinical utility profile. Calibration curves confirmed SVM’s exceptional reliability, with training and validation predictions closely aligning with observed outcomes (Fig. 2C and D). This concordance with the diagonal reference line indicates precise probability calibration. Decision curve analysis further established SVM’s clinical superiority, demonstrating the highest net benefit across decision thresholds (Fig. 2E and F). Notably, RF achieved perfect training metrics (AUC = 1, sensitivity = 1, specificity = 1) but showed reduced validation performance (AUC = 0.828), indicating potential overfitting. Despite XGBoost’s strong AUC (0.829), its lower specificity reduced clinical utility for preventive applications requiring minimal false positives. Consequently, SVM was selected as the optimal model for carotid plaque prediction in MASLD populations, balancing discrimination (AUC), calibration reliability, and clinical applicability.
Based on the SVM model performance, feature importance was quantified using the absolute SHAP values. The negative and positive contributions of each feature are represented by purple and yellow markers, respectively. The horizontal position of each data point reflects its SHAP value, where higher values indicate stronger contributions to increased predicted probabilities of carotid plaque occurrences, while lower values correspond to reduced risk predictions. Figure 3A and B shows the top 15 features ranked by contribution weights, which are displayed to simplify the interpretation of complex model outputs. This SHAP-based visualization framework enhanced clinical translatability by explicitly mapping the quantitative relationships between key predictors (e.g., age and SBP) and atherosclerotic risk stratification.
Performance of the predictive model based on three machine learning (ML) algorithm models of the training and test cohorts. (A) Receiver Operating Characteristic (ROC) curve of three ML algorithm models of the training cohort. (B) ROC curve of three ML algorithm models of the test cohort. (C) Calibration curve of three ML algorithm models of the training cohort. (D) Calibration curves of three ML algorithm models of the test cohort. (E) Decision Curve Analysis (DCA) curve of three ML algorithm models of the training cohort. (F) DCA curve of three ML algorithm models of the test cohort.
Shapley Additive Explanations (SHAP) interpretation of the support vectors machine model. (A) SHAP summary plot. The importance rankings of the model prediction features, and the horizontal coordinates represent the mean absolute SHAP values computed across all samples, where a larger absolute SHAP value indicates a stronger influence of the corresponding features of the model’s predictions. (B) SHAP features importance. Each point represents a feature value, and different colors represent the final influence of the feature on the model output results, where yellow represents a larger value and purple represents a smaller value.
Discussion
In recent years, numerous studies have reported a significant association between MASLD and carotid plaque formation16,17,18. This growing body of evidence underscores the critical role of MASLD as a predictor of subclinical atherosclerosis and cardiovascular risk, particularly through mechanisms involving metabolic dysregulation and systemic inflammation16. Notably, longitudinal cohort studies have reported that both the presence and progression of MASLD, especially in advanced fibrosis stages, are independently associated with increased carotid plaque burden19. These findings highlight the need for integrated cardiovascular risk assessment in MASLD patients, to mitigate the burden of ischemic cerebrovascular events.
ML algorithms have revolutionized clinical prediction model development by facilitating the integration of complex, high dimensional data, to improve diagnostic accuracy, risk stratification, and therapeutic decision-making20,21. However, the development of predictive models for carotid plaques in the MASLD population, which integrates health check-up indicators with ML algorithms, remains poorly understood. Deng Y, et al.9 developed a prediction model for carotid plaques based on the health check-up indicators of 5.4 million adults with fatty liver disease, combined with ML algorithms. The predictive model showed good performance in an internal validation set (AUC = 0.831) and external validation set (AUC = 0.801). The model also graphically showed good calibration capabilities. However, this predictive model has not undergone validation using DCA to assess its clinical utility, nor has it undergone SHAP analysis for interpretability characterization. In addition, the study population comprised individuals with fatty liver disease, rather than MASLD.
In the present study, we constructed a predictive model for carotid plaques in the MASLD population, based on health check-up indicators, which showed efficiency, straightforwardness, and practicality in clinical applications. Among the three ML algorithms, SVM demonstrated the best performance, achieving an AUC of 0.813 in the validation cohort, The model exhibited favorable calibration characteristics, goodness-of-fit properties, and clinical usefulness as shown by the calibration and DCA curves, which exhibited efficiency, straightforwardness, and practicality in clinical applications. Notably, while SVM demonstrated strong discriminative capability (AUC = 0.813), XGBoost achieved marginally superior performance in the validation cohort (AUC = 0.829). This aligns with prior studies where boosting algorithms outperformed kernel-based methods in complex biomedical prediction tasks22. However, SVM maintained higher specificity (0.785 vs. 0.744), suggesting greater reliability in identifying low-risk individuals. Given the clinical priority of minimizing false positives in preventive cardiology, SVM’s balanced performance profile – coupled with its superior interpretability via SHAP – justified its selection as the primary predictive tool. To resolve persistent challenges in ML algorithm interpretability, and to enable transparent visualization of prediction determinants, we implemented SHAP in our SVM model, to systematically assess both global (feature importance rankings) and local (individual prediction analyses) interpretabilities. This integration enabled us to identify critical features driving predictions, such as age, SBP, TC, sex, and FBG, which were important features in prediction of carotid plaques in the MASLD population, while also revealing nonlinear relationships and interaction effects that may be masked by simpler methods. The computed SHAP values quantified directional feature contributions, with positive scores indicating elevated carotid plaque risks associated with specific feature values in the MASLD population, while negative values indicated protective effects. Through tree-specific implementation of the SHAP’s additive feature attribution framework, our methodology enabled granular visualization of nonlinear decision pathways, bridging the gap between algorithmic complexity and clinical reasoning.
The results in the present study, which showed that age, SBP, TC, sex, and FBG were the most important features of carotid plaques in MASLD populations, were consistent with previous studies9,23. Aging induces cumulative metabolic and vascular stress, fostering carotid plaque formation through three principal mechanisms: (1) age-related reactive oxygen species accumulation and elevated pro-inflammatory cytokines (TNF-α and IL-6), which disrupt endothelial integrity, promote lipid oxidation, and drive foam cell formation, initiating atherosclerotic plaque development24; (2) declining nitric oxide bioavailability and arterial stiffening impair vasodilation, enhance platelet aggregation, and potentiate thrombotic events25; and (3) reduced insulin sensitivity exacerbates hepatic steatosis and systemic metabolic imbalance, accelerating atherosclerosis progression26. Hypertension is a well-established risk factor for carotid atherosclerosis27,28. Elevated SBP accelerates vascular endothelial injury and promotes LDL-C infiltration into the arterial wall, leading to plaque initiation and progression27. In addition, elevated SBP results in arterial stiffness and promotes cytokine release (IL-6, TNF-α), enhancing monocyte adhesion and intraplaque inflammation29. Furthermore, high cholesterol is a major risk factor for carotid artery disease, which can lead to narrowing or blockage of the carotid arteries supplying blood to the brain; and which also plays a role in the progression of carotid artery stenosis and overall cardiovascular risk30. The mechanism involves: (1) small, dense LDL particles infiltrating arterial intima, undergoing oxidation, and being phagocytosed by macrophages to form foam cells, which are the core of atherosclerotic plaques31; and (2) reduced HDL-C, impairing reverse cholesterol transport, and failing to clear lipid-laden macrophages from plaque sites32. In addition, males with higher occurrences of carotid plaques are characterized by the following: (1) higher testosterone levels in males promote insulin resistance and accelerate atherosclerosis, but females with higher estrogen can enhance or reverse cholesterol transport and reduce vascular inflammation33. (2) In male MASLD patients, oxidized LDL preferentially infiltrates the arterial wall of males, where it is engulfed by macrophages via scavenger receptors, leading to foam cell formation and the release of pro-inflammatory chemokines (e.g., MCP-1)34. This process recruits’ monocytes to the arterial intima, causing plaque initiation and progression, while reduced paraoxonase-1 activity impairs HDL’s antioxidant capacity, diminishing its ability to detoxify oxidized LDL and mitigate endothelial dysfunction. This further exacerbates vascular inflammation and plaque vulnerability35. Some studies had reported that elevated FBG increases advanced glycation end-products (AGEs), which bind to the receptor for AGEs on endothelial cells, activating the NF-κB pathway and promoting inflammatory cytokines (e.g., TNF-α and IL-6). This accelerates LDL oxidation and foam cell formation36.
Limitations
There were several limitations in this study. First, this study has several inherent limitations. The single-center retrospective design and cross-sectional nature preclude establishing causal or temporal relationships between MASLD and carotid plaque occurrence. The restricted one-year temporal scope constrains longitudinal assessment. Potential selection bias may affect population representativeness and residual confounding; Model generalizability requires validation in multi-ethnic cohorts and prospective settings; Future research should prioritize multicenter prospective cohorts to validate prediction performance temporally, establish causal mechanisms, and explore advanced architectures (e.g., transformer networks) for feature representation learning. Second, the diagnosis of MASLD was defined by ultrasonography rather than liver biopsy. However, studies have reported a strong correlation between ultrasonographic findings and histopathological results from liver biopsies, particularly in detecting hepatic steatosis37. Finally, SHAP analysis does not quantify the importance of predictors in real quant problems, but rather their importance to the model’s predictions38.
Conclusions
This study developed and validated predictive models for carotid plaque occurrence in patients with MASLD by integrating demographic data, blood biochemical indices, and clinical parameters from annual health check-up populations, using ML algorithms. The SVM-based models showed high accuracy and robust reliability in predicting carotid plaque development. Furthermore, we used advanced SHAP technology for model interpretation and visualization, to facilitate a precise characterization of risk factors for carotid plaques in MASLD populations. By combining SHAP technology with ML algorithms, we quantified the contributions of key risk factors such as age, SBP, TC, sex, and FBG to carotid plaque risk, thereby possibly providing precise clinical insights into the management of carotid plaques associated with MASLD patients.
Data availability
The datasets generated and/or analyzed during this study are available from the corresponding author upon reasonable request.
References
Roth, G. A. et al. Global burden of cardiovascular diseases and risk factors, 1990–2019: update from the GBD 2019 Study. J. Am. Coll. Cardiol. 76(25), 2982–3021 (2020).
Johri, A. M. et al. Maximum plaque height in carotid ultrasound predicts cardiovascular disease outcomes: a population-based validation study of the American society of echocardiography’s grade II-III plaque characterization and protocol. Int. J. Cardiovasc. Imaging 37(5), 1601–1610 (2021).
Genkel, V. et al. Carotid total plaque area as an independent predictor of short-term subclinical polyvascular atherosclerosis progression and major adverse cardiac and cerebrovascular events. Ther. Adv. Cardiovasc. Dis. 17, 17539447231194860 (2023).
Teng, M. L. et al. Global incidence and prevalence of nonalcoholic fatty liver disease. Clin. Mol. Hepatol. 29(Suppl), S32–S42 (2023).
Lee, T. B. Jr. et al. Biomarkers of hepatic dysfunction and cardiovascular risk. Curr. Cardiol. Rep. 25(12), 1783–1795 (2023).
Zihni, E. et al. Opening the black box of artificial intelligence for clinical decision support: A study predicting stroke outcome. PLoS One. 15(4), e0231166 (2020).
Loh, H. W. et al. Application of explainable artificial intelligence for healthcare: A systematic review of the last decade (2011–2022). Comput. Methods Programs Biomed. 226, 107161 (2022).
Ali, S. et al. The enlightening role of explainable artificial intelligence in medical & healthcare domains: A systematic literature review. Comput. Biol. Med. 166, 107555 (2023).
Deng, Y. et al. Combinatorial use of machine learning and logistic regression for predicting carotid plaque risk among 5.4 million adults with fatty liver disease receiving health check-ups: population-based cross-sectional study. JMIR Public. Health Surveill 9, e47095 (2023).
Chalasani, N. et al. The diagnosis and management of nonalcoholic fatty liver disease: Practice guidance from the American Association for the Study of Liver Diseases. Hepatology 67(1), 328–357 (2018).
Petzold, G. Role of ultrasound methods for the assessment of NAFLD. J. Clin. Med. 11(15), 4581 (2022).
Rinella, M. E. et al. A multi-society Delphi consensus statement on new fatty liver disease nomenclature. Hepatology 78(6), 1966–1986 (2023).
Johri, A. M. et al. Recommendations for the assessment of carotid arterial plaque by ultrasound for the characterization of atherosclerosis and evaluation of cardiovascular risk: from the American Society of Echocardiography. J. Am. Soc. Echocardiogr. 33(8), 917–933 (2020).
Zhao, L. et al. Understanding decision curve analysis in clinical prediction model research. Postgrad. Med. J. 100(1185), 512–515 (2024).
Piovani, D. et al. Optimizing clinical decision making with decision curve analysis: insights for clinical investigators. Healthcare 11(16), 2244 (2023).
Yu, X. et al. High NAFLD fibrosis score in non-alcoholic fatty liver disease as a predictor of carotid plaque development: a retrospective cohort study based on regular health check-up data in China. Ann. Med. 53(1), 1621–1631 (2021).
Zhang, Y. et al. Association between non-alcoholic fatty liver disease and silent carotid plaque in Chinese aged population: a cross-sectional study. Ann. Palliat. Med. 9(2), 182–189 (2020).
Xu, T. et al. CT-diagnosed non-alcoholic fatty liver disease as a risk predictor of symptomatic carotid plaque and cerebrovascular symptoms. Angiology. 33197241227501 (2024).
Tan, S. H. & Zhou, X. L. Early-stage non-alcoholic fatty liver disease in relation to atherosclerosis and inflammation. Clinics 78, 100301 (2023).
Adlung, L. et al. Machine learning in clinical decision-making. Medicine 2(6), 642–665 (2021).
Haug, C. J. & Drazen, J. M. Artificial intelligence and machine learning in clinical medicine, 2023. N. Engl. J. Med. 388(13), 1201–1208 (2023).
Aravkin, A. Y., Bottegal, G. & Pillonetto, G. Boosting as a kernel-based method. Mach. Learn. 108, 1951–1974 (2019).
Wei, Y. et al. Application of machine learning algorithms in predicting carotid artery plaques using routine health assessments. Front. Cardiovasc. Med. 11, 1454642 (2024).
Anik, M. I. et al. Role of reactive oxygen species in aging and age-related diseases: a review. ACS Appl. Bio Mater. https://doi.org/10.1021/acsabm.2c00411 (2022) (Epub ahead of print.).
Chen, J. Y. et al. Nitric oxide bioavailability dysfunction involves in atherosclerosis. Biomed. Pharmacother. 423–428. (2018).
Vesković, M. et al. The interconnection between hepatic insulin resistance and metabolic Dysfunction-associated steatotic liver disease-The transition from an adipocentric to liver-centric approach. Curr. Issues Mol. Biol. 45(11), 9084–9102 (2023).
Zhang, Y. et al. Endothelial function and arterial stiffness indexes in subjects with carotid plaque and carotid plaque length: A subgroup analysis showing the relationship with hypertension and diabetes. J. Stroke Cerebrovasc. Dis. 32(3), 106986 (2023).
Chen, J. et al. Risk factors for carotid plaque formation in type 2 diabetes mellitus. J. Transl Med. 22(1), 18 (2024).
Loperena, R. et al. Hypertension and increased endothelial mechanical stretch promote monocyte differentiation and activation: roles of STAT3, Interleukin 6 and hydrogen peroxide. Cardiovasc. Res. 114(11), 1547–1563 (2018).
Paraskevas, K. I. et al. Cholesterol, carotid artery disease and stroke: what the vascular specialist needs to know. Ann. Transl. Med. 8(19), 1265 (2020).
Jin, X. et al. Small, dense low-density lipoprotein-cholesterol and atherosclerosis: relationship and therapeutic strategies. Front Cardiovasc. Med. 8, 804214 (2022).
Ouimet, M., Barrett, T. J. & Fisher, E. A. HDL and reverse cholesterol transport. Circ. Res. 124(10), 1505–1518 (2019).
Song, M. J. & Choi, J. Y. Androgen dysfunction in non-alcoholic fatty liver disease: role of sex hormone binding globulin. Front. Endocrinol. 13, 1053709 (2022).
Cupido, A. J. et al. Low-density lipoprotein cholesterol attributable cardiovascular disease risk is sex specific. J. Am. Heart Assoc. 11(12), e024248 (2022).
Kang, H. et al. The entry and egress of monocytes in atherosclerosis: A biochemical and biomechanical driven process. Cardiovasc. Ther. 2021, 6642927 (2021).
Chen, J. et al. Advanced glycation end products measured by skin autofluorescence and subclinical cardiovascular disease: the Rotterdam Study. Cardiovasc. Diabetol. 22(1), 326 (2023).
Liu, F. et al. Multiparametric US for identifying metabolic dysfunction-associated steatohepatitis: A prospective multicenter study. Radiology 310(3), e232416 (2024).
Ponce-Bobadilla, A. V. et al. Practical guide to SHAP analysis: Explaining supervised machine learning model predictions in drug development. Clin. Transl Sci. 17(11), e70056 (2024).
Acknowledgements
We wish to thank the study participants for their cooperation and participation.
Funding
This study supported by the Medical Science Research Project of Hebei (No.20230505).
Author information
Authors and Affiliations
Contributions
Shu-Mei Zhai conceptualization, Methodology, Investigation, writing-original draft; Han Zhang collected clinical data and laboratory indicators; Xiao-Long Wang was responsible for data analysis and visualization; Yu-Qiang Zuo conceptualization, formal analysis, writing-review and editing. We confirmed than this manuscript has not been published elsewhere and is not under consideration in whole or in part by another journal. All authors have approved the manuscript and agree with submission.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Zhai, SM., Wang, XL., Zhang, H. et al. Predicting carotid plaques in metabolic dysfunction-associated steatotic liver disease using machine learning and SHAP interpretation. Sci Rep 15, 36074 (2025). https://doi.org/10.1038/s41598-025-19959-8
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-19959-8