Introduction

The recent increase in the incidence of coronary heart disease (CHD) has led to the critical state of global disease burden. Thus, reducing the global burden of CHD remains a public health priority1. Studies have explored hypertension, hyperlipidemia, hyperglycemia, smoking, drinking, and obesity as risk factors for CHD2. However, the association between klotho and CHD has not been investigated in detail3,4 probably because most studies have used traditional statistical analyses and because klotho biomarkers have not been widely applied in routine clinical practice. Therefore, a new analytical approach might be able to more accurately determine an association between CHD and klotho.

Klotho affects several metabolic pathways that are essential for the pathogenesis and prevention of cardiovascular diseases. It inhibits lipid peroxidation and inflammation and prevents endothelial injury and vascular calcification5. Some studies have found that klotho protein values are independent risk factors for CHD and have clinical value for predicting this disease6. However, the prediction of CHD onset using klotho has not been thoroughly investigated.

Machine learning (ML) broadly refers to the process of fitting predictive models to data or identifying informative groups within data. The aim of ML is to approximate or imitate the ability of humans to objectively recognize patterns using computation. ML is applied in almost all fields of biology7. We aimed to construct a prediction tool by analyzing klotho levels in patients with CHD, as well as values of clinically relevant indicators to predict the risk of CHD using ML.

Results

We assigned the 11,583 participants into groups with (n = 1074) and without (n = 10,509) CHD (Table 1). Figure 1 shows a flowchart of the participants through the study. Various CHD samples showed significant differences for all variables except alcohol consumption (p < 0.05). We entered 13 potential predictive variables into the feature screening process of the Least Absolute Shrinkage and Selection Operator regression. This statistical approach uses dummy variable processing to process 13 variables into 20 variables. The initial lambda value was determined by 5-fold cross-validation, and 13 initial variables were selected (Figure S1). These comprised education, marital status, race, poverty impact ratio, sex, body mass index, smoking, alcohol consumption, diabetes mellitus, hypertension, hyperlipidemia, age, and klotho levels. These variables were randomly assigned to training and test sets at a ratio of 70:30, and the training set was resampled. After hyperparameter optimization, the logistic regression (LR), neural network (NNet), random forest (RF), support vector machine (SVM), and eXtreme Gradient Boosting (XGBoost) models were used to construct the model in the training set. LR is a traditional multivariate statistical learning model that predicts class output, which can be either “0” or “1.” LR is a popular ML algorithm, and it is grouped under supervised learning techniques8. Thus, we considered using LR to benchmark the predictive performance of the other ML models. The area under receiver operating characteristic (ROC) curves (AUC) values of RF, XGBoost, LR, SVM, and NNET in the test set were 0.7394, 0.7460, 0.7807, 0.7841, and 0.6957, respectively (Table 2). Table 2 shows that the SVM model outperformed the RF, XGBoost, and NNET models in terms of discrimination. The LR model had similarly good discrimination but was significantly less calibrated than the SVM model (Brier score: 0.4441 vs. 0.1847). The RF model outperformed the XGBoost and SVM models in terms of discrimination. Among the five ML models, the Brier scores of the two prediction sets (RF and SVM) were < 0.25. The SVM had a higher AUC (0.7841); better precision (0.9669); a Brier score of 0.1847; and lower accuracy (0.7009), recall (0.6942), and F1 (0.8091) scores than the RF model. Similarly, the Random Forest model also outperforms the LR. The ROC curves for the models in Fig. 2 show that the SVM had the largest AUC. We visualized the relationships between the important features (klotho) and CHD by fitting the restricted cubic splines (RCS) into a binary LR and then assessing the potential nonlinear relationships. Figure 3 shows that klotho did not have significant nonlinearity, thus suggesting a mainly negative linear relationship with the dependent variable. Figure 3 shows that the cutoff was 787.2 (odds ratio: 1), and the risk of CHD increased when the value was < 787.2. Figure 4a presents the restricted cubic spline (RCS) analysis of the relationship between age and Klotho levels in the coronary heart disease (CHD) group. The model demonstrated no significant nonlinear relationship (p > 0.01), with Klotho levels showing an overall declining trend as age increased. Figure 4b illustrates the RCS analysis for the non-CHD group, revealing a statistically significant nonlinear association (p < 0.01) between age and Klotho levels. A distinct inflection point was identified, suggesting a threshold effect. Notably, Klotho levels exhibited a marked acceleration in decline beyond the age of 65, highlighting a critical age-dependent reduction in this population. Figure 5 shows the importance of ranking the variables in the RF model. Additionally, we conducted a SHAP (SHapley Additive exPlanations) analysis to elucidate feature contributions in the RF model, revealing key drivers of its predictive performance and enhancing the interpretability of complex decision pathways (Figure S2).

Table 1 Characteristics of the study participants with and without CHD from 2007–2016 in the US NHANES.
Fig. 1
figure 1

Flowchart of patients through the study. LR, logistic regression; Nnet, neural network; RF, random forest; SVM, support vector machine; XGBoost, eXtreme Gradient Boosting; ACC,accuracy; AUC, area under the receiver operator characteristic curve.

Table 2 Evaluation of models.
Fig. 2
figure 2

Receiver operator characteristic curves of the five models.

Fig. 3
figure 3

Analysis of klotho and CHD using RCS. CHD, coronary heart disease; RCS, restricted cubic splines.

Fig. 4
figure 4

Analysis of Age and klotho using RCS. CHD, coronary heart disease; RCS, restricted cubic splines.

Fig. 5
figure 5

Important Features for the Prediction of CHD. CHD, coronary heart disease; PIR, ratio of family income to poverty; BMI, Body Mass Index; DM, diabetes mellitus.

Discussion

The number of deaths due to heart disease peaked during the mid-1960s, followed by a marked decline that continues today9. Nonetheless, CHD remains a major public health challenge that contributes to global disease burden. Despite improvements in preventive methods and treatments, CHD still imposes immense health and economic burdens in the United States and around the world10. Improvements at the diagnostic level can reduce this prevalence.

Klotho is an aging-suppressor gene11, which has been found to predict the risk of developing cardiovascular disease12,13. Athersclerosis is the key pathological factor of coronary heart disease.According to Wang K et al. in their study14,15,16,17, klotho is regarded as a cardiovascular protective factor because of its anti-atherosclerosis,anti-inflammatory and antioxidant effects,thus providing more theoretical basis for the prevention and treatment of coronary heart disease. Serum klotho concentrations display a consistent negative association with cardiovascular disease18,19. In recent years,the method of combing nanomaterials to deliver klotho protein genes has been used as a treatment for other diseases associated with klotho.The study proves that upregulation of klotho gene may be a possible treatment and prevention method for aging-related complications20,21,22, and it is likely to be used in the cardiovascular field in the future. Although supplementing Klotho protein holds potential therapeutic value, its side effects and safety issues still require in-depth research. For example, exogenous Klotho protein might be recognized by the immune system as a foreign substance, triggering immune responses that could lead to allergies or inflammation. Additionally, Klotho protein interacts with FGF2323, and excessive levels could disrupt the balance of phosphate and calcium metabolism. There are ongoing studies exploring the use of nanoparticle technology to deliver Klotho protein precisely to target tissues20,21,22, but these remain at the animal research stage, and further studies are needed before clinical application can be realized. Future studies should focus on targeted delivery, dose optimization, and clinical trials to ensure its safety and efficacy.

ML is a branch of artificial intelligence (AI) that focuses on how computers learn patterns and rules from data to predict classify, decide, and other tasks. Among AI tools, ML is poised to potentially extend and augment the effectiveness of clinicians and revolutionize patient care24. ML algorithms are extremely useful for predicting cardiovascular diseases25 and have previously been combined with single or serial high-sensitivity cardiac troponin values to determine the probability of acute myocardial infarction26.

To our knowledge, an ML algorithm to predict CHD based on klotho levels has not yet been created. We used NHANES dataset with large data volume and rich clinical information, and combined with the characteristics of machine learning to process big data to increase the reliability of the model. We investigated data about indicators collected between 2007 and 2016 in National Health and Nutrition Examination Survey (NHANES) and used them as ML features to construct models.

When using klotho as a biomarker, machine learning (ML) models for CHD has higher accuracy in prediction compared to conventional statistical techniques. Random Forest is a powerful and versatile machine learning algorithm that leverages the strength of multiple decision trees to produce accurate and robust27. The AUC showed that the model had good discrimination performance.In clinical settings,prioritize recall,F1 score,or Brier score over AUC alone. A low F1score indicates that the model fails to balance precision and recall. A model with high AUC but low recall and F1 score may pose a risk of missed diagnoses in clinical applications,affecting its reliability and practicality. Further, the Brier score of the RF model was < 0.25, thus indicating good predictive accuracy. The RF model performed better than the LR model, except for the AUC and precision. Therefore,considering the six evuluation indexes comprehensinely,RF is the optimal model. Figure 5 shows the importance ranking of different features in building the RF model. Furthermore, the top three features in the RF tree model were age, klotho, and hypertension (Fig. 5). Klotho was second only to age. This indicated that klotho plays an important role in the construction of ML models and prediction of CHD onset. Our restricted cubic spline (RCS) analysis revealed that the age-dependent decline in klotho levels among coronary heart disease (CHD) patients aged over 65 was slower compared to non-CHD individuals (Fig. 4). This finding aligns with the conclusions of PAtlanta RS et al., whose study reported higher serum Klotho concentrations in older adults with a history of myocardial infarction28, suggesting that disease status may decelerate age-related Klotho depletion. Whether this reflects disease-related compensatory mechanisms or is influenced by pharmacological interventions or comorbidities prevalent in older adults requires further investigation. Nevertheless, targeted monitoring of Klotho levels in high-risk age groups and exploring therapeutic time windows for Klotho replacement therapy hold clinical relevance. The discrepancy in feature importance rankings between Fig. 5 (based on Gini index) and Figure S2 (based on SHAP values) stems from fundamental differences in their evaluation perspectives and underlying logic. The Gini importance metric primarily reflects features’ contributions to improving classification accuracy during node splitting in tree-based models, whereas SHAP values quantify the direct and localized impact of features on individual model predictions, accounting for both main effects and interactions. Regarding the discrepancy observed in the ROC curves of the NNET model compared to other models in Fig. 2, we attribute this to the neural network’s heightened sensitivity to class imbalance. Despite implementing balancing strategies (e.g., SMOTE oversampling), its generalization capability may still slightly underperform ensemble methods such as random forests in handling minority-class predictions.

This study has some limitations. Most predictors included in this study were based on self-reports by participants, which could be subject to cognitive deficits and/or recall bias. We divided the entire dataset into training and validation sets for internal authentication, but we did not have external cohort studies to validate the performance of the prediction tool. The study results should be validated using external datasets, particularly those from other continents. The establishment of web is more convenient for clinical application and visualization of black-box machine learning models. We did not develop a web-based tool or mobile application on the basis of feature importance. We plan to address these issues in the future.

In conclusion, ML models based on serum klotho levels can predict CHD onset. Among the five ML models, RF showed the best predictive performance. We will continue to conduct external verification in subsequent studies.

Methods

Data sources and study population

This cross-sectional study analyzed nationally representative data from the US National Center for Health Statistics NHANES, which has continued in two-year cycles since 1999 to monitor the health and nutritional status of the US population. All NHANES protocols were approved by the National Center for Health Statistics Ethics Review Board, and written informed consent was obtained from participants in NHANES. Informed consent was not required for the present study because the information was downloaded from NHANES, which is an open secondary data source. As far as we are aware, no patients were involved in the planning, selection, or execution of NHANES.

Data from 50,588 persons who participated in five cycles of NHANES from 2007 to 2008 and 2015 to 2016 were selected. Among them, those aged < 40 years (n = 31,244) or had no serum klotho values (n = 5580) or clinically relevant indicators (n = 2181) were excluded from the analysis. Thus, we analyzed data from 11,583 persons.

Measurement of serum klotho

Samples on dry ice were inspected by personnel at the receiving area of the laboratory and then scanned. The data were compared with information on an accompanying electronic manifest and entered into the laboratory information system. The samples were stored at − 80 °C until defined batches were provided to technicians daily for analyses.

Outcome ascertainment

We identified CHD in questionnaire data reported by participants in NHANES. Experienced physicians subsequently diagnosed CHD using codes I00-09, I11, I13, and I20-51, which were determined according to the International Statistical Classification of Diseases and Related Health Problems (10th revision).

Feature selection and ML

Several features pooled in the model comprised clinically relevant indicators and serum klotho levels. These relevant features were driven to the model on the basis of domain knowledge and a literature search to obtain an impression of predictors associated with major adverse cardiovascular events that were determined by traditional statistics. We used the ML algorithms LR, NNet, RF, SVM, and XGBoost to construct a classification model (see Supplemental Information for detailed procedure).

Statistical analysis

Non-normally distributed continuous and categorical variables are respectively presented as medians with interquartile ranges and as ratios (%). Continuous and categorical variables were analyzed using the Mann–Whitney U test and chi-square test, respectively.

We evaluated ML model discrimination by calculating the AUCs of sensitivity vs. Brier scores, specificity, accuracy, precision, recall, and F1. The AUC is a concordance index or C-statistic that reflects the probability that a risk score would be higher for a randomly selected person who experienced an event than one who did not. It is measured on a scale from 0.5 (a “coin flip”) to 1 (perfect discrimination)29. The ROC curve was drawn from true and false positive rates on y and x axes, respectively. Brier scores combine the differentiation and calibration of the model and was used to evaluate the overall performance of the model; scores closer to zero indicated that the predicted values were closer to the actual values30. The F1 score is an ML parameter that determines the performance of a model by combining its precision and recall scores. The following formula shows how the metrics were calculated:

  • Accuracy = (TP + TN)/(TP + TN + FP + FN)

  • Precision = TP/(TP + FP)

  • Recall = TP/(TP + FN)

  • F1 score = (2  precision  recall)/(precision + recall)

RCSs effectively capture nonlinear relationships in data through piecewise fitting and smooth transitions, while also reducing the risk of overfitting. Their flexibility and broad applicability make them a powerful tool for handling nonlinear data. RCSs are used to model the relationship between important continuous variable and CHD, particularly in regression analysis, because it allows the capture of nonlinear relationships and avoidance of overfitting while maintaining smoothness.

All data were statistically analyzed using SPSSAU software (SPSSAU, Haidian, China) or R 4.4.0 (R Foundation for Statistical Computing, Vienna, Austria).