Introduction

Peritoneal dialysis is a well-established renal replacement therapy (RRT) modality with economic and clinical advantages for end-stage renal disease(ESRD) patients1. Clinically, PD provides better preservation of residual kidney function, hemodynamic stability with fewer intradialytic blood pressure fluctuations. It also offers patients flexibility, enabling home-based therapy and lower healthcare costs than hemodialysis.

Despite advances in PD techniques, cardiovascular event remains a leading reason of morbidity and mortality among patients undergoing PD2.The risk of one-year MACE for these patients is more than five times greater than the patients without ESRD3,4. Therefore, identifying cardiovascular disease risk in this population can not only improve survival and quality of life, but also reduce treatment costs. The International Society for Peritoneal Dialysis (ISPD) guidelines emphasize the need for early risk stratification and targeted intervention5.

In medicine, the use of ML has been diverse, ranging from image interpretation to risk prediction6,7,8.With the exponential growth in large-scale medical data. ML algorithms are widely used in medical with the good predictive performance, which is more accurate than traditional statistical models in prediction of disease risk factors.

The study aimed to predict MACE in PD patients using ML algorithm. While prior studies focused on general CKD populations, this is the first to optimize ML models specifically for PD patients, incorporating time-dependent risk stratification (1-year vs. 5-year MACE).

Materials and methods

Ethical consideration

Our study was conducted in accordance with the Declaration of Helsinki. Due to the retrospective nature of the research, the Institutional Review Board of the Fujian Provincial Hospital waived the requirement for informed consent. This study protocol was reviewed and approved by Clinical Ethics Review Committee in Fujian Provincial Hospital (Approval No. 2022068), and all methods adhered to the relevant guidelines and regulations.

Study population

This retrospective study reviewed patients who underwent PD catheterization and continuous ambulatory peritoneal dialysis (CAPD) at the Department of Nephrology, Fujian Provincial Hospital and Fujian Provincial Jinshan Hospital from January 2010 to December 2016. Inclusion criteria:aged18 years and CAPD duration 3 months. Exclusion criteria: pregnancy, serious mental illness, serious lung disease, serious heart disease or congenital heart disease, high dose of glucocorticoid and immunosuppressive therapy during active autoimmune disease, or missing data > 10%. Initially, 1200 patients (715 from Fujian Provincial Hospital and 485 from Fujian Provincial Jinshan Hospital) were screened. After exclusions, 1144 patients (681 and 463 respectively) entered follow-up until February 2021, with 1,006 patients ultimately included in the analysis. A total of 606 patients at Fujian Provincial Hospital were enrolled to the model development cohort. At the same time, a total of 400 patients at Fujian Provincial Jinshan Hospital were enrolled as the external validation cohort. Part of this cohort has also been used in our previous study investigating heart failure hospitalization in PD patients9. Patient selection flowchart is shown in Fig. 1.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Flowchart of patient selection.

Data collection and measurements

The data were obtained from the electronic medical and laboratory systems of Fujian Provincial Hospital and Fujian Provincial Jinshan Hospital. We evaluated 86 baseline variables (demographics, comorbidities, vital signs, lab tests, imaging, and treatments) at PD initiation to ensure temporal precedence before MACE. All patients underwent transthoracic echocardiography to assess cardiac structure and function (left ventricular ejection fraction, left ventricular end-diastolic dimensions, interventricular septum thickness, left ventricular diastolic posterior wall thickness, left atrium, left ventricular mass index, and so on) and chest X-ray to evaluate cardiac enlargement and atherosclerosis. For patients with multiple test records, the first result within 3 months of PD initiation was selected as the baseline value. Missing values (< 10%) were imputed using Random Forest to maintain data integrity.

Definition of outcome

The primary outcome was major adverse cardiovascular events (MACE), assessed at three time points: overall follow-up, 1 year, and 5 years follow-up. MACE was defined as a composite of acute myocardial infarction, recurrent angina, cardiac arrest, stroke, all-cause mortality, and Cardiovascular-related hospitalizations (including heart failure, severe arrhythmia, unstable angina). Based on the timing of the first event, patients were categorised into the following groups: Overall MACE, patients who experienced any MACE during the entire follow-up period. 1-year MACE, patients who experienced MACE within the first year of follow-up. 5-year MACE, patients who experienced MACE within the first five years of follow-up. Details of overall MACE,1-year MACE and 5-year MACE are shown in the Suppl. Data.

Statistical analysis and machine learning modeling

Statistical analyses were performed using SPSS 22.0. Normally distributed continuous variables were expressed as mean ± SD, while non-normal variables were reported as medians or quartiles. Group comparisons used t-tests or Mann–Whitney U tests for continuous variables and χ2 tests for categorical variables, with significance set at p < 0.05. We assessed homogeneity of variance and conducted Pearson correlation analysis for significant variables, excluding those with VIF > 10 or intercorrelation > 0.5 to mitigate multicollinearity.

Machine learning modeling was performed using Python. The dataset included model development cohort (60%) and external validation cohort (40%). The model development cohort was further divided into training (80%) and internal validation (20%) subsets. We compared XGBoost, Random Forest and Adaboost models using five-fold cross-validation, selecting the model with the highest AUC for final evaluation. Key variables were ranked by their impact on outcomes. Both 1-year and 5-year MACE outcomes were assessed. Additional modeling details are provided in Suppl. 3. Data.

Sensitivity analysis using time-to-event models

To address follow-up heterogeneity and validate robustness, we performed Cox regression using the same predictors. Discrimination was assessed via C-index and time-dependent ROC curves at 1 and 5 years for comparison with the fixed-time machine learning models.

Code availability

The Python code used for data preprocessing, feature engineering, machine learning model development (XGBoost, Random Forest, AdaBoost), hyperparameter tuning, cross-validation, and evaluation is publicly available on [Github repository] (https://github.com/xuliping999/Papercode).

Results

Baseline characteristics

A total of 606 patients were included in the development cohort for modeling, 384 males and 222 females, mean age 52.6 ± 16.1 years. The median follow-up time was 49 months (IQR 16–69). Chronic glomerulonephritis as the primary factor (60.1%) led to ESRD, 25.4% of patients were diabetic kidney disease, and 1.8% of patients were hypertensive nephropathy. In 606 patients, the incidence of MACE was 409 patients (shown in Table1 and Suppl. Table 1). To clarify the composition of the primary outcome, the frequency of each MACE component in the development cohort is presented in Suppl.Table 2. Comparisons of characteristics between the model development cohort and external validation cohort revealed no statistically significant differences (p > 0.05). The Comparison of the two groups dataset is detailed in Suppl. Table 3.

Table 1 Baseline data table for MACE.

Risk prediction results of MACE at different timepoints

Risk prediction results of overall MACE (any time during follow-up)

In the model development cohort (n = 606), a total of 409 patients developed MACE. XGBoost, Random Forest and Adaboost were evaluated using AUC-ROC using fivefold cross-validation, and the ROC curves of model comparison were shown in Fig. 2A. The Random Forest achieved the highest AUC = 0.80 (95% CI 0.782–0.832) compared to XGBoost (0.76) and Adaboost (0.77). Finally, the Random Forest model was selected for modeling, and the average AUC value of the model validation set was 0.809. The top predictors were PTH and congestive heart failure (Fig. 3A).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

(A) The ROC curve for the random forest model for predicting MACE in the test set. (B) The ROC curve for the random forest model for predicting MACE at year 1 in the test set. (C) The ROC curve for the XGBoost model for predicting MACE at year 5 in the test set. AUC,area under the curve.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

The relative scales of model quantization are 0–0.08, 0–100, and 0–0.08 respectively, where 0 represents the lowest importance (the smallest decline in model accuracy after exclusion), and 0.08, 100, and 0.08 represent the highest importance (the largest decline in model accuracy after exclusion). (A) Ranking chart of the importance of MACE. (B) After 1 year follow-up, the importance ranking diagram of the MACE; C.After 5 years follow-up, the importance ranking chart of the MACE variables was achieved.

Risk prediction results of 1-year MACE

Among 114 patients with 1-year MACE, XGBoost, Random Forest and Adaboost were measured in terms of AUC-ROC using fivefold cross-validation. The XGBoost model outperformed others (AUC = 0.86 vs. 0.79 for Random Forest and 0.76 for Adaboost), and the ROC curves of model comparison were shown in Fig. 2B. Finally, the optimal model of XGBoost was selected for modeling, and the average AUC value of the model validation set was 0.810. Key risk factors included high-density lipoprotein cholesterol (HDL-C) and age (Fig. 3B).

Risk prediction results of 5-year MACE

For 331 patients with 5-year MACE, XGBoost, Random Forest and Adaboost were measured in terms of AUC-ROC using fivefold cross-validation, the Random Forest model showed the highest AUC (0.75 vs. 0.68 for XGBoost and 0.71 for AdaBoost), and the ROC curves of model comparison were shown in Fig. 2C. Finally, the optimal model of Random Forest was selected for modeling, and the average AUC value of the model validation set was 0.708. Age and creatinine were the dominant predictors (Fig. 3C).

At the same time, the performance of machine learning models in predicting MACE, 1-year MACE and 5-year MACE was presented in Suppl. Table 4.

Comparison of ML models with Cox regression

Sensitivity analysis using time-to-event models, Cox regression for overall MACE yielded a C-index of 0.71 (95% CI 0.68–0.74). Key predictors matched those from machine learning models. Time-dependent ROC analysis gave AUCs of 0.75 at 1 year and 0.77 at 5 years, comparable to fixed-time models (1-year AUC 0.76, 5-year AUC 0.74).

Model performance on the external validation cohort

The external validation cohort included 400 PD patients from Fujian Provincial Jinshan Hospital, 237 of 400 patients developed MACE (59%). Machine learning model performance was again evaluated in this external validation cohort. After plotting the verified survival curve, we found that the correction curve of the model is close to the ideal curve. The external validation curve was presented in Supple Fig. 1.The AUC for the external validation curve was 0.768 with a 95% CI of (0.724, 0.816). It demonstrated a sensitivity of 0.699, specificity of 0.695, accuracy of 0.696, PPV 0.684, NPV 0.712, F1-score 0.73. The model prediction results are in good agreement with the actual observation results, indicating that the optimized clinical prediction models have a satisfactory performance.

Discussion

Accurately predicting which PD patients are at high risk of deteriorating into MACE is essential due to their frequent comorbidities. Many studies emphasized on building prediction tools for CKD patients, predicting cardiovascular events (AUC = 0.60 to 0.74), renal failure (AUC = 0.79 to 0.84), and all-cause mortality (AUC = 0.70 to 0.82)10. Some studies focused on predicting ESRD events in type 2 diabetes patients for 5-year risk with AUC ranging from 0.86 to 0.9211,12,13,14. In our study, we developed and validated machine learning models to predict the risks of MACE in PD patients. To contextualize our model’s performance, we benchmarked it against published risk tools for cardiovascular events in CKD/dialysis populations. These comparisons confirm that our ML approach offers superior discrimination over conventional regression-based tools and general CKD risk scores.

Age emerged as an independent risk factor for MACE in PD patients in our ML analysis. PD is considered a patient-centered requiring adequate self-care capacity,which may decline with advancing age15,16. But the widespread adoption of assisted PD programmes has substantially broadened treatment eligibility for elderly and physically frail individuals. In our cohort, assisted PD was not systematically implemented, and chronological age remained a strong predictor of MACE. Aging is associated with progressive structural and functional cardiac changes, further elevating MACE risk. Integrating such programmes and evaluating their cardiovascular benefit in older PD patients warrant further study.

PD patients often present with substantial comorbidity. The Charlson Comorbidity Index (CCI)17 is most suitable for evaluating complication of PD patients. CCI remains the most widely used for quantifying comorbidity burden in this setting. Among the conditions captured by the CCI, diabetes mellitus, heart failure and cerebrovascular disease were the three most frequent diseases in our cohort. In our study, the history of congestive heart failure emerged as a particularly important predictor of MACE, consistent with previous reports. Notably, a CCI score ≥ 5 has been identified as the optimal threshold for predicting cardiovascular hospitalisation and all-cause hospitalisation in PD patients18, underscoring the clinical utility of comorbidity assessment for risk stratification and resource allocation.

Our study revealed significant differences in calcium and PTH levels between the two groups(p < 0.05). Both play a crucial role in vascular calcification progression. Bryan Kestenbaum et al.19 found that hyperphosphatemia and secondary hyperparathyroidism were associated with HF, myocardial infarction and all-cause death in CKD patients. Additionally, hyperphosphatemia has been established as an independent risk factor for CVD and mortality in CKD patients20,21,22.However, unlike prior studies, our analysis did not detect a statistically significant association between serum phosphorus and MACE, possibly due to limited sample size.

The ML algorithm in our study also shows that BMI is an independent risk factor for Mace in PD patients. KAYA et al.23 found that BMI was negatively correlated with dialysis adequacy, while the risk of cardiovascular and cerebrovascular diseases, anemia and other complications were significantly increased in patients with inadequate dialysis. In PD patients, BMI increased, nutritional status improved, but with increasing body surface area, the adequacy of PD decreased, and the benefit of increasing BMI was offset.

Hypertriglyceridemia and HDL-C are important factors of MACE in PD patients. In our study, the ML models in predict the risk factors of MACE are consistent with relevant reports. CKD patients are often complicated with lipid metabolism disorders, hyperlipidemia, atherosclerosis, cardiovascular and cerebrovascular diseases24.Among the cardiovascular risk factors, lipid abnormalities dominate the high mortality in PD patients.

Hypoalbuminemia was identified as a key risk factor of cardiovascular diseases and infection in PD patients, a finding consistent with our ML model results. Mehrotra et al. 25showed that blood albumin is an independent predictor of death in PD patients. USRDS data further indicate that in ESRD patients, each 10 g/L decrease in serum albumin increases cardiovascular mortality risk by 39%, with malnourished patients facing a 27% higher cardiovascular death risk compared to well-nourished individuals26.

EGFR and urine volume are key indicators of residual renal function in PD patients. Previous studies suggest that low urine volume is associated with fluid overload, contributing to myocardial hypertrophy, cardiac enlargement, and functional decline 27. MACE patients had significantly lower daily urine output than non-MACE patients (754.4 ± 510.3 mL/d vs. 917.6 ± 500.9 mL/d), confirming urine volume’s role as a predictive risk factor in our ML model. Serum creatinine emerged as a key predictor of 5-year MACE in our models, likely reflecting its dual role as both an inverse indicator of residual renal function and a marker of dialysis adequacy. The observed 68% MACE incidence in our cohort is higher than in several previous PD studies, a difference largely attributable to our broad composite definition. Notably, cardiovascular hospitalisation accounted for 70% of all events, and among these, heart failure related to fluid overload is a frequent and PD-specific complication. Inadequate ultrafiltration or poor adherence to fluid restriction can precipitate acute decompensated heart failure, leading to hospital admission even in the absence of atherosclerotic events.

Hypertension represents a significant risk factor for elevated cardiovascular morbidity and mortality in dialysis patients 28. According to the ISPD Cardiovascular and Metabolic Guidelines, a blood pressure target of < 140/90 mmHg is recommended for PD patients with persistent hypertension29.Hypertension was present in 91.4% of our cohort, a figure slightly lower than some other PD cohorts. This modest difference likely reflects the use of baseline data collected at dialysis initiation, when residual renal function is better preserved and volume overload less severe, both factors that may attenuate hypertension. Additionally, retrospective underdiagnosis cannot be entirely excluded. Nonetheless, the observed prevalence remains high and aligns with registry data, reinforcing hypertension as a critical therapeutic target in this population.

This study developed ML models to predict MACE in PD patients using baseline clinical and comorbidity data. Notably, our analysis revealed that the risk factors like albumin were less influential than age, BMI, and PTH in predicting MACE of PD patients.The top-performing models (Random Forest and XGBoost) consistently identified age as the most significant factor across both short-term (1-year) and long-term (5-year) predictions, with HDL-C being particularly important for short-term risk and dialysis adequacy markers (creatinine, eGFR) for long-term outcomes. The time-dependent importance of these factors (more predictive in 5-year vs 1-year models) underscores the need for longitudinal monitoring. Notably, PTH’s role in calcium-phosphorus metabolism proved significant for both timeframes. PTH’s strong association supports current KDIGO guidelines targeting levels < 300 pg/mL, while HDL-C's protective effect suggests lipid management remains crucial despite ESRD.The clinical value of our findings extends beyond prediction accuracy to identifying key modifiable risk factors. By integrating these models into clinical workflows, clinicians could enable timely interventions to improve patient outcomes and reduce healthcare costs. Future research should focus on implementing these predictive tools in real-world settings and developing targeted management strategies for high-risk patients.

Study limitations

Although this analysis provides major strengths, most limitations caused by the retrospective should be recognized. First, this is a retrospective study, some clinical data collection is not comprehensive, this might overlook potential predictors. Prospective trials are needed to validate these monitoring protocols and evaluate cost-effectiveness. Second, in our 5 years MACE analysis, the median follow-up time was 49 months. Some patients had follow-up < 5 years. While we excluded these patients from the 5-year control group to avoid misclassification, this may reduce statistical power. Prospective follow-up of this patient population will be performed in the future.

Conclusion

This study was specifically designed to identify and characterize significant risk factors for MACE in PD patients through advanced machine learning analysis. By employing rigorous analytical methods on a comprehensive dataset of PD patients, we have systematically evaluated potential risk factors and established their relative importance in MACE development. Timely intervention targeting MACE risk factors may prolong survival and reduce complications.