Introduction

Aortic dissection is a life-threatening disease with an acute onset and high mortality rate1. Aortic dissection can be categorized into several different types based on its anatomical structure; the most dangerous type is acute type A aortic dissection (ATAAD), with untreated patients facing a mortality rate of approximately 1–2% per hour during the first 48 h2. There are many surgical methods for the treatment of ATAAD. Among them, total arch replacement combined with frozen elephant trunk (TAR + FET) is a growing surgical option worldwide because it is conducive to promoting good aortic remodeling and possible secondary distal repair3.

Because ATAAD patients have a serious condition, TAR + FET surgery carries considerable risk. Many aortic centers still have high rates of various postoperative complications, including mortality4. The content of the surgery and the patient’s preoperative condition have a great impact on the success of the surgery. The consensus statement from the International Aortic Arch Surgery Study Group (IAASSG) provided a detailed definition of the major adverse outcome (MAO) that occurred during hospitalization5.

The purpose of this study was to analyze the incidence of MAO during hospitalization after TAR + FET in ATAAD patients in our center, as well as to develop and validate a risk prediction model for postoperative MAO by combining multiple machine learning models.

Methods

Study design and patient selection

We retrospectively reviewed the patients who underwent surgically treatment From January 2018 to October 2023 at our hospital. In all patients, computed tomography was performed to confirm the diagnosis. To evaluate cardiac function, transthoracic echocardiography is used. Our study exclusion criteria were as follows: (1) inability to obtain relevant data (n = 34). (2) underwent ascending aorta replacement alone (n = 77). (3) underwent ascending aorta combinded with hemiarch replacement (n = 138). (4) underwent hybrid total arch repair (n = 92) (Hybrid total arch repair targets high-risk patients, introducing potential bias). Finally, we included 635 ATAAD patients, all of whom underwent ascending aortic replacement combined with TAR + FET, as shown in Supplementary Fig. 1. This study was approved by the Ethics Committee of Wuhan Union Hospital (Approval No. 0497, August 25, 2023). The privacy of all subjects has been adequately protected, with all data sourced from the hospital medical records system and obtained with written informed consent from the patients. This study adheres to the ethical principles outlined in the Declaration of Helsinki, published in 1975 by the World Medical Association.

Study endpoint

According to the IAASSG consensus statement, the study endpoints were defined as in-hospital MAO events corresponding to grade III complications or higher, which mainly included all-cause mortality within 30 days and adverse outcomes of the neurological system, cardiovascular system, respiratory system, renal system, gastrointestinal system and other systems. Table 1 presents the specific information of the group with MAO during hospitalization in this study.

Table 1 In-hospital major adverse outcomes.

Surgical techniques

The TAR + FET procedure at our facility involves the following steps: After the induction of general anesthesia, the right axillary artery was cannulated for cardiopulmonary bypass (CPB). Bilateral antegrade cerebral perfusion (ACP) was established via the right axillary artery and the left common carotid artery, followed by a median sternotomy. Once the temperature reaches 33 °C or ventricular fibrillation occurs, the ascending aorta is clamped, allowing aortic root procedures to proceed. Cooling continues to approximately 25 °C, at which point circulatory arrest is performed. During this arrest, bilateral ACP is established via the left common carotid artery, with clamping of the left subclavian, left common carotid, and innominate arteries. A stented elephant trunk is inserted into the descending aorta’s true lumen and anastomosed to the distal end of a four-branched graft (Maquet M00202175728APO). After anastomosis, air is removed from the descending aorta as needed, and blood perfusion to the lower body is initiated through the graft’s infusion limb, with the left subclavian artery anastomosed to one graft limb. CPB gradually returns to normal flow, and rewarming begins. The left common carotid artery is then anastomosed to the innominate artery, with proximal anastomosis completed during rewarming. After lung reventilation, the ascending aorta is reopened to restore cardiac perfusion. Cooling and rewarming are conducted gradually and uniformly. CPB is terminated once blood gas analysis results are satisfactory.

Development and validation of predictive models

During the organization of predictive model variables, we included all preoperative and surgical procedure-related data. Detailed variable information is presented in Supplementary Tables 1 and Supplementary Table 2. All data were randomly divided into a training set (80%) and a validation set (20%). To ensure both the predictive accuracy and the reliability of model validation, we provided the source code and set a fixed random seed6. The training and validation sets included data from 508 to 127 patients, respectively, with 128 cases (25.2%) and 32 cases (25.2%) of MAO in each set. In order to develop a risk prediction model with high stability, we integrated 10 machine learning algorithms based on the Leave-one-out Cross-validation (LOOCV) framework and generated 190 algorithm combinations. The integrative algorithms included elastic network, Lasso, Ridge, stepwise Cox, CoxBoost, random survival forest (RSF), survival support vector machine, partial least squares regression for Cox, generalized boosted regression modeling (GBM), and supervised principal components. The detailed algorithm combination is shown in Supplementary Table 3. For the integrated 190 algorithm combinations, Harrell’s concordance index (C-index) was calculated in both datasets, and the prediction model with the highest average C-index was considered the best.

To allow for the interpretation of our prediction model, we further used Shapley additive explanation (SHAP) to assess the relative contribution of the predictors in the model to each observation and finally took the average. Finally, the discriminant performance was evaluated by the area under the receiver operating characteristic curve (AUROC)7 and the area under the precision-recall curve (AUPRC)8. The calibration ability was evaluated by the calibration plot and Brier score9. The clinical utility was evaluated by the clinical decision curve10.

Statistical analysis

In the patient’s baseline characteristics, categorical variables were presented as frequencies and percentages, while continuous variables were reported as mean and standard deviation or as median and interquartile range. Categorical variables were analyzed using the chi-square test or Fisher’s exact test. For continuous variables that followed a normal distribution, an independent t-test was employed, while the Mann-Whitney U test was utilized for those that did not meet this assumption. Data analysis was performed using R software (version 4.4.1). All statistical tests were two-tailed, and a significance level of P < 0.05 was applied.

Results

Patient characteristics and outcomes

This study included 635 ATAAD patients who underwent TAR + FET, with 160 patients classified into the MAO group. The baseline clinical data of both groups are detailed in Supplementary Table 1. Compared to the non-MAO group, a higher proportion of patients in the MAO group had a history of chronic kidney disease (11.88% vs. 3.16%, P < 0.001), coronary heart disease (11.25% vs. 5.68%, P = 0.028), and prior open thoracotomy cardiovascular surgery (13.13% vs. 7.37%, P = 0.039). Additionally, the MAO group had more patients with mechanical ventilation at referral (3.75% vs. 0.63%, P = 0.010), altered mental status (7.50% vs. 1.47%, P < 0.001), and multiple malperfusion sites (27.5% vs. 15.78%, P = 0.002) prior to surgery. No significant differences were observed between the two groups in terms of age, gender, and height. Supplementary Table 2 summarizes preoperative examinations and combined procedural types for all patients. Notably, several examination indicators were significantly higher in the MAO group compared to the non-MAO group, including neutrophil count (79.70 ± 14.92 vs. 77.60 ± 13.81, P = 0.006), activated partial thromboplastin time (APTT) (33.57 ± 3.96 vs. 28.15 ± 4.35, P < 0.001), and ascending aorta diameter (45.70 ± 7.31 vs. 44.60 ± 8.03, P = 0.025). Conversely, the estimated glomerular filtration rate (eGFR) was lower in the MAO group (96.12 ± 35.60 vs. 101.42 ± 34.15, P = 0.028). Among the combined surgical types, the MAO group had a higher incidence of aortic root repair (12.50% vs. 6.53%, P = 0.025), Bentall procedures (33.13% vs. 18.74%, P < 0.001), and CABG (15.63% vs. 9.26%, P = 0.037).

Development and validation of predictive models

All data were randomly divided into a training set and a validation set. Table 2 demonstrates that there were no significant differences in clinical baseline data or examination results between the two groups. A machine learning integration procedure was performed on both datasets to develop a prediction model characterized by high accuracy and stability. We fitted a total of 190 algorithm combinations and calculated the C-index for each model. As shown in Fig. 1, the optimal model was the combination of RSF and GBM, achieving the highest average C-index of 0.919, with scores of 0.957 in the training set and 0.882 in the validation set.

Table 2 Baseline data for training and validation sets.
Fig. 1
figure 1

The integration procedure based on machine learning. A total of 190 algorithm combinations were constructed using the LOOCV framework, and the C index of each model was further calculated.

As shown in Fig. 2A, the error curve of the entire RSF model was analyzed based on the LOOCV framework, ensuring the stability of variable selection. Variables with importance scores greater than zero were retained. Subsequent GBM analysis of these selected variables revealed that the minimum residual value was achieved at 8640 iterations, as shown in Fig. 2B, confirming the optimal model. To elucidate the machine learning model and clarify the relative contributions of candidate variables to MAO prediction, we conducted SHAP analysis. All important variables identified by the RSF and GBM algorithms were ranked according to their mean SHAP feature importance. Only the top-ranked variables are displayed in the SHAP value ranking in Fig. 2C and Supplementary Fig. 2. To enhance the clinical applicability of our prediction model, we selected eleven variables with an average SHAP value greater than 0.1011: international normalized ratio (INR, Mean SHAP:0.273), creatine kinase-MB (CK-MB, Mean SHAP:0.252), D-dimer (Mean SHAP:0.195), direct bilirubin (Mean SHAP:0.177), hemoglobin (Mean SHAP:0.168), albumin (Mean SHAP:0.159), platelet count (Mean SHAP:0.145), total bilirubin (Mean SHAP:0.141), APTT (Mean SHAP:0.137), neutrophil count (Mean SHAP:0.126), and ascending aorta diameter (Mean SHAP:0.115). We retrained the GBM algorithm with these eleven variables to create a parsimonious model for predicting the risk of MAO following TAR + FET. Finally, we generated a parsimonious prediction model that can be accessed online through a browser for easy use (https://pmodel.shinyapps.io/pmodel/).

Fig. 2
figure 2

Construction of prediction model. (A) Error curve of RSF. (B) GBM reaches the minimum residual value at 8640 iterations. (C) Variables are ranked from large to small according to the average SHAP value in SHAP analysis. A SHAP value greater than 0 indicates that the result is more likely to occur due to the predicted value. Only the top-ranked variables are shown in the figure. CK-MB, Creatine kinase-MB; eGFR, estimated glomerular filtration rate; APTT, activated partial thromboplastin time; INR, International Normalized Ratio.

We further performed multiple validations on the complete model and the simplified model in the two datasets to analyze the accuracy and clinical application benefits of the two models. As shown in Fig. 3A, both models demonstrated high discriminative ability across both datasets. Even the simplified model, which had the lowest AUROC, achieved a value of 0.851 in the validation set. They also exhibited sufficient accuracy, with the simplified model achieving an AUPRC of 0.703 in the validation set, as shown in Fig. 3B. As shown in Fig. 3C, the calibration effects of the two models were good, with the simplified model having Brier values of 0.124 and 0.138 in the training and validation sets, respectively. Finally, the clinical application benefits of the two models were good, with the risk threshold probability of the simplified model being 0.2–0.9, as shown in Fig. 3D.

Fig. 3
figure 3

Performance of the complete model and the simplified model in the training and validation set. (A) AUROC of the two models. (B) AUPRC of the two models. (C) Calibration graphs of the two models. (D) Decision curves of the two models. , full risk model in training set; #, full risk model in validation set; †, parsimonious risk model in training set; ‡, parsimonious risk model in validation set; AUROC, area under the receiver operating characteristic curve; AUPRC, area under the precision-recall curve.

Finally, Supplementary Fig. 3 displays the risk scores calculated for each patient in both datasets using the constructed simplified risk prediction model. The patients were divided into high-risk and low-risk groups based on the optimal cutpoint determined by the “surv_cutpoint” function from the survminer package in R. This function identifies the cutpoint that best separates the survival outcomes by employing maximally selected rank statistics. It can be observed that in both the training set and the validation set, the incidence of MAO after TAR + FET in the high-risk group was significantly higher than that in the low-risk group (P < 0.001).

Discussion

Despite advancements in surgical techniques and medical equipment, the morbidity and mortality rates for emergency surgery in Type A aortic dissection remain high, ranging from approximately 4–26%12. Postoperative complications in ATAAD often involve multiple organ damage, significantly impacting patients’ recovery quality and long-term survival13. Furthermore, studies have shown that MAO patients have significantly lower survival rates compared to non-MAO patients14. Therefore, this study aimed to develop and validate a new machine learning prediction model that, by using demographic data, preoperative laboratory tests, and imaging results, identifies high-risk populations following TAR + FET surgery, providing valuable guidance for the prevention of postoperative adverse events and clinical decision-making.

Machine learning studies for predicting postoperative complications in ATAAD patients remain limited. Lin Feng et al. applied six algorithms, achieving AUC values approximately 0.72 in both training and validation sets with stable validation results15. In contrast, our study constructed 190 algorithms, selecting the most predictive model. Even in the simplified version, our model achieved AUC values above 0.85 for both sets, indicating higher predictive accuracy. Unlike traditional regression models, complex machine learning models exhibit a ‘black box effect,’ providing more accurate predictions but challenging interpretability16. To address this, we applied SHAP analysis for visual interpretation of the best model, a combination of RSF and GBM. As a post-hoc explanation method, SHAP incorporates all feature variables, outputting SHAP values that quantify each feature’s impact on the model17. During the variable selection phase, we considered narrowing the variable range to improve clinical applicability. However, we found that reducing the model to only 4–5 variables compromised its representativeness, and focusing solely on a few laboratory indicators might reduce its clinical relevance. Additionally, considering the potential bias associated with arbitrary variable reduction and the sample size of this model, we ultimately selected variables with an average SHAP value greater than 0.10. Predictive models intuitively explain to clinicians which patient characteristics are more likely to lead to outcome events. This helps clinicians better understand the decision-making process for assessing disease severity and implementing personalized prevention. However, the complexity of diseases in actual clinical settings may make it difficult for models to adequately predict outcomes. In the future, it may be possible to improve models’ explanatory and persuasive power by adopting dynamic models of longitudinal studies.

Coagulation dysfunction in aortic dissection patients often leads to increased intraoperative blood loss and transfusion requirements, extending surgical duration and potentially resulting in intra- or postoperative coagulopathy, such as disseminated intravascular coagulation18. Hao et al. further demonstrated that INR is significantly associated with stroke risk in TAR + FET surgery patients and can serve as an independent predictor for postoperative stroke19. D-dimer levels rise within six hours after aortic dissection onset, peaking at 24 h, with sensitivity exceeding 90% within ten days20. D-dimer is thus widely used as a diagnostic marker to exclude ATAAD and for early risk stratification21. Additionally, it is significantly linked to severe postoperative complications and in-hospital mortality in ATAAD patients22.

As a biomarker of myocardial injury, elevated CK-MB levels may indicate coronary artery involvement in ATAAD23, leading to impaired arterial perfusion and a significantly higher incidence of MAO following dissection repair. This finding is consistent with previous studies24. In our research, serum bilirubin, hemoglobin levels, and various inflammatory markers were also identified as predictors of MAO after aortic dissection surgery. This suggests that preoperative comorbidities, such as anemia25, chronic liver disease26, and chronic kidney disease, may increase the risk of postoperative adverse events. These findings highlight the importance of risk stratification and personalized preoperative management for high-risk populations.

Postoperative outcomes for patients with aortic dissection are often complex and difficult to predict27. While simplified models offer greater clinical practicality, they may also compromise predictive performance. Currently, AI-based predictive models require large-scale patient datasets for training to surpass the accuracy of traditional scoring systems28. Given the high-risk nature of aortic dissection surgery, although our predictive model has a relatively smaller sample size compared to traditional risk prediction tools (such as the German Registry for Acute Type A Aortic Dissection score, Additive EuroSCORE, Logistic EuroSCORE, Parsonnet score, and the Cleveland score)29, this AI algorithm-driven risk model offers a novel perspective and a more effective approach to achieving superior predictive accuracy, discriminative power, and clinical utility. Furthermore, this model provides new insights for surgeons to preoperatively assess the surgical risk of ATAAD patients, facilitating the early identification of high-risk individuals and enabling targeted interventions.

Limitation

This study has several limitations. First, it is based on single-center data with a relatively small sample size. Multicenter studies with larger cohorts will be conducted in the future to further validate the model. Second, while preoperative variables were included, intraoperative and early postoperative variables, which could impact outcomes, were not considered, potentially affecting model accuracy. Third, although data completeness was prioritized, the sample size remains relatively small. Larger cohorts are needed to further validate the clinical utility of the model. Lastly, as the model relies on current clinical data, advancements in surgical techniques and instrumentation may necessitate future updates to maintain clinical relevance.

Conclusion

This study successfully developed and validated a predictive model combining RSF and GBM to assess postoperative complication risk in ATAAD patients undergoing TAR + FET. This model aids surgeons in early identification of high-risk patients, enabling personalized, targeted interventions and ultimately enhancing clinical outcomes. However, the model was constructed using only preoperative variables and did not include intraoperative or postoperative data.