STEMI-OP in-hospital mortality prediction algorithms: Frailty-integrated machine learning in older patients undergoing primary PCI

Nguyen, Tan Van; Nguyen, Quyen The; Nguyen, Huong Quynh; Nguyen, Nghia Thuong; Luong, Khai Duc; Do Thi, Lan Hoang; Nguyen, Tu Cam; Vo, Thuan Hoang; Le, Phan Huu; Tran, Phuc Thien; Le, Thanh Dinh

doi:10.1038/s41514-025-00238-9

Download PDF

Article
Open access
Published: 06 June 2025

STEMI-OP in-hospital mortality prediction algorithms: Frailty-integrated machine learning in older patients undergoing primary PCI

Tan Van Nguyen ORCID: orcid.org/0000-0002-0234-6596^1,2,
Quyen The Nguyen ORCID: orcid.org/0009-0006-0884-6707³,
Huong Quynh Nguyen⁴,
Nghia Thuong Nguyen⁵,
Khai Duc Luong^1,2,
Lan Hoang Do Thi^1,2,
Tu Cam Nguyen^1,2,
Thuan Hoang Vo^1,2,
Phan Huu Le^1,2,
Phuc Thien Tran^1,2 &
…
Thanh Dinh Le ORCID: orcid.org/0009-0009-3153-085X⁶

npj Aging volume 11, Article number: 48 (2025) Cite this article

2015 Accesses
1 Altmetric
Metrics details

Subjects

This article has been updated

Abstract

Despite advances in medical care, older patients with ST-elevation myocardial infarction (STEMI) undergoing primary percutaneous coronary intervention (PCI) currently face high in-hospital mortality rates. Traditional prognostic models, primarily developed in Caucasian populations with fewer older participants and using classical statistical approaches, may not perform well in Southeast Asian settings. This study explores the need for artificial intelligence-based risk assessment models—the STEMI-OP algorithms—designed explicitly for STEMI patients aged 60 and older following primary PCI in Vietnam. Machine learning (ML) models were developed and validated using pre- and post-PCI features, with advanced feature selection techniques to identify key predictors. SHapley Additive exPlanations and Causal Random Forests were employed to improve interpretability and causal relationships between features and outcomes, highlighting the key factors, including the Killip classification, the Clinical Frailty Scale, glucose levels, and creatinine levels in predicting in-hospital mortality. The CatBoost model with ElasticNet regression for pre-PCI prediction and the Random Forest model with Ridge regression post-PCI prediction demonstrated significantly superior performance compared to traditional risk scores, achieving AUC values of 92.16% and 95.10%, respectively, outperforming the GRACE 2.0 score (83.48%) and the CADILLAC score (87.01%). By incorporating frailty and employing advanced ML techniques, the STEMI-OP algorithms produced more precise, personalized risk assessments that could enhance clinical decision-making and improve outcomes for older STEMI patients undergoing primary PCI.

In-hospital risk stratification algorithm of Asian elderly patients

Article Open access 20 October 2022

Development and validation of a machine learning-based readmission risk prediction model for non-ST elevation myocardial infarction patients after percutaneous coronary intervention

Article Open access 11 June 2024

Predicting cardiovascular risk with hybrid ensemble learning and explainable AI

Article Open access 23 May 2025

Introduction

Despite decades of application of primary percutaneous coronary intervention (PCI), in-hospital mortality among older ST-elevation myocardial infarction (STEMI) patients remains a critical concern. A 20-year registry in Germany revealed that in-hospital mortality in women aged ≥75 years showed little change, from 25.1% in 2000 to 23.6% in 2019, and remained significantly higher than in younger patients¹. Similar data from Singapore indicated in-hospital mortality rates of 11.9% in older patients compared to 3.6% in younger cohorts following primary PCI².

Prognostic models predict patient outcomes, guide clinical decisions and improve care quality. However, significant gaps in the evidence underlying these models can reduce their accuracy and applicability in older patients following primary PCI. Regional differences in healthcare systems, patient demographics, and treatment approaches pose challenges for applying traditional models universally. Most widely recognized mortality risk scores such as GRACE 2.0, CADILLAC, TIMI, PAMI, and ALPHA were developed predominantly from Caucasian populations, with limited representation from Asian populations, necessitating local adaptation and validation, particularly in regions like Southeast Asia^3,4,5,6,7.

Older populations, often characterized by multiple comorbidities and higher risks of adverse outcomes, have been underrepresented in the clinical trials and registries used for model development. Consequently, traditional risk scores were developed based on cohorts primarily comprising younger and less clinically complex populations, which raised concerns about potential inaccuracies when these models are applied to older patient groups^3,4,5,6,7. Furthermore, a considerable number of younger STEMI patients were enrolled in the Southeast Asia validation cohorts used to develop these prognostic models. Notably, even the most recent machine learning (ML) models designed explicitly for STEMI patients undergoing primary PCI in Asia included a modest representation of older individuals within their cohorts^{8,9,10,11,12,13,14}. These limitations have further contributed to restricting the applicability of these models to older populations within this region.

Additionally, to our best-known knowledge, existing traditional and ML models lacked integration of geriatric-specific factors such as frailty, cognitive function, and functional status, which are critical in predicting outcomes in older patients^{3,4,5,6,7,8,9,10,11,12,13,14}. Frailty, commonly measured by the Clinical Frailty Scale (CFS)¹⁵, has been consistently associated with higher mortality in older patients following primary PCI^16,17,18,19. The exclusion of such factors from conventional models may lead to an incomplete risk assessment and underestimation of mortality in older populations. These gaps in evidence highlight the critical requirement of incorporating geriatric-specific factors to enhance the accuracy of mortality prediction models for older adults undergoing primary PCI in Southeast Asia.

Traditional models assume linear relationships between risk factors and mortality, which oversimplify the complexity of these interactions^3,4,5,6,7. ML algorithms offer a more nuanced approach, incorporating non-linear relationships and leveraging ensemble models that combine multiple algorithms to enhance prediction accuracy, reduce overfitting, and improve model robustness^20,21,22. To enhance interpretability, tree-based methods such as Random Forest offered feature importance scores, revealing the impact of each variable on the outcome²³. Yet, as noted by Lundberg et al., these metrics can sometimes be inconsistent²⁴. To overcome this limitation, the SHapley Additive exPlanation (SHAP) framework offers a more consistent and interpretable approach to Feature Importance by attributing each feature’s contribution to the model predictions using Shapley values. In addition, Causal Random Forests (CRF) have emerged as a powerful tool for exploring causal relationships between features and outcomes²⁵. Unlike traditional methods, CRF estimates heterogeneous treatment effects, identifying how different variables influence outcomes across diverse subgroups. This approach improves the understanding of causal mechanisms and enables the development of more robust and generalizable models.

By integrating SHAP, CRF, and accounting for frailty as a geriatric-specific factor, this study aims to build a transparent and accurate ML risk calculator—the STEMI-OP (STEMI-Older Persons) algorithms—for older patients undergoing primary PCI in Vietnam. This comprehensive strategy overcomes the limitations of traditional and previous ML models, enhancing clinical decision-making and improving patient outcomes.

Results

The baseline characteristics of patients across the four centers included in the cohort are detailed in Table 1. The data demonstrates significant inter-centre variability in several key clinical parameters, which merit further attention.

Table 1 Patient characteristics

Full size table

The prevalence of cardiac arrest occurring before or at admission was markedly higher in Centers 3 and 4 (~3.0%) compared to Centers 1 and 2 (around 1.0%). Center 3 also exhibited significantly lower systolic blood pressure than the other centers. A history of diabetes mellitus was observed in approximately 17.0% of patients in Centers 1 and 3, whereas this proportion increased substantially to around 35.0% in Centers 2 and 4.

The duration from symptom onset to hospital admission was notably shorter at Center 4, with a median of 4 hours, compared to a significantly longer median of 7–8 hours at the other three centers. This disparity highlights the unique distribution of PCI-capable centres in developing countries, such as Vietnam, which are predominantly located in central urban areas. Consequently, most patients in this cohort experience prolonged travel times from their first medical contact to a PCI-capable center.

Anterior STEMI was the predominant type in Centers 2 and 4, comprising approximately 55.0%–60.0% of cases, in contrast to Centers 1 and 3, accounting for ~45.0%. The incidence of ventricular fibrillation or ventricular tachycardia was disproportionately higher in Center 4 (10.9%). Left ventricular ejection fraction (LVEF) was comparable across Centers 1, 3, and 4, ranging between 42.0% and 45.0%, but was notably higher in Center 2, at 51.9%.

The use of ticagrelor was relatively limited in Centers 3 (26.7%) and 4 (1.7%), where clopidogrel remained the antiplatelet agent of choice. By contrast, Centers 1 and 2 demonstrated a significantly higher adoption of ticagrelor (~64.0%). These differences can be attributed to the timeline of patient recruitment; Centers 3 and 4 primarily enrolled patients in 2017–2018, when clopidogrel was widely prescribed in Vietnam. Following subsequent updates to guidelines highlighting ticagrelor’s superior efficacy over clopidogrel in the primary PCI setting, its use saw a significant increase in Centers 1 and 2.

Radial artery access was preferred in Centers 1 and 3 (over 90.0%), whereas femoral artery access was predominantly used in Centers 2 and 4 (93.0%–100.0%). Left main coronary artery involvement was observed in approximately 15.0% of cases in Centers 2 and 4, compared to around 5.0% in Centers 1 and 3.

These demographic and clinical variations among centers likely influenced in-hospital mortality rates, ranging from 10.0% to 12.0% across most centers but reaching a peak of 17.9% in Center 2.

Pre-PCI models

Feature selection

The performance metrics of various ML models using the complete set of features are outlined in Supplementary Table S1. The Catboost model demonstrated strong performance, with an accuracy of 90.67% and an AUC (Area Under the Curve) of 91.66%, indicating its effectiveness in differentiating diagnostic outcomes. In contrast, while the Random Forest model achieved an AUC of 91.88%, its lower sensitivity of 43.64% suggested challenges in identifying true positive cases. These results highlighted the need to improve model sensitivity without compromising accuracy or specificity. Feature selection methods help optimise models by reducing complexity and improving computational efficiency, making them more interpretable for clinical use.

The application of feature selection methods demonstrated that reducing the feature set to 10 still maintained robust model performance. Specifically, the CatBoost model, optimised with features selected through ElasticNet regression, achieved an improved AUC of 92.16%, an increased sensitivity of 59.74%, an enhanced G-Mean of 75.33%, and an F1 score of 90.21%. These models outperformed traditional risk-scoring systems, such as GRACE 2.0 (AUC of 83.48%) and CADILLAC (AUC of 87.01%), as shown in Table 2. DeLong’s test was conducted using CatBoost (ElasticNet) as the reference model to assess the statistical significance of these differences. The analysis confirmed that the improvements in AUC over all traditional scores were statistically significant (all p values < 0.0001), further supporting the superior discriminative performance of the proposed ML approach. The complete performance comparisons of pre-PCI ML models with various feature selection methods are presented in Supplementary Table S2.

Table 2 Pre-PCI models

Full size table

Key features selected by each method are detailed in Supplemental Table S3. Notably, the Killip class, the CFS, hemoglobin level, and heart rate were among the most chosen features, underscoring their importance across various predictive models.

Evaluation and interpretation of the top-performing model

The top-performing ML model underwent a comprehensive evaluation to ensure a thorough understanding of its predictive capabilities and the factors driving its performance. This evaluation included an analysis of feature importance metrics, SHAP values, calibration plots, and causal effect assessments to investigate the relationships between features and outcomes. The evaluation incorporated SHAP summaries, causal effect analyses, and dependence plots, providing detailed insights into how individual features influenced model predictions and highlighting the alignment between Feature Importance rankings and SHAP values. Additionally, the causal effect analyses elucidated the direct and indirect relationships between features and outcomes, reinforcing the robustness and interpretability of the model.

Feature importance

As illustrated in Fig. 1, the CatBoost model, refined through ElasticNet regression for feature selection, highlighted the CFS as the most pivotal predictor of patient outcomes. This was closely followed by glucose levels, the Killip class, systolic blood pressure, and the time interval from symptom onset to hospital admission, which emerged as other critical determinants influencing the model’s predictive accuracy. Additional variables, including heart rate, creatinine levels, hemoglobin levels, patient age, and ventricular tachycardia/fibrillation at admission, also demonstrated substantial significance, albeit with a lesser impact.

Performance of the top-ranking model

The comparative analysis of the CatBoost model with ElasticNet feature selection, the Logistic Regression model enhanced with RFE feature selection, and traditional risk models provides valuable insights into the balance between model complexity and interpretability in predictive modeling. Advanced ML models like CatBoost excel at capturing intricate, non-linear patterns in data, yielding superior predictive accuracy. However, these benefits come at the cost of reduced interpretability and greater computational demands compared to simpler models such as Logistic Regression.

Logistic Regression, particularly when paired with RFE feature selection, strikes a compelling balance by offering enhanced interpretability, computational efficiency, and competitive performance, making it well-suited for specific clinical applications. Among the Logistic Regression approaches, the RFE-selected model stood out as the top performer. Nonetheless, the CatBoost model with ElasticNet feature selection demonstrated the highest AUC, outperforming all other models, including the RFE-enhanced Logistic Regression.

Performance differences were apparent in several evaluation metrics, including ROC curves, calibration plots, and precision-recall (PR) curves. The CatBoost model exhibited the most reliable calibration (Fig. 2b), aligning closely with the ideal diagonal across most probability ranges, with only minor overestimations at higher probabilities. In contrast, the Logistic Regression model with RFE displayed moderate calibration, exhibiting a tendency to overestimate risk across all levels of observed probabilities.

The CatBoost model consistently maintained superior precision across all recall levels in the PR domain (Fig. 2c), particularly excelling in high-recall scenarios. These findings underscore the CatBoost model’s superior discrimination, calibration, and precision when applying ElasticNet feature selection, highlighting its potential for robust application in clinical settings.

SHapley additive exPlanations and causal random forests analysis

The CatBoost model, refined with features selected through ElasticNet regression, was evaluated using SHAP analysis (Figs. 3a and 4) and Causal Random Forests (Fig. 3b), highlighting their distinct yet complementary contributions to elucidating model predictions.

The SHAP summary plot (Fig. 3a) demonstrates individual features’ relative importance and contributions to the model’s predictions. Among these, the Killip class emerged as the most critical predictor, followed by the CFS, glucose, systolic blood pressure, and the time interval from symptom onset to hospital admission. Higher feature values (depicted in purple) were associated with an increased mortality risk, whereas lower values (shown in green) corresponded to a decreased risk. Other variables, including creatinine levels, heart rate, patient age, hemoglobin levels, and ventricular tachycardia/fibrillation at admission, also played significant roles. This visualisation encapsulates the magnitude and direction of each feature’s impact while capturing the complex, non-linear, and interactive effects that influence the model’s predictions.

SHAP dependence plots (Fig. 4) offer deeper insights into the relationships between individual features and their contributions to the model. These plots illustrate several critical patterns: a strong positive association of the Killip class and the CFS with mortality risk, non-linear glucose and creatinine level effects, and an inverted U-shaped impact of systolic blood pressure and heart rate. Furthermore, these plots underscore the critical role of timely admission, revealing a strong correlation between delayed admission and increased mortality risk.

The Causal Random Forests analysis (Fig. 3b) complements these findings by quantifying the direct impact of each feature on patient outcomes, thereby providing a causal perspective. The presence of ventricular tachycardia/fibrillation at admission exhibited the most substantial causal influence, followed by the Killip class, corroborating its prominent role in the SHAP analysis. Similarly, the CFS, glucose, and creatinine levels also displayed significant causal effects. This alignment reinforces the conclusions drawn from the SHAP analysis, affirming that the most influential features identified by SHAP also hold substantial causal relevance.

SHAP analysis explains the magnitude and direction of feature contributions to individual predictions, accounting for both linear and non-linear interactions. In contrast, causal effect analysis focuses on direct relationships between features and outcomes. The strong agreement between SHAP-identified influential features—such as the Killip class, the CFS, glucose, and creatinine levels—and their corresponding causal effects highlight these variables’ robustness and clinical significance. This synergy enhances the model’s interpretability, reliability, and clinical relevance, providing a comprehensive understanding of patient mortality risk.