Introduction

Healthcare-associated infections (HAIs) pose a significant threat to patient care and safety, affecting over 7% of patients globally and resulting in 37,000 deaths annually in Europe and 99,000 in the United States. They also impose substantial healthcare costs1. Among HAIs, device-associated infections represent the most severe risk to intensive care unit (ICU) patients’ life expectancy2. Common examples include ventilator-associated pneumonia (VAP), central line-associated bloodstream infections (CLABSIs), and catheter-associated urinary tract infections (CAUTIs). The incidence of such infections reaches up to 14.7% in countries like China, Korea, and other developing nations2,3,4. Device-associated infections correlate with prolonged hospital stays, increased costs, and higher mortality rates5. Reducing the incidence of these infections by up to 30% could significantly lower healthcare expenses6. Hence, preventing device-associated infections is crucial for reducing healthcare costs and improving patient outcomes.

Currently, several clinical strategies aim to mitigate device-associated infections, including primary education, infection surveillance, and continuous feedback. Effective surveillance is vital for implementing practical prevention measures. Continuous monitoring and management are essential to prevent these infections and enhance the prognosis for high-risk patients, particularly those in ICUs. A study involving 900 acute care hospitals in the United States demonstrated the efficacy of intervention practices in preventing device-associated infections7. Additionally, the International Nosocomial Infection Control Consortium (INICC) network, an online surveillance system, has proven effective in reducing infection rates8. Therefore, early risk assessment and identification of patients at risk for device-associated infections are critical to reducing incidence and improving care quality.

Recently, multivariate logistic analysis has been commonly used to identify risk factors and develop predictive models. However, this method is constrained by its reliance on the linearity assumption between risk factors and outcomes9. In contrast, machine learning (ML), a branch of artificial intelligence, can uncover hidden patterns in complex datasets by training algorithms. ML has significantly influenced clinical decision-making and shown promise in predicting clinical outcomes10,11. The trend of ML applications in HAI prediction is evident, with prior studies developing models to predict specific device-associated infections like VAP, CLABSI, and CAUTI12,13,14. However, these models are limited to predicting individual infection types, not encompassing all potential HAIs. ICU patients often undergo multiple invasive procedures, increasing their infection risk. Therefore, comprehensive predictive models are essential for mitigating device-associated infections and improving survival outcomes in ICU patients post-procedures.

This study endeavors to develop ML models for predicting the risk of device-associated infections and 30-day survival outcomes in ICU patients following invasive device treatments, using data from the Medical Information Mart for Intensive Care (MIMIC)-IV cohort15. These ML models will be incorporated into a web-based application, facilitating clinicians in the early identification of high-risk patients and informed decision-making.

Materials and methods

Study design and participants

Electronic inpatient records were retrieved from the MIMIC-IV database (version 2.2), which includes data on 180,733 inpatients and 50,920 ICU patients admitted to Beth Israel Deaconess Medical Center between 2008 and 2019. This database integrates information from hospital and ICU systems, along with external sources, capturing vital signs, laboratory and microbiological tests, admissions and discharges, medications, length of stay, survival data, and discharge or death records16. Access to this database was granted upon completion of the Collaborative Institutional Training Initiative (CITI Program) training course (certification ID: 12037493). Since the database is publicly available and anonymized, patient informed consent was not required. This study adhered to relevant guidelines to ensure patient privacy, and it was approved by the Ethics Committee of Tengzhou Central People’s Hospital in China (ethical review approval number: 2023-ethical reviews-45). All methodologies and protocols were rigorously conducted in full compliance with the principles set forth in the Declaration of Helsinki.

The study included patients over 18 years old who were first hospitalized in the ICU and underwent at least one invasive procedure, such as invasive mechanical ventilation (IMV), central venous catheter (CVC), or indwelling urinary catheter (IUC). Exclusion criteria were: (1) death within 48 h of ICU admission; (2) invasive procedures conducted in non-ICU departments; (3) failure to meet the Centers for Disease Control and Prevention (CDC) National Healthcare Safety Network (NHSN) guidelines17 for diagnosing VAP, CLABSI, or CAUTI.

Outcomes and definitions

This study targeted three types of infections: VAP, CLABSI, and CAUTI. The outcomes were defined as 30-day mortality and device-associated infections occurring in the ICU after more than 48 h following the specified operations. The predictive device-associated infections included VAP, CLABSI, and CAUTI. Invasive device procedures and diagnoses of device-associated infections were evaluated based on the CDC’s NHSN guidelines.

Data collection

Variables documented upon ICU admission were collected, encompassing general information, laboratory examinations, and vital signs. Basic data included age, gender, ethnicity, comorbidities, admission care unit, invasive interventions, device-associated infections, 30-day survival outcomes post-first invasive procedure, and hospitalization outcomes. Laboratory parameters and vital signs were collected within the first 24 h of ICU admission. These included white blood cell (WBC) count, platelet count, anion gap, bicarbonate levels, creatinine levels, chloride levels, glucose levels, hemoglobin levels, potassium levels, sodium levels, blood urea nitrogen (BUN) levels, calcium levels, activated partial thromboplastin time (APTT), partial thromboplastin time (PT), international normalized ratio (INR), systolic blood pressure (SBP), diastolic blood pressure (DBP), mean arterial pressure (MAP), temperature, peripheral blood oxygen saturation (SpO2), heart rate, respiratory rate, Glasgow Coma Scale (GCS), Sequential Organ Failure Assessment (SOFA), Acute Physiology Score III (APS III), and Simplified Acute Physiology Score II (SAPS II). The code for the study is available on GitHub (https://github.com/susu223344/Device-Associated-Infections-Hyperparametric-Search/blob/main/sql).

Development and visualization of ML models

Patients were randomly assigned to training and validation datasets in a 7:3 ratio. Missing data were imputed using the k-nearest neighbor (KNN) strategy, and continuous variables were normalized using a min-max scaler. To mitigate the risk of data leakage, the KNN imputation method and min-max scaling were separately applied to the training and validation datasets following data partitioning18. Two separate ML models were developed: one for device-associated infections and one for 30-day survival outcomes. Seven ML models were applied for predicting device-associated infections: random forest (RF), logistic regression (LR), support vector machine (SVM), extreme gradient boosting (XGBoost), Gaussian naive Bayesian (GNB), decision tree (DT), and recurrent neural networks with long short-term memory (LSTM). For 30-day survival outcomes, five models were used: Cox regression, extra survival trees (EST), survival tree (ST), gradient boosting survival tree (GBST), and deep learning survival neural network (DeepSurv)19,20. A grid search approach was employed to optimize each model, identifying the optimal hyperparameters through 10-fold cross-validation (Supplementary Table 1). The code for this grid search is accessible on GitHub (https://github.com/susu223344/Device-Associated-Infections-Hyperparametric-Search/blob/main/Code%201.ipynb). All models were trained and evaluated using 10-fold cross-validation with the identified optimal hyperparameters.

Assessment of the device-associated infection ML model was conducted using the area under the curve (AUC) of the receiver operating characteristic (ROC), the area under the precision-recall curve (AUPRC), and the Brier score (BS). Evaluation of the 30-day survival ML model employed concordance index (C-index), integrated Brier score (IBS), and time-dependent AUC. Both AUC and C-index measure discrimination; they are equivalent for binary classification ML models, but the C-index remains unaffected by survival time censoring, making it ideal for survival data analysis. C-index and AUC values below 0.60 indicate poor discrimination, between 0.60 and 0.75 indicate potentially helpful discrimination, and above 0.75 indicate clearly useful discrimination21. Additionally, AUPRC, similar to ROC-AUC, excels at evaluating discrimination in class-imbalanced scenarios, distinguishing positive and negative samples more effectively. The time-ROC curve, illustrating ROC curves at various time points with evolving AUC, captures the model’s discrimination variations over time, with a time-dependent AUC closer to 1 indicating higher predictive performance. BS and IBS are essential for assessing prediction accuracy; BS calculates the mean squared difference between predicted and actual values, while IBS integrates BS for continuous-time model assessment, with values closer to 0 signifying superior predictive accuracy.

Selection of the optimal ML algorithm for the device-associated infection model involved training seven ML models with all variables and comparing metrics such as AUC, AUPRC, and BS, alongside disparities in decision curve analysis (DCA) and calibration curves. This rigorous evaluation identified the most suitable algorithm. A similar process was followed for the 30-day survival model, training five ML models with all relevant variables and selecting the best algorithm based on C-index, IBS, and time-dependent AUC. The Shapley additive explanation (SHAP)22 method evaluated the variable importance of the optimal ML algorithms for both models. The performance of the optimal ML algorithms was tested on the full set of variables and the top 30, top 20, top 15, and top 10 variables, emphasizing fewer variables without compromising performance, thereby determining the final model.

The optimal ML models were converted and implemented into a web-based application. This application displays the predicted probability of device-associated infections and provides insights into the decision-making process using the SHAP method. Additionally, it includes the patients’ 30-day Kaplan-Meier survival curve.

Statistical analysis

Data were presented as medians with interquartile ranges for continuous variables and as counts with percentages for categorical variables. Differences between the two datasets were assessed using the Wilcoxon rank sum test for continuous data and Fisher’s exact test for categorical data. All statistical analyses were performed using R version 4.3.0 and Python version 3.7.0 (version 3.8.17 for the web-based application). A P-value of less than 0.05 was considered statistically significant.

Results

Baseline characteristics

A total of 8574 patients were included in this study, with 6001 in the training dataset and 2573 in the validation dataset. No significant differences were observed between the baseline parameters of the two cohorts (P > 0.05) (Table 1). The baseline characteristics are detailed in Table 1, with a median age of 69.03 years (interquartile range: 57.33, 80.04). Among the participants, 4774 (55.68%) were male. Following invasive device procedures, 491 patients (5.73%) developed device-associated infections, and 1329 patients (15.50%) died within 30 days.

Table 1 Baseline characteristics and outcomes of the included patients.

Development and assessment of ML models

Figures 1 and 2 compare the performance of seven ML models in predicting device-associated infections when all variables are included. In the validation dataset, XGBoost exhibited the highest discriminative performance with an AUC of 0.812 (95% CI 0.809–0.815), while SVM had the lowest AUC of 0.591 (95% CI 0.558–0.625). All seven ML models displayed low AUPRCs, with XGBoost and RF achieving the highest AUPRCs of 0.230 (95% CI 0.224–0.236) and 0.230 (95% CI 0.224–0.237), respectively, and SVM the lowest at 0.089 (95% CI 0.076–0.101). Among all models, the XGBoost model was most closely calibrated to the perfectly calibrated curve and had the lowest BS of 0.054 (95% CI 0.053–0.054). DCA indicated that XGBoost provided the highest net benefit when the threshold probability ranged between 0.1 and 0.3. At a threshold probability of 0.1, the XGBoost model identified 2 true positives per 100 patients with invasive devices, with no false positives. Based on these findings, XGBoost was selected as the ML model for predicting device-associated infections.

Fig. 1
figure 1

Comparison of the discriminative ability among device-associated infection models. (A) ROC curves for the training dataset. (B) ROC curves for the validation dataset. (C) PR curves for the training dataset. (D) PR curves for the validation dataset.

Fig. 2
figure 2

Calibration curves and DCA of device-associated infection models. (A) Calibration curves for the training dataset. (B) Calibration curves for the validation dataset. (C) DCA for the training dataset. (D) DCA for the validation dataset.

Table 2 and Fig. 3 depict the performance of five ML models in predicting 30-day survival. In the validation dataset, EST demonstrated the highest C-index of 0.777 (95% CI 0.775–0.779), while DeepSurv had the lowest value of 0.678 (95% CI 0.642–0.714). EST and Cox achieved the lowest IBS of 0.086 (95% CI 0.086–0.086), whereas DeepSurv had the highest IBS of 0.101 (95% CI 0.099–0.102). Additionally, EST displayed a higher time-dependent AUC of 0.795 (95% CI 0.789–0.801) compared to the other ML models, particularly beyond five days following invasive device procedures. Based on the performance of all models in the validation cohorts, EST was selected as the ML model for predicting 30-day survival outcomes.

Table 2 Performance metrics for 30-day survival models.
Fig. 3
figure 3

Comparison of time-dependent AUC among 30-day survival models. (A) Comparison for the training dataset. (B) Comparison for the validation dataset.

Variables selection

Figure 4 depicts the feature importance of the two models when all variables are included. Tables 3 and 4 present model performances with different numbers of variables, sorted by importance. The XGBoost model incorporating the top 10 variables achieved an AUC of 0.810 (95% CI 0.808–0.812), AUPRC of 0.226 (95% CI 0.222–0.230), and BS of 0.053 (95% CI 0.053–0.054), comparable to the model using all variables, which had an AUC of 0.812 (95% CI 0.809–0.815), AUPRC of 0.230 (95% CI 0.224–0.236), and BS of 0.054 (95% CI 0.053–0.054) (Table 3). Similarly, the EST model with the top 10 variables exhibited a C-index of 0.756 (95% CI 0.754–0.757) and a time-dependent AUC of 0.769 (95% CI 0.763–0.775) compared to the C-index of 0.777 (95% CI 0.775–0.779) and time-dependent AUC of 0.795 (95% CI 0.789–0.801) observed in models incorporating all variables (Table 4). Both XGBoost and EST eventually selected the top 10 variables to minimize the number of variables included.

Fig. 4
figure 4

Feature importance plots for all variables. (A) Feature importance plot of XGBoost. (B) Feature importance plot of EST.

Table 3 Performance metrics for different variables of XGBoost in the validation dataset.
Table 4 Performance metrics for different variables of EST in the validation dataset.

Interpretability analysis

The SHAP algorithm was utilized to calculate each variable’s contribution to every example, revealing how the data influences the probability and how the model processes this information. Figure 5 illustrates the interpretability analysis of the SHAP algorithm in the XGBoost and EST models. Feature importance was estimated based on the training dataset to assess the global impact of each feature. In the XGBoost model, the top variables were length of stay (LOS) in the ICU before using a central venous catheter (CVC), LOS before using invasive mechanical ventilation (IMV), tracheostomy, medical intensive care unit (MICU)/surgical intensive care unit (SICU), temperature, APS III, LOS before using an indwelling urinary catheter (IUC), MAP, PT, and renal replacement therapy (RRT). In the EST model, the key variables included APS III, MICU/SICU admission, age, cancer, liver disease, SAPS II, anion gap, cerebrovascular disease, myocardial infarction, and the use of two or more devices. SHAP summary plots demonstrated the impact of each feature on the model output in terms of overall distribution. The XGBoost model indicated an increased risk of device-associated infections with prolonged ICU stay before the utilization of devices like CVC and IMV, as well as heightened risks for patients who underwent tracheostomy and were admitted to the MICU/SICU. The EST model revealed an increased risk of mortality within 30 days following an invasive device procedure for patients with elevated APS III scores, MICU/SICU admission, advanced age, and a cancer diagnosis. Decision plots depicted the decision-making process for the top 100 patients, with the prediction line illustrating the accumulation of the ML output value from the base SHAP value to the final value at the top of the figure.

Fig. 5
figure 5

SHAP analysis of the final XGBoost and EST models. (A) Feature importance plot of XGBoost. (B) Feature importance plot of EST. (C) SHAP summary plot of XGBoost. (D) SHAP summary plot of EST. (E) SHAP decision plot of XGBoost. (F) SHAP decision plot of EST.

Model application

The XGBoost and EST models were deployed through a web-based application (https://susudaihai.streamlit.app/). Users can input variables into the application to predict the probability of device-associated infection. The application generates SHAP force plots, illustrating the contribution of individual features in influencing the model’s predictions. Additionally, it presents the patient’s 30-day Kaplan-Meier survival curve, enabling an assessment of their prognosis.

Discussion

This study preliminarily developed ML-based models to predict the risk of device-associated infections and 30-day survival outcomes in ICU patients following initial invasive device procedures. The XGBoost and EST models were developed for these predictions, focusing on the top 10 variables. In the validation cohort, the XGBoost model achieved an AUC of 0.810 (95% CI 0.808–0.812), and the EST model achieved a C-index of 0.756 (95% CI 0.754–0.757). These models enhance personalized surveillance by providing individualized probabilities of device-associated infections and 30-day survival outcomes.

Device-associated infections have garnered significant attention in preventing HAIs over recent decades. A survey of 183 hospitals in the USA indicated that HAIs accounted for 4%, with device-associated infections contributing 25.6%. These infections place a substantial burden on patients23. Previous studies have primarily focused on predicting specific device-associated infections through model construction. Various ML algorithms have been proposed to predict the risk of VAP in ICU patients. For instance, Giang et al.12 and Liang et al.24 demonstrated that XGBoost (AUC: 0.854) and RF (AUC: 0.84) as ensemble ML algorithms performed well in predicting VAP. Wu et al.25 used the LR algorithm to predict VAP in patients with acute respiratory distress syndrome, achieving an AUC of 0.744. Han et al.26 established an LR model with an AUC of 0.817 to predict mortality in ICU patients with VAP. Additionally, other studies13,27,28,29 have used LR, XGBoost, RF, and deep learning algorithms to build ML models for predicting CLABSI in hospitalized patients, with AUCs ranging from 0.722 to 0.993. Furthermore, several studies14,30,31 have utilized neural networks, ensemble learning models, and XGBoost algorithms to develop ML models for predicting CAUTI in hospitalized patients, with AUCs ranging from 0.758 to 0.904. However, these models only predicted one of the outcomes: VAP, CLABSI, or CAUTI. A study conducted in Poland32 among ICU patients found that the incidence density for CLABSI was 8.0 per 1000 catheter days, VAP was 15.2 per 1000 ventilator days, and CAUTI was 3.0 per 1000 catheter days. Predicting just one outcome is insufficient for reducing the risk of HAIs in ICU patients. Using models from three different studies to predict the risk of device-related infections would present clinicians with numerous variables and inconsistent inclusion criteria, complicating clinical decision-making. Moreover, these models did not integrate patient mortality outcomes. In the ICU, both patient survival probability and device-associated infections are crucial for clinicians, as some ICU patients face poor prognoses following invasive device procedures. A model that can predict both outcomes would significantly enhance patient prognosis. The XGBoost model in this study achieved an AUC of 0.810, closely approximating the AUCs of the aforementioned models specific to device-related infections. Requiring only 10 variables, XGBoost facilitates easier data acquisition compared to similar studies. Additionally, it provides comprehensive predictions for VAP, CLABSI, and CAUTI, enhancing both the breadth and efficiency of predictions. The model includes variables such as patients’ ICU length of stay before employing IMV, CVC, and IUC, enabling clinicians to input diverse device usage times. This aids in predicting potential risks of device-related infections and offers insights into optimal device utilization timings and necessity. This particular aspect distinguishes our study from others.

The XGBoost algorithm achieved the highest AUC among the seven classification models included in the study. For unbalanced data, AUPRC is typically more discriminative than AUC33, and XGBoost also achieved the highest AUPRC. Predictive models often struggle to show high net benefits across all risk thresholds, our DCA curve demonstrated significant net benefit for XGBoost within the 0.1–0.3 risk threshold. Considering all these factors, XGBoost was ultimately selected as the optimal classification model, aligning with findings from previous studies. Among the five survival models examined, EST was chosen for its highest C-index and time-dependent AUC.

The SHAP algorithm revealed that LOS before using CVC and IMV were the most significant variables in the device-associated infection model. As an intravenous catheter, CVC is inserted into centrally-located veins to administer medications, fluids, and nutrition, and to draw blood. Most ICU patients require CVC due to advanced care needs. However, infection is a common complication associated with CVC, primarily due to bacteria infiltrating the catheter and forming biofilm colonies resistant to antibiotics. These biofilms can quickly spread to the bloodstream, leading to CRBSI34,35,36. The delayed use of CVCs and IMVs in the ICU correlates with an increased risk predicted by the ML model. This heightened risk is likely due to prolonged ICU stays, which elevate the chances of bacterial colonization and infection. Additionally, the use of invasive devices, such as CVCs and IMVs, compromises the patient’s protective barriers.

The 30-day survival model identified APS III, MICU/SICU admission, advanced age, and a cancer diagnosis as key variables increasing mortality risk within 30 days after invasive device procedures. Consistent with previous analyses, APS III and age are significant factors affecting ICU mortality37. This study demonstrates that patients admitted to the MICU/SICU with higher APS III scores not only face a higher risk of device-associated infections but also exhibit increased 30-day mortality rates. It is important to note that predictive models and causal studies differ, and identifying significant variables in interpretable findings does not always indicate causal associations38.

The ML model assists ICU physicians in making informed decisions on optimal device insertion timing to reduce the risk of device-associated infections. This assistance includes: (1) For patients newly admitted to the ICU without prior device usage, physicians can input varying durations of CVC and IMV usage based on the patient’s current fixed variables. This predicts the infection risk at different intervals, facilitating the determination of the latest suitable time for device implementation and the appropriate equipment category. (2) For patients with existing ICU tenure and prior device usage, physicians can input different device usage durations and adjust tracheostomy status to forecast infection risks. This helps in assessing the necessity of additional equipment and tracheostomy. (3) In scenarios where device usage duration is fixed, integrating forecasts of infection risk and 30-day mortality risk enables comprehensive evaluation of the patient’s prognosis, allowing timely intervention.

Conclusions

This study preliminarily developed an integrated ML model that predicts device-associated infections and 30-day survival outcomes following invasive device procedures in ICU patients. The ML model is accessible through a user-friendly web-based application. The clinical applicability of this integrated tool requires further validation in future studies.

Limitations

This study has several limitations. Firstly, while the MIMIC database contains extensive patient data and has facilitated the development of numerous ML algorithms, it is important to note that it is a single-center database, unlike multi-center databases such as eICU. Some of the data in the MIMIC database originate from two distinct systems, MetaVision and CareVue, which focus solely on ICU data and lack information about patients once they transition from the ICU to general wards. Moreover, the racial distribution within the MIMIC database is significantly imbalanced, posing substantial challenges in applying the findings across diverse global populations, particularly among Asians. Secondly, difficulties in collecting sufficient external validation data hindered the generalizability of our model to different settings. Thirdly, the incidence of device-associated infections was only 5.73%, resulting in substantial data imbalance. While the AUC reached 0.810, the AUPRC was merely 0.226, indicating a limitation in XGBoost’s capacity to accurately identify true positive cases. Thus, the practical effectiveness of the model warrants further investigation. Finally, this ML model predicts only 30-day survival outcomes rather than long-term prognosis and is specifically designed for ICU patients following invasive device procedures. It cannot be applied to predict infection risk and survival outcomes for non-ICU patients or those undergoing other types of operations.