Development and validation of a machine learning-based model to predict the risk of hospitalization death in hospitalized patients with AECOPD

Zhang, Yan; Zheng, Shuping; Wang, Dan; Lin, Fanjie; Ma, Yu; Qiang, Zhihui; Zhang, Tianyi; Zhong, Haicheng; Zhou, Miaomiao; Li, Zhuoyang; Chen, Penggang; Feng, Jieyu; Lu, Wenju; Liu, Yun

doi:10.1038/s41598-025-19810-0

Download PDF

Article
Open access
Published: 14 October 2025

Development and validation of a machine learning-based model to predict the risk of hospitalization death in hospitalized patients with AECOPD

Yan Zhang¹,
Shuping Zheng¹,
Dan Wang¹,
Fanjie Lin²,
Yu Ma¹,
Zhihui Qiang¹,
Tianyi Zhang¹,
Haicheng Zhong¹,
Miaomiao Zhou¹,
Zhuoyang Li¹,
Penggang Chen³,
Jieyu Feng⁴,
Wenju Lu² &
…
Yun Liu¹

Scientific Reports volume 15, Article number: 35918 (2025) Cite this article

2175 Accesses
1 Citations
1 Altmetric
Metrics details

Subjects

Abstract

Acute exacerbation of chronic obstructive pulmonary disease (AECOPD) is a leading cause of hospitalization and death in COPD patients. Machine learning (ML) approach is powerful but has a “black box” issue with an undirect interpretation of the ML technique. Herein, we conducted a multicentre, retrospective cohort study in two tertiary hospitals across China, primarily utilizing echocardiographic variables to build and validate an explainable prediction model based on a ML approach to predict the hospitalization death of AECOPD. For model explainability, we utilized a model-agnostic SHapley Additive exPlanations explainer to interpret the output of our final model. Our results showed that the light gradient boosting machine (LightGBM) model achieved the best performance among the 11 ML models. After reducing features according to the feature importance rank, an explainable final LightGBM model was established with 9 features (AUC = 0.956, accuracy = 92.1%, sensitivity = 0.891, specificity = 0.933, PPV = 0.852, NPV = 0.952, F1 score = 0.871). To facilitate its utility for clinicians, this final explainable model had been translated into a convenient application. In addition, the LightGBM model mitigated the concern of the “black-box” via a global and a local explanation of the SHAP method. A publicly accessible web tool was generated for the model. These findings further hold promise for guiding clinical management and improving patient outcomes.

Prediction of mortality risk in patients with severe community-acquired pneumonia in the intensive care unit using machine learning

Article Open access 10 January 2025

Machine learning for the development of diagnostic models of decompensated heart failure or exacerbation of chronic obstructive pulmonary disease

Article Open access 05 August 2023

Machine learning for risk prediction of acute kidney injury in patients with diabetes mellitus combined with heart failure during hospitalization

Article Open access 28 March 2025

Introduction

Acute exacerbation of chronic obstructive pulmonary disease (AECOPD) is characterized by rapid worsening of respiratory symptoms, accelerated decline in airway function, and reduced quality of life^1,2. Notably, AECOPD represents the leading cause of hospitalization and death in chronic obstructive pulmonary disease (COPD) patients, resulting in more than three million deaths annually^3,4, and is a significant financial burden to health care systems⁵. Although mortality rates following hospitalization for acute exacerbation of COPD are declining, reported rates still vary from 23 to 80%. Progressive respiratory failure, cardiovascular disease (CVD), malignancies and other diseases are the primary causes of death in people with COPD hospitalized for an exacerbation^6,7,8. COPD poses a significant but heterogeneous burden to individuals and healthcare systems. Policymakers develop targeted policies that will minimize this burden and target them to subpopulations most likely to benefit, which offers several significant administrative benefits for hospitals that can enhance overall patient care, resource allocation, and operational efficiency⁹. Therefore, timely identification of patients at high risk for death after hospitalization for acute exacerbation of COPD, which may reduce the associated mortality and financial burden, is highly important.

At present, clinical scoring systems such as the modified Medical Research Council (mMRC) score and COPD Assessment Test (CAT) questionnaire are used to assess AECOPD. Many components of these scoring systems rely on subjective clinical assessments, primarily focus on clinical signs and symptoms and often do not incorporate a comprehensive set of biomarkers that can provide more objective insights into the underlying pathophysiology of AECOPD^10,11. Thus, these scoring systems have several limitations that make objective measurement of AECOPD challenging. An increasing number of studies have confirmed that blood cell counts, e.g., eosinophil, platelet, and lymphocyte counts, are associated with AECOPD^{12,13,14,15,16}. Other studies have reported that circulating biomarkers of inflammation, e.g., C-reactive protein (CRP)¹⁷ and fibrinogen^18,19, can be used to predict the risk of AECOPD. In addition, previous studies have suggested that the presence of a low blood eosinophil count significantly increases the risk of in-hospital mortality rates in hospitalized patients with AECOPD^20,21. Although numerous clinical studies have searched for novel, easy to measure prognostic biomarkers to develop more effective predictive models and enhance risk prediction over clinical assessment, the search for a single biomarker seems to have been unsuccessful owing to the complex pathophysiology of COPD. CVD is the most common comorbidity in patients with COPD and is a common underlying cause of COPD exacerbation²². Therefore, cardiovascular risk factors can predict adverse outcomes of COPD patients. However, few studies have focused on the role of baseline echocardiographic abnormalities in patients with AECOPD.

Electronic medical records (EMRs) have been gaining widespread use in hospitals for many years, which has made it possible for clinicians and researchers to collect the clinical data of patients more accurately and conveniently. To date, numerous studies have applied machine learning (ML) approaches to facilitate disease prediction, and most have shown a good predictive value, making ML-based models valuable tools for implementation in clinical practice²³. The ML technique is a powerful computational method for handling highly variable datasets and understanding the complex relationships between variables in a way that can be trained. In recent years, an increasing number of studies have concentrated on AECOPD prediction via the ML approach to identify which features are most important for case identification and predicting exacerbations^24,25,26,27. ML methods have the potential to improve predictive modeling of health outcomes²⁸, and were used to improve prediction of 5-year all-cause mortality in subjects undergoing CT coronary angiography²⁹ and cardiac motion MRI³⁰. These examples highlight the growing body of research leveraging ML for clinical predictions and underscore the importance of our work in this context. Although the ML technique is powerful due to the complexity of the model, obtaining the correct interpretation of an ML model is challenging; it is still limited by the difficulty of stating a direct interpretation, such as a so-called “black box”³¹. To overcome the “black-box” issue with little explanation about how predictions are derived, the SHapley Additive exPlanation (SHAP) method, which can rank the importance of input features, was utilized to explain the ML models and visualize individual variable predictions³². By decomposing the model’s predictions into contributions from individual features, SHAP enhances the interpretability of ML model. This transparency is crucial for clinical applications, where understanding the reasoning behind predictions can inform decision-making. SHAP helps identify key drivers of the outcome, which can lead to actionable insights^23,32. The SHAP method is a unified approach for explaining the outputs of ML models in earlier studies, including a study on predicting the first exacerbation of COPD²⁷. However, there are no studies on hospitalization death in which the SHAP method has been used to explain prediction models for hospitalized patients with AECOPD.

In the present study, we aimed to develop and validate an explainable ML-based model for hospitalization death in hospitalized patients with AECOPD by analysing and mining medical big data, elucidating feature importance and explaining the model via the SHAP method. Furthermore, this method enables early and accurate identification of AECOPD patients at high risk of hospitalization death so that prompt therapeutic measures can be initiated and prognoses are improved in clinical settings.

Results

Patient characteristics

A total of 2924 patients with AECOPD who underwent transthoracic echocardiography were identified. Among them, 275 patients who failed to meet the inclusion criteria were excluded. Finally, 2649 patients were included for analysis. During the multiple model comparison, we employed stratified sampling to randomly split our data set into 70% and 30% partitions for the training set (1854 patients) and the validation set (795 patients), respectively. Details of the study design are displayed in Fig. 1.

The baseline characteristics of the derivation cohort (n = 2649), training cohort (n = 1854), and validation cohort (n = 795) are described in Table 1. Among the 2649 participants in the derivation cohort, 79.92% were male and 53.11% had a history of smoking. Consistent with previous studies, most of the patients were elderly, and their median age was 75.0 years. In the training cohort, the median age mirrored that of the derivation cohort at 75.0 years; 79.50% of them were male and 52.86% had a history of smoking. For the validation cohort, the median age was also 75.0 years, 80.88% were male, and 53.71% had a history of smoking.

Table 1 Demographic and clinical characteristics of the cohort.

Full size table

In the population, 796 (30.05%) patients died during hospitalization, and 1853 (69.95%) patients survived hospitalization. A comparison of the demographic and clinical characteristics between the survival group and the death group is presented in Supplementary Table S1.

Model development and performance comparison

The top 20 variables from the training cohort were used to generate 11 ML models to predict the risk of hospitalization death in hospitalized patients with AECOPD. Among the 11 models, the LightGBM model (AUC = 0.962) had the best predictive effect, followed by the GBM model (AUC = 0.951) and XGboost model (AUC = 0.945). The discriminative performances of these 11 models are listed in Table 2. The sensitivity, specificity, PPV, NPV, accuracy, and F1 score were calculated at the optimal cut-off value that maximized the Youden index. The ROC curves and the SHAP summary plots of the top 20 features for the top five best-performing ML models are presented in Fig. 2A and Fig. 3A–E. As shown in Fig. 2B, the LightGBM model was found to obtain nearly the optimal AUC and the best predictive ability among these five models during the process of reducing features based on the feature importance rank.

Table 2 Performance of the ML models for predicting hospitalization death in hospitalized patients with AECOPD.

Full size table

Identification of the final model

Through multi-model comparison, it was found that the LightGBM model performed best, and the final model was identified during feature reduction of the LightGBM model. The 28-feature model was significantly better than the 3-feature model (△AUC = 0.020, P = 0.023) and the 6-feature model (△AUC = 0.011, P = 0.027); however, it was not significantly better than the 9-feature model (△AUC = 0.007, P = 0.086) and the 12-feature model (△AUC = 0.003, P = 0.351). This comparison of various features in the LightGBM model is displayed in Supplementary Table S2 and Supplementary Fig. S1A-B. To identify the appropriate number of features for this model, an 8-feature model (△AUC = 0.020, P = 0.023) and a 10-feature model (△AUC = 0.011, P = 0.027) were further analysed and were not significantly different from the 28-feature model (Supplementary Fig. S1C). Hence, we focused on the 9-feature LightGBM model, and it was selected as the final model for further analysis according to the significance of the following variables: PAD, PV, cTnT, DD, Vmax, NEUT%, NT-proBNP, EF, and RAD.

Diagnostic performance of the final LightGBM model

The final LightGBM model achieved an AUC of 0.956 with a sensitivity of 0.891, a specificity of 0.933, a PPV of 0.852, an NPV of 0.952, an accuracy of 0.921, and an F1 score of 0.871 for predicting hospitalization death in hospitalized patients with AECOPD. We also performed a decision curve analysis (DCA) to further assess the clinical utility of our model, as shown in Fig. 4A. The DCA revealed that our 9-feature final model, achieved a higher net benefit across a broad range of threshold probabilities. Specifically, the optimal decision threshold for our model was determined to be 0.229 based on the Youden index. At this threshold, the model demonstrated a sensitivity of 89.1% and a specificity of 93.3%. In clinical practice, this cut-off value would prioritize early intervention for patients with a predicted risk of death ≥ 22.9%. The net benefit corresponding to the cut-off value of 0.229 was calculated to be 0.253. Moreover, the area under the precision‒recall (P–R) curve of the 9-feature model was only marginally lower than that of the 28-feature model (Fig. 4B–F), indicating that the 9-feature model has high clinical utility. Then we used decile binning (10 equal sample size bins) to create the calibration curve. As can be seen from Supplementary Fig. S2, the calibration curve of the LightGBM model is quite close to the ideal diagonal line. The Brier score of the 9-feature model is 0.06, which is less than 0.1, indicates that the average squared difference between the predicted probabilities and the actual observed probabilities is relatively small. Overall, the calibration curve demonstrates a strong consistency between the predicted probabilities and the observed outcomes. These results indicate that our model can provide more accurate and clinically relevant predictions, thereby assisting clinicians in making better-informed decisions regarding the management of patients with AECOPD.

In addition, to validate the appropriate sample size for this study and the robustness of this model to site variation, we further performed five-fold and ten-fold cross validations. As presented in Supplementary Fig. S3, the final model displayed mean AUCs of 0.960 ± 0.009 and 0.962 ± 0.011 in the five-fold and ten-fold cross validations, respectively. The predictive values of PAD, PV, cTnT, DD, Vmax, NEUT%, NT-proBNP, EF, and RAD were further investigated and compared with the 9-feature final model (Supplementary Fig. S4A), and they all performed worse than the final model. The DCA curves also revealed that the final model had greater clinical utility than each variable (Supplementary Fig. S4B).

Model explanation

The SHAP method provides two types of explanations: a global explanation of the model at the feature level and a local explanation at the individual level. A global explanation could provide consistent and accurate attribution values for each feature and describe the overall functionality of the model. The local explanation details how a certain prediction is made for a patient by inputting individualized data. The SHAP approach can interpret the output of the final model by calculating the contribution of each variable to the prediction. As illustrated in Fig. 5A, B, the SHAP summary plots using the average SHAP values were used to evaluate the contribution of each feature to the model and are presented in descending order. Moreover, the SHAP dependence plot can be used to understand how a single feature affects the output of the prediction model. The real values versus the SHAP values of these 9 features are shown in Fig. 5C. SHAP values that are higher than zero correspond to a positive class prediction in the model, i.e., a higher risk of hospitalization death in hospitalized patients with AECOPD.

The waterfall plot (Fig. 6A, B) shows a patient who survived hospitalization and displays the actual measured values of the features. According to the prediction model, the decision for this case leaned towards “non-death” with a probability of 90.3% and “death” with a probability of 9.7%. We also observed a similar phenomenon for a patient who died during hospitalization (Fig. 6C, D), which also revealed the features pushing or pulling the decision towards the “death” class and their actual measured values. Figure 6C represents this patient in the “non-death” class with a probability of 2.4%, and Fig. 6D represents this patient in the “death” class with a probability of 97.6%. Additionally, the force plot of interpretation (Fig. 6E) illustrates that an increased red part for each individual patient represents a greater probability of “death.” The x-axis represents each patient, and the y-axis represents the contributions of the features. An increased red part for each individual patient represents a greater probability of “death.”

Discussion

To our knowledge, this is the first retrospective, multicentre cohort study to develop and validate a clinical prediction model for hospitalization death in hospitalized patients with AECOPD, primarily utilizing readily available clinical and laboratory data, with a focus on echocardiographic variables, via ML algorithms. We compared the baseline characteristics of hospitalized patients with AECOPD and identified a set of predictive variables to establish a prediction model. The established prediction model was accurate in both the training and validation cohorts and could be used to assist clinicians in early, accurate identification and personalized treatment of AECOPD.

Currently, there is a lack of consensus regarding the definition of AECOPD, in most of the international guidelines the diagnosis of an exacerbation relies exclusively on the clinical presentation of the patient complaining of acute changes of symptoms³³. Clinical scoring systems for COPD, such as the mMRC score and CAT questionnaire, have been widely used to assess the severity and predict outcomes in patients with AECOPD^10,11. Additionally, various scoring systems have been proposed to classify the risk of clinical deterioration or mortality in hospitalized patients with AECOPD^3,34,35. However, there is a lack of objective measurements in the scoring systems currently. With the advancement of ML, clinicians and researchers are now able to address complex and extensive data, turning high volumes of data into feasible models to improve their ability to diagnose diseases. Moreover, sophisticated ML algorithms combined with EMR data can facilitate the development of clinical prediction models³⁶. Among the 11 ML models, the LightGBM model had the best AUC value with good net benefit and a high threshold probability for feature reduction and has the potential to identify the risk of hospitalization death in hospitalized patients with AECOPD. Advanced algorithms such as light gradient boosting trees analyse and interpret massive amounts of patient data to improve forecasts, early detection, and personalized treatment strategies. The algorithm learns patterns and relationships in data by training an ML model on a dataset with known outcomes. LightGBM improves gradient boosting, training efficiency, and prediction accuracy in large datasets³⁷. Several studies have proven that the LightGBM method has excellent predictive value in the field of medicine^37,38,39. In this study, we employed the LightGBM algorithm to develop a final model with 9 features. These features, which are easily obtained or evaluated, make this model promising as an early discriminative tool for assessing disease severity for further examinations and aiding in treatment decisions.

More features may provide more information for the prediction model; however, noncausal features may reduce the accuracy of the prediction, and a large number of features may limit the clinical use of the model. In addition, there is a lack of guidelines or consensus for selecting features for the prediction model⁴⁰. Although there is no evidence demonstrating how many features should be included in the model, the SHAP method was employed to assist feature selection. Our final model was established through a comparison of 11 ML models and feature reduction. In addition, it performed well in predicting mortality in COPD patients.

Compared with traditional single markers, the final model we developed had a superior ability. Blood eosinophil counts are widely studied, but few studies have examined the prognostic value of blood neutrophil counts in patients with AECOPD. In fact, only a minority of COPD patients exhibit eosinophilia; COPD is predominantly characterized as a neutrophilic inflammatory disorder^41,42. Increased neutrophil counts reflect systemic inflammation, which can exacerbate respiratory symptoms and lead to multi-organ dysfunction, contributing to higher mortality risk in AECOPD. A recent study demonstrated that high blood neutrophil counts may be useful indicators in the risk of exacerbations and mortality in COPD patients⁴³, which is consistent with our findings. Furthermore, activation of the blood coagulation system is a common observation in inflammatory diseases and is increased during COPD exacerbations. Researchers have confirmed that coagulation markers are potential predictors of later COPD exacerbation and mortality, and higher DDs in stable COPD patients predict higher mortality⁴⁴. Elevated DD levels indicate a hypercoagulable state common in AECOPD patients due to inflammation and hypoxia, leading to pulmonary emboli formation, worsening respiratory function, and increased death risk. Thus, the correlation of DD with the poor prognosis of COPD patients has been well recognized, and it is well used in our model development. Although the underlying mechanism between AECOPD and CVD has not been elucidated, some studies have shown that AECOPD increases the risk for subsequent CVD^45,46,47, which is one of the causes of death in COPD patients hospitalized for an exacerbation. The elevation of cTnT and NT-proBNP levels suggests myocardial injury and cardiac stress or dysfunction in AECOPD patients, increasing death risk via impaired cardiac function and predicting poor outcomes. Including the features of cTnT and NT-proBNP was beneficial for strengthening the predictive ability of the final model due to their critical significance in CVD diagnoses. Interestingly, the final model also consisted of PAD, PV, Vmax, EF, and RAD. Although there is no evidence demonstrating their ability to independently predict AECOPD, these echocardiographic variables may be associated with an increased risk of AECOPD. Indeed, increased PAD, PV, and Vmax indicate right heart issues like pulmonary hypertension and right ventricular dysfunction in AECOPD patients, leading to strain, failure, and increased mortality risk⁴⁸. Reduced EF and enlarged RAD signal left ventricular dysfunction and right atrial dilation in AECOPD patients, worsening respiratory failure and contributing to higher mortality risk. Many studies have shown that the presence of echocardiographic abnormalities in the general population as well as in those with cardiovascular risk factors has greater prognostic value than clinical factors do alone in predicting the risk of stroke, sudden cardiac death, cardiovascular morbidity and mortality^49,50. Similarly, our study revealed that echocardiographic abnormalities may be able to identify a greater risk of hospitalization death in hospitalized patients with AECOPD. Therefore, these clinical variables could contribute to the final model, and their combination may be superior to a single marker in predicting the risk of hospitalization death in hospitalized patients with AECOPD.

Owing to the lack of explanation of how predictions are derived, the ML technique has been described as a “black box”. To the best of our knowledge, clinicians are hesitant to use ML to make medical decisions because they believe that it is based on opaque information. This brought up another advantage of this study: we utilized the SHAP approach to explain the ‘‘black-box’’ of ML models. The SHAP method could explain this model via a global explanation that describes the overall functionality of a model and a local explanation that details how a certain prediction is made for a hospitalized patients with AECOPD by inputting the individualized data. Moreover, with a convenient tool based on the Streamlit framework, this prediction model can be used on the webpage and shared with more clinicians. Consequently, the model with the SHAP method can be useful for assessing high-risk patients with AECOPD and may assist clinicians in directing personalized AECOPD strategies in an understandable manner.

While our study showed promising results, it is important to acknowledge its limitations. First, this final explainable model was not validated in the external validation cohorts. In the future, we intend to include more patients in prospective multicentre studies to validate the model’s performance. Second, our model was developed using data mainly from Chinese patients, and whether the model performs well in various global populations remains unclear, necessitating further validation in diverse racial groups to ensure its generalizability across various clinical settings. Third, because we included only patients with complete echocardiographic data, 30.05% of patients died during hospitalization, which may bias this study towards positive results. We will continue to refine our model and consider incorporating additional data sources in future studies to further improve its generalizability and applicability. Fourth, as a prevalent issue in hospitalized patients with AECOPD, pulmonary function tests were unavailable for most patients because of their critical condition in this study. This results in difficulty in incorporating pulmonary function data into the prediction model. Nevertheless, the cost of prediction and medical expenses can be reduced to a large extent for patients. Fifth, our current dataset does not include information on comorbidity indices or frailty scores, we suggest that future studies should consider these variables to further refine and validate predictive models. Sixth, we encountered data sparsity in several variables, future work should focus on improving data collection processes and ensuring more complete datasets to enhance model robustness; the real-time application of our ML model in clinical settings presents several challenges, we plan to explore these challenges in future work and collaborate with more clinicians to develop practical solutions for real-time deployment; our study has a class imbalance issue where the number of patients who died during hospitalization is significantly lower than that of those who survived, we will continue to explore advanced techniques for handling class imbalance, such as synthetic data generation and ensemble methods, to improve the model’s performance and ensure balanced prediction accuracy across classes. Seventh, data drift and changes in clinical practice are potential issues, future efforts will focus on data collection and monitoring, regular model retraining, incorporation of additional variables and adaptation to clinical practice. Finally, hospitalized patients with AECOPD frequently have more than one disease, and because there are many disease categories in clinical settings, incorporating the disease category into the prediction model is difficult. Thus, we did not include the disease category in the model.

In conclusion, we successfully developed an ML model that can predict the risk of hospitalization death in hospitalized patients with AECOPD early, exclusively utilizing clinically available data, especially echocardiographic variables. With high accuracy and reliability, the final LightGBM model can guide clinicians to take appropriate preventive measures to achieve personalized treatment and improve the clinical prognosis of high-risk patients with AECOPD.

Methods

Study population

This multicentre, retrospective study included AECOPD patients admitted to the hospital for the derivation and validation of the prediction model. The cohort consisted of hospitalized patients with AECOPD from the Department of Respiratory and Critical Care Medicine of two separate tertiary hospitals, the Second Affiliated Hospital of Xi’an Jiaotong University (between April 2017 and August 2023), and the First Affiliated Hospital of Guangzhou Medical University (between February 2012 and July 2023). The diagnosis of COPD was based on the criteria established by the Global Initiative for Chronic Obstructive Lung Disease (GOLD): forced spirometry showing the presence of a post-bronchodilator FEV1/FVC < 0.7 in any patient who has dyspnea, chronic cough or sputum production, a history of recurrent lower respiratory tract infections and/or a history of exposure to risk factors for the disease. According to the GOLD guidelines, AECOPD is defined on the basis of patient symptoms: acute deterioration of respiratory symptoms necessitating additional treatment¹¹. Medical records were reviewed by respiratory physicians to confirm AECOPD and exclude alternative diagnoses. Adult patients who underwent transthoracic echocardiography were included. The exclusion criteria were as follows: (1) complicated with various diseases such as cardiovascular and cerebrovascular diseases, kidney diseases, autoimmune diseases and tumors; (2) insufficient clinical and laboratory data.

Ethics statement

This study was performed in accordance with the Declaration of Helsinki, with the approval of the Institutional Ethics Review Board of the Second Affiliated Hospital of Xi’an Jiaotong University (Ethics Approval NO.186 in 2024). Due to the retrospective nature of the study, the Institutional Ethics Review Board of the Second Affiliated Hospital of Xi’an Jiaotong University waived the need of obtaining informed consent.

Data collection and processing

We utilized the Big Data Platform for Respiratory Medicine, an integrated database of the EMR system, to construct prediction models. All the staff involved participated in training sessions on data extraction. We extracted clinical and laboratory data, including demographic characteristics, vital sign measurements, blood test parameters and echocardiographic variables. All data of eligible patients obtained from the database were systematically extracted into standardized forms and meticulously reviewed.

In the following analyses, more than 30% of the missing parameters were excluded to minimize the bias resulting from missing data, and missing data were addressed via the K-nearest neighbour (KNN) algorithm. Given that multicollinearity among parameters may affect prediction accuracy, we eliminated one parameter that was less correlated with outcome when two parameters were highly correlated (correlation coefficient > 0.6) in Spearman’s correlation analyses. Finally, 28 variables, including sex, age, smoking status, systolic blood pressure (SBP), diastolic blood pressure (DBP), neutrophil percentage (NEUT%), eosinophil percentage (EOS%), monocyte percentage (MONO%), eosinophil (EO), monocyte (MONO), lymphocyte (LYMP), platelet (PLT), haematocrit (HCT), mean platelet volume (MPV), red cell distribution width (RDW), albumin (ALB), N-terminal prohormone of brain natriuretic peptide (NT-proBNP), cardiac troponin T (cTnT), prothrombin time activity (PTA), thrombin time (TT), D-dimer (DD), right ventricle diameter (RVD), right atrial diameter (RAD), pulmonary artery diameter (PAD), pulmonary valve (PV) flow velocity, maximum tricuspid regurgitation velocity (Vmax), pulmonary arterial systolic pressure (PASP), and ejection fraction (EF) were utilized in machine algorithms to generate the prediction models.

Model development and comparison

To avoid overfitting, the cohort comprising two independent tertiary hospitals was divided with 70% utilized for the training cohort and 30% for the validation cohort.

A total of 28 variables mentioned above were used to develop the prediction models. Eleven ML models, namely, eXtreme gradient boosting (XGboost), adaptive boosting (AdaBoost), light gradient boosting machine (LightGBM), gradient boosting machine (GBM), random forest (RF), logistic regression (LR), K-nearest neighbour (KNN), decision tree (DT), artificial neutral network (ANN), extra tree (ET), and support vector machine (SVM) were used to predict the risk of hospitalization death in hospitalized patients with AECOPD. XGboost is an ensemble learning model based on the gradient boosting algorithm; it performs well in terms of efficiency and accuracy when processing structured data and large-scale datasets.⁵¹ Based on the classification error rate of the sample, AdaBoost adjusts the weight of the sample and pays more attention to samples with higher classification errors, and the final classification result is determined through weighted voting^52,53. LightGBM is an ensemble model that works on a decision tree algorithm as a weak learner. It uses a novel technique called histogram-based binning and learns more efficiently than other algorithms do⁵⁴. The GBM is a gradient descent-based formulation of boosting methods. It can be considered an optimization model aimed at training a series of weak-learner models, which sequentially minimizes a predefined loss function⁵⁵. RF is an ensemble classifier that aggregates the results of multiple single decision trees into one result; it is widely used as a classification model that compensates for the disadvantage of overfitting in a single decision tree and reduces variance^56,57. LR is a learning algorithm with a logistic function at its core. By assessing the relationships between dependent variables and one or more independent variables, classification probabilities are derived via logical functions⁵⁸. KNN is an instance-based learning algorithm that is suitable for problems with multicategory classification and nonlinear decision boundaries. It measures the distance between samples and classifies a new sample into the most common category among its K nearest neighbours⁵⁹. A DT model, with its clear cut-off points, is a user-friendly tool that can significantly aid treatment decision-making in clinical settings⁶⁰. ANNs are effective tools for nonlinear multivariate modelling and are capable of learning trends in historical data⁶¹. ET is a variant of RF that increases randomness by arbitrarily splitting each node with a candidate feature and choosing the best split⁶². SVM is a new trainable learning method based on statistical learning theory. It studies the law of the ML approach under the condition of limited training samples and explains the sample classification problem of the ML approach⁶³.

The performances of the ML models were assessed by the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 score. Furthermore, DCA, P–R curve analysis and calibration were used to assess the diagnostic performance of the model. In addition, to avoid overfitting, five-fold and ten-fold cross validations were used in the validation of the prediction model.

Model explanation

The SHAP score⁶⁴ can provide a quantitative description of the overall relationship between hospitalization death and all features according to the premodel; the SHAP score was calculated to assess the importance of each feature and reduce the features of the prediction model in order of importance. We sequentially integrated features and gradually added the next feature, starting with the most important feature in accordance with the feature importance rank. We compared the differences between various AUCs via R software version 4.0.3 and ultimately selected the number of features that generated the highest AUC in feature inclusion iterations. The final model with the best predictive ability was chosen for further analysis. In addition, the SHAP method offered global and local explanations for the model explanation.

Statistical analysis

R software version 4.0.3 and Python programming software version 3.6.5 were used to conduct the data analyses. Patient data are presented as continuous or categorical variables. The Shapiro‒Wilk test was used to assess whether the data followed a normal distribution. Continuous predictors with skewed distributions are described as medians with interquartile ranges and were compared via the Mann–Whitney U test or Kruskal–Wallis H test, whereas categorical predictors are described as counts with percentages and were compared via the chi-square test or Fisher’s exact test. The AUC was used to evaluate the predictive power, and the optimal cut-off was estimated via the Youden index (sensitivity + specificity-1). DCA, P–R and calibration curve analyses were conducted with R software version 4.0.3. All the statistical tests were two-sided, with a statistically significant level of P < 0.05.

Finally, the web-based tool was developed to facilitate its utility in clinical scenarios. The web application is accessible online at https://winxwqz6savjnnzmfdkwgl.streamlit.app/.

Data availability

The datasets used and analyzed during the present study are available from the corresponding author on reasonable request.

References

Agustí, A. et al. Global initiative for chronic obstructive lung disease 2023 report: GOLD executive summary. Am. J. Respir. Crit. Care Med. 207, 819–837. https://doi.org/10.1164/rccm.202301-0106PP (2023).
Article PubMed PubMed Central Google Scholar
Hurst, J. R. et al. Understanding the impact of chronic obstructive pulmonary disease exacerbations on patient health and quality of life. Eur. J. Intern. Med. 73, 1–6. https://doi.org/10.1016/j.ejim.2019.12.014 (2020).
Article PubMed Google Scholar
Celli, B. R. et al. An updated definition and severity classification of chronic obstructive pulmonary disease exacerbations: The Rome proposal. Am. J. Respir. Crit. Care Med. 204, 1251–1258. https://doi.org/10.1164/rccm.202108-1819PP (2021).
Article PubMed Google Scholar
Ko, F. W. et al. Acute exacerbation of COPD. Respirology Carlton Vic. 21, 1152–1165. https://doi.org/10.1111/resp.12780 (2016).
Article PubMed Google Scholar
Foo, J. et al. Continuing to conzfront COPD international patient survey: Economic impact of COPD in 12 countries. PLoS ONE 11, e0152618. https://doi.org/10.1371/journal.pone.0152618 (2016).
Article CAS PubMed PubMed Central Google Scholar
Gudmundsson, G. et al. Long-term survival in patients hospitalized for chronic obstructive pulmonary disease: A prospective observational study in the Nordic countries. Int. J. Chron. Obstruct. Pulmon. Dis. 7, 571–576. https://doi.org/10.2147/copd.S34466 (2012).
Article PubMed PubMed Central Google Scholar
Eriksen, N. & Vestbo, J. Management and survival of patients admitted with an exacerbation of COPD: Comparison of two Danish patient cohorts. Clin. Respir. J. 4, 208–214. https://doi.org/10.1111/j.1752-699X.2009.00177.x (2010).
Article PubMed Google Scholar
Groenewegen, K. H., Schols, A. M. & Wouters, E. F. Mortality and mortality-related factors after hospitalization for acute exacerbation of COPD. Chest 124, 459–467. https://doi.org/10.1378/chest.124.2.459 (2003).
Article PubMed Google Scholar
Bond, E. G. et al. Understanding resource utilization and mortality in COPD to support policy making: A microsimulation study. PLoS ONE 15, e0236559. https://doi.org/10.1371/journal.pone.0236559 (2020).
Article CAS PubMed PubMed Central Google Scholar
Agustí, A. et al. Global initiative for chronic obstructive lung disease 2023 report: GOLD executive summary. Eur. Respir. J. https://doi.org/10.1183/13993003.00239-2023 (2023).
Article PubMed PubMed Central Google Scholar
Singh, D. et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive lung disease: The GOLD science committee report 2019. Eur. Respir. J. https://doi.org/10.1183/13993003.00164-2019 (2019).
Article PubMed PubMed Central Google Scholar
Ruiying, W., Zhaoyun, & Jianying, X. Clinical features and three-year prognosis of AECOPD patients with different levels of blood eosinophils. Heart Lung J. Crit. Care 56, 29–39. https://doi.org/10.1016/j.hrtlng.2022.05.012 (2022).
Article Google Scholar
Yu, S., Zhang, J., Fang, Q. & Tong, Z. Blood eosinophil levels and prognosis of hospitalized patients with acute exacerbation of chronic obstructive pulmonary disease. Am. J. Med. Sci. 362, 56–62. https://doi.org/10.1016/j.amjms.2021.02.013 (2021).
Article PubMed Google Scholar
Fawzy, A. et al. Association of platelet count with all-cause mortality and risk of cardiovascular and respiratory morbidity in stable COPD. Respir. Res. 20, 86. https://doi.org/10.1186/s12931-019-1059-1 (2019).
Article PubMed PubMed Central Google Scholar
Harrison, M. T. et al. Thrombocytosis is associated with increased short and long term mortality after exacerbation of chronic obstructive pulmonary disease: A role for antiplatelet therapy?. Thorax 69, 609–615. https://doi.org/10.1136/thoraxjnl-2013-203996 (2014).
Article PubMed Google Scholar
Hu, Y., Long, H., Cao, Y. & Guo, Y. Prognostic value of lymphocyte count for in-hospital mortality in patients with severe AECOPD. BMC Pulm. Med. 22, 376. https://doi.org/10.1186/s12890-022-02137-1 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. C-reactive protein to serum albumin ratio as a novel biomarker to predict prognosis in patients with chronic obstructive pulmonary disease. Clin. Labor. https://doi.org/10.7754/Clin.Lab.2020.200630 (2021).
Article Google Scholar
Mohan, M., Parthasarathi, A., Siddaiah, J. B., Mahesh, P. A. & SK, C. Fibrinogen: A feasible biomarker in identifying the severity and acute exacerbation of chronic obstructive pulmonary disease. Cureus 13, e16864. https://doi.org/10.7759/cureus.16864 (2021).
Article PubMed PubMed Central Google Scholar
Sun, W. et al. Fibrinogen, a promising marker to evaluate severity and prognosis of acute exacerbation of chronic obstructive pulmonary disease: A retrospective observational study. Int. J. Chron. Obstruct. Pulmon. Dis. 17, 1299–1310. https://doi.org/10.2147/copd.S361929 (2022).
Article PubMed PubMed Central Google Scholar
Holland, M., Alkhalil, M., Chandromouli, S., Janjua, A. & Babores, M. Eosinopenia as a marker of mortality and length of stay in patients admitted with exacerbations of chronic obstructive pulmonary disease. Respirology Carlton Vic. 15, 165–167. https://doi.org/10.1111/j.1440-1843.2009.01651.x (2010).
Article PubMed Google Scholar
MacDonald, M. I. et al. Low and high blood eosinophil counts as biomarkers in hospitalized acute exacerbations of COPD. Chest 156, 92–100. https://doi.org/10.1016/j.chest.2019.02.406 (2019).
Article PubMed Google Scholar
Andrijevic, I. et al. N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as a diagnostic biomarker of left ventricular systolic dysfunction in patients with acute exacerbation of chronic obstructive pulmonary disease (AECOPD). Lung 196, 583–590. https://doi.org/10.1007/s00408-018-0137-3 (2018).
Article CAS PubMed Google Scholar
Rui, F. et al. Development of a machine learning-based model to predict hepatic inflammation in chronic hepatitis B patients with concurrent hepatic steatosis: A cohort study. EClinicalMedicine 68, 102419. https://doi.org/10.1016/j.eclinm.2023.102419 (2024).
Article PubMed PubMed Central Google Scholar
Wu, C. T. et al. Acute Exacerbation of a chronic obstructive pulmonary disease prediction system using wearable device data, machine learning, and deep learning: Development and cohort study. JMIR Mhealth Uhealth 9, e22591. https://doi.org/10.2196/22591 (2021).
Article PubMed PubMed Central Google Scholar
Yin, H. et al. A machine learning model for predicting acute exacerbation of in-home chronic obstructive pulmonary disease patients. Comput. Methods Programs Biomed. 246, 108005. https://doi.org/10.1016/j.cmpb.2023.108005 (2024).
Article PubMed Google Scholar
Wang, C. et al. Comparison of machine learning algorithms for the identification of acute exacerbations in chronic obstructive pulmonary disease. Comput. Methods Programs Biomed. 188, 105267. https://doi.org/10.1016/j.cmpb.2019.105267 (2020).
Article PubMed Google Scholar
Kor, C. T. et al. Explainable machine learning model for predicting first-time acute exacerbation in patients with chronic obstructive pulmonary disease. J. Pers. Med. https://doi.org/10.3390/jpm12020228 (2022).
Article PubMed PubMed Central Google Scholar
Saria, S., Butte, A. & Sheikh, A. Better medicine through machine learning: What’s real, and what’s artificial?. PLoS Med 15, e1002721. https://doi.org/10.1371/journal.pmed.1002721 (2018).
Article PubMed PubMed Central Google Scholar
Motwani, M. et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: A 5-year multicentre prospective registry analysis. Eur. Heart J. 38, 500–507. https://doi.org/10.1093/eurheartj/ehw188 (2017).
Article PubMed Google Scholar
Dawes, T. J. W. et al. Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: A cardiac MR imaging study. Radiology 283, 381–390. https://doi.org/10.1148/radiol.2016161315 (2017).
Article PubMed Google Scholar
Azodi, C. B., Tang, J. & Shiu, S. H. Opening the black box: Interpretable machine learning for geneticists. Trends Genet. 36, 442–455. https://doi.org/10.1016/j.tig.2020.03.005 (2020).
Article CAS PubMed Google Scholar
Hu, J. et al. Identification and validation of an explainable prediction model of acute kidney injury with prognostic implications in critically ill children: A prospective multicenter cohort study. EClinicalMedicine 68, 102409. https://doi.org/10.1016/j.eclinm.2023.102409 (2024).
Article PubMed PubMed Central Google Scholar
Montes de Oca, M. & Laucho-Contreras, M. E. Is it time to change the definition of acute exacerbation of chronic obstructive pulmornary disease? What do we need to add?. Med. Sci. https://doi.org/10.3390/medsci6020050 (2018).
Article Google Scholar
Zhang, J. et al. A simple clinical risk score (ABCDMP) for predicting mortality in patients with AECOPD and cardiovascular diseases. Respir. Res. 25, 89. https://doi.org/10.1186/s12931-024-02704-6 (2024).
Article PubMed PubMed Central Google Scholar
Gomes, L., Pereira, S., Sousa-Pinto, B. & Rodrigues, C. Performance of risk scores in patients with acute exacerbations of COPD. Jornal brasileiro de pneumologia : publicacao oficial da Sociedade Brasileira de Pneumologia e Tisilogia 49, e20230032. https://doi.org/10.36416/1806-3756/e20230032 (2023).
Article PubMed Google Scholar
Goecks, J., Jalili, V., Heiser, L. M. & Gray, J. W. How machine learning will transform biomedicine. Cell 181, 92–101. https://doi.org/10.1016/j.cell.2020.03.022 (2020).
Article CAS PubMed PubMed Central Google Scholar
Ramalingam, K. et al. Light gradient boosting-based prediction of quality of life among oral cancer-treated patients. BMC Oral Health 24, 349. https://doi.org/10.1186/s12903-024-04050-x (2024).
Article PubMed PubMed Central Google Scholar
Wang, X. et al. Unraveling variations and enhancing prediction of successful sphincter-preserving resection for low rectal cancer: A post hoc analysis of the multicentre LASRE randomized clinical trial. Int. J. Surg. 110, 4031–4042. https://doi.org/10.1097/js9.0000000000001014 (2024).
Article PubMed PubMed Central Google Scholar
Park, Y. W. et al. A fully automatic multiparametric radiomics model for differentiation of adult pilocytic astrocytomas from high-grade gliomas. Eur. Radiol. 32, 4500–4509. https://doi.org/10.1007/s00330-022-08575-z (2022).
Article CAS PubMed Google Scholar
Li, Y., Sperrin, M., Ashcroft, D. M. & van Staa, T. P. Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: Longitudinal cohort study using cardiovascular disease as exemplar. BMJ 371, m3919. https://doi.org/10.1136/bmj.m3919 (2020).
Article PubMed PubMed Central Google Scholar
Cowburn, A. S., Condliffe, A. M., Farahi, N., Summers, C. & Chilvers, E. R. Advances in neutrophil biology: Clinical implications. Chest 134, 606–612. https://doi.org/10.1378/chest.08-0422 (2008).
Article CAS PubMed Google Scholar
Stockley, J. A., Walton, G. M., Lord, J. M. & Sapey, E. Aberrant neutrophil functions in stable chronic obstructive pulmonary disease: The neutrophil as an immunotherapeutic target. Int. Immunopharmacol. 17, 1211–1217. https://doi.org/10.1016/j.intimp.2013.05.035 (2013).
Article CAS PubMed Google Scholar
Lonergan, M. et al. Blood neutrophil counts are associated with exacerbation frequency and mortality in COPD. Respir. Res. 21, 166. https://doi.org/10.1186/s12931-020-01436-7 (2020).
Article CAS PubMed PubMed Central Google Scholar
Husebø, G. R. et al. Coagulation markers as predictors for clinical events in COPD. Respirology Carlton, Vic. 26, 342–351 (2021).
Article PubMed Google Scholar
Donaldson, G. C., Hurst, J. R., Smith, C. J., Hubbard, R. B. & Wedzicha, J. A. Increased risk of myocardial infarction and stroke following exacerbation of COPD. Chest 137, 1091–1097. https://doi.org/10.1378/chest.09-2029 (2010).
Article PubMed Google Scholar
Halpin, D. M. et al. Risk of nonlower respiratory serious adverse events following COPD exacerbations in the 4-year UPLIFT® trial. Lung 189, 261–268. https://doi.org/10.1007/s00408-011-9301-8 (2011).
Article PubMed PubMed Central Google Scholar
Kunisaki, K. M. et al. Exacerbations of chronic obstructive pulmonary disease and cardiac events. A post hoc cohort analysis from the summit randomized clinical trial. Am. J. Respire. Crit. Care Med. 198, 51–57. https://doi.org/10.1164/rccm.201711-2239OC (2018).
Article CAS Google Scholar
Kovacs, G. et al. Severe pulmonary hypertension in COPD: Impact on survival and diagnostic approach. Chest 162, 202–212. https://doi.org/10.1016/j.chest.2022.01.031 (2022).
Article PubMed PubMed Central Google Scholar
Konety, S. H. et al. Echocardiographic predictors of sudden cardiac death: The atherosclerosis risk in communities study and cardiovascular health study. Circ. Cardiovasc. Imaging https://doi.org/10.1161/circimaging.115.004431 (2016).
Article PubMed PubMed Central Google Scholar
Lundorff, I. et al. Echocardiographic predictors of cardiovascular morbidity and mortality in women from the general population. Eur. Heart J. Cardiovasc. Imaging 22, 1026–1034. https://doi.org/10.1093/ehjci/jeaa167 (2021).
Article PubMed Google Scholar
Ester, M., Kriegel, H. P. & Xu, X. XGBoost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (vol, pp 785, 2016). Geographical Analysis, https://doi.org/10.1111/gean.12315 (2022).
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory. Second European Conference, EuroCOLT ‘95. Proceedings, pp. 23–37 (1995).
Avidan, S. Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29, 261–271. https://doi.org/10.1109/tpami.2007.35 (2007).
Article ADS PubMed Google Scholar
Ke, G. L. et al. in 31st Annual Conference on Neural Information Processing Systems (NIPS). (2017).
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232. https://doi.org/10.1214/aos/1013203451 (2001).
Article MathSciNet Google Scholar
Chang, D. et al. Machine learning models are superior to noninvasive tests in identifying clinically significant stages of NAFLD and NAFLD-related cirrhosis. Hepatology 77, 546–557. https://doi.org/10.1002/hep.32655 (2023).
Article CAS PubMed Google Scholar
Zhang, K. et al. Machine learning-reinforced noninvasive biosensors for healthcare. Adv. Healthc. Mater. 10, e2100734. https://doi.org/10.1002/adhm.202100734 (2021).
Article CAS PubMed Google Scholar
Jiang, X., Zhang, Y., Li, Y. & Zhang, B. Forecast and analysis of aircraft passenger satisfaction based on RF-RFE-LR model. Sci. Rep. 12, 11174. https://doi.org/10.1038/s41598-022-14566-3 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
D’Amato, M. et al. A machine learning approach to characterize patients with asthma exacerbation attending an acute care setting. Eur. J. Intern. Med. 104, 66–72. https://doi.org/10.1016/j.ejim.2022.07.019 (2022).
Article PubMed Google Scholar
Wang, K. et al. A decision tree model to help treatment decision-making for severe spontaneous intracerebral hemorrhage. Int. J. Surg. 110, 788–798. https://doi.org/10.1097/js9.0000000000000852 (2024).
Article PubMed Google Scholar
Salari, M., Nikoo, M. R., Al-Mamun, A., Rakhshandehroo, G. R. & Mooselu, M. G. Optimizing Fenton-like process, homogeneous at neutral pH for ciprofloxacin degradation: Comparing RSM-CCD and ANN-GA. J. Environ. Manag. 317, 115469. https://doi.org/10.1016/j.jenvman.2022.115469 (2022).
Article CAS Google Scholar
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42. https://doi.org/10.1007/s10994-006-6226-1 (2006).
Article Google Scholar
Chen, Y. et al. Privacy-preserving multi-class support vector machine model on medical diagnosis. IEEE J. Biomed. Health Inform. 26, 3342–3353. https://doi.org/10.1109/jbhi.2022.3157592 (2022).
Article PubMed Google Scholar
Lundberg, S. M., Erion, G. G. & Su-In, L. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv. arXiv (USA), p. 9. (2018).

Download references

Acknowledgements

We thank the colleagues in our department for their help in our study.

Funding

This work was supported by the National Natural Science Foundation of China (NSFC) (grant number: 82070035, 82370043). The sponsor had no role in the design and conduct of the study.

Author information

Authors and Affiliations

Department of Respiratory and Critical Care Medicine, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an, Shaanxi, China
Yan Zhang, Shuping Zheng, Dan Wang, Yu Ma, Zhihui Qiang, Tianyi Zhang, Haicheng Zhong, Miaomiao Zhou, Zhuoyang Li & Yun Liu
State Key Laboratory of Respiratory Disease, Guangdong Key Laboratory of Vascular Disease, National Clinical Research Center for Respiratory Disease, Guangzhou Institute of Respiratory Health, The First Affiliated Hospital of Guangzhou Medical University, Guangzhou, Guangdong, China
Fanjie Lin & Wenju Lu
Department of Information, The Second Affiliated Hospital of Xi’an Jiaotong University, Xi’an , Shaanxi, China
Penggang Chen
Guangzhou Tianpeng Technology Co., Ltd., Guangzhou, Guangdong, China
Jieyu Feng

Authors

Yan Zhang
View author publications
Search author on:PubMed Google Scholar
Shuping Zheng
View author publications
Search author on:PubMed Google Scholar
Dan Wang
View author publications
Search author on:PubMed Google Scholar
Fanjie Lin
View author publications
Search author on:PubMed Google Scholar
Yu Ma
View author publications
Search author on:PubMed Google Scholar
Zhihui Qiang
View author publications
Search author on:PubMed Google Scholar
Tianyi Zhang
View author publications
Search author on:PubMed Google Scholar
Haicheng Zhong
View author publications
Search author on:PubMed Google Scholar
Miaomiao Zhou
View author publications
Search author on:PubMed Google Scholar
Zhuoyang Li
View author publications
Search author on:PubMed Google Scholar
Penggang Chen
View author publications
Search author on:PubMed Google Scholar
Jieyu Feng
View author publications
Search author on:PubMed Google Scholar
Wenju Lu
View author publications
Search author on:PubMed Google Scholar
Yun Liu
View author publications
Search author on:PubMed Google Scholar

Contributions

All authors had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. YM, ZQ, TZ, HZ, MZ, ZL, PC and JF participated in data collection. YZ, SZ, DW, FL, WL and YL participated in the design of the study, the analysis and interpretation of data. YZ drafted the manuscript. YZ, WL and YL have accessed and verified the data. All authors participated in the critical revision of the manuscript and approved the final manuscript.

Corresponding authors

Correspondence to Wenju Lu or Yun Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhang, Y., Zheng, S., Wang, D. et al. Development and validation of a machine learning-based model to predict the risk of hospitalization death in hospitalized patients with AECOPD. Sci Rep 15, 35918 (2025). https://doi.org/10.1038/s41598-025-19810-0

Download citation

Received: 28 October 2024
Accepted: 10 September 2025
Published: 14 October 2025
Version of record: 14 October 2025
DOI: https://doi.org/10.1038/s41598-025-19810-0

Subjects

Abstract

Similar content being viewed by others

Prediction of mortality risk in patients with severe community-acquired pneumonia in the intensive care unit using machine learning

Machine learning for the development of diagnostic models of decompensated heart failure or exacerbation of chronic obstructive pulmonary disease

Machine learning for risk prediction of acute kidney injury in patients with diabetes mellitus combined with heart failure during hospitalization

Introduction

Results

Patient characteristics

Model development and performance comparison

Identification of the final model

Diagnostic performance of the final LightGBM model

Model explanation

Discussion

Methods

Study population

Ethics statement

Data collection and processing

Model development and comparison

Model explanation

Statistical analysis

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links