Abstract
Acute exacerbation of chronic obstructive pulmonary disease (AECOPD) is a leading cause of hospitalization and death in COPD patients. Machine learning (ML) approach is powerful but has a “black box” issue with an undirect interpretation of the ML technique. Herein, we conducted a multicentre, retrospective cohort study in two tertiary hospitals across China, primarily utilizing echocardiographic variables to build and validate an explainable prediction model based on a ML approach to predict the hospitalization death of AECOPD. For model explainability, we utilized a model-agnostic SHapley Additive exPlanations explainer to interpret the output of our final model. Our results showed that the light gradient boosting machine (LightGBM) model achieved the best performance among the 11 ML models. After reducing features according to the feature importance rank, an explainable final LightGBM model was established with 9 features (AUC = 0.956, accuracy = 92.1%, sensitivity = 0.891, specificity = 0.933, PPV = 0.852, NPV = 0.952, F1 score = 0.871). To facilitate its utility for clinicians, this final explainable model had been translated into a convenient application. In addition, the LightGBM model mitigated the concern of the “black-box” via a global and a local explanation of the SHAP method. A publicly accessible web tool was generated for the model. These findings further hold promise for guiding clinical management and improving patient outcomes.
Similar content being viewed by others
Introduction
Acute exacerbation of chronic obstructive pulmonary disease (AECOPD) is characterized by rapid worsening of respiratory symptoms, accelerated decline in airway function, and reduced quality of life1,2. Notably, AECOPD represents the leading cause of hospitalization and death in chronic obstructive pulmonary disease (COPD) patients, resulting in more than three million deaths annually3,4, and is a significant financial burden to health care systems5. Although mortality rates following hospitalization for acute exacerbation of COPD are declining, reported rates still vary from 23 to 80%. Progressive respiratory failure, cardiovascular disease (CVD), malignancies and other diseases are the primary causes of death in people with COPD hospitalized for an exacerbation6,7,8. COPD poses a significant but heterogeneous burden to individuals and healthcare systems. Policymakers develop targeted policies that will minimize this burden and target them to subpopulations most likely to benefit, which offers several significant administrative benefits for hospitals that can enhance overall patient care, resource allocation, and operational efficiency9. Therefore, timely identification of patients at high risk for death after hospitalization for acute exacerbation of COPD, which may reduce the associated mortality and financial burden, is highly important.
At present, clinical scoring systems such as the modified Medical Research Council (mMRC) score and COPD Assessment Test (CAT) questionnaire are used to assess AECOPD. Many components of these scoring systems rely on subjective clinical assessments, primarily focus on clinical signs and symptoms and often do not incorporate a comprehensive set of biomarkers that can provide more objective insights into the underlying pathophysiology of AECOPD10,11. Thus, these scoring systems have several limitations that make objective measurement of AECOPD challenging. An increasing number of studies have confirmed that blood cell counts, e.g., eosinophil, platelet, and lymphocyte counts, are associated with AECOPD12,13,14,15,16. Other studies have reported that circulating biomarkers of inflammation, e.g., C-reactive protein (CRP)17 and fibrinogen18,19, can be used to predict the risk of AECOPD. In addition, previous studies have suggested that the presence of a low blood eosinophil count significantly increases the risk of in-hospital mortality rates in hospitalized patients with AECOPD20,21. Although numerous clinical studies have searched for novel, easy to measure prognostic biomarkers to develop more effective predictive models and enhance risk prediction over clinical assessment, the search for a single biomarker seems to have been unsuccessful owing to the complex pathophysiology of COPD. CVD is the most common comorbidity in patients with COPD and is a common underlying cause of COPD exacerbation22. Therefore, cardiovascular risk factors can predict adverse outcomes of COPD patients. However, few studies have focused on the role of baseline echocardiographic abnormalities in patients with AECOPD.
Electronic medical records (EMRs) have been gaining widespread use in hospitals for many years, which has made it possible for clinicians and researchers to collect the clinical data of patients more accurately and conveniently. To date, numerous studies have applied machine learning (ML) approaches to facilitate disease prediction, and most have shown a good predictive value, making ML-based models valuable tools for implementation in clinical practice23. The ML technique is a powerful computational method for handling highly variable datasets and understanding the complex relationships between variables in a way that can be trained. In recent years, an increasing number of studies have concentrated on AECOPD prediction via the ML approach to identify which features are most important for case identification and predicting exacerbations24,25,26,27. ML methods have the potential to improve predictive modeling of health outcomes28, and were used to improve prediction of 5-year all-cause mortality in subjects undergoing CT coronary angiography29 and cardiac motion MRI30. These examples highlight the growing body of research leveraging ML for clinical predictions and underscore the importance of our work in this context. Although the ML technique is powerful due to the complexity of the model, obtaining the correct interpretation of an ML model is challenging; it is still limited by the difficulty of stating a direct interpretation, such as a so-called “black box”31. To overcome the “black-box” issue with little explanation about how predictions are derived, the SHapley Additive exPlanation (SHAP) method, which can rank the importance of input features, was utilized to explain the ML models and visualize individual variable predictions32. By decomposing the model’s predictions into contributions from individual features, SHAP enhances the interpretability of ML model. This transparency is crucial for clinical applications, where understanding the reasoning behind predictions can inform decision-making. SHAP helps identify key drivers of the outcome, which can lead to actionable insights23,32. The SHAP method is a unified approach for explaining the outputs of ML models in earlier studies, including a study on predicting the first exacerbation of COPD27. However, there are no studies on hospitalization death in which the SHAP method has been used to explain prediction models for hospitalized patients with AECOPD.
In the present study, we aimed to develop and validate an explainable ML-based model for hospitalization death in hospitalized patients with AECOPD by analysing and mining medical big data, elucidating feature importance and explaining the model via the SHAP method. Furthermore, this method enables early and accurate identification of AECOPD patients at high risk of hospitalization death so that prompt therapeutic measures can be initiated and prognoses are improved in clinical settings.
Results
Patient characteristics
A total of 2924 patients with AECOPD who underwent transthoracic echocardiography were identified. Among them, 275 patients who failed to meet the inclusion criteria were excluded. Finally, 2649 patients were included for analysis. During the multiple model comparison, we employed stratified sampling to randomly split our data set into 70% and 30% partitions for the training set (1854 patients) and the validation set (795 patients), respectively. Details of the study design are displayed in Fig. 1.
The baseline characteristics of the derivation cohort (n = 2649), training cohort (n = 1854), and validation cohort (n = 795) are described in Table 1. Among the 2649 participants in the derivation cohort, 79.92% were male and 53.11% had a history of smoking. Consistent with previous studies, most of the patients were elderly, and their median age was 75.0 years. In the training cohort, the median age mirrored that of the derivation cohort at 75.0 years; 79.50% of them were male and 52.86% had a history of smoking. For the validation cohort, the median age was also 75.0 years, 80.88% were male, and 53.71% had a history of smoking.
In the population, 796 (30.05%) patients died during hospitalization, and 1853 (69.95%) patients survived hospitalization. A comparison of the demographic and clinical characteristics between the survival group and the death group is presented in Supplementary Table S1.
Model development and performance comparison
The top 20 variables from the training cohort were used to generate 11 ML models to predict the risk of hospitalization death in hospitalized patients with AECOPD. Among the 11 models, the LightGBM model (AUC = 0.962) had the best predictive effect, followed by the GBM model (AUC = 0.951) and XGboost model (AUC = 0.945). The discriminative performances of these 11 models are listed in Table 2. The sensitivity, specificity, PPV, NPV, accuracy, and F1 score were calculated at the optimal cut-off value that maximized the Youden index. The ROC curves and the SHAP summary plots of the top 20 features for the top five best-performing ML models are presented in Fig. 2A and Fig. 3A–E. As shown in Fig. 2B, the LightGBM model was found to obtain nearly the optimal AUC and the best predictive ability among these five models during the process of reducing features based on the feature importance rank.
Performance of ML models to predict the risk of hospitalization death in hospitalized patients with AECOPD. (A) ROC curves of the top five best-performing ML models. (B) AUCs of the top five best-performing ML models with varied numbers of features. ML, machine learning; AECOPD, acute exacerbation of chronic obstructive pulmonary disease; ROC: receiver operating characteristic; AUC: area under the ROC curve; LightGBM, light gradient boosting machine; GBM, gradient boosting machine; XGboost, eXtreme gradient boosting; AdaBoost: adaptive boosting; RF, random forest.
SHAP summary dot plots of the top 20 features of the top five best-performing ML models. These plots represented the feature important rank of ML models in the validation cohort. A dot is made for each feature attribution value in the model for each single patient, so each patient has one dot on every line for each feature. The actual values of the features for each patient are shown by the colors of the dots: red means a higher actual value, and blue means a lower actual value. The dots are stacked vertically to show density. (A) LightGBM, (B) GBM, (C) XGboost, (D) AdaBoost, (E) RF. SHAP: SHapley Additive explanation; ML, machine learning; LightGBM, light gradient boosting machine; GBM, gradient boosting machine; XGboost, eXtreme gradient boosting; AdaBoost: adaptive boosting; RF, random forest; PAD, pulmonary artery diameter; PV, pulmonary valve flow velocity; cTnT, cardiac troponin T; DD, D-dimer; Vmax, maximum tricuspid regurgitation velocity; NEUT%, neutrophil percentage; NT-proBNP, N-terminal prohormone of brain natriuretic peptide; EF, ejection fraction; RAD, right atrial diameter; LYMP, lymphocytes; ALB, albumin; PASP, pulmonary arterial systolic pressure; MONO%, monocyte percentage; PLT, platelets; RDW, red cell distribution width; HCT, hematocrit; PTA, prothrombin time activity; SBP, systolic blood pressure; RVD, right ventricle diameter; MPV, mean platelet volume; DBP, diastolic blood pressure; EOS%, eosinophil percentage; TT, thrombin time; EO, eosinophils.
Identification of the final model
Through multi-model comparison, it was found that the LightGBM model performed best, and the final model was identified during feature reduction of the LightGBM model. The 28-feature model was significantly better than the 3-feature model (△AUC = 0.020, P = 0.023) and the 6-feature model (△AUC = 0.011, P = 0.027); however, it was not significantly better than the 9-feature model (△AUC = 0.007, P = 0.086) and the 12-feature model (△AUC = 0.003, P = 0.351). This comparison of various features in the LightGBM model is displayed in Supplementary Table S2 and Supplementary Fig. S1A-B. To identify the appropriate number of features for this model, an 8-feature model (△AUC = 0.020, P = 0.023) and a 10-feature model (△AUC = 0.011, P = 0.027) were further analysed and were not significantly different from the 28-feature model (Supplementary Fig. S1C). Hence, we focused on the 9-feature LightGBM model, and it was selected as the final model for further analysis according to the significance of the following variables: PAD, PV, cTnT, DD, Vmax, NEUT%, NT-proBNP, EF, and RAD.
Diagnostic performance of the final LightGBM model
The final LightGBM model achieved an AUC of 0.956 with a sensitivity of 0.891, a specificity of 0.933, a PPV of 0.852, an NPV of 0.952, an accuracy of 0.921, and an F1 score of 0.871 for predicting hospitalization death in hospitalized patients with AECOPD. We also performed a decision curve analysis (DCA) to further assess the clinical utility of our model, as shown in Fig. 4A. The DCA revealed that our 9-feature final model, achieved a higher net benefit across a broad range of threshold probabilities. Specifically, the optimal decision threshold for our model was determined to be 0.229 based on the Youden index. At this threshold, the model demonstrated a sensitivity of 89.1% and a specificity of 93.3%. In clinical practice, this cut-off value would prioritize early intervention for patients with a predicted risk of death ≥ 22.9%. The net benefit corresponding to the cut-off value of 0.229 was calculated to be 0.253. Moreover, the area under the precision‒recall (P–R) curve of the 9-feature model was only marginally lower than that of the 28-feature model (Fig. 4B–F), indicating that the 9-feature model has high clinical utility. Then we used decile binning (10 equal sample size bins) to create the calibration curve. As can be seen from Supplementary Fig. S2, the calibration curve of the LightGBM model is quite close to the ideal diagonal line. The Brier score of the 9-feature model is 0.06, which is less than 0.1, indicates that the average squared difference between the predicted probabilities and the actual observed probabilities is relatively small. Overall, the calibration curve demonstrates a strong consistency between the predicted probabilities and the observed outcomes. These results indicate that our model can provide more accurate and clinically relevant predictions, thereby assisting clinicians in making better-informed decisions regarding the management of patients with AECOPD.
Net benefits of the LightGBM model by decision curve analysis and evaluation of LightGBM model’ performance by P-R curve. (A) DCA curves of the LightGBM model with different features. (B-F) P-R curves of the LightGBM model with 28 features (B), 10 features (C), 9 features (D), 8 features (E), and 3 features (F). These plots represented the predictive performance in the validation cohort. LightGBM, light gradient boosting machine; DCA: decision curve analysis; P-R: precision-recall; AP: average precision.
In addition, to validate the appropriate sample size for this study and the robustness of this model to site variation, we further performed five-fold and ten-fold cross validations. As presented in Supplementary Fig. S3, the final model displayed mean AUCs of 0.960 ± 0.009 and 0.962 ± 0.011 in the five-fold and ten-fold cross validations, respectively. The predictive values of PAD, PV, cTnT, DD, Vmax, NEUT%, NT-proBNP, EF, and RAD were further investigated and compared with the 9-feature final model (Supplementary Fig. S4A), and they all performed worse than the final model. The DCA curves also revealed that the final model had greater clinical utility than each variable (Supplementary Fig. S4B).
Model explanation
The SHAP method provides two types of explanations: a global explanation of the model at the feature level and a local explanation at the individual level. A global explanation could provide consistent and accurate attribution values for each feature and describe the overall functionality of the model. The local explanation details how a certain prediction is made for a patient by inputting individualized data. The SHAP approach can interpret the output of the final model by calculating the contribution of each variable to the prediction. As illustrated in Fig. 5A, B, the SHAP summary plots using the average SHAP values were used to evaluate the contribution of each feature to the model and are presented in descending order. Moreover, the SHAP dependence plot can be used to understand how a single feature affects the output of the prediction model. The real values versus the SHAP values of these 9 features are shown in Fig. 5C. SHAP values that are higher than zero correspond to a positive class prediction in the model, i.e., a higher risk of hospitalization death in hospitalized patients with AECOPD.
Global model explanation by the SHAP method. (A) SHAP summary bar plot. (B) SHAP summary dot plot. The probability of hospitalization death increases with the SHAP value of a feature. A dot is made for SHAP value in the model for each single patient, so each patient has one dot on the line for each feature. The colors of the dots demonstrate the actual values of the features for each patient, as red means a higher feature value and blue means a lower feature value. The dots are stacked vertically to show density. (C) SHAP dependence plot. Each dependence plot shows how a single feature affects the output of the prediction model, and each dot represents a single patient. The SHAP values for specific features exceeding zero push the decision towards the “death” class. SHAP: SHapley Additive explanation; PAD, pulmonary artery diameter; cTnT, cardiac troponin T; NEUT%, neutrophil percentage; PV, pulmonary valve flow velocity; DD, D-dimer; Vmax, maximum tricuspid regurgitation velocity; NT-proBNP, N-terminal prohormone of brain natriuretic peptide; EF, ejection fraction; RAD, right atrial diameter.
The waterfall plot (Fig. 6A, B) shows a patient who survived hospitalization and displays the actual measured values of the features. According to the prediction model, the decision for this case leaned towards “non-death” with a probability of 90.3% and “death” with a probability of 9.7%. We also observed a similar phenomenon for a patient who died during hospitalization (Fig. 6C, D), which also revealed the features pushing or pulling the decision towards the “death” class and their actual measured values. Figure 6C represents this patient in the “non-death” class with a probability of 2.4%, and Fig. 6D represents this patient in the “death” class with a probability of 97.6%. Additionally, the force plot of interpretation (Fig. 6E) illustrates that an increased red part for each individual patient represents a greater probability of “death.” The x-axis represents each patient, and the y-axis represents the contributions of the features. An increased red part for each individual patient represents a greater probability of “death.”
Local model explanation by the SHAP method. (A–D) Waterfall plot and evolution of risks contributed by each feature for individual patient at low (A–B) or high (C–D) risk of developing hospitalization death: A and C represented the individual patient towards the “non-death” class, and B and D represented the individual patient towards the “death” class. (E) Force plot for the validation set. Each patient was represented by the x-axis, while the features’ contributions were represented by the y-axis: an increased red part for each individual patient represented a greater probability towards the decision of “death”. SHAP: SHapley Additive explanation; PAD, pulmonary artery diameter; DD, D-dimer; Vmax, maximum tricuspid regurgitation velocity; PV, pulmonary valve flow velocity; NT-proBNP, N-terminal prohormone of brain natriuretic peptide; EF, ejection fraction; cTnT, cardiac troponin T; RAD, right atrial diameter; NEUT%, neutrophil percentage.
Discussion
To our knowledge, this is the first retrospective, multicentre cohort study to develop and validate a clinical prediction model for hospitalization death in hospitalized patients with AECOPD, primarily utilizing readily available clinical and laboratory data, with a focus on echocardiographic variables, via ML algorithms. We compared the baseline characteristics of hospitalized patients with AECOPD and identified a set of predictive variables to establish a prediction model. The established prediction model was accurate in both the training and validation cohorts and could be used to assist clinicians in early, accurate identification and personalized treatment of AECOPD.
Currently, there is a lack of consensus regarding the definition of AECOPD, in most of the international guidelines the diagnosis of an exacerbation relies exclusively on the clinical presentation of the patient complaining of acute changes of symptoms33. Clinical scoring systems for COPD, such as the mMRC score and CAT questionnaire, have been widely used to assess the severity and predict outcomes in patients with AECOPD10,11. Additionally, various scoring systems have been proposed to classify the risk of clinical deterioration or mortality in hospitalized patients with AECOPD3,34,35. However, there is a lack of objective measurements in the scoring systems currently. With the advancement of ML, clinicians and researchers are now able to address complex and extensive data, turning high volumes of data into feasible models to improve their ability to diagnose diseases. Moreover, sophisticated ML algorithms combined with EMR data can facilitate the development of clinical prediction models36. Among the 11 ML models, the LightGBM model had the best AUC value with good net benefit and a high threshold probability for feature reduction and has the potential to identify the risk of hospitalization death in hospitalized patients with AECOPD. Advanced algorithms such as light gradient boosting trees analyse and interpret massive amounts of patient data to improve forecasts, early detection, and personalized treatment strategies. The algorithm learns patterns and relationships in data by training an ML model on a dataset with known outcomes. LightGBM improves gradient boosting, training efficiency, and prediction accuracy in large datasets37. Several studies have proven that the LightGBM method has excellent predictive value in the field of medicine37,38,39. In this study, we employed the LightGBM algorithm to develop a final model with 9 features. These features, which are easily obtained or evaluated, make this model promising as an early discriminative tool for assessing disease severity for further examinations and aiding in treatment decisions.
More features may provide more information for the prediction model; however, noncausal features may reduce the accuracy of the prediction, and a large number of features may limit the clinical use of the model. In addition, there is a lack of guidelines or consensus for selecting features for the prediction model40. Although there is no evidence demonstrating how many features should be included in the model, the SHAP method was employed to assist feature selection. Our final model was established through a comparison of 11 ML models and feature reduction. In addition, it performed well in predicting mortality in COPD patients.
Compared with traditional single markers, the final model we developed had a superior ability. Blood eosinophil counts are widely studied, but few studies have examined the prognostic value of blood neutrophil counts in patients with AECOPD. In fact, only a minority of COPD patients exhibit eosinophilia; COPD is predominantly characterized as a neutrophilic inflammatory disorder41,42. Increased neutrophil counts reflect systemic inflammation, which can exacerbate respiratory symptoms and lead to multi-organ dysfunction, contributing to higher mortality risk in AECOPD. A recent study demonstrated that high blood neutrophil counts may be useful indicators in the risk of exacerbations and mortality in COPD patients43, which is consistent with our findings. Furthermore, activation of the blood coagulation system is a common observation in inflammatory diseases and is increased during COPD exacerbations. Researchers have confirmed that coagulation markers are potential predictors of later COPD exacerbation and mortality, and higher DDs in stable COPD patients predict higher mortality44. Elevated DD levels indicate a hypercoagulable state common in AECOPD patients due to inflammation and hypoxia, leading to pulmonary emboli formation, worsening respiratory function, and increased death risk. Thus, the correlation of DD with the poor prognosis of COPD patients has been well recognized, and it is well used in our model development. Although the underlying mechanism between AECOPD and CVD has not been elucidated, some studies have shown that AECOPD increases the risk for subsequent CVD45,46,47, which is one of the causes of death in COPD patients hospitalized for an exacerbation. The elevation of cTnT and NT-proBNP levels suggests myocardial injury and cardiac stress or dysfunction in AECOPD patients, increasing death risk via impaired cardiac function and predicting poor outcomes. Including the features of cTnT and NT-proBNP was beneficial for strengthening the predictive ability of the final model due to their critical significance in CVD diagnoses. Interestingly, the final model also consisted of PAD, PV, Vmax, EF, and RAD. Although there is no evidence demonstrating their ability to independently predict AECOPD, these echocardiographic variables may be associated with an increased risk of AECOPD. Indeed, increased PAD, PV, and Vmax indicate right heart issues like pulmonary hypertension and right ventricular dysfunction in AECOPD patients, leading to strain, failure, and increased mortality risk48. Reduced EF and enlarged RAD signal left ventricular dysfunction and right atrial dilation in AECOPD patients, worsening respiratory failure and contributing to higher mortality risk. Many studies have shown that the presence of echocardiographic abnormalities in the general population as well as in those with cardiovascular risk factors has greater prognostic value than clinical factors do alone in predicting the risk of stroke, sudden cardiac death, cardiovascular morbidity and mortality49,50. Similarly, our study revealed that echocardiographic abnormalities may be able to identify a greater risk of hospitalization death in hospitalized patients with AECOPD. Therefore, these clinical variables could contribute to the final model, and their combination may be superior to a single marker in predicting the risk of hospitalization death in hospitalized patients with AECOPD.
Owing to the lack of explanation of how predictions are derived, the ML technique has been described as a “black box”. To the best of our knowledge, clinicians are hesitant to use ML to make medical decisions because they believe that it is based on opaque information. This brought up another advantage of this study: we utilized the SHAP approach to explain the ‘‘black-box’’ of ML models. The SHAP method could explain this model via a global explanation that describes the overall functionality of a model and a local explanation that details how a certain prediction is made for a hospitalized patients with AECOPD by inputting the individualized data. Moreover, with a convenient tool based on the Streamlit framework, this prediction model can be used on the webpage and shared with more clinicians. Consequently, the model with the SHAP method can be useful for assessing high-risk patients with AECOPD and may assist clinicians in directing personalized AECOPD strategies in an understandable manner.
While our study showed promising results, it is important to acknowledge its limitations. First, this final explainable model was not validated in the external validation cohorts. In the future, we intend to include more patients in prospective multicentre studies to validate the model’s performance. Second, our model was developed using data mainly from Chinese patients, and whether the model performs well in various global populations remains unclear, necessitating further validation in diverse racial groups to ensure its generalizability across various clinical settings. Third, because we included only patients with complete echocardiographic data, 30.05% of patients died during hospitalization, which may bias this study towards positive results. We will continue to refine our model and consider incorporating additional data sources in future studies to further improve its generalizability and applicability. Fourth, as a prevalent issue in hospitalized patients with AECOPD, pulmonary function tests were unavailable for most patients because of their critical condition in this study. This results in difficulty in incorporating pulmonary function data into the prediction model. Nevertheless, the cost of prediction and medical expenses can be reduced to a large extent for patients. Fifth, our current dataset does not include information on comorbidity indices or frailty scores, we suggest that future studies should consider these variables to further refine and validate predictive models. Sixth, we encountered data sparsity in several variables, future work should focus on improving data collection processes and ensuring more complete datasets to enhance model robustness; the real-time application of our ML model in clinical settings presents several challenges, we plan to explore these challenges in future work and collaborate with more clinicians to develop practical solutions for real-time deployment; our study has a class imbalance issue where the number of patients who died during hospitalization is significantly lower than that of those who survived, we will continue to explore advanced techniques for handling class imbalance, such as synthetic data generation and ensemble methods, to improve the model’s performance and ensure balanced prediction accuracy across classes. Seventh, data drift and changes in clinical practice are potential issues, future efforts will focus on data collection and monitoring, regular model retraining, incorporation of additional variables and adaptation to clinical practice. Finally, hospitalized patients with AECOPD frequently have more than one disease, and because there are many disease categories in clinical settings, incorporating the disease category into the prediction model is difficult. Thus, we did not include the disease category in the model.
In conclusion, we successfully developed an ML model that can predict the risk of hospitalization death in hospitalized patients with AECOPD early, exclusively utilizing clinically available data, especially echocardiographic variables. With high accuracy and reliability, the final LightGBM model can guide clinicians to take appropriate preventive measures to achieve personalized treatment and improve the clinical prognosis of high-risk patients with AECOPD.
Methods
Study population
This multicentre, retrospective study included AECOPD patients admitted to the hospital for the derivation and validation of the prediction model. The cohort consisted of hospitalized patients with AECOPD from the Department of Respiratory and Critical Care Medicine of two separate tertiary hospitals, the Second Affiliated Hospital of Xi’an Jiaotong University (between April 2017 and August 2023), and the First Affiliated Hospital of Guangzhou Medical University (between February 2012 and July 2023). The diagnosis of COPD was based on the criteria established by the Global Initiative for Chronic Obstructive Lung Disease (GOLD): forced spirometry showing the presence of a post-bronchodilator FEV1/FVC < 0.7 in any patient who has dyspnea, chronic cough or sputum production, a history of recurrent lower respiratory tract infections and/or a history of exposure to risk factors for the disease. According to the GOLD guidelines, AECOPD is defined on the basis of patient symptoms: acute deterioration of respiratory symptoms necessitating additional treatment11. Medical records were reviewed by respiratory physicians to confirm AECOPD and exclude alternative diagnoses. Adult patients who underwent transthoracic echocardiography were included. The exclusion criteria were as follows: (1) complicated with various diseases such as cardiovascular and cerebrovascular diseases, kidney diseases, autoimmune diseases and tumors; (2) insufficient clinical and laboratory data.
Ethics statement
This study was performed in accordance with the Declaration of Helsinki, with the approval of the Institutional Ethics Review Board of the Second Affiliated Hospital of Xi’an Jiaotong University (Ethics Approval NO.186 in 2024). Due to the retrospective nature of the study, the Institutional Ethics Review Board of the Second Affiliated Hospital of Xi’an Jiaotong University waived the need of obtaining informed consent.
Data collection and processing
We utilized the Big Data Platform for Respiratory Medicine, an integrated database of the EMR system, to construct prediction models. All the staff involved participated in training sessions on data extraction. We extracted clinical and laboratory data, including demographic characteristics, vital sign measurements, blood test parameters and echocardiographic variables. All data of eligible patients obtained from the database were systematically extracted into standardized forms and meticulously reviewed.
In the following analyses, more than 30% of the missing parameters were excluded to minimize the bias resulting from missing data, and missing data were addressed via the K-nearest neighbour (KNN) algorithm. Given that multicollinearity among parameters may affect prediction accuracy, we eliminated one parameter that was less correlated with outcome when two parameters were highly correlated (correlation coefficient > 0.6) in Spearman’s correlation analyses. Finally, 28 variables, including sex, age, smoking status, systolic blood pressure (SBP), diastolic blood pressure (DBP), neutrophil percentage (NEUT%), eosinophil percentage (EOS%), monocyte percentage (MONO%), eosinophil (EO), monocyte (MONO), lymphocyte (LYMP), platelet (PLT), haematocrit (HCT), mean platelet volume (MPV), red cell distribution width (RDW), albumin (ALB), N-terminal prohormone of brain natriuretic peptide (NT-proBNP), cardiac troponin T (cTnT), prothrombin time activity (PTA), thrombin time (TT), D-dimer (DD), right ventricle diameter (RVD), right atrial diameter (RAD), pulmonary artery diameter (PAD), pulmonary valve (PV) flow velocity, maximum tricuspid regurgitation velocity (Vmax), pulmonary arterial systolic pressure (PASP), and ejection fraction (EF) were utilized in machine algorithms to generate the prediction models.
Model development and comparison
To avoid overfitting, the cohort comprising two independent tertiary hospitals was divided with 70% utilized for the training cohort and 30% for the validation cohort.
A total of 28 variables mentioned above were used to develop the prediction models. Eleven ML models, namely, eXtreme gradient boosting (XGboost), adaptive boosting (AdaBoost), light gradient boosting machine (LightGBM), gradient boosting machine (GBM), random forest (RF), logistic regression (LR), K-nearest neighbour (KNN), decision tree (DT), artificial neutral network (ANN), extra tree (ET), and support vector machine (SVM) were used to predict the risk of hospitalization death in hospitalized patients with AECOPD. XGboost is an ensemble learning model based on the gradient boosting algorithm; it performs well in terms of efficiency and accuracy when processing structured data and large-scale datasets.51 Based on the classification error rate of the sample, AdaBoost adjusts the weight of the sample and pays more attention to samples with higher classification errors, and the final classification result is determined through weighted voting52,53. LightGBM is an ensemble model that works on a decision tree algorithm as a weak learner. It uses a novel technique called histogram-based binning and learns more efficiently than other algorithms do54. The GBM is a gradient descent-based formulation of boosting methods. It can be considered an optimization model aimed at training a series of weak-learner models, which sequentially minimizes a predefined loss function55. RF is an ensemble classifier that aggregates the results of multiple single decision trees into one result; it is widely used as a classification model that compensates for the disadvantage of overfitting in a single decision tree and reduces variance56,57. LR is a learning algorithm with a logistic function at its core. By assessing the relationships between dependent variables and one or more independent variables, classification probabilities are derived via logical functions58. KNN is an instance-based learning algorithm that is suitable for problems with multicategory classification and nonlinear decision boundaries. It measures the distance between samples and classifies a new sample into the most common category among its K nearest neighbours59. A DT model, with its clear cut-off points, is a user-friendly tool that can significantly aid treatment decision-making in clinical settings60. ANNs are effective tools for nonlinear multivariate modelling and are capable of learning trends in historical data61. ET is a variant of RF that increases randomness by arbitrarily splitting each node with a candidate feature and choosing the best split62. SVM is a new trainable learning method based on statistical learning theory. It studies the law of the ML approach under the condition of limited training samples and explains the sample classification problem of the ML approach63.
The performances of the ML models were assessed by the area under the receiver operating characteristic (ROC) curve (AUC), sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), accuracy, and F1 score. Furthermore, DCA, P–R curve analysis and calibration were used to assess the diagnostic performance of the model. In addition, to avoid overfitting, five-fold and ten-fold cross validations were used in the validation of the prediction model.
Model explanation
The SHAP score64 can provide a quantitative description of the overall relationship between hospitalization death and all features according to the premodel; the SHAP score was calculated to assess the importance of each feature and reduce the features of the prediction model in order of importance. We sequentially integrated features and gradually added the next feature, starting with the most important feature in accordance with the feature importance rank. We compared the differences between various AUCs via R software version 4.0.3 and ultimately selected the number of features that generated the highest AUC in feature inclusion iterations. The final model with the best predictive ability was chosen for further analysis. In addition, the SHAP method offered global and local explanations for the model explanation.
Statistical analysis
R software version 4.0.3 and Python programming software version 3.6.5 were used to conduct the data analyses. Patient data are presented as continuous or categorical variables. The Shapiro‒Wilk test was used to assess whether the data followed a normal distribution. Continuous predictors with skewed distributions are described as medians with interquartile ranges and were compared via the Mann–Whitney U test or Kruskal–Wallis H test, whereas categorical predictors are described as counts with percentages and were compared via the chi-square test or Fisher’s exact test. The AUC was used to evaluate the predictive power, and the optimal cut-off was estimated via the Youden index (sensitivity + specificity-1). DCA, P–R and calibration curve analyses were conducted with R software version 4.0.3. All the statistical tests were two-sided, with a statistically significant level of P < 0.05.
Finally, the web-based tool was developed to facilitate its utility in clinical scenarios. The web application is accessible online at https://winxwqz6savjnnzmfdkwgl.streamlit.app/.
Data availability
The datasets used and analyzed during the present study are available from the corresponding author on reasonable request.
References
Agustí, A. et al. Global initiative for chronic obstructive lung disease 2023 report: GOLD executive summary. Am. J. Respir. Crit. Care Med. 207, 819–837. https://doi.org/10.1164/rccm.202301-0106PP (2023).
Hurst, J. R. et al. Understanding the impact of chronic obstructive pulmonary disease exacerbations on patient health and quality of life. Eur. J. Intern. Med. 73, 1–6. https://doi.org/10.1016/j.ejim.2019.12.014 (2020).
Celli, B. R. et al. An updated definition and severity classification of chronic obstructive pulmonary disease exacerbations: The Rome proposal. Am. J. Respir. Crit. Care Med. 204, 1251–1258. https://doi.org/10.1164/rccm.202108-1819PP (2021).
Ko, F. W. et al. Acute exacerbation of COPD. Respirology Carlton Vic. 21, 1152–1165. https://doi.org/10.1111/resp.12780 (2016).
Foo, J. et al. Continuing to conzfront COPD international patient survey: Economic impact of COPD in 12 countries. PLoS ONE 11, e0152618. https://doi.org/10.1371/journal.pone.0152618 (2016).
Gudmundsson, G. et al. Long-term survival in patients hospitalized for chronic obstructive pulmonary disease: A prospective observational study in the Nordic countries. Int. J. Chron. Obstruct. Pulmon. Dis. 7, 571–576. https://doi.org/10.2147/copd.S34466 (2012).
Eriksen, N. & Vestbo, J. Management and survival of patients admitted with an exacerbation of COPD: Comparison of two Danish patient cohorts. Clin. Respir. J. 4, 208–214. https://doi.org/10.1111/j.1752-699X.2009.00177.x (2010).
Groenewegen, K. H., Schols, A. M. & Wouters, E. F. Mortality and mortality-related factors after hospitalization for acute exacerbation of COPD. Chest 124, 459–467. https://doi.org/10.1378/chest.124.2.459 (2003).
Bond, E. G. et al. Understanding resource utilization and mortality in COPD to support policy making: A microsimulation study. PLoS ONE 15, e0236559. https://doi.org/10.1371/journal.pone.0236559 (2020).
Agustí, A. et al. Global initiative for chronic obstructive lung disease 2023 report: GOLD executive summary. Eur. Respir. J. https://doi.org/10.1183/13993003.00239-2023 (2023).
Singh, D. et al. Global strategy for the diagnosis, management, and prevention of chronic obstructive lung disease: The GOLD science committee report 2019. Eur. Respir. J. https://doi.org/10.1183/13993003.00164-2019 (2019).
Ruiying, W., Zhaoyun, & Jianying, X. Clinical features and three-year prognosis of AECOPD patients with different levels of blood eosinophils. Heart Lung J. Crit. Care 56, 29–39. https://doi.org/10.1016/j.hrtlng.2022.05.012 (2022).
Yu, S., Zhang, J., Fang, Q. & Tong, Z. Blood eosinophil levels and prognosis of hospitalized patients with acute exacerbation of chronic obstructive pulmonary disease. Am. J. Med. Sci. 362, 56–62. https://doi.org/10.1016/j.amjms.2021.02.013 (2021).
Fawzy, A. et al. Association of platelet count with all-cause mortality and risk of cardiovascular and respiratory morbidity in stable COPD. Respir. Res. 20, 86. https://doi.org/10.1186/s12931-019-1059-1 (2019).
Harrison, M. T. et al. Thrombocytosis is associated with increased short and long term mortality after exacerbation of chronic obstructive pulmonary disease: A role for antiplatelet therapy?. Thorax 69, 609–615. https://doi.org/10.1136/thoraxjnl-2013-203996 (2014).
Hu, Y., Long, H., Cao, Y. & Guo, Y. Prognostic value of lymphocyte count for in-hospital mortality in patients with severe AECOPD. BMC Pulm. Med. 22, 376. https://doi.org/10.1186/s12890-022-02137-1 (2022).
Li, H. et al. C-reactive protein to serum albumin ratio as a novel biomarker to predict prognosis in patients with chronic obstructive pulmonary disease. Clin. Labor. https://doi.org/10.7754/Clin.Lab.2020.200630 (2021).
Mohan, M., Parthasarathi, A., Siddaiah, J. B., Mahesh, P. A. & SK, C. Fibrinogen: A feasible biomarker in identifying the severity and acute exacerbation of chronic obstructive pulmonary disease. Cureus 13, e16864. https://doi.org/10.7759/cureus.16864 (2021).
Sun, W. et al. Fibrinogen, a promising marker to evaluate severity and prognosis of acute exacerbation of chronic obstructive pulmonary disease: A retrospective observational study. Int. J. Chron. Obstruct. Pulmon. Dis. 17, 1299–1310. https://doi.org/10.2147/copd.S361929 (2022).
Holland, M., Alkhalil, M., Chandromouli, S., Janjua, A. & Babores, M. Eosinopenia as a marker of mortality and length of stay in patients admitted with exacerbations of chronic obstructive pulmonary disease. Respirology Carlton Vic. 15, 165–167. https://doi.org/10.1111/j.1440-1843.2009.01651.x (2010).
MacDonald, M. I. et al. Low and high blood eosinophil counts as biomarkers in hospitalized acute exacerbations of COPD. Chest 156, 92–100. https://doi.org/10.1016/j.chest.2019.02.406 (2019).
Andrijevic, I. et al. N-terminal prohormone of brain natriuretic peptide (NT-proBNP) as a diagnostic biomarker of left ventricular systolic dysfunction in patients with acute exacerbation of chronic obstructive pulmonary disease (AECOPD). Lung 196, 583–590. https://doi.org/10.1007/s00408-018-0137-3 (2018).
Rui, F. et al. Development of a machine learning-based model to predict hepatic inflammation in chronic hepatitis B patients with concurrent hepatic steatosis: A cohort study. EClinicalMedicine 68, 102419. https://doi.org/10.1016/j.eclinm.2023.102419 (2024).
Wu, C. T. et al. Acute Exacerbation of a chronic obstructive pulmonary disease prediction system using wearable device data, machine learning, and deep learning: Development and cohort study. JMIR Mhealth Uhealth 9, e22591. https://doi.org/10.2196/22591 (2021).
Yin, H. et al. A machine learning model for predicting acute exacerbation of in-home chronic obstructive pulmonary disease patients. Comput. Methods Programs Biomed. 246, 108005. https://doi.org/10.1016/j.cmpb.2023.108005 (2024).
Wang, C. et al. Comparison of machine learning algorithms for the identification of acute exacerbations in chronic obstructive pulmonary disease. Comput. Methods Programs Biomed. 188, 105267. https://doi.org/10.1016/j.cmpb.2019.105267 (2020).
Kor, C. T. et al. Explainable machine learning model for predicting first-time acute exacerbation in patients with chronic obstructive pulmonary disease. J. Pers. Med. https://doi.org/10.3390/jpm12020228 (2022).
Saria, S., Butte, A. & Sheikh, A. Better medicine through machine learning: What’s real, and what’s artificial?. PLoS Med 15, e1002721. https://doi.org/10.1371/journal.pmed.1002721 (2018).
Motwani, M. et al. Machine learning for prediction of all-cause mortality in patients with suspected coronary artery disease: A 5-year multicentre prospective registry analysis. Eur. Heart J. 38, 500–507. https://doi.org/10.1093/eurheartj/ehw188 (2017).
Dawes, T. J. W. et al. Machine learning of three-dimensional right ventricular motion enables outcome prediction in pulmonary hypertension: A cardiac MR imaging study. Radiology 283, 381–390. https://doi.org/10.1148/radiol.2016161315 (2017).
Azodi, C. B., Tang, J. & Shiu, S. H. Opening the black box: Interpretable machine learning for geneticists. Trends Genet. 36, 442–455. https://doi.org/10.1016/j.tig.2020.03.005 (2020).
Hu, J. et al. Identification and validation of an explainable prediction model of acute kidney injury with prognostic implications in critically ill children: A prospective multicenter cohort study. EClinicalMedicine 68, 102409. https://doi.org/10.1016/j.eclinm.2023.102409 (2024).
Montes de Oca, M. & Laucho-Contreras, M. E. Is it time to change the definition of acute exacerbation of chronic obstructive pulmornary disease? What do we need to add?. Med. Sci. https://doi.org/10.3390/medsci6020050 (2018).
Zhang, J. et al. A simple clinical risk score (ABCDMP) for predicting mortality in patients with AECOPD and cardiovascular diseases. Respir. Res. 25, 89. https://doi.org/10.1186/s12931-024-02704-6 (2024).
Gomes, L., Pereira, S., Sousa-Pinto, B. & Rodrigues, C. Performance of risk scores in patients with acute exacerbations of COPD. Jornal brasileiro de pneumologia : publicacao oficial da Sociedade Brasileira de Pneumologia e Tisilogia 49, e20230032. https://doi.org/10.36416/1806-3756/e20230032 (2023).
Goecks, J., Jalili, V., Heiser, L. M. & Gray, J. W. How machine learning will transform biomedicine. Cell 181, 92–101. https://doi.org/10.1016/j.cell.2020.03.022 (2020).
Ramalingam, K. et al. Light gradient boosting-based prediction of quality of life among oral cancer-treated patients. BMC Oral Health 24, 349. https://doi.org/10.1186/s12903-024-04050-x (2024).
Wang, X. et al. Unraveling variations and enhancing prediction of successful sphincter-preserving resection for low rectal cancer: A post hoc analysis of the multicentre LASRE randomized clinical trial. Int. J. Surg. 110, 4031–4042. https://doi.org/10.1097/js9.0000000000001014 (2024).
Park, Y. W. et al. A fully automatic multiparametric radiomics model for differentiation of adult pilocytic astrocytomas from high-grade gliomas. Eur. Radiol. 32, 4500–4509. https://doi.org/10.1007/s00330-022-08575-z (2022).
Li, Y., Sperrin, M., Ashcroft, D. M. & van Staa, T. P. Consistency of variety of machine learning and statistical models in predicting clinical risks of individual patients: Longitudinal cohort study using cardiovascular disease as exemplar. BMJ 371, m3919. https://doi.org/10.1136/bmj.m3919 (2020).
Cowburn, A. S., Condliffe, A. M., Farahi, N., Summers, C. & Chilvers, E. R. Advances in neutrophil biology: Clinical implications. Chest 134, 606–612. https://doi.org/10.1378/chest.08-0422 (2008).
Stockley, J. A., Walton, G. M., Lord, J. M. & Sapey, E. Aberrant neutrophil functions in stable chronic obstructive pulmonary disease: The neutrophil as an immunotherapeutic target. Int. Immunopharmacol. 17, 1211–1217. https://doi.org/10.1016/j.intimp.2013.05.035 (2013).
Lonergan, M. et al. Blood neutrophil counts are associated with exacerbation frequency and mortality in COPD. Respir. Res. 21, 166. https://doi.org/10.1186/s12931-020-01436-7 (2020).
Husebø, G. R. et al. Coagulation markers as predictors for clinical events in COPD. Respirology Carlton, Vic. 26, 342–351 (2021).
Donaldson, G. C., Hurst, J. R., Smith, C. J., Hubbard, R. B. & Wedzicha, J. A. Increased risk of myocardial infarction and stroke following exacerbation of COPD. Chest 137, 1091–1097. https://doi.org/10.1378/chest.09-2029 (2010).
Halpin, D. M. et al. Risk of nonlower respiratory serious adverse events following COPD exacerbations in the 4-year UPLIFT® trial. Lung 189, 261–268. https://doi.org/10.1007/s00408-011-9301-8 (2011).
Kunisaki, K. M. et al. Exacerbations of chronic obstructive pulmonary disease and cardiac events. A post hoc cohort analysis from the summit randomized clinical trial. Am. J. Respire. Crit. Care Med. 198, 51–57. https://doi.org/10.1164/rccm.201711-2239OC (2018).
Kovacs, G. et al. Severe pulmonary hypertension in COPD: Impact on survival and diagnostic approach. Chest 162, 202–212. https://doi.org/10.1016/j.chest.2022.01.031 (2022).
Konety, S. H. et al. Echocardiographic predictors of sudden cardiac death: The atherosclerosis risk in communities study and cardiovascular health study. Circ. Cardiovasc. Imaging https://doi.org/10.1161/circimaging.115.004431 (2016).
Lundorff, I. et al. Echocardiographic predictors of cardiovascular morbidity and mortality in women from the general population. Eur. Heart J. Cardiovasc. Imaging 22, 1026–1034. https://doi.org/10.1093/ehjci/jeaa167 (2021).
Ester, M., Kriegel, H. P. & Xu, X. XGBoost: A scalable tree boosting system. In: Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (vol, pp 785, 2016). Geographical Analysis, https://doi.org/10.1111/gean.12315 (2022).
Freund, Y. & Schapire, R. E. A decision-theoretic generalization of on-line learning and an application to boosting. Computational Learning Theory. Second European Conference, EuroCOLT ‘95. Proceedings, pp. 23–37 (1995).
Avidan, S. Ensemble tracking. IEEE Trans. Pattern Anal. Mach. Intell. 29, 261–271. https://doi.org/10.1109/tpami.2007.35 (2007).
Ke, G. L. et al. in 31st Annual Conference on Neural Information Processing Systems (NIPS). (2017).
Friedman, J. H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 29, 1189–1232. https://doi.org/10.1214/aos/1013203451 (2001).
Chang, D. et al. Machine learning models are superior to noninvasive tests in identifying clinically significant stages of NAFLD and NAFLD-related cirrhosis. Hepatology 77, 546–557. https://doi.org/10.1002/hep.32655 (2023).
Zhang, K. et al. Machine learning-reinforced noninvasive biosensors for healthcare. Adv. Healthc. Mater. 10, e2100734. https://doi.org/10.1002/adhm.202100734 (2021).
Jiang, X., Zhang, Y., Li, Y. & Zhang, B. Forecast and analysis of aircraft passenger satisfaction based on RF-RFE-LR model. Sci. Rep. 12, 11174. https://doi.org/10.1038/s41598-022-14566-3 (2022).
D’Amato, M. et al. A machine learning approach to characterize patients with asthma exacerbation attending an acute care setting. Eur. J. Intern. Med. 104, 66–72. https://doi.org/10.1016/j.ejim.2022.07.019 (2022).
Wang, K. et al. A decision tree model to help treatment decision-making for severe spontaneous intracerebral hemorrhage. Int. J. Surg. 110, 788–798. https://doi.org/10.1097/js9.0000000000000852 (2024).
Salari, M., Nikoo, M. R., Al-Mamun, A., Rakhshandehroo, G. R. & Mooselu, M. G. Optimizing Fenton-like process, homogeneous at neutral pH for ciprofloxacin degradation: Comparing RSM-CCD and ANN-GA. J. Environ. Manag. 317, 115469. https://doi.org/10.1016/j.jenvman.2022.115469 (2022).
Geurts, P., Ernst, D. & Wehenkel, L. Extremely randomized trees. Mach. Learn. 63, 3–42. https://doi.org/10.1007/s10994-006-6226-1 (2006).
Chen, Y. et al. Privacy-preserving multi-class support vector machine model on medical diagnosis. IEEE J. Biomed. Health Inform. 26, 3342–3353. https://doi.org/10.1109/jbhi.2022.3157592 (2022).
Lundberg, S. M., Erion, G. G. & Su-In, L. Consistent Individualized Feature Attribution for Tree Ensembles. arXiv. arXiv (USA), p. 9. (2018).
Acknowledgements
We thank the colleagues in our department for their help in our study.
Funding
This work was supported by the National Natural Science Foundation of China (NSFC) (grant number: 82070035, 82370043). The sponsor had no role in the design and conduct of the study.
Author information
Authors and Affiliations
Contributions
All authors had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. YM, ZQ, TZ, HZ, MZ, ZL, PC and JF participated in data collection. YZ, SZ, DW, FL, WL and YL participated in the design of the study, the analysis and interpretation of data. YZ drafted the manuscript. YZ, WL and YL have accessed and verified the data. All authors participated in the critical revision of the manuscript and approved the final manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Zhang, Y., Zheng, S., Wang, D. et al. Development and validation of a machine learning-based model to predict the risk of hospitalization death in hospitalized patients with AECOPD. Sci Rep 15, 35918 (2025). https://doi.org/10.1038/s41598-025-19810-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-19810-0