Introduction

Chronic obstructive pulmonary disease (COPD) is a heterogenous, chronic respiratory disease characterized by poorly reversible airflow obstruction1. Early identification of those at high risk of mortality may facilitate earlier intervention. BODE index using physiological constructs consisting of Body mass index (BMI), airflow Obstruction, Dyspnea scale, and Exercise capacity was first developed as a mortality prediction model in this population2. Recent modeling studies for all-cause mortality incorporates clinical, spirometric, and CT imaging data, and have shown improved predictive ability over a 10-year period3,4,5. The features used in these studies were selected based on clinician input and on prior studies that demonstrated their potential as mortality predictors. However, recent advancements in image analysis techniques have allowed for novel lung functional information to be inferred from CT that have not yet been utilized to improve mortality prediction.

Quantitative CT imaging variables have fundamentally changed the landscape with their ability to detect structural changes in lung parenchyma and airway preceding changes in spirometry6,7,8,9. Advances in CT-functional imaging (CTFI), a robust image processing-based modality derived from non-contrast inspiratory and expiratory (IE) CT, offers a comprehensive spatial and functional assessment of lung parenchyma. CTFI has yielded accurate and reproducible estimates of CT pulmonary ventilation (CT-V) and pulmonary blood mass changes (PBM), a surrogate for pulmonary perfusion10,11,12. Still, CT functional data is not routinely incorporated into COPD diagnostics or for risk assessment8.

In this study, we determine the predictive ability of CTFI combined with forced expiratory volume in 1 s (FEV1) at baseline for predicting patient mortality in COPD patients with airflow obstruction. We hypothesize there will be a significant improvement in predictive accuracy combining functional imaging markers with FEV1.

Results

Baseline characteristics of the study population and CTFI global and regional values are shown in Tables 1 & 2. We included 3550 patients with definite spirometric obstruction in the study. The average age of study subjects was 63 years. The population mostly included Non-Hispanic White and smokers. There was a significant trend for higher supplemental oxygen requirements, worsening dyspnea scores, lower BMI, walk distance and higher St. George’s Respiratory Questionnaire (SGRQ) scores with increasing GOLD stages. All-cause mortality was 34.7% and is dependent on GOLD stages (Fig. 1).

Table 1 Baseline Demographics of Patients Included in the Study
Fig. 1
figure 1

Survival probability in the population from baseline divided on GOLD stage. At six years GOLD I – IV are 0.92 (0.90, 0.94), 0.86 (0.84, 0.88), 0.75 (0.72, 0.78), 0.49 (0.44, 0.54) respectively.

Correlation between CTFI and FEV1 to other predictors of COPD mortality

We noted good correlation between CTFI parameters, CT-V, and PBM to FEV1. A Spearman correlation coefficient (denoted by r) showed positive correlation between FEV1 and Global CT-V (r = 0.60) and FEV1 and PBM (r = 0.41).

We noted a higher correlation between FEV1 compared to CTFI over other COPD mortality predictors. Distance walked in 6 minutes was positively correlated to FEV1, CT-V and PBM respectively (r = 0.53, 0.45, 0.34). Whereas BODE-index showed a high negative correlation to FEV1 but only moderate to low with CT-V and PBM (r = FEV1 −0.76, CT-V −0.46, PBM −0.34). Similarly, SGRQ and exacerbation frequency showed a negative correlation to FEV1, CT-V and PBM respectively [(SGRQ r = −0.49, −0.33, −0.23), exacerbation frequency r = −0.29, −0.16, −0.10)].

Relationship of the CTFI scores with GOLD stage

The global CT-V scores in liters (L) were noted to be lower with increasing of GOLD stages I-IV [GOLD I: 2.73 (1.99, 3.46), GOLD-II 2.13 (1.65, 2.71), GOLD-III 1.76 (1.32, 2.31), GOLD-IV 1.52 (1.11, 1.86)]. Similarly, PBM values in gm/mm3 were lower with advancing COPD stages [GOLD I: 95.60 (61.11, 130.73), GOLD-II 73.80 (45.90, 110.54), GOLD-III 57.55 (32.78, 90.10), GOLD-IV 46.89 (28.70, 69.54)] respectively. There were significant differences in regional ventilation (p < 0.001) and blood mass changes (p < 0.001) across the GOLD stages with right middle lobe having least amount of ventilation and PBM across all stages (Table 2), which is consistent with known physiology13,14.

Table 2 Spirometry and CTFI Variables

Model for mortality prediction on longitudinal follow-up

To test the predictive ability of baseline functional information derived from CTFI in addition to lung function, we built a RSF model as detailed above. The RSF model, using 5-fold cross validation showed significant improvement in AUC for FEV1 + CTFI compared to a model with FEV1 alone (Fig. 2). At year 2, the AUC for FEV1 was 0.678 compared to 0.704 for FEV1 + CTFI. This trend was similar for all the subsequent years. The AUC for FEV1 was highest, 0.692 at years 5 and 7. Whereas the model with FEV1 + CTFI showed AUC over 0.73 from year 6 onwards (Fig. 2).

Fig. 2
figure 2

AUC of Random Survival Forest model comparing CTFI + FEV1 without and after including for age, BMI and scanner type.

We noted that age, BMI, and scanner type (Siemens, GE, or Philips) were other significant features on variable importance selection. We developed a second RSF prediction model including age, BMI and scanner type. The AUC for FEV1 + CTFI was higher compared to model with FEV1 on all years. RSF model with functional imaging variable and FEV1 obtained at baseline showed significant discriminative capacity for mortality prediction from year 2 onwards and the trend continued for subsequent years of follow up. AUC was highest at 0.757 and 0.755 at years 9 and 10. This trend held true when restricting the model to a specific scanner type. The AUC values for the RSF model with CTFI and FEV1 using the GE scanner showed consistent performance, with AUC values remaining stable across different time points. Similarly, the RSF model using CTFI and FEV1 from Siemens scanner data exhibited comparable AUC values, with minor fluctuations but overall consistent performance over time (Fig. 3).

Fig. 3: Violin plots for CTV (top) and PBM regional changes (bottom) with increasing GOLD stage by lobe.
figure 3figure 3

The violin plot displays a rotated kernel density plot on each side and a box plot in the middle, which visualizes the distribution and summary statistics of the data.

A NRI quantifies the extent to which a model with imaging variables enhances the accurate reclassification of individuals into death compared to the model without imaging variables. NRI for the prediction model with CTFI, FEV1, age, BMI, and scanner type showed significant improvement in the mortality prediction over the model without CTFI (Fig. 4).

Fig. 4
figure 4

Net Reclassification Index for Random Survival Forest prediction model CTFI + FEV1 with age, BMI, and scanner type showed significant improvement in the mortality prediction over the model without CTFI.

Survival probability of the cohort on longitudinal follow-up

The survival probability decreases with increasing GOLD stages. Year 6 probabilities are as follows: GOLD I: 0.92 (0.90, 0.94), GOLD-II 0.86 (0.84, 0.88), GOLD-III 0.75 (0.72, 0.78), GOLD-IV 0.49 (0.44, 0.54). Similarly, the survival probability decreases with increasing BODE index quantiles. In Year 6, the probabilities are as follows: Quantile 1: 0.92 (0.90, 0.93), Quantile 2: 0.86 (0.84, 0.88), Quantile 3: 0.78 (0.74, 0.81), Quantile 4: 0.53 (0.49, 0.57). The survival probability decreases with increasing quantiles of the RSF model, which includes age, BMI, scanner type, and FEV1 + CTFI. In year 6, the probabilities are as follows: Quantile 1: 0.92 (0.91, 0.94), Quantile 2: 0.89 (0.87, 0.91), Quantile 3: 0.79 (0.77, 0.82), Quantile 4: 0.56 (0.53, 0.60).

Thus, both BODE index and RSF model with CTFI and FEV1 have similar survival probabilities as advancing GOLD stages. The RSF model including imaging and lung function obtained at baseline has good discriminative capacity for mortality prediction from year 2 onwards with increasing trend seen up to 10 years of follow up.

Discussion

The COPDGene® study is a multicenter cohort study of current and former smokers with at least a 10–pack-year smoking history enrolled at 21 centers across the United States15,16. We applied CT-derived lung function parameters to FEV1 to develop a robust RSF model for all-cause mortality in this population. The main findings of our study include the following: 1) CTFI (both global and regional) score is worse with the advanced GOLD stages, 2) FEV1 is moderately correlated with CT-V and PBM, and 3) a model combining FEV1 and CTFI obtained at baseline provides important additional information on mortality in this cohort.

In our study, we examined the additional information gained from CT functional imaging at baseline, in addition to FEV1 at baseline, on long term mortality in patients with COPD and spirometric obstruction. Prior research has focused on utilizing known parameters such as degree of obstruction, walk distance, exacerbation frequency, and dyspnea score to help with COPD mortality prediction. Multidimensional methods such as ADO (age, dyspnea, airflow obstruction), COTE (COPD-specific comorbidity test), DOSE (dyspnea, airflow obstruction, smoking, exacerbations), and CODEX (comorbidity, dyspnea, airflow obstruction, exacerbations) provide good discrimination of short term mortality in COPD patients17,18, When these predictive models were combined, it improved the discriminative ability at 1 year (c-statistic 0.780 ADO + COTE; 0.727 DOSE + COTE)18. In the PROSPERO study, a large meta-analysis including 42 studies showed multicomponent prognostic models only had moderate discriminative ability and factors that were related to mortality were previous hospitalization for acute exacerbation, readmission within 30 days, cardiovascular comorbidity, age, male sex, and long-term oxygen therapy19. Our approach of using baseline FEV1 and CTFI showed both short-term and long-term mortality discrimination without the need for other historical measures compared with ADO or DOSE algorithms.

Long-term COPD mortality is worse in patients over age 70, those with cardiovascular comorbidities, have history of diabetes, have worse dyspnea scores, and with FEV1 < 50% predicted20. More recent studies incorporating machine learning (ML) algorithms for all-cause mortality included spirometric and CT imaging data3,4,5. Features with highest impact on COPD risk of mortality were FEV1, 6-minute walk distance, and age4. CT imaging data used in those studies were from radiographic inputs. We included a physical model to derive lung function from imaging first, and it is not reliant on the imaging features. Furthermore, we prioritized RSF due to its strength in handling censored survival data and its interpretability when integrating both imaging and non-imaging variables. While RSF is simpler than deep learning models, it has proven effective for survival prediction in COPD. We used a minimalistic model with good discriminative capacity starting from year 2 that is comparable to other models for short-term COPD mortality. We adjusted for age, BMI, and scanner type based on the features on the variable importance ranking. The RSF model is unique in that it can handle complex non-linear relationships and multicollinearity. Our results are robust and as shown with NRI, CTFI is significantly contributing to predicting death in the COPD cohort.

Detailed spatial information and function using CT scan has been well studied in patients with COPD8. These studies were plagued with reproducibility and standardization issues due to differences in patient effort, lack of spirometric gating, CT manufacturers, lung segmentation algorithms, and issues with reconstruction kernels8. CTFI uses the IJF method for CT ventilation and has previously been shown to have inherent stability and high reproducibility21. As such, there would be minimal impact with lack of patient effort, which seems to be a big issue with other methods of CT derived lung function. Others have used densitometry [Hounsfield units (HU)] to determine lung function22,23. Parametric response mapping (PRM) uses coregistered inhalation and exhalation images to determine emphysema (<−950 HU inhalation and <−856 HU in exhalation) and small airway disease (<−856 HU on exhalation and >−950 HU on inhalation CT) and has shown good correlation with lung function9. These methods are also subjective to the issues addressed above, but CTFI would provide lung function (volume and blood mass changes) to quantify disease globally and regionally with good reproducibility10,11,24. In comparison with BODE index divided into quantiles, our model shows similar survival probabilities in advanced COPD. Despite similar survival probabilities in the studied population, the model with CTFI and FEV1 has good mortality discrimination from year 2 onwards.

We noted there was a significant difference in regional ventilation and PBM changes with advancing GOLD stages. Interestingly, PBM, a surrogate for pulmonary perfusion, is only moderately correlated with FEV1 but shows strong regional differences with advancing COPD stages. Physiologically, this could be related to worsening emphysema or pulmonary hypertension seen in patients with advanced GOLD stages.

Our study has a few limitations. First, although CTFI is robust, it may encounter problems similar to those affecting other quantitative CT imaging methods, such as lung segmentation accuracy, CT acquisition, and standardization. Our objective of this study was to determine if additional information obtained from CTFI combined with FEV1 could strengthen the mortality insight. Study results indicate it does, but we plan to conduct future studies measuring ventilation-perfusion mismatch scores obtained from CTFI that could potentially be stronger at identifying normal functioning and diseased lung separately. Second, we only included patients with obstructive spirometry from the COPDGene® cohort. An immediate area of future research will be to see if the results are applicable to those with preserved ratio and impaired spirometry. Third, we did not include radiographic features, such as pulmonary artery size, indicating pulmonary hypertension or coronary artery calcium scoring obtained from CT images, nor did we include small airway thickness, vessel segmentation, or the presence of mucus plugs. As our purpose of this study was to gain insight of how much CT derived lung function is contributing to COPD mortality, our model avoided radiologist readings. Future development could include a model incorporating a physical model obtained from CTFI and Artificial Intelligence derived parameters from IE-CT scans. Similarly, for this study we did not include dyspnea score, need for oxygen, or walk distance, as they were noted to be lower in the variable selection ranking. Moreover, the results of this study should be further validated on other COPD cohorts, such as with SPIROMICS. Lastly, the CTFI measurements used in this study were average values taken over whole lung and lobe volumes. Another area of future work will be to develop modeling methods capable of utilizing the full 3D spatial distribution of CTFI values when making mortality predictions.

In summary, to investigate mortality prediction for COPD patients, baseline clinical metrics, such as FEV1, were combined with average lobe and whole lung CTFI values. Correlations between FEV1 and CT-V, as well as between FEV1 and PBM, are higher than the correlations between FEV1 and other COPD mortality predictors. Further, there were statistically significant differences in regional ventilation and blood mass changes across the GOLD stages and we demonstrated that Incorporating values derived from CTFI into predictive modeling offers increased mortality prediction for patients with COPD. Average lobe and whole lung CTFI values inherently provide less detail to a predictive model than the CTFI they are derived from. Thus, these promising findings suggest that future predictive models based on FEV1 and full resolution CTFI data (Fig. 5) could have the potential to provide insights into disease progression and inform treatment decisions.

Fig. 5: PBM imaging in patients with increasing GOLD stages. Red areas showing higher CT-perfusion and blue regions low perfusion.
figure 5

The global PBM decreases with advancing GOLD stages, but the regional distribution of PBM is characteristically different.

Methods

We performed a retrospective, longitudinal analysis of 8583 patients from the COPDGene® Project (www.clinicaltrials.gov [NCT00608764]) stratified according to severity of obstruction. The study was approved as ancillary study ANC 475 by COPDGene®.

All participants were coached to full inspiration and end expiration in order to obtain volumetric computed tomographic scans without spirometric gating25,26. In this study, we only included patients with spirometric obstruction (FEV1/FVC < 70) based on GOLD stage (Fig. 6) to avoid potential bias in interpretation with other parenchymal lung diseases.

Fig. 6
figure 6

Consort diagram explaining inclusion and exclusion.

CT-V and PBM are image processing-based modalities that recover changes in local tissue volumes (ventilation surrogate) and magnitude pulmonary blood mass change (perfusion surrogate), induced by respiratory motion, from an IE-CT scan. CT-V uses the Integrated Jacobian Formulation (IJF) method, which calculates volume changes with Monte Carlo techniques with quantifiable and controllable levels of uncertainty in the image processing pipeline, which allow for robust ventilation calculations with reproducibility and good correlation with lung function parameters10,21,27,28. PBM leverages HU estimates of lung density and the robust CT-V measured volume change to compute magnitude mass changes between the IE-CT scans as a surrogate for perfusion10,11. For each patient, we estimated mean CT-V and PBM at the voxel level and averaged to lobar and global lung volume. For each lobe, we measured average CT-V in mm3(liters) and average PBM values in gm/mm3. This results in a total of 10 lobar values for each patient plus average global (i.e., average over the whole lung volume) CT-V and PBM values. Software used for the study was written in MATLAB (release R2024b, The Mathworks Inc, Natick, Massachusetts, United States). Average computation runtime for one patient was 10 minutes using a Dell Precision Laptop with an Intel Core i7-6920HQ CPU and an Nvidia Quadro M5000 graphics processing unit (GPU).

Study Analysis

Descriptive analysis was used to summarize the patient characteristics. Continuous variables are presented as means with standard deviations and medians with interquartile ranges, while categorical variables are summarized using frequencies and percentages. To compare continuous variables among groups, we employed the ANOVA test while the Chi-square test was used for categorical variables. To assess the relationships between imaging and FEV1 variables, we calculated the Spearman correlation coefficient. To visualize the long-term survival probability of subjects over a span of 10 years, Kaplan-Meier plots were generated, stratified by four Global Initiative for Chronic Obstructive Lung Disease (GOLD) stages. The primary outcome was time-to-death from any cause.

We used the Random Survival Forests (RSF) with default parameters to compare the predictive ability between models with and without the imaging variables29. RSF can model complex non-linear relationships in the data, which is a struggle for traditional linear regression models. Additionally, RSF can handle data with a large number of predictors and automatically model complex interactions between these predictions. RSF also provides a ranking of variable importance, helping to identify which variables contribute most to the prediction. While Random Forest is designed for classification/regression, it doesn’t account for censoring, which is why RSF was preferred. Cox proportional hazards models were initially considered, but RSF was preferred due to its ability to automatically capture nonlinear interactions between variables, which Cox models may fail to.

To build a robust prediction model, we first selected important clinical, spirometric, and CTFI features based on univariate Cox regression analysis and RSF-derived variable importance (Supplemental Table 1) and then used these selected features as inputs for a RSF to predict mortality probability. Comorbidities closely related to smoking habits, such as high blood pressure, congestive heart failure, and coronary artery disease, were found to not greatly impact the model output. Time-dependent receiver operator curves (ROC) and Area Under the Curve (AUC) value was used to assess model performance based on five-fold cross validations30. Specifically, we compared the models utilizing FEV1 alone and with those including additional imaging variables. To visualize prediction performance of RSF model with imaging variables, we used the model predictions to generate the Kaplan-Meier plot stratified by four quantiles, and the log-rank test to assess whether there is a statistically significant difference in survival among the four quantile groups.

To quantify the improvement in reclassification of individuals into deceased or surviving categories by the inclusion of imaging variables, we calculated the Net Reclassification Improvement (NRI) metric. Positive NRI values indicate enhanced reclassification, suggesting that the model including imaging variables is more accurate in identifying deceased and surviving individuals. We report the NRI value, 95% confidence interval, and p-value at each time point

All statistical tests were two-sided, and statistical significance was determined with a threshold of p-value < 0.05. The entire analysis was conducted using R-4.2.1, provided by the R Foundation for Statistical Computing.