Abstract
Background
Bronchiolitis Obliterans Syndrome (BOS), a fibrotic airway disease that may develop after lung transplantation, conventionally relies on pulmonary function tests (PFTs) for diagnosis due to limitations of CT imaging. Deep neural networks (DNNs) have not previously been used for BOS detection. This study aims to train a DNN to detect BOS in CT scans using an approach tailored for low-data scenarios.
Methods
We trained a DNN to detect BOS in CT scans using a co-training method designed to enhance performance in low-data environments. Our method employs an auxiliary task that makes the DNN more sensitive to disease manifestations and less sensitive to the patient’s anatomical features. The DNN was tasked with predicting the sequence of two CT scans taken from the same BOS patient at least six months apart. We evaluated this approach on CT scans from 75 post-transplant patients, including 26 with BOS, and used a ROC-AUC metric to assess performance.
Results
We show that our DNN method achieves a ROC-AUC of 0.90 (95% CI: 0.840–0.953) in distinguishing BOS from non-BOS in CT scans. Performance correlates with BOS progression, with ROC-AUC values of 0.88 for stage I, 0.91 for stage II, and 0.94 for stage III BOS. Notably, the DNN shows comparable performance on standard- and high-resolution CT scans. It also demonstrates the ability to predict BOS in at-risk patients (FEV1 between 80% and 90% of best FEV1) with a ROC-AUC of 0.87 (95% CI: 0.735–0.974). Using visual interpretation techniques for DNNs, we reveal sensitivity to hyperlucent/hypoattenuated areas indicative of air-trapping or bronchiectasis.
Conclusions
Our approach shows potential for improving BOS diagnosis by enabling early detection and management. The ability to detect BOS from standard-resolution scans at any stage of respiration makes this method more accessible than previous approaches. Additionally, our findings highlight that techniques to limit overfitting are crucial for unlocking the potential of DNNs in low-data settings, which could assist clinicians in BOS studies with limited patient data.
Plain language summary
Bronchiolitis Obliterans Syndrome (BOS) is a progressive disease that can occur after lung transplantation, leading to airway scarring. It is typically diagnosed through tests of lung function. In this study, we developed a computational approach to improve BOS detection from lung scans. We tested this approach on scans from 75 lung transplant patients, including 26 with BOS, and found that our approach could accurately distinguish BOS from non-BOS cases, even at early stages of disease. Our approach could improve early diagnosis and management of BOS, making detection more accessible using standard lung scans, and hence improving patient outcomes.
Similar content being viewed by others
Introduction
Bronchiolitis obliterans syndrome (BOS) is a progressive fibrotic lung disease affecting lung transplantation recipients and patients with hematopoietic transplantation1,2. In lung transplant recipients, BOS is defined as graft deterioration secondary to progressive airway disease for which there is no other cause. The severity of BOS is defined by the decline of forced expiratory volume in the first second (FEV1) from the patient’s best value3. BOS is associated with poor overall survival (median of 2.5 years after lung transplantation). Unfortunately, the alterations in spirometry are only apparent after the establishment of airway fibrosis. At this point, the disease is irreversible with no curative treatment4. Consequently, early diagnosis is pivotal for optimal patient management and follow-up5.
The diagnosis of BOS presents multiple challenges. Surgical lung biopsies are not routinely performed due to the high risk of complications in lung transplant recipients6,7. Less invasive, transbronchial biopsies have a low diagnostic sensitivity. The clinical presentation of BOS is similar to other obstructive lung diseases, including asthma, COPD, small airway disease and bronchiectasis, which may also present with an irreversible obstructive pattern on spirometry6. Consequently, current diagnostic methods centered on pulmonary function tests (PFTs), e.g., decline in FEV1, lack specificity, and chest imaging for diagnosing BOS are not fully established. Typical findings, such as parietal wall bronchial thickening and mosaic attenuation/perfusion pattern with air-trapping on expiration, can only be seen with advanced disease, and some BOS cases remain difficult to diagnose even in advanced forms. Moreover, CT scans are only optimal in this context if reconstructed with thin slices and the lung kernel and complemented with an additional scan on expiration. Even with low-dose CT, this approach inevitably results in higher radiation exposure and requires maneuvers that may not be feasible for patients with poor lung capacity8,9.
Ongoing research aims to address the current limitations of CT imaging in BOS diagnosis by employing visual recognition algorithms—algorithms that analyze a CT scan and estimate the variable of interest, in this case, the likelihood that the patient suffers from BOS. Algorithms previously used for BOS detection10,11,12,13,14,15,16,17,18 follow the two-step approach: First, they use heavily engineered procedures to extract information pertinent to the disease from the scan, for example, the frequency of specific voxel intensities10,11,12,14,15,16; Next, from the extracted information, they estimate the likelihood of the disease using statistical or machine learning methods. A major drawback of this approach is that it is infeasible to design the information extraction procedure in a way that will maximize the accuracy of the subsequent likelihood estimation. In consequence, in virtually all tasks of visual recognition, the two-step approach is outperformed by deep neural networks (DNNs)19,20, a class of algorithms that learn to estimate the variables of interest directly from images. This end-to-end learning, from raw data to disease likelihood, holds the promise of capturing subtle manifestations of diseases in CT scans, including ones too difficult to quantify with manually designed procedures. However, DNNs require large volumes of training data to reach their full performance, which so far hindered their application to studies with only a small number of patients. Thus far, DNNs have not been used for BOS detection.
In this study, we developed a DNN technique for BOS detection. To account for the limited number of cases in the study, we combined several established methods to improve the efficiency of the training data. Additionally, we developed a strategy that enables the DNN to simultaneously detect BOS and determine the temporal sequence of scan pairs from the same patient. As far as we are aware, our DNN-based BOS detector is the first to detect BOS from standard-resolution CT scans taken at inspiration and expiration. Moreover, it shows promise in predicting future BOS onset in patients at risk. Finally, our contribution goes beyond detecting BOS: Thanks to its data-efficiency, our method enables using DNNs in studies where the number of available patients is limited due to the low incidence of the targeted disease or by other constraints.
Methods
Study participants
Approval for this study was obtained from the relevant ethics committee (Vaud Cantonal Research Ethics Committee CER-VD reference number 2020-02455). All patients included in the study provided informed consent by signing the institutional general consent form allowing for the re-use of clinical data. We initially included 130 lung transplant recipients presenting to the University Hospital of Lausanne, Switzerland, between 1990 and 2020. Among those, 45 patients were diagnosed with BOS based on current diagnostic guidelines1,3. Among the 26 patients with BOS, 12 patients had BOS reported in the radiology report from the CT examination following the diagnosis.
For each selected patient, we extracted and anonymized thoracic CT scans from the hospital’s picture archiving and communication system (PACS). We incorporated CT scans from the time of transplantation to either the time of data extraction, the date of the patients’ demise or their second transplantation. Within the inclusion period, patients presented for a CT scan between one and eighteen times and the majority of patients had between three and ten visits (80%). We did not limit the included CT scans by the reconstruction algorithm or dose parameters. Whenever PACS contained more than one CT reconstruction of a scan, we retained all reconstructions, including, for example, one reconstruction with the lung/sharp kernel and one reconstruction with the soft/standard kernel for the same acquisition. As shown in Table 1, majority of the included scans were reconstructed once in high resolution (slice thickness ≤ 1.25 mm, lung kernel) and once with thin slices but standard kernel, or with thick slices (slice thickness > 1.25 mm) and the standard kernel. Scans with pixel size larger than 0.75 mm in the transverse plane and ones containing less than fifty transverse slices were excluded. Finally, the entire set of scans was inspected by a pulmonologist (NM), who removed 7 scans of 3 patients with lung infiltrates in more than one lobe. Since extensive lung infiltrates are easily identified in the scans, they do not constitute an important confounding disease. At the same time, they could confuse the DNN due to their distinct appearance and the small number of such scans in the training set. For patients not diagnosed with BOS, this approach resulted in 228 scans. Patients eventually diagnosed with BOS contributed a total of 163 scans, with 105 scans acquired after the onset of the disease. The onset was defined as the time when the FEV1 consistently fell below 80% of the patient’s best value, computed as the average of the two best measurements taken at least three weeks apart, in alignment with clinical guidelines21. The scan selection protocol is shown in Fig. 1. The summary of the included scans can be found in Table 1.
CT acquisition protocol
The CT scans were performed with multidetector equipment from different vendors. The scans were taken in the context of clinical follow-up of patients post lung transplantation. Tube voltages varied between 80 and 120 kV, tube currents were between 80 and 400 mA, and automodulation was used for some of the scans. These scans were reconstructed using the lung kernel to optimize the spatial resolution and with slice thickness of 1.25 mm. Additionally, most CTs were reconstructed with the standard kernel for high contrast resolution, with the same slice thickness. Some of the scans were reconstructed with the standard kernel and slices thicker than 1.25 mm. In addition to the above, some scans were performed with tube voltage between 80 and 120 kV and tube currents between 20 and 100 mA, often incorporating automated exposure control, and computed using the standard kernel and slices thicker than 1.25 mm. The number of scans reconstructed using different algorithms can be found in Table 1.
Pulmonary function testing
For PFT measurement, participants were seated and used a nose clip. Each subject performed a series of forced expiratory maneuvers, ensuring maximal effort. The spirometer (Geratherm Medical AG Germany) was calibrated daily, and quality control measures were implemented. Key parameters, including Forced Vital Capacity (FVC) and Forced Expiratory Volume in one second (FEV1), were recorded. The protocol required the participant to perform at least three acceptable and reproducible maneuvers, and the highest values were reported.
Evaluation methodology
We evaluated four major aspects of our approach: We evaluated the performance of the DNN in differentiating scans of patients with BOS from ones of patients without BOS, we investigated the changes in BOS detection accuracy as a function of BOS stage, tested the DNN in identifying patients at risk of developing BOS and verified its performance in standard- and high-resolution scans. We highlight the methodology of each of these experiments below.
Differentiating patients with BOS from patients without BOS
Our main objective was to evaluate the performance of the DNN in differentiating patients with BOS from patients without BOS. To that end, we analyzed the CT scans of BOS patients taken after their FEV1 fell below 80% of the best FEV1, which corresponds to the clinical criterion of BOS stage I21, and all scans in patients never diagnosed with BOS (non-BOS). To enhance the versatility of our approach, we made the DNN analyze a single CT scan acquired at any stage of inspiration, diverging from the conventional practice of complementing the inspiratory scan with an expiratory scan for BOS diagnosis.
Changes in detection accuracy with disease progression
To understand how the BOS detection accuracy changes as the disease progresses, we conducted evaluation on CT scans of BOS patients in stages I, II, and III of the disease, defined in accordance with the current diagnostic guidelines as a fall of FEV1 below 80%, 65%, and 50% of the best FEV1 on at least two consecutive tests taken no less than three weeks apart21. We evaluated the network’s ability to distinguish between these CT scans and CT scans of non-BOS patients.
Identifying patients at risk
From the clinical perspective, the added value of a method to automatically detect BOS from CT scans largely depends on its capacity to detect patients at risk of developing the disease before the lung function declines. To verify this capacity, we designed a comparison that closely mimics a realistic clinical scenario in which a patient whose FEV1 has decreased but remains above the threshold of BOS stage I, needs to be screened for the risk of developing BOS in the future. We evaluated the performance of the network in distinguishing the scans of BOS patients taken before the diagnosis with FEV1 between 80% and 90% of best FEV1 from the scans of non-BOS patients with relative FEV1 in the same range.
Performance in standard- and high-resolution scans
Our dataset comprises three main types of CT scans: high-resolution scans (slice thickness below 1.25 mm, lung kernel), scans reconstructed with thin slices but the standard kernel, and standard-resolution scans (slice thickness above 1.25 mm, standard kernel). Since scans of each type might contain more or less information pertaining to the presence or absence of the disease, we also investigated if one of the types is better suited for BOS detection than the others. We verified this by re-training the DNN on data sets restricted to scans reconstructed with either method and by limiting the test sets accordingly.
Performance evaluation criteria and statistical analysis
We used the Chi-squared test for contingency tables to evaluate the match in sex between the BOS and non-BOS groups. The Student’s t-test for independent samples was used to compare patient age at transplantation and FEV1 measured 3 months after transplantation. A comparison of FEV1 between the two groups at 30 months after transplantation was performed with Welch’s t-test due to the difference in the variance of FEV1 between the groups.
To evaluate our DNN in BOS detection, we performed five-fold cross-validation. We randomly divided the cohort of 26 patients with BOS and 49 non-BOS patients into four splits of 5 patients with BOS and 10 patients without BOS and one split of 6 patients with BOS and 9 patients without BOS. We trained our DNN five times according to the procedure described below, each time leaving scans of one split of patients out of the training set. We tested the resulting classifier on the scans that were not used for training.
We present the results using the receiver operating characteristic area under curve (ROC-AUC). When computing the curves, we weighted the results for individual scans to prevent biasing the results to patients who had scans performed more often. The details of this procedure are provided in the Supplementary Note.
To summarize the performance, we computed a single ROC curve for test predictions computed for CT scans of patients from all five splits and evaluated the corresponding AUC. We call this metric “aggregated AUC.” Unlike the average of AUC computed for individual splits, the aggregated AUC provides a conservative performance estimate, because each point in the aggregated ROC curve results from applying the same threshold to predictions produced by five independently trained DNNs with different optimal thresholds.
We could not apply the standard approach to computing confidence intervals for the aggregated ROC-AUC values since predictions produced by the DNN for two scans of the same patient are not statistically independent. We therefore resorted to bootstrapping by hierarchical case resampling22. We used ten thousand simulations to estimate the 95% confidence intervals. See Supplementary Note for details. To additionally illustrate the variability of the performance of our method, we report the performance attained on individual splits in the form of box plots containing five ROC-AUC values, one for each training run.
Deep learning approach
We framed detecting BOS from CT scans as a classification task. We trained a DNN to take a CT scan as input and return an estimate of the log-likelihood that the patient suffers from BOS. DNNs are a well-studied technique for image classification22 but need to be trained on large volumes of annotated data to attain high performance. The modest number of patients in our dataset makes training prone to overfitting—a phenomenon where the DNN learns to classify training scans with high accuracy but fails to classify examples not seen during training. We employed several techniques to mitigate this adverse effect. First, we used the late-fusion architecture23, which relies on a two-dimensional DNN to process scan slices independently from one another and then fuses the results to produce the disease likelihood. Compared to architectures that rely on three-dimensional representations of the scan, late-fusion DNNs have a significantly lower number of parameters, which is crucial for reducing overfitting. Second, we randomly discarded scan slices during training, retaining only 8 randomly selected slices of each scan. This forces the DNN to learn to recognize BOS from incomplete scans and discourages the DNN from classifying training scans using anatomic details or pathologies specific to individual patients since large portions of the scans are removed at random. We used complete scans at test time. Finally, we co-trained the DNN in BOS detection along with an auxiliary task: predicting the temporal order of CT scans. Given a random pair of scans of a patient with BOS taken at least six months apart, the auxiliary task is to predict which of the scans was acquired earlier. The rationale of this approach is to train the DNN to be more sensitive to the manifestations of BOS, which develop over time, and less sensitive to the anatomic features of individual patients. It also allowed us to include more CT scans of BOS patients in the training of the DNN. Training the classification task is limited to the scans of BOS patients taken after the patients met BOS clinical criteria (105 scans of 26 patients), and all scans of patients without BOS. The remaining scans of patients with BOS, ones taken before the patients met the clinical criteria, are not used for training the classification task, because it is not clear which of them represent signs of the disease and which do not. By contrast, training the task of temporal precedence prediction lets us use all the scans of BOS patients. We provide more details of the architecture of our DNN, the training procedure, and the experiments we performed to validate the design in the Supplementary Note and in Supplementary Fig. 1.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Results
Study participants
There was no significant difference between the groups of patients with and without BOS in terms of age or sex (p value 0.78 and 0.54, respectively). To compare the lung function between patients of the two groups, we evaluated their relative FEV1, computed as the ratio of the current FEV1 to the best FEV1 level, defined as the average of two best measurements, after the transplantation, and at least 3 weeks apart, as recommended by the current clinical practice guidelines. While the difference between the results obtained 3 months after the transplantation was not significant, 30 months after transplantation several BOS patients showed a marked decrease in FEV1 (p value 0.02) (Table 2). The median time to BOS onset was 3.8 years. One thousand days after transplantation, 42% of BOS patients with documented PFT results maintained relative FEV1 above 90% of the best level. This fraction diminished to 19% at two thousand days post lung transplant (Fig. 2). Total number of patients decreased with time as patients were transferred to other medical centers, underwent a second transplantation, or passed away.
The height of the bar of each color represents the number of patients in each range of relative FEV1. The total height of the colored area corresponds to the total number of patients for whom PFT results were available at the given time. FEV1 forced expiratory volume in the first second, BOS bronchiolitis obliterans syndrome.
Detection of BOS
We assessed the performance of our method in distinguishing between CT scans of BOS patients meeting the clinical criteria of BOS stage I or higher and CT scans of patients who were never diagnosed with BOS. The method was able to distinguish patients with BOS from non-BOS with a median AUC of 0.92, varying between 0.87 to 0.95 on individual splits. The aggregated AUC, less affected by the number of patients in individual splits, attained 0.90 (95% CI: 0.840–0.953) (Fig. 3).
Left: weighted ROC curves for individual splits (dotted lines), and the ROC curve resulting from aggregating all splits (continuous blue line); Right: ROC-AUC values of the dotted curves in the left summarized in a box plot. The box extends from the first quartile (Q1) to the third quartile (Q3) of the data, with a line at the median. The whiskers extend from the box to the farthest data point lying within 1.5x the interquartile range (IQR) from the box. Flier points are those past the end of the whiskers. The number of data points for each box plot is 5 (n = 5). BOS bronchiolitis obliterans syndrome.
To understand how the BOS detection accuracy changes as the disease progresses, we tested our method’s ability to distinguish between CT scans of patients within BOS stages I, II, and III, and CT scans of patients who were never diagnosed with BOS. The performance increased with the advancement of BOS: At BOS stage I, the ROC-AUC attained 0.88 (CI: 0.798–0.945) and increased to 0.91 (CI: 0.829–0.964) and 0.94 (CI: 0.869–0.989) for stages II and III, respectively. Performance on individual data splits revealed a similar pattern, with a median ROC-AUC of 0.89 for stage I, 0.93 for stage II, and 0.98 for BOS stage III (Fig. 4).
Left: aggregated weighted ROC curves representing performance of the deep neural network in distinguishing thoracic CT scans of BOS patients at different stages of the disease from scans of patients that were not diagnosed with BOS. Right: ROC-AUC values for individual splits. The box extends from the first quartile (Q1) to the third quartile (Q3) of the data, with a line at the median. The whiskers extend from the box to the farthest data point lying within 1.5x the interquartile range (IQR) from the box. Flier points are those past the end of the whiskers. The number of data points for each box plot is 5 (n = 5). BOS bronchiolitis obliterans syndrome.
To put these results in context, we analyzed the radiological reports following the diagnosis of BOS (any stage). These reports were written by either a thoracic radiologist with over 10 years of expertise or a thoracic radiologist with over 25 years of expertise. Only 12 out of 26 patients had BOS reported by the radiologist. The rate of false negative diagnosis (when a CT scan of a BOS patient was read as non-BOS or normal) and true positives were similar between both radiologists’ reports. The number of false negatives was 8 and 6 and the number of true positives 7 and 5, yielding a combined true positive rate (TPR) of 46%. For our DNN, the corresponding numbers can be read from the blue plot in Fig. 4: When distinguishing scans of patients with BOS stage I from scans of patients without BOS, at the break-even point between sensitivity (TPR) and specificity (1-FPR), the DNN attains a true positive rate of 77% for a false positive rate of 23%.
Risk prediction for future development of BOS
To verify the capacity of our method to detect patients at risk of developing BOS before they meet the clinical criteria, we evaluated the performance of the network in distinguishing CT scans of BOS patients before the diagnosis with FEV1 between 80% and 90% of best FEV1 from CT scans of non-BOS patients with FEV1 in the same range. The network attained an aggregated ROC-AUC of 0.87 (CI: 0.735–0.974), and between 0.75 and 1.0 on individual splits. The variance of this result can be attributed to the low number of CT scans in the considered FEV1 range (11 BOS and 20 non-BOS patients). To contrast these results with a yet more challenging scenario, we also evaluated our method in distinguishing scans of BOS and non-BOS patients with FEV1 in the range of 90–100% of the baseline, where we expect to see much less difference between the patients with and without BOS. At this range of FEV1, the aggregated AUC decreased to 0.61 (CI: 0.412–0.792) (Fig. 5). This points to either the absence of disease at this stage or to the abnormalities being too subtle for our method to detect.
Deep network prediction of future onset BOS in patients with relative FEV1 in the range of 90–80% (orange) and 100–90% (blue). Left: aggregated ROC curves. Right: ROC-AUC computed on individual splits. The box extends from the first quartile (Q1) to the third quartile (Q3) of the data, with a line at the median. The whiskers extend from the box to the farthest data point lying within 1.5x the interquartile range (IQR) from the box. Flier points are those past the end of the whiskers. The number of data points for each box plot is 5 (n = 5). FEV1 forced expiratory volume in the first second, BOS bronchiolitis obliterans syndrome.
Detection based on high- and standard-resolution scans
Radiological changes associated with advanced BOS have traditionally been identified using a pair of CT scans, one of which was taken on inspiration and the other on expiration. This results in a higher radiation exposure than a standard chest CT taken only on inspiration, even when a low-dose protocol is used. In addition to using CT scans at any stage of respiration as input, we probed the impact of image resolution on our method’s performance. We investigated performance in the three types of scans most common in our dataset: high-resolution scans (slice thickness ≤ 1.25 mm, lung kernel), thin-slice scans (slice thickness ≤ 1.25 mm, standard kernel), and standard-resolution scans (slice thickness > 1.25 mm, standard kernel). For BOS patients, only scans taken post-disease onset were included. The dataset comprised 97 high-resolution scans of 25 patients with BOS and 214 such scans from 49 patients without BOS, 55 thin-slice scans of 18 patients with BOS and 116 such scans of 42 patients without BOS, and 75 standard-resolution scans of 24 patients with BOS and 147 such scans of 47 patients without BOS. Multiple scans were reconstructed with more than one algorithm. First, we evaluated the DNN trained on all types of scans in each type of scan separately. Remarkably, the method exhibited comparable performance in all three scan types, achieving an aggregated ROC-AUC of 0.90 for high-resolution scans (the blue-dashed plots in Fig. 6), 0.91 for scans with thin slices (the blue dash-dotted plots in Fig. 6), and 0.90 for standard-resolution scans (the blue-dotted plots in Fig. 6). This suggests that all three scan types are well suited for BOS detection with our DNN.
The performance of the network trained with high-resolution scans (slice thickness ≤ 1.25 mm, lung kernel) (a), thin-slice scans (slice thickness ≤ 1.25 mm, standard kernel) (b) and standard-resolution scans (slice thickness > 1.25 mm, standard kernel) (c), and tested on scans of each type separately, is shown in respective panels. For reference, performance of the DNN trained with all scans is shown in each panel in blue. In each subfigure, the Left plot shows the aggregated ROC curves. The Right plot shows the AUC attained in individual splits. The box extends from the first quartile (Q1) to the third quartile (Q3) of the data, with a line at the median. The whiskers extend from the box to the farthest data point lying within 1.5x the interquartile range (IQR) from the box. Flier points are those past the end of the whiskers. The number of data points for each box plot is 5 (n = 5). BOS bronchiolitis obliterans syndrome.
The uniform performance of the DNN across different scan types raises an additional question: Does training the DNN on a mixture of scans of different types compromise its performance in any specific scan type? More specifically, scans of one type might contain additional evidence of the disease that the DNN fails to capture when trained on a mixture of scans, due to the absence of this evidence in the training scans of the other types. To verify this, we re-trained the DNN exclusively on scans of each of the three types. The re-training did not yield increased performance in any of the scan types. As shown in Fig. 6, when trained on scans of a specific type, the DNN detected BOS in scans of this type almost as well as when trained on a mixture of scans. However, performance in other types of scans suffered significantly. A slight edge of the DNN trained on a mixture of scans over the DNNs specialized to one type of scans suggests that mixing the scans of different types during training helps the DNN home in on the manifestations of the disease in each scan type.
Verification of our deep learning approach
To demonstrate the contribution of each of the deep learning techniques to performance, we re-trained the DNN with each of the techniques switched off and evaluated it in distinguishing BOS from non-BOS similar to the method described above. We present the results in Supplementary Note; we show the importance of each of the techniques in Supplementary Fig. 2 and Supplementary Table 1; we present the effect of co-training in Supplementary Fig. 3, and demonstrate the effect of dropping scan slices in Supplementary Fig. 4. The results confirm that each of the techniques contributes to the result. Notably, disabling all the techniques, which reduced our method to a standard deep learning approach, dropped the performance to 0.74 (CI: 0.629–0.831) ROC-AUC. This illustrates the importance of mitigating overfitting when deploying DNNs in small-scale studies.
Generating explanations for the deep neural network’s decisions
To delineate the lung regions instrumental in the DNN’s diagnostic process, we employed Guided Grad-CAM24, a technique to produce masks that highlight the image areas important to the network’s decision-making. In essence, Guided Grad-CAM utilizes the DNN’s gradient to accentuate regions in the image whose alteration would bolster the decision. Examples of this explanation maps are provided in Fig. 7a–c. In patients with established BOS the marking consistently manifests in hyperlucent/hypoattenuated areas, compatible with either air-trapping or bronchiectasis. The same focal lesions are marked in both scans reconstructed with thin and thicker slices. DNN, however, marked areas with seemingly normal parenchyma prior to the establishment of BOS diagnosis.
a Left: a CT slice; Right: the same slice with the attention map obtained using Grad-CAM algorithm. b DNN interpretation on CTs reconstructed with thin and thicker slices obtained at the same radiology visit. c DNN interpretation prior BOS diagnosis. Scale bar, 1 cm. DNN deep neural networks, BOS bronchiolitis obliterans syndrome.
Discussion
While DNNs have shown remarkable success in detecting lung diseases such as COVID-19, their application to BOS detection has, to date, remained unexplored. With this study, we bridge this gap. Importantly, our method shows promising performance even in the absence of CT scans dedicated to BOS diagnosis, excluding the need for special CT protocols involving exhalation. Notably, expiratory CT may be challenging in this setting, especially in the case of severe BOS, in which diffuse involvement of small airways may preclude the accurate assessment of air-trapping.
Previous methods of CT scan analysis for BOS detection rely on pre-defined procedures to identify patterns in the lung and cannot benefit from signs of BOS not captured by these procedures. However, designing procedures to extract all pertinent information from the scans may not be feasible. Prior research suggests that BOS can manifest itself in non-obvious ways, including by alterations to the volume and surface of the airways, modifications of the volume of pulmonary vasculature17,25, and changes in the density of microvasculature26,27, all of which may affect the appearance of the scans. This prompted us to take a different approach and use a DNN that does not rely on fixed pattern identification procedures but instead learns to extract pertinent information directly from the scans. When benchmarked in distinguishing scans of patients with and without BOS, our DNN attains an area under the receiver operating characteristic (ROC-AUC) of 0.90 (CI: 0.840–0.953).
We did not observe significant differences between the performance of our DNN when trained or tested exclusively with scans reconstructed with the lung or the standard kernel, provided the reconstruction was performed with thin slices. This is consistent with previous work28, which reported that, for interstitial lung diseases, the difference between the recognition rates attained by radiologists when reading scans reconstructed with thin slices and the standard kernel or the lung kernel is not significant. Interestingly, we could also train the DNN to detect BOS in non-dedicated CT scans, taken at any stage of inspiration and reconstructed with thick slices and the standard kernel. The high performance of the resulting BOS detector suggests that current diagnostic procedures that rely on high-resolution scans and benefit from the additional expiratory scan might not fully exploit the information contained in CT scans and that DNNs can help unlock currently unused information.
The ability to identify BOS in routinely conducted CT scans, without the necessity for high-resolution or paired inspiratory-expiratory scans, implies that our deep-learning-based BOS marker can be computed at reduced expense and without exposing patients to additional procedural risks. This stands in contrast to invasive procedures like surgical lung biopsies, which, although informative, are not routinely used due to the risks of adverse events. The integration of data from various non-invasive modalities, such as CT scans and PFTs, has the potential to create a more comprehensive diagnostic profile of BOS, improving diagnostic accuracy and contributing to a more nuanced understanding of disease progression. In all cases, future studies should employ reduced-dose CT to evaluate the performance of this algorithm, aiming to further limit radiation exposure for patients who typically require multiple CT scans during their follow-up.
Moreover, our method shows the promise of early BOS detection, reaching an ROC-AUC of 0.87 (CI: 0.735–0.974) in patients whose lung function decreased but remains above BOS clinical criteria. These results render DNNs a promising avenue for improving BOS diagnosis and enabling early detection, crucial for improving patient outcome. This result is well aligned with previous reports of predicting BOS onset based on airflow simulations and computations of airway volume in lung models reconstructed from CT scans25. In contrast to this approach, our DNN can be applied directly to CT scans, without the need of constructing airway models. Further research is needed to establish a standard of early management, such as azithromycin or extracorporeal photopheresis29, that could be administrated to patients with elevated risk of developing BOS, identified by a prognostic method, like our DNN.
The effectiveness of the DNN in our study is significantly influenced by strategies employed to counter overfitting, notably our innovative co-training configuration: A DNN trained without these techniques attains a ROC-AUC of 0.74 (CI: 0.629–0.831), which represents a large performance drop from the 0.90 (CI: 0.840–0.953) ROC-AUC attained by our method. Anticipatedly, these techniques will diminish in significance with the availability of a sufficiently extensive training dataset. However, financial constraints often restrict the size of patient cohorts accessible for the initial assessment of novel technologies in diagnosing diseases with low incidence. In this context, our method could offer valuable benefits to future studies facing similar limitations.
The main limitation of our study is its retrospective, single-center design and the modest size of our patient cohort. A large-scale study would represent the opportunity to increase the size of the training set, which we expect to result in higher performance. Additionally, even though we did not exclude patients with confounding diseases, such as restrictive allograft syndrome (RAS), viral infections, and anastomotic problems, from the group of patients without BOS, we did not evaluate the performance of our DNN in distinguishing specifically between BOS and each of these diseases. A study including separate groups of patients with confounding diseases would enable filling this gap.
To conclude, to the best of our knowledge, we described a novel method for detecting BOS through deep learning applied to CT scans. Our approach holds clinical relevance in lung transplantation, thanks to the capacity to detect BOS early, pivotal in enabling timely interventions and improving patient care.
Data availability
For legal reasons, access to the CT scans used in this study is regulated by the Vaud Cantonal Research Ethics Committee, Switzerland (CER-VD). Handling of the data is only allowed for the time and purpose specified by the authorization given by CER-VD. To be granted access to the data, the Readers need to request authorization by CER-VD. The supplementary material includes the Supplementary Note detailing our deep learning methodology and contains the Supplementary Figs. 1–4 and Supplementary Table 1. Additionally, we attach the supplementary files containing the data plotted in Figs. 2–6 and Supplementary Figs. 2–4: The supplementary data file 1 (figure_2_numbers.tar) contains the numbers plotted in Fig. 2; the supplementary data file 2 (figure_3_numbers.tar) contains the numbers plotted in Fig. 3; the supplementary data file 3 (figure_4_numbers.tar) contains the numbers plotted in Fig. 4; the supplementary data file 4 (figure_5_numbers.tar) contains the numbers plotted in Fig. 5; the supplementary data file 5 (figure_6_numbers.tar) contains the numbers plotted in Fig. 6; the supplementary data file 6 (supplementary_figure_2_numbers.tar) holds the numbers plotted in Supplementary Fig. 2; the supplementary data file 7 (supplementary_figure_3_numbers.tar) holds the numbers plotted in Supplementary Fig. 3; the supplementary data file 8 (supplementary_fig. _4_numbers.tar) holds the numbers plotted in Supplementary Fig. 4.
Code availability
The code required to reproduce the results is publicly available at https://github.com/mkozinski/BOSDetection30.
References
Al-Githmi, I. et al. Bronchiolitis obliterans following lung transplantation. Eur. J. Cardiothorac. Surg. 30, 846–851 (2006).
Grønningsæter, I. S. et al. Bronchiolitis obliterans syndrome in adults after allogeneic stem cell transplantation—pathophysiology, diagnostics and treatment. Expert Rev. Clin. Immunol. 13, 553–569 (2017).
Hayes, D. A review of bronchiolitis obliterans syndrome and therapeutic strategies. J. Cardiothorac. Surg. 6, 1–9 (2011).
Hakim, A. et al. Diagnosis and treatment of bronchiolitis obliterans syndrome accessible universally. Bone Marrow Transplant. 54, 383–392 (2019).
Sengsayadeth, S. M., Srivastava, S., Jagasia, M. & Savani, B. N. Time to explore preventive and novel therapies for bronchiolitis obliterans syndrome after allogeneic hematopoietic stem cell transplantation. Biol. Blood Marrow Transplant. 18, 1479–1487 (2012).
Visscher, D. W. & Myers, J. L. Bronchiolitis: the pathologist's perspective. Proc. Am. Thorac. Soc. 3, 41–47 (2006).
Paradela, M. et al. Surgical risk factors associated with lung transplantation. Transplant. Proc. 41, 2218–2220 (2009).
Gunn, M. L. D., Godwin, J. D., Kanne, J. P., Flowers, M. E. & Chien, J. W. High-resolution CT findings of bronchiolitis obliterans syndrome after hematopoietic stem cell transplantation. J. Thorac. Imaging 23, 244 (2008).
Winningham, P. J. et al. Bronchiolitis: a practical approach for the general radiologist. Radiographics 37, 777–794 (2017).
Galbán, C. J. et al. Parametric response mapping as an indicator of bronchiolitis obliterans syndrome after hematopoietic stem cell transplantation. Biol. Blood Marrow Transplant. 20, 1592–1598 (2014).
Verleden, S. E. et al. Parametric response mapping of bronchiolitis obliterans syndrome progression after lung transplantation. Am. J. Transplant. 16, 3262–3269 (2016).
Barbosa, E. M. et al. Multivariate modeling using quantitative CT metrics may improve accuracy of diagnosis of bronchiolitis obliterans syndrome after lung transplantation. Comput. Biol. Med. 89, 275–281 (2017).
Horie, M. et al. Quantitative chest CT for subtyping chronic lung allograft dysfunction and its association with survival. Clin. Transplant. 32, e13233 (2018).
Sharifi, H. et al. Head-to-head comparison of qualitative radiologist assessment with automated quantitative computed tomography analysis for bronchiolitis obliterans syndrome after hematopoietic cell transplantation. J. Thorac. Imaging 37, 109–116 (2022).
Cheng, G. S. et al. Multicenter evaluation of parametric response mapping as an indicator of bronchiolitis obliterans syndrome after hematopoietic stem cell transplantation. Am. J. Transplant. 20, 2198–2205 (2020).
Belloli, E. A. et al. Radiographic graft surveillance in lung transplantation: prognostic role of parametric response mapping. Am. J. Respir. Crit. Care Med. 204, 967–976 (2021).
McInnis, M. C. et al. Chronic lung allograft dysfunction phenotype and prognosis by machine learning CT analysis. Eur. Respir. J. 60, 2101652 (2022).
Weigt, S. S. et al. Quantitative image analysis at chronic lung allograft dysfunction onset predicts mortality. Transplantation 106, 1253–1261 (2022).
Krizhevsky, A., Sutskever, I. & Hinton, G. E. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 84–90 (2017).
Lecun, Y., Bottou, L., Bengio, Y. & Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998).
Meyer, K. C. et al. An international ISHLT/ATS/ERS clinical practice guideline: diagnosis and management of bronchiolitis obliterans syndrome. Eur. Respir. J. 44, 1479–1503 (2014).
Saravanan, V., Berman, G. J. & Sober, S. J. Application of the hierarchical bootstrap to multi-level data in neuroscience. Neuron Behav. Data Anal. Theory 3, 1–25 (2020).
Argus, M., Schaefer-Prokop, C., Lynch, D. & van Ginneken, D. Function Follows Form: Regression from Complete Thoracic Computed Tomography Scans. CoRR abs/1909.12047 (2019).
Selvaraju, R. R. et al. Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. 128, 336–359 (2020).
Barbosa, E. J. M. et al. Machine learning algorithms utilizing quantitative CT features may predict eventual onset of bronchiolitis obliterans syndrome after lung transplantation. Acad. Radiol. 25, 1201–1212 (2018).
Arjuna, A. et al. An update on current treatment strategies for managing bronchiolitis obliterans syndrome after lung transplantation. Expert Rev. Respir. Med. 15, 339–350 (2021).
von der Thüsen, J. H. et al. The histomorphological spectrum of restrictive chronic lung allograft dysfunction and implications for prognosis. Mod. Pathol. 31, 780–790 (2018).
Klaus, J. et al. Influence of lung reconstruction algorithms on interstitial lung pattern recognition on CT. RoFo 195, 47–54 (2022).
Karnes, H. E. et al. Factors associated with mortality and response to extracorporeal photopheresis in lung allograft recipients with bronchiolitis obliterans syndrome. Transplantation 103, 1036–1042 (2019).
Koziński, M., Oner, D. & Gwizdała, J. BOS detection with a deep neural network—code repository. GitHub http://github.com/mkozinski/BOSDetection; https://doi.org/10.5281/zenodo.10980623 (2024).
Acknowledgements
This work was supported by the FWF Austrian Science Fund Lise Meitner (M3374 to M.K.), Swiss Cancer League (KFS-5433-08-2021 to N.M. and C.V.G.) and SNSF grants (CRSK-3_190526 and 310030_204938 to S.J.R.). The funders of the study had no role in study design, data analysis, data interpretation, or writing the manuscript.
Author information
Authors and Affiliations
Contributions
M.K. led the project, designed and implemented the DNN, performed data preprocessing, model training, and evaluation of imaging-based predictions. M.K. also contributed to the writing and revision of the manuscript. D.O., J.G., P.F., and H.B. assisted with the development of image processing algorithms and supported the optimization of CT scan data analysis and appropriate statistical tests. C.B.A. ensured the accurate use of radiological terminology throughout the study, edited the manuscript and assisted with ethical request. A.K., A.C., and M.D.P. gave intellectual insights and assisted with manuscript writing. A.V. helped with clinical data extraction. J.D.A. and C.V.G. provided oversight for the project and assisted with data extraction and manuscript writing. S.J.R. assisted with data extraction, the implementation of the DNN and manuscript writing. M.U. helped with design and implementation of the DNN, data preprocessing and appropriate statistical testing. N.M. provided oversight throughout the project, guided the experimental design, and assisted with clinical and CT extraction. She designed research and interpreted results, supervised and coordinated the study and wrote the paper along with M.K. with input from all authors.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Communications Medicine thanks Emma Chen and the other anonymous reviewer(s) for their contribution to the peer review of this work. Peer review reports are available.
Additional information
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Koziński, M., Oner, D., Gwizdała, J. et al. Harnessing deep learning to detect bronchiolitis obliterans syndrome from chest CT. Commun Med 5, 18 (2025). https://doi.org/10.1038/s43856-025-00732-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s43856-025-00732-x