Introduction

Non-small cell lung cancer (NSCLC) is one of the most common causes of cancer-related death worldwide accounting for about 75-80% of all primary lung cancer patients1,2. At present, the treatment of NSCLC mainly includes surgical resection, radiotherapy (RT), chemotherapy, targeted therapy, immunotherapy, microwave ablation and so on. RT plays a crucial role in the treatment of NSCLC, especially for locally advanced and unresectable cases3,4,5. However, some patients may experience additional radiation-related adverse reactions such as radiotherapy induced lung injury, including radiation pneumonitis (RP) and pulmonary fibrosis (PF). RP is an acute inflammation of lung tissue caused by radiation exposure to normal tissues and is the most common side effect of thoracic radiotherapy, often occurring within 6 months post-RT, which may cause respiratory insufficiency, affect patients’ quality of life, and even lead to death6,7,8.

The pathogenesis of radiation-induced lung toxicity is described as multiple interacting cellular activities such as hypoxia, fibrogenesis, inflammation, and angiogenesis9. Patients with stage III/IV NSCLC have a 30–40% risk of developing RP8. Numerous studies have shown that some clinical risk factors, such as pulmonary function, smoking history, tumor location, lung interstitial disease, pulmonary emphysema, immunotherapy etc. are closely related to the occurrence of RP8,10,11,12. Recent developments in quantitative analysis of medical images using artificial intelligence (AI) tools, such as machine learning, have created new frontiers in oncologic imaging. Radiomics, a new AI tool based on textural analysis extracting quantitative data from medical images, has been used to predict treatment response in different types of cancer, including patients with lung cancer treated with chemotherapy and/or RT13,14. Several studies have demonstrated the potential of radiomics to predict RP15,16,17. In addition, the occurrence of RP is directly related to radiation dose. Some studies have shown that RP is related to RT dose derived from dose-volume histograms (DVH), such as volume of the lung receiving 5 Gy (V5), volume of the lung receiving 10 Gy (V10), volume of the lung receiving 20 Gy (V20), volume of the lung receiving 30 Gy (V30), or mean lung dose (MLD)8,15. However, we can only summarize the two-dimensional dose distribution in the target from the DVH parameters, and cannot obtain the spatial dose distribution from it18. Dosiomics derived from radiomics, can describe the spatial heterogeneity of dose distribution compared to DVH. Zhang15 and Huang18 extracted dosiomic features from 3D dose distribution for RP prediction, which played an important role in RP prediction. Besides, as a branch of artificial intelligence, deep learning (DL) technology can automatically learn representative mass and high-dimensional data information from raw medical image data, including decoding the radiomics representation of tumors, which shows great potential in the application of tumor medicine19,20. Radiomics and DL features represent different modes of image analysis that are not redundant, but complementary21.

Previous studies have shown that radiomics and DL can be used to predict RP, but most of them are single-model predictions, and few studies have conducted conjoint analyses. In this study, we aimed to develop and validate a combined model of deep learning radiomics and dosiomics nomogram (DLRDN) based on simulated location CT and dosimetry images to predict ≥ 2 grade RP (RP2) in NSCLC.

Materials and methods

Patient characteristics

This was a retrospective multi-center study approved and need to obtain informed consent was waived by the Ethics Committee of the Affiliated Hospital of Jining Medical University (Jining Medical University Ethics Committee for Human Research), China. This study was conducted according to the Declaration of Helsinki.

In this study, 162 patients with NSCLC who received intensity modulated radiation therapy (IMRT) or volumetric modulated arc therapy (VMAT) in Hospital I from June 2016 to December 2022 were retrospectively collected and randomly divided into a training cohort(n = 113, 74 without RP2 and 39 with RP2) and an internal validation cohort(n = 49, 32 without RP2 and 17 with RP2) at a ratio of 7:3. 83 patients(59 without RP2 and 24 with RP2) from two other hospitals were collected as an external validation cohort from January 2019 to December 2022. All patients underwent IMRT or VMAT and were administered radiotherapy at 1.8–3 Gy per day, 5 days s week, with 20–30 fractions and with a total dose of 40–60 Gy. The inclusion criteria of this study were as follows: (i) Definite pathological diagnosis of NSCLC. (ii) Patients who had completed radiotherapy with or without chemotherapy and had comprehensive clinical data. (iii) Follow up for at least 6 months after RT or for endpoint events. The endpoint was defined as the occurrence of RP2 within 6 months after the end of RT. The exclusion criteria of this study were as follows: (i) Surgically removed lung cancer patients. (ii) Patients received repeated chest RT or stereotactic body radiation therapy (SBRT). (iii) Patients with other malignant tumors or mediastinal lung cancer.

The severity of RP is graded according to the classification criteria for acute radiation pneumonia developed by the American Radiation Therapy Oncology Group (RTOG). A grade ≥ 2 was classified as symptomatic RP, which necessitated the use of steroids or the restriction of instrumental activities of daily living8. So, in this study, the risk of RP was divided into two levels, <2 grade (without RP2) for the low-risk group, ≥ 2 grade (RP2) for the high-risk group. The grade of RP was determined by a radiologist and an oncologist separately, and by a third physician if there was disagreement. The overall workflow of the RP2 predictive model development and validation was shown in Fig. 1.

Fig. 1
figure 1

The overall workflow of the RP2 predictive model development and validation.

Establishment of CM

Clinical data included age, sex, smoking status, emphysema, pathological type, T stage, peripheral blood markers, immunotherapy status, and DVH parameters. The peripheral blood markers mainly included lymphocytes, neutrophils, monocytes, platelets, neutrophil-to-lymphocyte ratio (NLR), platelet-to-lymphocyte ratio (PLR) and lymphocyte-to-monocyte ratio (LMR). The T stage was based on guidelines from the American Joint Committee on Cancer “Cancer Staging Manual, Eighth Edition”. The DVH parameters included V5, V10, V20, V30 and MLD. Multivariate analysis was used to screen independent clinical predictors and establish clinical model (CM).

CT acquisition and image segmentation

All patients were scanned and underwent tumour location by large aperture CT analog positioner (PHILIPS Brilliance CT Big Bore, and Canon Aquilion Prime TSX-303 A). The scanning voltage was 120 kV, the tube current was 250mAs, the helical sweep pitch was 1, and slice thickness was 5 mm. Experienced radiation oncologists delineated the gross tumor volume (GTV) of lung cancer and malignant lymph nodes in the Eclipse, Monaco, or Raystation treatment planning system (TPS). The clinical tumor volume (CTV), including potential areas of suspected microscopic tumor invasion and possible microscopic diffusion paths, was derived by GTV isotropic expansion of 5 mm, and the planning target volume (PTV) was generated by isotropic expansion of 5 mm on the basis of CTV. The target prescription dose covers at least 95% of PTV. Since RP primarily occurs in normal lung tissue rather than in the high-dose PTV region, selecting an appropriate region of interest (ROI) is critical for accurate prediction. Prior studies by Jiang et al.22 and Meng et al.23 showed that models based on the total lung minus PTV (TL-PTV) region yielded better predictive performance for symptomatic RP compared to other ROIs such as TL-GTV, PTV, or GTV. Based on this evidence and the biological relevance of TL-PTV to RP development, it was chosen as the ROI in this study. The planning CT, ROI and 3D dose grids of planned radiotherapy were exported from TPS in digital imaging and communications in medicine (DICOM) format.Prior to feature extraction, CT and dosimetry images were isotropic spatially resampled (1 mm) to ensure comparability. Three-dimensional (3D) tumor segmentation was performed using 3D Slicer (software version 5.1.0, https://www.slicer.org). A sensitivity analysis was performed to evaluate the impact of inter-institutional variations in radiotherapy equipment and dose fractionation on the results.

Feature extraction and selection of RD and DL

A total of 214 radiomics and dosiomics(RD) features were extracted from each ROI (TL-PTV) using PyRadiomics, including first-order, texture and shape features. Additionally, 4096 DL features were extracted using a 3D ResNet50 architecture. The input images were cropped and resized to 96*96*96 voxels. To address limited data and improve model generalizability, data augmentation techniques—such as random clipping, rotation, flipping, and transfer learning—were applied during training. All features were standardized using Z-score normalization. Feature selection was performed using least absolute shrinkage and selection operator (LASSO) regression with 10-fold cross-validation to enhance model stability. RD-score and DL-score were calculated for each patient based on the multivariate logistic regression product of the selected features.

Development of the RD model, DL model and DLRDN

The selected RD features were input into supervised classifiers, including logistic regression (LR), support vector machine (SVM), and random forest (RF) to construct RD models, and the optimal model was selected based on the best-performing machine learning algorithm. Similarly, the DL model was developed using the selected DL features. A combined DLRDN was developed by combing the independent clinical features, RD-score and DL-score using multivariate logistic regression analysis.

Performance assessment of different models

The performances of each prediction model (CM, RD, DL, and DLRDN) were tested using the area under the curve (AUC) of the receiver operating characteristic (ROC). The optimum cut-off value was obtained by the Youden index, and the sensitivity, specificity, and accuracy were calculated in the training cohort, internal validation cohort and external validation cohort. The decision curve analysis (DCA) was used to determine the clinical benefit of each model. Calibration curves were drawn to evaluate the calibration of the models in the training cohort, internal validation cohort and external validation cohort by Hosmer-Lemeshow goodness-of-fit tests.

Statistics

To compare the differences in clinical characteristics of patients in different groups or cohorts, independent t test or Mann-Whitney U test was used for continuous variables, and chi-square test was used for categorical variables. SPSS software (version 20.0, IBM) was used for univariate and multivariate analysis (including chi-square test, t-test or Mann-Whitney U test). The R statistical software was used for LASSO regression analysis, ROC, and DCA analysis. P < 0.05 was considered statistically significant.

Results

Clinical characteristics of the patients

Clinical characteristics of patients in different cohorts are shown in Table 1. RP2 patients accounted for 34.5% (39/113) in the training cohort, 34.7% (17/49) in the internal validation cohort, and 28.9% (24/83) in the external validation cohort. Sensitivity analysis revealed no significant impact of inter-institutional variations in radiotherapy equipment or dose fractionation schedules on treatment outcomes (p > 0.05). No significant difference was detected in the age, gender, smoking, emphysema, pathology, neutrophils, monocytes, NLR, LMR, V5 and V10 between the RP2 and without RP2 groups in all three cohorts(p > 0.05). Multivariate analysis identified V20, V30, and MLD as independent predictors of RP2 (p < 0.05), which were subsequently used to establish the CM.

Table 1 Baseline clinical characteristics of patients in the training cohort, internal and external validation cohorts.

Feature selection and model construction of RD, DL, and DLRDN

Finally, 7 RD features including 4 radiomics features(1 first-order feature, 3 texture features) and 3 dosiomics features(3 texture features) were finally screened out which were closely associated with RP2 to build RD model (Suppl. Figure 1). We further evaluated the performance of three machine learning algorithms—LR, SVM, and RF—in predicting RP. All three models demonstrated favorable predictive performance, with LR slightly outperforming SVM and RF (Suppl. Figure 2). We ultimately selected LR as the primary modeling approach due to its stable performance, simplicity, and interpretability. 10 DL features were selected, among which 3 were from the CT images and 7 were from the dosimetry images to build DL model. The RD-score and DL-score were calculated based on the coefficients weighted by LR. Multivariate logistic regression analysis was performed on the independent clinical features, RD-score, and DL-score to construct the DLRDN (Fig. 2).

Performance comparison of different models

The 10-fold cross-validation results of the RD and DL models were presented in Suppl. Figure 3. The cross-validation showed that the RD model achieved a mean AUC of 0.775 ± 0.077, while the DL model achieved a mean AUC of 0.888 ± 0.053. The AUC, sensitivity, specificity, and accuracy of CM, RD, DL, and DLRDN in the training cohort, internal validation cohort and external cohort were shown in Table 2. DLRDN showed a good performance for RP2 prediction in the training cohort with an AUC of 0.891(95% CI 0.826–0.957), which was further confirmed in the internal and external validation cohorts with an AUC of 0.825 (95% CI 0.693–0.957) and 0.801 (95%CI, 0.698–0.904), respectively (Fig. 3). In both the training cohort and external validation cohort, DLRDN had significantly higher AUC than the CM (p < 0.05). DCA showed that DLRDN had a higher overall net benefit than the other models across most ranges of reasonable threshold probabilities, suggesting that DLRDN could be used to predict the occurrence of RP2 in NSCLC patients (Fig. 4A). The calibration curves of the DLRDN demonstrated that model-predicted RP2 was well-calibrated with the actual observation in all cohorts (p > 0.05) (Fig. 4B).

Fig. 2
figure 2

The DLRDN, developed based on training cohort combining the RD-score, DL-score and clinical features.

Fig. 3
figure 3

Receiver operating characteristic (ROC) curves of the predictive models. ROC curves of DLRDN, DL, RD and CM for predicting RP2 in (A) training cohort, (B) internal validation cohort, and (C) external validation cohort.

Fig. 4
figure 4

(A) Decision curve analysis of the four models. (B) Calibration curves of DLRDN in all the three cohorts.

Table 2 Performance of different models in training cohort, internal and external validation cohort.

Discussion

In our study, we investigated a novel comprehensive model (DLRDN) combining radiomics, dosiomics, DL features and independent clinical factors to predict RP risk stratification in NSCLC patients, and validated its ability to predict RP2 in the multicentre externally validation cohort. The model demonstrated robust predictive performance in both internal and external validation cohort (AUC: 0.825, 0.801, respectively). The predictive efficacy of DLRDN was better than that of clinical model, which demonstrated the added value of radiomics/dosiomics and DL features to traditional clinical and dosimetric parameters in prediction of RP2 in NSCLC.

RP is an important adverse event in NSCLC patients receiving chest radiotherapy. It is mainly managed with corticosteroids, and treatment failure can be fatal, thus offsetting the survival benefits of radiotherapy and affecting patients’ quality of life8,24,25. Patients with RP ≥ 2 represent a highly heterogeneous group. Early identification of high-risk individuals with symptomatic RP before treatment is crucial for effective prevention.

The incidence and severity of RP were directly related to the dose distribution in lung. Dosimetric factors such as MLD, V5, V10, V20, V30 have been widely used to predict RP8,11,15. In our study, MLD, V20 and V30 were identified as independent predictors of RP2. However, their predictive performance was limited, with AUCs of 0.736, 0.733 and 0.631 in the training, internal validation and external validation cohort, respectively. These results suggested that traditional DVH parameters might not fully capture the complexity of RP risk. Liang et al.26,27 reported that increased local dose variation in the ipsilateral lung and larger low-dose regions in the total lung were significantly associated with higher RP incidence. Unlike spatial metrics, DVH parameters represent only the cumulative dose to specific lung volumes and fail to account for spatial heterogeneity, local dose gradients, or anatomical context. This highlights the need for more refined metrics that incorporate spatial dose distribution and organ-specific characteristics to improve RP risk prediction.

Radiomics is a high-throughput extraction of quantitative features from medical images, offering a non-invasive means to assess pulmonary heterogeneity before and during radiotherapy. Prior studies have demonstrated its potential in predicting symptomatic RP. Krafft et al.17 reported an AUC of 0.68 using whole-lung CT radiomics, outperforming clinical and dosimetric parameters (AUC: 0.51). Study by Nie et al.28 also demonstrated that radiomics had good performance in predicting symptomatic RP, with an AUC of 0.740–0.802. Dosiomics can provide quantitative measurements of intensity, shape, or heterogeneity within a defined volume of interest, capturing the inhomogeneity of dose distribution in a medical image18,29. Liang et al.26 showed that dosiomics outperformed traditional dosimetric factors in RP prediction (AUCs: 0.709 and 0.782 vs. 0.665 and 0.676). Zhang et al.15 showed that the combined model composed of radiomics features, dosiomics features and clinical parameters yielded the highest AUC (0.793 in the training set, and 0.855 in the testing set), which was higher than the single radiomics performance(AUC: 0.676 in the training set, and 0.671 in the testing set) and dosiomics performance (AUC: 0.728 in the training set, and 0.684 in the testing set). In our study, the AUCs of RP2 prediction using radiomics and dosiomics were 0.775, 0.783, and 0.637 in the training, internal validation, and external validation cohorts, respectively—all higher than those achieved using clinical features alone. These findings were consistent with previous studies. Additionally, we further evaluated the performance of SVM and RF in predicting RP. Although both algorithms showed good predictive performance, LR achieved slightly superior results. Compared to the more complex and parameter-sensitive SVM and RF, LR offers greater interpretability, lower computational complexity, and enhanced clinical applicability. This study primarily aimed to assess the incremental value of integrating multi-dimensional data, and LR was employed as an illustrative modeling tool to highlight this benefit. The selected radiomics and dosiomics features were predominantly texture-based, in agreement with earlier finding30. Among them, the Gray Level Dependence Matrix (GLDM)-derived feature Dependence Non-Uniformity Normalized (DNNUN) quantifies the heterogeneity of gray-level dependencies within the ROI31. In our study, higher DNNUN values were associated with an increased risk of RP, potentially reflecting greater structural inhomogeneity or underlying tissue vulnerability. These findings suggest that lung regions exhibiting uneven textural dependency might be more susceptible to radiation-induced damage, highlighting the potential of DNNUN as a predictive biomarker for RP. Conversely, Short Run Low Gray Level Emphasis (SRLGLE), derived from the Gray Level Run Length Matrix (GLRLM), showed a negative correlation with RP risk. Higher SRLGLE values, indicative of more uniform and aerated lung tissue, were linked to a lower likelihood of RP, whereas lower SRLGLE values—characteristic of denser and more heterogeneous textures—were more commonly observed in RP-prone regions. These results underscore the potential protective role of preserved lung structure against radiation injury.

The DL algorithm, which takes the raw images as input and calculate the output signal using multilayer transformations, can be a powerful and promising tool for studying complex patterns in the field of radiation therapy32. Studies of Huang, Kong, and Zhang et al.18,33,34 showed that DL or DL-radiomics/dosiomics could improve the predictive performance of RP occurrence or risk stratification. Although radiomics and DL features represent different modalities for image analysis, they are not redundant but complementary. All of them can reflect the spatial heterogeneity and microenvironment within the tumor21. However, most of their studies were based on single-omics deep learning method and were mostly single-center studies. Our study is multicentric, and based on multi-omics (radiomics and dosiomics) deep learning to predict RP2. Whether in the training, internal validation, or external validation cohort, the combined model (DLRDN) all showed good predictive performance, with AUCs above 0.8, which were higher than other predictive models. Our results also showed that DL and multi-omics features complement each other, and their integration could provide more useful information. In addition, the cross-validation results(RD: mean AUC = 0.775 ± 0.077; DL: mean AUC = 0.888 ± 0.053) demonstrated consistent performance across folds, suggesting that the model’s predictive capacity was not driven by overfitting to a specific subset of the data.

However, there are still limitations in this study. First, although this study was a multicenter study, the sample size was still small. DL models with complex neural networks require large data sets to avoid overfitting. Only 245 patients were included in this study, although the 10-fold cross-validation results indicated stable performance of the RD and DL model, further expansion of the sample size is essential to enhance the model’s generalizability and ensure more reliable predictions in broader clinical applications. Second, although multicenter data were included, all participating centers were located within the same geographic region, which may limit the generalizability of our findings to broader populations. Additionally, the external validation cohort consisted of only 83 patients from two independent centers, which, while providing a preliminary assessment of the model’s generalizability, remains relatively small. To further validate the robustness and applicability of our model, larger-scale external validation in geographically diverse cohorts is warranted. Ongoing collaborations with institutions in other regions are underway to facilitate prospective validation studies. Third, although all PTV delineations in this study followed standardized guidelines and were reviewed by experienced radiation oncologists, inter-observer and inter-institutional variability remains a challenge. Future studies should explore automated or consensus-based contouring methods to improve the reproducibility and generalizability of ROI-based modeling. Finally, this retrospective study lacked prospective data, which might introduce bias. Some patients were lost to follow-up, potentially affecting the results. Immunotherapy and chemotherapy, which had been shown to impact RP15,35, were not included due to treatment heterogeneity and inconsistent recording across centers. Future research will involve prospective studies with standardized protocols and a broader range of clinical variables to enhance model accuracy.

Conclusion

In summary, our study found that radiomics, dosiomics, and 3D DL-derived features based on simulated location CT and dosimetry images were effective in predicting risk stratification of RP. The combined model (DLRDN) established in our study also achieved good results in the external validation cohort. In future clinical practice, this model could serve as a decision-support tool to stratify patients based on their predicted risk of RP. For those identified as high-risk, clinicians may consider individualized radiotherapy dose adjustments, more frequent imaging surveillance, early preventive interventions (such as the administration of glucocorticoids), or even alternative treatment strategies to mitigate toxicity and adverse effects.