Introduction

Lung cancer is the leading cause of cancer-related deaths worldwide, with an estimated 2.21 million new cases and 1.8 million deaths occurring annually1. Non-small cell lung cancer (NSCLC) is the most common type, accounting for approximately 85% of all lung cancer cases2. The standard treatment for locally advanced lung cancer (unresectable stage III) involves a combination of chemotherapy and radiation therapy, administered either concurrently or sequentially3. Platinum-based doublet chemotherapy is the first-line treatment for advanced/metastatic NSCLC patients without specific gene mutations such as epidermal growth factor receptor (EGFR) and anaplastic lymphoma kinase (ALK) mutations4. However, the 5-year survival rate for metastatic disease remains low with traditional chemotherapy regimens5.

Immune checkpoint inhibitors (ICIs) have revolutionized the treatment of advanced-stage lung cancer, representing a significant breakthrough. These medications specifically target immune checkpoint proteins and have demonstrated enhanced survival outcomes in patients with NSCLC6. For example, patients with high programmed death-ligand 1 (PD-L1) expression (≥ 50%) who receive pembrolizumab as first-line therapy have demonstrated a 5-year overall survival rate of 31.9%, compared to a rate of 16.3% for those receiving chemotherapy alone7. As a result, ICI monotherapy or combination therapy is now recommended as the first-line treatment for locally advanced or metastatic NSCLC patients with positive PD-L1 expression8,9,10.

PD-L1 expression has been extensively studied as a biomarker to predict the effectiveness of ICIs and is currently the only validated marker for selecting appropriate immunotherapy treatment11,12. The association between immunotherapy response and PD-L1 expression status has been demonstrated in various clinical trials, including KEYNOTE-024, KEYNOTE-042 and IMpower 1109,13,14. In a retrospective study by Aguilar et al.15, NSCLC patients with very high PD-L1 expression (≥ 90%) who received first-line pembrolizumab achieved a better overall response rate (ORR) (60.0% vs. 32.7%), median progression-free survival (mPFS) (14.5 vs. 4.1 months), and median overall survival (mOS) (not reached vs. 15.9 months) compared to those with PD-L1 expression levels of 50–89%. Similarly, Carbognin et al. reported a higher ORR of 34.1% in PD-L1-positive patients compared to those with PD-L1-negative tumors (19.9%)16. Hence, accurately identifying the PD-L1 status of NSCLC patients is of paramount importance in selecting the most suitable treatment strategy. This crucial information helps identify individuals who are more likely to benefit from immunotherapy, guiding treatment decisions and ultimately improving clinical outcomes.

Currently, the assessment of PD-L1 expression through immunohistochemistry (IHC) analysis of tumor biopsies is the only clinically approved molecular biomarker for the administration of ICIs17. However, there are certain drawbacks associated with tissue sampling. Firstly, obtaining a tissue sample can pose challenges in certain patients. Factors such as tumor location or patient comorbidities can make the biopsy procedure difficult or risky. Secondly, biopsy tissue may not fully capture the heterogeneity of tumors due to inherent genetic variations within the tumor itself18. Thirdly, although the probability is exceedingly low, there is a risk of cancer metastasis associated with biopsy testing19. Additionally, the long turnaround time and relatively high costs may restrict the applicability of this testing. Thus, a non-invasive and easy-to-use method for predicting the status of PD-L1 expression is needed.

Deep learning radiomics (DLR) refers to the application of deep learning (DL) techniques in radiomics, a field of medical imaging that focuses on extracting quantitative features from medical images to aid in diagnosis, prognosis, and treatment planning20,21. Convolutional Neural Networks (CNN) are a powerful class of DL models that have the capability to learn and extract features directly from raw image data, enabling them to achieve human-like performance in certain tasks22. Residual Network (ResNet) 50 is a deep CNN architecture that consists of 50 layers, including residual blocks that help address the vanishing gradient problem and enable training deeper networks23. By removing the last fully connected layer and using the output from the last convolutional layer, ResNet-50 can be utilized to extract high-level features from images. While ResNet has been utilized for prediction of PD-L1 expression and survival in NSCLC24, there are relatively few studies that have specifically employed ResNet-50 for feature extraction from CT images to predict PD-L1 expression in NSCLC. Therefore, the objective of this study is to develop a DLR-based model that utilizes ResNet-50 for extracting features from the data, with the purpose of predicting PD-L1 expression in patients with NSCLC.

Materials and methods

Patients

This retrospective analysis collected patient data from the First Affiliated Hospital of Shandong First Medical University (Jinan, China), during the period from June 2019 to December 2022. This study was approved by the institutional review board of the First Affiliated Hospital of Shandong First Medical University with a waiver for the informed consent requirement. We confirms that all method was perform according the guidelines .A total of 631 patients with NSCLC were initially selected. The inclusion criteria were as follows: (1) histological confirmation of unresectable stage III or stage IV NSCLC, (2) treatment-naïve patients, (3) PD-L1 expression level detected through IHC, and (4) undergoing a computerized tomography (CT) scan before biopsy. Patients with poor quality CT images, difficulties in drawing regions of interest (ROIs), insufficient demographic and clinical data, and difficulties in determining the molecular testing results were excluded. Ultimately, a total of 352 patients were included and randomly divided into a training cohort (n = 247) and a validation cohort (n = 105) using a 7:3 ratio. The detailed process of screening and grouping of NSCLC cases was illustrated in Fig. 1.

Fig. 1
figure 1

Flowchart of NSCLC patient selection.

Analysis of PD-L1 expression

PD-L1 IHC was performed using the SP142 primary antibody on the Ventana Benchmark platform. Two pathologists independently scrutinized and assessed the slides to ascertain the expression of PD-L1 protein in both tumor cells and tumor-infiltrating immune cells. In cases where there was a variance in their evaluations, the pathologists collaborated to review and deliberate on the findings until reaching a consensus. The level of PD-L1 expression was quantified by evaluating the proportion of cells exhibiting PD-L1 protein staining25. Based on the level of PD-L1 expression, patients were divided into two groups: a negative group (PD-L1 < 1%, n = 182) and a positive group (PD-L1 ≥ 1%, n = 170).

CT data acquisition

Chest CT scans were performed using two different CT scanners: GE Healthcare (Milwaukee, WI, USA) and United Imaging (Shanghai, China). In order to maintain consistency, standardized acquisition parameters were employed, which included a tube voltage of 120 kVp, tube current ranging from 160 to 300 mA, detector collimation of either 64 or 128 × 0.625 mm, a field of view measuring 350 × 350 mm, a pitch ratio of 0.992:1, and a matrix size of 512 × 512. All reconstructed images with a section thickness of 1.25 mm were saved in the Digital Imaging and Communications in Medicine (DICOM) format. These images were then stored within the hospital’s Picture Archiving and Communication Systems (PACS) using mediastinal (width, 360 HU; level, 50 HU) and lung (width, 1500 HU; level, -650 HU) window settings.

Image preprocessing

In light of the utilization of diverse CT scans in this study, image preprocessing was employed prior to segmentation and feature extraction, with the aim of improving the robustness and applicability of the radiomic features for subsequent analysis26. A resampling method was implemented, following a modified protocol as previously reported27. Briefly, the CT image pixel values were initially transformed from radiodensity to Hounsfield Units (HU) using the metadata attributes of the scans. Subsequently, the entire dataset, encompassing tumor masks, underwent resampling to standardize image representations. The spacing between slices and pixel spacing were adjusted to 1 mm and [1.0, 1.0] mm, respectively. Each slice dimension was accordingly modified to align with the new spacing, and the resampled image was generated through interpolation28. Finally, all CT images were standardized within the range of -1000 to 400 HU through the application of min-max normalization. A linear transformation was subsequently applied to proportionally map all intermediate HU values into the [0,1] interval29.

Tumor segmentation

The ROIs were manually delineated using ITK-SNAP (version 3.6, http://www.itksnap.org) by a radiologist, and then verified by a chief radiologist30. In cases where patients had multiple lesions, the radiologist marked the tumor’s location where the biopsy was performed. To minimize bias, the PD-L1 status was concealed from all radiologists involved in the study. To ensure consistency in the manual segmentation process, the intra-group correlation coefficient (ICC) was calculated for each feature by comparing the results of two independent radiologists31. Features with an ICC greater than 0.80 were considered highly stable and were selected for subsequent analysis.

Features extraction

CNNs are a class of DL models widely used in image analysis and recognition tasks32. They are specifically designed to automatically learn and extract meaningful features from input images. As a specific architecture within the family of CNNs, ResNet50 is a variant of the ResNet architecture that has 50 layers and is known for its ability to extract high-dimensional features from input data, particularly in image analysis tasks33. In this study, each image was resized to a dimension of 224 × 224 pixels and then connected to the ResNet50 network for features extraction in both the training and validation cohort.

Selection and dimension reduction of DLR features

Feature selection was performed using a rigorous selection process to identify the most relevant features from an initial pool of features. To effectively reduce the dimensionality of these features, we employed the least absolute shrinkage and selection operator (LASSO) method, which is widely recognized for its exceptional precision, stability, and reproducibility in feature selection tasks34. The LASSO method applies a penalty to the coefficients during training, encouraging sparsity in the feature space and selecting the most informative features35. As the tuning parameter (λ) increases, progressively more coefficients were set to zero, resulting in the selection of fewer variables. Within the remaining non-zero coefficients, a greater degree of shrinkage is applied. Five-fold cross-validation was conducted to determine the optimal value of log(λ), and the area under the curve (AUC) of the receiver operating characteristic (ROC) curve was plotted against log(λ). The optimal log(λ) was identified based on the minimum criterion and the value within one standard error of the minimum criterion36.

Establishment and evaluation of DLR model

After the process of feature selection, 7 ML algorithms were imported from the scikit-learn library in Python software to build models37. These algorithms included decision tree (DT), k nearest neighbors (KNN), logistic regression (LR), naive Bayes (NB), random forest (RF), support vector machines (SVM), and extreme gradient boosting (XGBoost or XGB). The performance of the models was evaluated in the validation cohort using metrics such as the AUC of the ROC curve, sensitivity, and specificity. To ensure robustness and reliability, a fivefold cross-validation technique was applied to evaluate all the results38. By comparing the cross-validation accuracies of all the models, the one that achieved the highest accuracy was chosen as the final model for further analysis.

Development and validation of the combined model

Several clinical features are related to PD-L1 status and long-term response of ICIs in NSCLC patients39,40,41. To enhance the predictive power of identifying PD-L1 status, additional clinical characteristics were incorporated into the existing model. These clinical factors comprised age, gender, smoking status, stage of disease, serum levels of specific tumor markers, and the advanced lung cancer inflammation index (ALI). The tumor markers included in this study were carcinoembryonic antigen (CEA), neuron-specific enolase (NSE), fragment of cytokeratin subunit 19 (CYFRA 21 − 1), squamous cell carcinoma antigen (SCC), and pro-gastrin-releasing peptide (Pro-GRP). The ALI is defined as body mass index (BMI) * albumin (Alb, in g/L)/neutrophil-to-lymphocyte ratio (NLR). The NLR is defined as the ratio of the absolute neutrophil count to the absolute lymphocyte count in the blood (both in 109/L)42,43.

The Chi-square and Student’s t-tests were initially employed to examine the clinical factors associated with the PD-L1 status. Only those factors with a P-value below 0.05 were included for subsequent analysis. Then, the optimal classifier and significant clinical features associated with the PD-L1 expression were integrated to build combined models. The predictive performances of these models were also evaluated based on the AUC of ROC curve analysis. The workflow of the DLR analysis is shown in Fig. 2.

Fig. 2
figure 2

Workflow of the DLR analysis.

Statistical analysis

Statistical analysis was conducted using PRISM version 6 software from GraphPad (La Jolla, CA, USA). For quantitative data, Student’s t-test was utilized to compare group means, while the χ2 test was employed to compare categorical variables and identify any baseline differences between groups. To evaluate the discrimination performance of the models, the ROC curve analysis was conducted and the AUC, sensitivity, and specificity values were calculated. All statistical tests were two-tailed, and a pre-determined significance level of 0.05 was set to determine statistical significance.

Results

Clinical characteristics

Table 1 outlines the baseline clinical characteristics of the patients involved in the study. Out of the 352 patients, 170 (48.29%) were tested positive for PD-L1 expression. No significant differences were observed in age, sex, smoking history, clinical stage, pathological type, serum level of CEA, NSE, CYFRA21-1, SCC, Pro-GRP, NLR, and BMI between the PD-L1 positive and negative groups (P > 0.05). However, the levels of serum albumin and ALI were significantly higher in the PD-L1 positive group compared to the PD-L1 negative group in both the training and validation cohorts (P < 0.05). These findings suggest that serum albumin and ALI may be associated with PD-L1 expression in this specific patient population. Therefore, these variables will be retained for further analysis.

Table 1 Clinical characteristics of NSCLC patients in the training and validation cohorts.

Feature extraction and selection

Using the pooling technique, a total of 2048 features were extracted from the last convolutional layer of ResNet50. The dimensionality reduction was then performed, and Fig. 3 illustrates the coefficients assigned to each selected feature. As shown in Fig. 3A and B, it is evident that a lower error classification value is attained when the variable is set to 12. Consequently, these 12 features were deemed as the key predictors and selected to construct the LASSO logistic regression model. The features extracted using ResNet-50 are generally high-dimensional and represent abstract feature vectors38. As a result, we have labeled these features as “DLx,” and the distinctions between the PD-L1 positive and PD-L1 negative group were visually depicted in Supplementary Fig. 1.

Performance assessment of DLR and combined models

The predictive performance of the 7 models based on radiomics is depicted in Fig. 4. The AUC values obtained from the models of SVM, RF, LR, DT, XGB, KNN, and NB were 0.85 (95% confidence interval (CI): 0.82–0.88), 0.76 (0.73–0.78), 0.71 (0.69–0.74), 0.59 (0.57–0.60), 0.71 (0.68–0.73), 0.72 (0.70–0.74), and 0.70 (0.67–0.71) respectively. Among these models, the SVM model outperformed the others in the validation cohorts, signifying its superior performance (Fig. 4A).

The combined model, integrating DLR signatures derived from the SVM model and clinical factors such as serum albumin and ALI, exhibited an AUC of 0.91 (95% CI, 0.87–0.95), as depicted in Fig. 4B. The higher value of AUC indicated increased predictive accuracy, suggesting that the integration of radiomics and clinical factors enhances the overall performance of the model. The fivefold cross-validated ROC curves for model SVM and the combined model are depicted in Supplementary Fig. 2.

Fig. 3
figure 3

Selection of radiomics features. The LASSO was used to filter the most relevant features. (A) Selection of tuning parameter (λ). (B) Coefficient of each selected feature. LASSO: least absolute shrinkage and selection operator, MSE: mean square error.

Fig. 4
figure 4

ROC curves for the DLR models and the combined model in predicting the expression statuses of PD-L1. (A) ROC curves for the DLR models. (B) ROC curve for the combined model.

Interpretability of DLR model

To investigate the interpretability of the DLR, we employed the Gradient-weighted Class Activation Mapping (Grad-CAM) technique to visualize the network. Grad-CAM generates visualization maps that highlight the pivotal regions of an input image, which contribute to the model’s predictions44. The final convolutional layer of the last res-block was configured to illustrate the prediction of PD-L1 status, as depicted in Fig. 5.

Fig. 5
figure 5

Visualization of 2 patient examples. Each example displays a gray-scale CT image alongside its corresponding heat map. Within these heat maps, the red region signifies a higher weight, which can be interpreted using the color bar located on the right-hand side.

Discussion

In this study, we developed a DLR model that utilized CT images to predict PD-L1 status in patients with NSCLC. The model was trained on a cohort of 247 patients and validated using an independent set of 105 patients. The results demonstrated that the DLR model achieved an AUC of 0.85 (0.83–0.87), with a sensitivity of 0.80 (0.74–0.85) and a specificity of 0.73 (0.70–0.77). Moreover, we proposed a comprehensive prediction model that integrated both DLR and clinical characteristics. The combined model exhibited highly promising results, with a substantial improvement in the AUC of 0.91 (0.88–0.95), sensitivity of 0.85 (0.82–0.89), and specificity of 0.75 (0.72–0.80). This study demonstrated a clear association between DLR features and the PD-L1 status, highlighting the potential of combining DLR models with clinical variables for accurately identifying PD-L1 expression. To the best of our knowledge, this is the only study, which utilized the ResNet50 model to extract features from the CT images for predicting PD-L1 expression in patients with NSCLC. The findings presented in this study offered valuable insights and novel contributions to the existing research in this field.

Radiomics is a rapidly developing field that aims to extract quantitative features from medical images, such as CT scans and MRI scans, in order to improve diagnosis, treatment planning, and patient prognosis45,46. DL is a subfield of artificial intelligence that specifically focuses on training artificial neural networks to learn and make predictions from large amounts of data47. Radiomics and DL can complement each other, as radiomics can benefit from the advanced feature extraction capabilities of DL, while DL relies on radiomics for the necessary metrics and validation. This collaboration between the two fields has the potential to drive significant progress in medical imaging analysis, leading to more precise and personalized healthcare48. Wang et al. conducted a study to explore the application of DLR analysis in predicting the expression of PD-L1 in NSCLC using CT images. They reported AUC values of 0.950, 0.934, and 0.946 for predicting PD-L1 levels of < 1%, 1–49%, and ≥ 50% respectively in the validation cohort24. Other studies have also suggested the use of DL models based on radiomic features from 18 F-FDG PET/CT scans to predict PD-L1 expression in NSCLC, indicating their potential as substitutes for PD-L1 assessment49,50,51. Our study’s findings, which show that the DL model based on CT images can achieve an AUC of 0.85 for predicting PD-L1 status, are consistent with these previous researches.

CNNs have emerged as a prominent method in computer vision tasks and image analysis, aiming to mimic the visual processing mechanism of the human brain by utilizing multiple layers of neuron-like computational connections52. By utilizing a series of convolutional and pooling layers, CNNs can extract increasingly complex and abstract features from images. This hierarchical feature extraction empowers CNNs to capture and recognize patterns, shapes, and structures in images, rendering them well-suited for tasks such as image classification, object detection, and segmentation53,54,55. Studies have demonstrated the effectiveness and applicability of CNNs for the detection of lung nodules and the diagnosis of COVID-1956,57, showcasing the potential of CNN technology in aiding the early detection of lung abnormalities. In a study by Jahagirdar V et al., machine learning algorithms based on CNNs exhibited remarkable diagnostic accuracy parameters when assessing the severity of ulcerative colitis during endoscopic examination58. ResNet50, a specific CNN architecture, utilizes residual blocks to address the challenges of training very deep networks23. Utilizing a 3D-ResNet50 CNN model, He and colleagues extracted deep features from nephrographic phase CT images, and their study findings indicated that a fusion feature-based machine learning algorithm demonstrated high accuracy in distinguishing between malignant renal neoplasms and cystic renal lesions59. In the present study, ResNet50 was employed to extract relevant features from CT images. In line with our findings, Li et al. utilized ResNet50 to extract CT image features and developed machine learning models that demonstrated exceptional performance in identifying benign and malignant small pulmonary nodules (< 20 mm) across various sites60. Wang et al. extracted DL features with a 3D ResNet and subsequently constructed a specialized classifier designed to predict the PD-L1 status in NSCLC. The model demonstrated robust high-performance metrics, achieving AUC values of 0.950 (0.938–0.960), 0.934 (0.906–0.964), and 0.946 (0.933–0.958) for predicting PD-L1 expression levels of < 1%, 1–49%, and ≥ 50% in the validation cohort, respectively24. Therefore, ResNet50 has shown exceptional capability in feature extraction and classification tasks within the domain of medical imaging61.

As a supervised machine learning algorithm, SVM is commonly used for classification and regression tasks62. By learning from labeled examples in the training set, SVM identifies an optimal hyperplane that maximizes the margin between different classes63. In a study by Uddin S et al., various supervised ML algorithms were compared for predicting disease risk, revealing SVM as the most commonly utilized method62. Yang and colleagues developed DL models for the pathological diagnosis of cervical lymph nodes using PET/CT images64. By combining DL-based feature extractors with a SVM classifier, their DL-SVM model demonstrated impressive performance, with an AUC of 0.901, accuracy of 86.96%, sensitivity of 76.09%, and specificity of 94.20%. Similarly, in the present study, the SVM model demonstrated a remarkable performance with AUC of 0.85, highlighting its robust capability to accurately predict PD-L1 expression levels.

In this study, we also developed and assessed a combined model that integrated clinical characteristics with DLR. The results indicated that the combined model achieved significant improvements in predictive performance compared to using DLR alone (Fig. 4). In line with our results, Hashimoto et al. developed a combined model incorporating Random Forest and clinical variables. Their research revealed an AUC of 0.83 for PD-L1 ≥ 1% in the radiomics validation set among NSCLC patients65. Consequently, PD-L1 expression can be effectively predicted through machine learning utilizing clinical and imaging features. Our findings suggest that age, sex, smoking status, histological type, pathological stage, or tumor markers may not be associated with PD-L1 status in NSCLC. These findings are consistent with prior studies that have reported a lack of significant correlation between changes in PD-L1 expression in lung tumor tissue and clinical variables in both NSCLC and small cell lung cancer (SCLC)66,67. However, Zhu et al. identified a positive association between the expression of PD-L1 and certain clinicopathologic features such as gender, smoking history, histological types, and TNM stage39, which contradicted previous studies, including our own. These discrepancies may stem from variations in sample sizes, patient populations, study methodologies, and the specific assays used to evaluate PD-L1 expression.

Inflammation is widely recognized as a hallmark of cancer and plays a significant role in the development and progression of malignancies68. Both local immune response and systemic inflammation have been identified as factors that contribute to tumor progression and influence the survival of patients with cancer69. The ALI is a tool specifically developed to assess the level of systemic inflammation in patients diagnosed with metastatic NSCLC70. Several studies have suggested that the ALI serves as a valuable prognostic and predictive biomarker in advanced NSCLC patients treated solely with PD-L1 inhibitors71,72,73. Furthermore, the ALI has also been investigated as a predictor of disease control in melanoma74. Our findings of a correlation between PD-L1 expression and the ALI align with a study by Wang et al., which observed a higher abundance of PD-L1-positive tumor cells in a tumor microenvironment with high inflammation75. In advanced cancer patients, inflammation and malnutrition often occur due to the energy expenditure caused by the tumor76. Tumor-associated systemic inflammation is characterized by elevated levels of tumor necrosis factor (TNF) and interleukin-6 (IL-6), which are released as part of the inflammatory response occurring in the presence of tumors77. TNF-α can increase capillary permeability, leading to fluid and protein leakage, including albumin78. Serum albumin levels can be used as an indicator of nutritional status and inflammation in these patients. In our study, we observed a correlation between PD-L1 expression and serum ALB levels. This finding is consistent with a previous study, which demonstrated a negative correlation between PD-L1 expression and peripheral blood ALB levels39.

Our study has several limitations that should be acknowledged. Firstly, the retrospective design of the analysis may introduce potential bias in patient selection. Secondly, the relatively small patient sample size may potentially impact the statistical power of the conducted analyses. Nevertheless, it is important to note that radiomics analyses have shown success even with as few as 100 patients79. Thirdly, since all participants in this study were Chinese, caution should be taken when attempting to generalize these findings to other populations. It is necessary to conduct additional studies to validate these results within different racial and ethnic groups. Lastly, it should be noted that manual segmentation of the ROI can be laborious and time-consuming, and may potentially introduce variability among different observers. However, despite these drawbacks, manual segmentation remains a straightforward and reliable method, and its reproducibility can be assessed through interobserver reproducibility analysis80.

Conclusion

In conclusion, our study has shown that the DLR signatures extracted by ResNet50, along with clinical characteristics, can effectively identify the PD-L1 expression status in NSCLC. Although these findings require validation with a larger sample size, the use of DLR provides a noninvasive and cost-effective approach to predict PD-L1 expression. This method has the potential to assist in pre-screening patients before invasive sampling and could contribute to the development of personalized treatment strategies aimed at optimizing outcomes for NSCLC patients.