Sub-regional radiomics combining multichannel 2-dimensional or 3-dimensional deep learning for predicting neoadjuvant chemo-immunotherapy response in esophageal squamous cell carcinoma: a multicenter study

Zhu, Jiahao; Xu, Benjie; Fan, Tiantian; Ji, Shengjun; Gu, Ke; Ding, Jiaxuan; Lu, Haibo; Ma, Jianqun; Zhou, Yang

doi:10.1038/s41698-025-01047-9

Download PDF

Article
Open access
Published: 21 July 2025

Sub-regional radiomics combining multichannel 2-dimensional or 3-dimensional deep learning for predicting neoadjuvant chemo-immunotherapy response in esophageal squamous cell carcinoma: a multicenter study

Jiahao Zhu¹^na1,
Benjie Xu¹^na1,
Tiantian Fan²^na1,
Shengjun Ji³,
Ke Gu⁴,
Jiaxuan Ding²,
Haibo Lu¹,
Jianqun Ma⁵ &
…
Yang Zhou²

npj Precision Oncology volume 9, Article number: 248 (2025) Cite this article

4670 Accesses
2 Citations
Metrics details

Subjects

Abstract

This study aimed to develop and compare fusion models combining sub-regional radiomics with multichannel 2D and 3D DL to predict pCR in patients with LA-ESCC undergoing NACI. A total of 271 patients from three hospitals were divided into training, internal validation, and external validation cohorts. Tumor sub-regions were identified using K-means clustering based on radiomic features, and predictive features were extracted using PyRadiomics. Among all models, the DLRad1 model (radiomics + 2D DL) demonstrated the highest performance, with an AUC ranging from 0.793 to 0.910 across cohorts. Sub-region 1 features alone achieved an AUC of 0.823, while DLRad2 (radiomics + 3D DL) and other single-modality models showed lower AUCs (0.701–0.906). Spearman correlation analysis confirmed low redundancy among selected features. These findings support DLRad1 as a promising non-invasive tool to identify LA-ESCC patients most likely to benefit from NACI, potentially aiding personalized treatment decisions.

Multiparameter MRI-based model integrating radiomics and deep learning for preoperative staging of laryngeal squamous cell carcinoma

Article Open access 09 May 2025

Prediction of cervical cancer lymph node metastasis based on multisequence magnetic resonance imaging radiomics and deep learning features: a dual-center study

Article Open access 10 August 2025

An artificial neural network-based radiomics model for predicting the radiotherapy response of advanced esophageal squamous cell carcinoma patients: a multicenter study

Article Open access 29 May 2023

Introduction

Esophageal cancer ranks as the seventh most prevalent malignancy and the sixth leading cause of cancer-related mortality worldwide, representing a major health challenge¹. Around half of the cases are found in China, predominantly as esophageal squamous cell carcinoma (ESCC)². Due to its aggressive nature and lack of early symptoms, ESCC is often diagnosed at a locally advanced stage.

Recent clinical trials indicated that combining neoadjuvant immunotherapy with chemotherapy is effective and safe for patients with locally advanced esophageal squamous cell carcinoma (LA-ESCC), showing promising outcomes^3,4,5,6. However, the benefits of the neoadjuvant chemo-immunotherapy (NACI) regimen are limited to a subset of patients^7,8. Various biomarkers, such as tumor mutation burden (TMB), programmed death receptor-1 (PD-1), programmed death ligand-1 (PD-L1) expression, circulating tumor DNA, microsatellite instability (MSI), and gut microbiome composition have been investigated to predict responses to NACI⁹. Unfortunately, these biomarkers have shown inconsistent reliability for ESCC. Consequently, there is a critical demand for reliable, non-invasive biomarkers to predict treatment outcomes.

Radiomics is an emerging non-invasive technique that extracts quantitative features from medical images, offering valuable insights into tumor biology¹⁰. Often termed a digital biopsy, radiomics assesses the tumor microenvironment by providing data on tumor heterogeneity and molecular and genetic variations^11,12. Previous studies have validated the predictive value of imaging biomarkers across various cancers, particularly for chemo-immunotherapy responses^13,14,15. Radiomics has the potential to surpass traditional markers such as TMB, MSI, and PD-1/PD-L1 expression due to its ability to characterize the tumor microenvironment non-invasively. Recent research indicates that tumors comprise diverse sub-regions or habitats, each with distinct structural, metabolic, and functional properties^16,17. Advanced computed tomography (CT) technology allows for the visualization and quantification of these sub-regions, enhancing tumor characterization¹⁸. The robustness, convenience, and reliability of radiomics make it a promising tool for predicting NACI outcomes in LA-ESCC.

CT imaging is inherently three-dimensional (3D), and analyzing these images using two-dimensional (2D) methods can lead to information loss.Although 3D deep convolutional neural networks (DCNNs) are more appropriate for this type of data, they demand larger datasets for training compared to 2D networks, posing a challenge in medical research due to the limited available data¹⁹. Moreover, deep learning (DL) features can be sensitive to transformations like translation, rotation, and scaling, whereas radiomics features exhibit greater robustness against such variations²⁰. Studies have shown that multi-domain models integrating radiomics with DL outperform single-method approaches ^21,22,23.

This study aims to construct and validate a non-invasive imaging biomarker for predicting NACI response in LA-ESCC using fusion models combining sub-regional radiomics with multichannel 2D or 3D DL. Additionally, the performance of the two fusion models was comprehensively compared to identify the most effective approach.

Results

Patient characteristics

The study included 271 patients with LA-ESCC, divided into a training cohort of 161 patients, an internal test cohort of 69 patients, and an external test cohort of 41 patients. Table 1 provides a summary of the baseline clinical characteristics for these cohorts. There were no significant differences among the cohorts in key variables, including age, gender, body mass index, smoking and drinking habits, clinical tumor, lymph node and metastasis stages, tumor location, tumor length, cycles of chemo-immunotherapy, immune checkpoint inhibitors, and squamous cell carcinoma antigen levels (all P > 0.05). The pathological complete response (pCR) rates were 28.6% in the training set, 28.9% in the internal validation set, and 29.3% in the external validation set, with no significant differences (P > 0.05).

Table 1 Baseline clinical characteristics of the training cohort and the validation cohorts

Full size table

Diagnostic performance of different tumor regions in CT

The process of selecting the optimal number of clusters is detailed in Table S1. In the training cohort, the Calinski-Harabasz index was highest when four clusters were used, dividing the tumor region into four distinct sub-regions. Sub-region 1 exhibited the highest diagnostic performance in the ROC curve analysis, achieving an AUC of 0.823 (95% CI: 0.753–0.881), with sensitivity at 78.2%, specificity at 89.1%, positive predictive value (PPV) at 66.9%, negative predictive value (NPV) at 91.5%, and an overall accuracy of 82.7%. Sub-region 2 had a lower AUC of 0.757, with sensitivity and specificity values of 73.8% and 69.1%, respectively. Sub-regions 3 and 4 were significantly less effective in distinguishing pCR from non-pCR than sub-regions 1. The combination of sub-region 1 and sub-region 2 did not improve diagnostic performance compared to sub-region 1 alone (P = 0.671). Therefore, sub-region 1 was selected for further analysis. Table 2 summarizes the predictive performances of radiomics features derived from the different sub-regions for pCR prediction in LA-ESCC. Key features selected from these sub-regions are detailed in Table S2.

Table 2 Predictive performances of radiomics features from different sub-regions for predicting pCR in LA-ESCC

Full size table

Comprehensive analysis of radiomics and DL features

A total of 1834 radiomics features were initially extracted from sub-region 1. After applying Spearman correlation coefficient matrix filtering to remove highly correlated features (|Spearman correlation coefficient | ≥0.9), 745 features remained. Subsequent intraclass correlation coefficients (ICC) analysis excluded features with poor reproducibility (ICC < 0.75), resulting in 312 robust features. From these, 12 key radiomics features were ultimately selected via least absolute shrinkage and selection operator (LASSO) regression (Fig. 1A, left panel). For deep learning–based features, 2048 multichannel 2D features and 2048 3D features were initially extracted. After Spearman correlation analysis, 1124 2D features and 1010 3D features were retained. Subsequent ICC filtering further reduced the feature set to 856 2D features and 693 3D features, respectively. Finally, LASSO regression identified 15 multichannel 2D features (Fig. 1A, middle panel) and 6 3D features (Fig. 1A, right panel) as the most informative predictors. Intraclass ICCs of selected features in three models are summarized in Table S3. Shapley additive explanations (SHAP) analysis provided quantitative insights into the importance of individual features. Among radiomics features, Rad_1_H1 was the most critical for pCR classification (Fig. 1B, C, left panel), while 2D_DL_1749 (Fig. 1B, C, middle panel) and 3D_DL_909 (Fig. 1B, C, right panel) were the most important features in their respective categories. Despite multichannel 2D DL features attaining the highest AUC of 0.906, their discriminative ability for predicting pCR was not significantly different from that of 3D DL and radiomics (Fig. 1D, Table 3). Spearman correlation analysis revealed minimal redundancy among the selected features, as illustrated in Fig. S1.

**Fig. 1: Features selection, SHAP analysis and models assessment.**

Table 3 Performances of the predictive models in the training and validation cohorts

Full size table

Performance analysis of radiomics, 2D/3D DL models, and fusion models

In the DLRad1 model, 14 key features were selected, comprising 5 radiomics features and 9 multichannel 2D DL features (Fig. S2A–C, left panels). The DLRad2 model incorporated 10 key features, including 5 radiomics features and 5 3D DL features (Fig. S2A–C, right panels). Diagnostic performance metrics of the predictive models across study sets are summarized in Table 3.

The DLRad1 model demonstrated the highest AUC (0.793–0.910) across all study sets. The AUCs for the DLRad2, multichannel 2D DL, 3D DL, and sub-region 1 radiomics models, compared to DLRad1, were 0.718–0.834 (P = 0.043–0.494), 0.769–0.906 (P = 0.810–0.894), 0.701–0.831 (P = 0.042–0.471), and 0.721–0.823 (P = 0.036–0.501), respectively. Calibration curves for the DLRad1 model exhibited strong consistency across probabilities (Fig. S3A–C), and decision curve analysis (DCA) indicated a higher net benefit for the DLRad1 model (Fig. S3D–F) in all study sets. The predictive models’ optimal cutoff values, determined by the maximum Youden index in the training set, were 0.411 for DLRad1, 0.431 for DLRad2, 0.401 for multichannel 2D DL, 0.409 for 3D DL, and 0.406 for radiomics. Based on these cutoffs, the DLRad1 model achieved the best sensitivity (80.0–83.3%) and specificity (79.3–81.6%) across test sets (Table 3). Multivariate logistic regression analyses confirmed that DLRad1 was an independent predictor of pCR in both training and validation sets (Table S4). SHAP waterfall diagrams further visualized the contributions of individual features in predicting pCR (Fig. 2A, B) and non-pCR (Fig. 2C, D).

**Fig. 2: CT images changes and SHAP waterfall plot for DLRad1 model in two patients.**

Clinical model development and models comparison

To investigate whether the integration of radiomics features enhances predictive performance beyond conventional clinical parameters, we first conducted univariate and multivariate logistic regression analyses to identify clinical factors independently associated with pCR (Table S4). In the training cohort, tumor length emerged as the only significant clinical predictor (odds ratio [OR] = 0.549, 95% confidence interval [CI]: 0.180–0.915, P = 0.045). Based on this, a Clinical model was constructed using tumor length as the sole input variable. We then developed a Combined model incorporating both the Clinical model and the DLRad1 signature to evaluate the added predictive value of radiomics features. The discriminative performance of the Clinical model, the DLRad1 model, and the Combined model was assessed across the training, internal validation, and external validation cohorts using AUCs, as well as Integrated Discrimination Improvement (IDI) and Net Reclassification Improvement (NRI) metrics (Table S5). In the training cohort, the Combined model achieved the highest AUC (0.919, 95% CI: 0.885–0.962), outperforming the Clinical model (AUC = 0.617, 95% CI: 0.536–0.698) and the DLRad1 model (AUC = 0.910, 95% CI: 0.867–0.954). Both IDI and NRI analyses demonstrated statistically significant improvements in discrimination and classification when adding DLRad1 to the Clinical model (IDI = 0.095 [0.038–0.151], P < 0.001; NRI = 0.317 [0.156–0.472], P < 0.001). Similar trends were observed in the internal validation (AUC = 0.841 for Combined vs. 0.669 for Clinical) and external validation cohorts (AUC = 0.773 for Combined vs. 0.561 for Clinical), with consistent IDI/NRI improvements.

Discussion

NACI is widely recognized as a safe and effective first-line treatment for LA-ESCC, significantly enhancing pCR rates. Despite its efficacy, no optimal method currently exists for predicting NACI response in LA-ESCC. In this study, we developed sub-region-based radiomics with multichannel 2D and 3D DL models using primary tumor CT images to predict pCR outcomes. First, habitat analysis was performed to extract and evaluate the predictive capabilities of imaging features across different primary tumor regions. Tumor regions were clustered into four distinct groups based on the Calinski-Harabasz Index, with sub-region 1 demonstrating the strongest predictive performance. Subsequently, two multidomain fusion models were constructed: DLRad1, integrating sub-region-based radiomics with multichannel 2D DL, and DLRad2, combining sub-region-based radiomics with 3D DL. The DLRad1 model exhibited superior performance, achieving an AUC value of 0.910 in the training set, 0.829 in the internal validation set, and 0.793 in the external validation set. Feature analysis revealed multichannel 2D DL features had greater discriminative ability than sub-region-based radiomics and 3D DL features. While partial correlations were observed among radiomics features from sub-region 1, multichannel 2D, and 3D DL, there was no significant redundancy among the selected features.

Accurately predicting NACI response prior to treatment is critical for devising personalized therapeutic strategies, minimizing unnecessary harm, and reducing side effects associated with preoperative treatments. Radiomics, owing to its high availability and non-invasive nature, is increasingly gaining attention. Several studies have investigated 2D radiomic models for predicting pCR in ESCC patients undergoing neoadjuvant immunotherapy. Wang et al. developed a traditional 2D model incorporating 10 radiomic features before treatment, achieving AUCs of 0.87 in the training cohort and 0.75 in the validation cohort²⁴. Shi et al. proposed a single-slice CT-based 2D model with LASSO feature selection combined with eXtreme Gradient Boosting (XGBoost), yielding AUCs of 0.89 and 0.80 in the training and validation cohorts, respectively²⁵. Zhu et al. constructed a similar model using ComBat harmonization, reporting AUCs of 0.818 and 0.787 in the respective datasets²⁶. These studies demonstrated favorable performance in identifying pCR to NACI in resectable ESCC patients. However, these studies were limited by small sample sizes or lacked external validation, which restricts the generalizability of their findings. A recent study by Zhang et al. highlighted the predictive value of post-treatment CT-based radiomic modeling in assessing pCR after neoadjuvant chemoradiotherapy in ESCC. The post-treatment model, built using XGBoost, demonstrated superior predictive performance compared to the pre-treatment model, with AUCs of 0.875 in the training set, 0.835 in the internal validation set, and 0.786 in the external validation set²⁷. This study underscores the predictive value of post-treatment imaging and the advantage of multimodal feature integration in enhancing individualized response assessment. Given this, two recent studies have demonstrated that incorporating both pre- and post-treatment imaging data into radiomics models could improve the prediction of pCR in patients with ESCC undergoing NACI. Zheng et al. developed a CT-based delta-radiomics model by quantifying changes in radiomic features before and after treatment, which outperformed both the pre-treatment model, achieving AUCs of 0.827 in the training set and 0.790 in the testing set²⁸. Ruan et al. proposed an ESCC preoperative imaging score by integrating radiological features from pre- and intra-treatment phases, achieving an impressive AUC of 0.918²⁹. These findings highlight the added value of dynamic radiomic changes in capturing therapy-induced tumor alterations and enhancing pCR prediction in ESCC.

Although previous studies have provided valuable insights, many rely on single-slice 2D radiomics features or incorporate post-treatment imaging data, limiting both the comprehensiveness and clinical utility of their models in the pre-treatment setting. Single-slice 2D radiomics overlooks the multidimensional nature of tumor morphology, while post-treatment-based models fail to support pre-therapeutic decision-making. In contrast, our study offers a distinct contribution by focusing exclusively on pretreatment CT images while still achieving robust predictive performance. We utilized multichannel 2D radiomics features to overcome the limitations of single-slice approaches, and, considering the heterogeneity of esophageal tumors and segmentation challenges, we divided the tumor into imaging-defined subregions. Radiomic features were extracted from each subregion and evaluated for their predictive power. The most informative features from the top-performing subregion were then fused with either multichannel 2D or 3D deep learning radiomics features to build two distinct fusion models. Among them, the DLRad1 model, which integrates subregional radiomics and 2D deep learning features, demonstrated strong predictive performance across training, internal validation, and external validation cohorts, comparable to or exceeding the performance of previously reported models including those based on post-treatment or delta-radiomics strategies. Importantly, our approach enriches the feature space and captures intratumoral heterogeneity without requiring additional imaging time points.

Recent studies have compared the performance of single-slice 2D and whole-volume 3D radiomic models in predicting pCR following neoadjuvant immunotherapy-based strategies in ESCC. Shi et al. found that a 2D model using XGBoost achieved superior predictive performance (AUC = 0.89 in the training and 0.80 in the test cohort) compared to 3D and clinical models²⁵. Similarly, Zhu et al. reported that a ComBat-harmonized 2D model outperformed its 3D counterparts, with AUCs of 0.818 and 0.787 in the training and validation cohorts, respectively, and this superiority of 2D over 3D models was statistically significant across all cohorts (P < 0.001)²⁶. These findings underscore the practical advantages of 2D radiomics, including reduced annotation workload, improved reproducibility, and lower sensitivity to inter-slice variability. While single-slice 2D radiomics is computationally efficient, it may overlook critical spatial information necessary for fully characterizing tumor heterogeneity. In contrast, 3D radiomics accounts for full volumetric data but often introduces greater noise and requires more computational resources. Multichannel 2D radiomics offers a balanced alternative, improving spatial characterization by analyzing multiple slices while maintaining computational efficiency. This method typically includes the largest cross-sectional slice and its adjacent multi-layer slices, effectively capturing key tumor characteristics such as size, shape, and intensity. By integrating information from multiple slices, multichannel 2D radiomics approximates overall 3D imaging features to some extent, offering a richer spatial context without the computational complexity of 3D analysis. In our study, the multichannel 2D radiomics model outperformed the 3D radiomics model, consistent with previous studies that also found 2D radiomic features to be more effective than 3D features in certain predictive tasks for ESCC^25,26. These results suggest that while 3D modeling offers theoretical comprehensiveness, 2D or multichannel 2D radiomics may capture essential tumor features more effectively in practice, especially when coupled with appropriate feature extraction and learning frameworks.

Habitat analysis, an advanced imaging strategy, enables the identification of spatially distinct tumor regions by capturing subtle differences in radiomic phenotypes, thereby illustrating the intratumoral heterogeneity that underlies treatment resistance and response variability³⁰. Gatenby et al. conceptualized tumors as a mosaic of evolutionary habitats rather than a single self-organizing entity, with each sub-region shaped by unique microenvironmental pressures and selective forces³¹. Building upon this theory, Xie et al. clustered esophageal tumors into four spatial subregions and demonstrated that sub-regional radiomics could effectively predict survival outcomes following chemoradiotherapy³². Similarly, a separate study confirmed that radiomics features exhibit significant heterogeneity across different tumor habitats, reinforcing the biological relevance of spatial dissection³³. Consistent with these prior observations, our subregional analysis revealed that features extracted from sub-region 1 yielded the highest predictive performance for identifying patients likely to achieve pCR to NACI. We further explored the biological interpretability of these computed CT-defined habitats. Despite the limited sample size, the distinct radiomic patterns observed across subregions were consistent with tumor heterogeneity. These findings suggest that CT-derived habitats may reflect underlying pathologic tissue compartments and could serve as noninvasive surrogates of tumor biology and aggressiveness. Our results underscore the potential of habitat-guided radiomics as a clinically valuable approach for stratifying patients based on their biological responsiveness to therapy.

This study proposed a non-invasive preoperative approach for predicting pCR in LA-ESCC patients undergoing NACI, thereby supporting informed treatment decisions and personalized therapeutic planning. In the external validation cohort, the DLRad1 model achieved a sensitivity of 83.3% and specificity of 79.3% at the optimal threshold, correctly identifying 10 of 12 pCR cases and 23 of 29 non-pCR cases. This corresponded to an under-treatment rate of 16.7% and an apparent overtreatment rate of 20.7%. While the under-treatment rate reflects a potential risk of withholding de-escalation in true pCR responders, the interpretation of “overtreatment” requires caution. In clinical practice, patients who do not achieve pCR may still experience meaningful therapeutic benefit from NACI, such as tumor downstaging or partial response, which are associated with improved R0 resection rates and long-term survival. Thus, failure to achieve pCR does not necessarily equate to overtreatment. As NACI continues to emerge as a promising standard of care for LA-ESCC, the proposed model is intended to complement, rather than replace, guideline-based recommendations by enhancing risk stratification and supporting individualized decision-making. Active surveillance has been proposed as a feasible strategy to improve survival in patients who achieve pCR following NACI. However, for those who are unresponsive to NACI or exhibit insufficient tumor regression, additional therapeutic interventions, such as radiotherapy, may be required to achieve adequate local control and optimize clinical outcomes^7,34. Given the high cost of NACI, radical chemoradiotherapy represents a viable alternative for these patients. This approach may help reduce financial burden and minimize treatment-related toxicities, particularly in resource-limited settings.

Nonetheless, this study has several limitations. First, it is a retrospective analysis with relatively small validation cohorts, which may affect the generalizability of the findings. Prospective, multicenter studies with larger and more diverse patient populations are warranted to confirm the results and further validate the proposed radiomics model. Second, most patients enrolled were male, potentially limiting the applicability of the model across genders. Third, the limited spatial resolution of CT imaging complicates the precise delineation of tumor boundaries from adjacent normal esophageal tissue during regions of interest (ROI) segmentation, which may introduce bias. Fourth, due to the retrospective and multicenter nature of this study, PD-L1 expression data were not consistently available and therefore could not be incorporated into the modeling process. Lastly, because of the limited follow-up duration in a significant portion of patients, survival outcomes such as disease-free survival or progression-free survival could not be reliably assessed.

Although the proposed DLRad1 model demonstrated robust performance across training, internal, and external validation cohorts, further refinement is needed to enable clinical translation. One promising direction is the application of delta radiomics, in which features extracted from multi-timepoint CT scans (pre-, mid-, and post-treatment) can dynamically reflect treatment-induced changes and improve predictive accuracy. Additionally, integrating radiomic features with clinical and hematological biomarkers, such as neutrophil-to-lymphocyte ratio, platelet-to-lymphocyte ratio, and lactate dehydrogenase, may provide complementary biological information and enhance model robustness. Future research may also explore diverse machine learning algorithms (LightGBM, support vector machines, ensemble models), as well as hybrid architectures combining handcrafted and deep learning features. Improving probability calibration and incorporating uncertainty estimation may further support clinical decision-making. To improve generalizability, large-scale, prospective multicenter studies are essential. The application of harmonization techniques (ComBat or domain adaptation) could reduce variability introduced by different imaging protocols. Furthermore, integrating radiomic features with tumor immune phenotypes or multi-omics data (genomic, transcriptomic) may enhance the biological interpretability and immunotherapy relevance of the model. Ultimately, the development of interpretable, clinically applicable decision support tools is critical for promoting individualized treatment planning and broader clinical adoption.

In conclusion, based on pretreatment CT images, the proposed multidomain fusion model, DLRad1, effectively predicts pCR in patients with LA-ESCC. While additional studies are needed to validate its clinical applicability, our findings provide a foundation for informed treatment decisions and the development of personalized therapeutic strategies for LA-ESCC.

Methods

Study participants

A retrospective analysis was conducted on 271 patients with LA-ESCC who received neoadjuvant chemotherapy and immune checkpoint inhibitor treatment across three hospitals in China (Fig. 3). Among them, 230 patients were consecutively treated at Harbin Medical University Cancer Hospital between December 1, 2019, and October 31, 2024. Patients were randomly divided into a training cohort (n = 161) and an internal validation cohort (n = 69) with a 7:3 ratio. An external validation cohort comprised 41 patients treated consecutively at the Affiliated Suzhou Hospital of Nanjing Medical University and the Affiliated Hospital of Jiangnan University between September 1, 2022, and August 31, 2024. Inclusion criteria: (1) pathohistologically confirmed ESCC treated with neoadjuvant chemotherapy and immune checkpoint inhibitors; (2) availability of contrast-enhanced CT images acquired within one week prior to the initiation of neoadjuvant treatment and before any antitumor therapy, representing pretreatment baseline imaging; (3) diagnosis of locally advanced ESCC (cT1b-2N + M0 or cT3-4a any N M0) according to the American Joint Committee on Cancer TNM Staging Manual (8th Edition); and (4) completion of radical surgery post-neoadjuvant treatment with comprehensive postoperative pathological data. The exclusion criteria: (1) incomplete clinical records or missing CT descriptor information; (2) CT images with significant artifacts; (3) any prior anticancer therapy administered before the baseline CT imaging; and (4) concurrent diagnosis of synchronous multiple primary ESCC or other primary malignant tumors. The overall workflow of the study is illustrated in Fig. 4.

**Fig. 3: Flowchart diagram shows the patient selection process from three medical centers.**

**Fig. 4: Workflow diagram for the construction of the predictive models.**

Ethics statement

The study was conducted in accordance with the Declaration of Helsinki. Ethical approval was obtained from the respective ethics committees of Harbin Medical University Cancer Hospital, the Affiliated Suzhou Hospital of Nanjing Medical University, and the Affiliated Hospital of Jiangnan University. The requirement for informed consent was waived by the institutional review boards due to the retrospective nature of the study and the use of de-identified data.

NACI regimen and clinicopathological data collection

All patients received 2–4 cycles of NACI, followed by radical esophagectomy with lymphadenectomy upon completion of the neoadjuvant therapy. Tumor response was assessed after the second cycle of treatment and again prior to surgery, according to the response evaluation criteria in solid tumors version 1.1. Further details on the treatment regimen can be found in a previous study³. Baseline clinical characteristics were documented before the initiation of treatment.

CT examination and image preprocessing

Preoperative CT scans were performed for all patients, with imaging parameters standardized to a window width of 300 and a window level of 40 to enhance visualization of esophageal tissues. Details of the CT scan parameters are summarized in Table S6. Images were resampled to a 1 × 1 × 1 mm voxel size using bi-cubic spline interpolation—two experienced radiologists independently and blindly delineated tumor regions using ITK-Snap software (version 3.6.0). ROIs were reassessed 2 months later to evaluate the consistency of annotations. ICCs were calculated to ensure reliability.

Sub-region clustering and radiomics feature extraction

This study excluded air regions within the ROI using a zero Hounsfield unit threshold. Local features for each voxel were calculated using a moving window of size 3 × 3 × 3, generating 19 feature vectors per voxel. The features were classified into five categories: first-order features, Gray Level Co-occurrence Matrix (GLCM), Gray Level Run Length Matrix (GLRLM), Gray Level Size Zone Matrix (GLSZM), and Neighborhood Gray Tone Difference Matrix (NGTDM) features (Fig. 5). The first-order features, including entropy, mean absolute deviation, and median, describe the overall intensity distribution. GLCM features, including difference average, difference entropy, difference variance, joint energy, joint entropy, inverse variance, sum entropy, Imc1, and Imc2, capture pixel pair spatial relationships to elucidate texture patterns. GLRLM features, including run entropy, run variance, and long run emphasis, focus on the distribution of run lengths of pixels with similar intensities. The GLSZM features, Size Zone Non-Uniformity Normalized and Small Area High Gray Level Emphasis, were utilized to characterize the size and distribution of homogeneous intensity zones. Lastly, NGTDM features, such as contrast and strength, were evaluated to assess local intensity variations. The K-means algorithm was utilized for sub-region clustering, exploring cluster numbers from 2 to 9 based on these features. The optimal number of clusters was determined using the Calinski–Harabasz score.

**Fig. 5: Radiomic feature maps and sub-region generation.**

Radiomics features for each sub-region were extracted using PyRadiomics³⁵. ROIs were normalized to 64 Gray levels to account for variability in CT scanners. Radiomics features quantifying volume, shape, intensity, and texture were calculated for each sub-region, yielding 1834 features derived from eight filters: Laplacian of Gaussian, Wavelet, 3D Local Binary Patterns, Exponential, Square, SquareRoot, Logarithm, and Gradient. Comprehensive parameters for radiomics features can be found on the PyRadiomics website (https://pyradiomics.readthedocs.io/en/latest/).

Radiomics feature selection and model construction

Radiomics features were standardized using Z-scores to achieve a normal distribution. A greedy recursive feature elimination approach was employed to remove features with a Spearman correlation coefficient of 0.9 or higher. Features with strong stability (intraobserver and interobserver ICC ≥ 0.75) were retained.

Feature selection was subsequently conducted using LASSO regression. Predictive models were developed using the XGBoost algorithm, optimized through five-fold cross-validation. Each model was trained on feature sets labeled for pCR prediction. Probabilities for pCR were then generated for patients in the validation cohorts.

Multichannel 2D DL model development and feature extraction

In the multichannel 2D DL framework, a bounding box was defined as the smallest rectangle enclosing the largest cross-section of the tumor. This bounding box was utilized to extract the primary tumor ROI along with five adjacent slices—both above and below the tumor—resulting in a dataset for each patient that included the main tumor ROI and up to 5 neighboring patches. Subsequently, the images were cropped and resized to a uniform dimension of 256 × 256 pixels through linear interpolation. Data augmentation methods, such as random horizontal and vertical flips and random cropping to 224 × 224 pixels, were employed to enhance the dataset and bolster model robustness. The final input image size was standardized to 224 × 224 pixels. The ResNet101 network was employed pre-trained on the ImageNet dataset, which contains millions of labeled images across thousands of categories. Transfer learning was applied to the training set, utilizing ImageNet’s strong feature extraction capabilities, which are commonly used in medical imaging research^21,36,37. A global fine-tuning approach was employed to adjust network parameters, customizing ResNet101 for pCR prediction.

Multiple patches for each patient were consistently labeled to reflect the presence or absence of pCR. These labeled tumor images were input for the 2D DCNN. The network parameters were iteratively updated using backpropagation, guided by the cross-entropy loss function during training.The model utilized the Adam optimizer with a learning rate of 1 × 10⁻⁴, a batch size of 64, and incorporated L2 regularization and early stopping to prevent overfitting. The 2D DCNN model produced probability scores for pCR for each CT patch. Patient-level probabilities were calculated by averaging the probabilities across all patches for a given patient. The penultimate average pooling layer of ResNet101 was also used to extract 2,048 DL features from each patch upon model completion.

3D DL model development and feature extraction

We employed a 3D ResNet architecture to address the limitations of 2D DCNNs in comprehensively capturing 3D anatomical structures. This model enhances traditional ResNet’s capabilities by preserving stereoscopic information and leveraging automatic parameter learning for contextual feature extraction²¹. The Med3D dataset, a comprehensive resource for multi-modal and multi-organ medical imaging, was employed due to the limited availability of pre-trained models for medical 3D imaging³⁸. The 3D ResNet101 model, pre-trained on Med3D, was the backbone architecture. Decoder layers were replaced with fully connected layers to customize the network for the specific task of pCR prediction³⁹. The 3D ResNet101 backbone was initialized with pre-trained weights and fine-tuned on our training dataset.

For 3D DL processing, the tumor ROI was enclosed within a bounding cube. The ROI cubes were linearly interpolated to a size of 96 × 96 × 96 voxels. Data augmentation methods, such as inversion across the X, Y, and Z axes, were utilized to enhance dataset diversity. The 3D DCNN was provided with the generated 3D images and their corresponding labels. The model was trained for 200 epochs using the Adam optimizer with a learning rate of 0.03. After training, the 3D DCNN predicted pCR for each patient.Each patient had 2048 deep learning features extracted from the penultimate average pooling layer of the 3D ResNet101 model.

Construction of the fusion models

Feature-level fusion was achieved by concatenating features from multiple modalities into a unified feature vector. Radiomics features of the primary tumor were extracted using PyRadiomics, while 2D and 3D DL features were derived from the respective DCNNs, as described earlier. Z-score normalization was applied to standardize all features. Feature relevance was assessed using Spearman correlation, ICCs, and LASSO analyses. The XGBoost algorithm was subsequently used to construct feature-based fusion models. Two fusion models were developed: DLRad1, which combined radiomics features based on sub-regions with multichannel 2D DL features, and DLRad2, which integrated radiomics features based on sub-regions with 3D DL features.

Clinical model development and models comparison

Univariate and multivariate logistic regression analyses were conducted to identify clinical and radiomics indicators independently associated with pCR. Based on the multivariate regression results, we constructed a Clinical model by including significant clinical predictors (P < 0.05) identified in the training cohort. Subsequently, we developed a combined multivariable model incorporating both the DLRad1 and the selected clinical variables, in order to evaluate the added value of radiomics beyond clinical factors. To compare the discriminative performance of the Clinical model, DLRad1 model, and Combined model (Clinical + DLRad1), we calculated the area under the AUC for each. Furthermore, we assessed the incremental predictive value of the radiomics-based model using the IDI and NRI metrics.

Model evaluation

The models’ predictive performance was evaluated using metrics such as ROC curves, AUC, sensitivity, specificity, and overall accuracy. The DeLong test was utilized to assess differences in AUC values between models. DCA was used to evaluate the clinical applicability of the models. Calibration curves were plotted to evaluate the concordance between predicted and observed outcomes. Model interpretability was assessed using the SHAP method to clarify the impact of individual features on predictions⁴⁰.

Statistical analysis

Categorical variables were assessed with the Chi-square or Fisher’s exact tests, and continuous variables were analyzed using the Mann–Whitney U test or independent t-test. A two-sided P-value of less than 0.05 was considered statistically significant. Statistical analyses were conducted using R (v4.2.1) and Python (v3.7.0). Table S7 provides a list of the packages utilized in this study.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. The code for feature selection and machine learning modeling can be accessed on the Onekey AI platform (https://github.com/OnekeyAI-Platform). Other codes used in this study are available from the corresponding author upon reasonable request.

Code availability

The code for feature selection and machine learning modeling can be accessed on the Onekey AI platform (https://github.com/OnekeyAI-Platform). Other codes used in this study are available from the corresponding author upon reasonable request.

References

Sung, H. et al. Global Cancer Statistics 2020: GLOBOCAN estimates of incidence and mortality Worldwide for 36 Cancers in 185 Countries. Ca. Cancer J. Clin. 71, 209–249 (2021).
PubMed Google Scholar
Chen, W. et al. Cancer statistics in China, 2015. CA Cancer J. Clin. 66, 115–132 (2016).
PubMed Google Scholar
Yang, Y. et al. Neoadjuvant camrelizumab combined with paclitaxel and nedaplatin for locally advanced esophageal squamous cell carcinoma: a single-arm phase 2 study (cohort study). Int. J. Surg. 110, 1430–1440 (2024).
Article PubMed Google Scholar
Yu, Y. K. et al. Neoadjuvant chemotherapy combined with immunotherapy versus neoadjuvant chemoradiotherapy in patients with locally advanced esophageal squamous cell carcinoma. J. Thorac. Cardiovasc. Surg. 168, 417–428.e3 (2024).
Article PubMed Google Scholar
Doki, Y. et al. Nivolumab combination therapy in advanced esophageal squamous-cell carcinoma. N. Engl. J. Med. 386, 449–462 (2022).
Article PubMed CAS Google Scholar
Luo, H. et al. Effect of Camrelizumab vs Placebo added to chemotherapy on survival and progression-free survival in patients with advanced or metastatic esophageal squamous cell carcinoma: the ESCORT-1st randomized clinical trial. JAMA 326, 916–925 (2021).
Article PubMed PubMed Central CAS Google Scholar
Wang, Z. et al. Efficacy and safety of neoadjuvant immunotherapy in surgically resectable esophageal cancer: a systematic review and meta-analysis. Int. J. Surg. 104, 106767 (2022).
Article PubMed Google Scholar
Ge, F. et al. Evaluation of clinical and safety outcomes of neoadjuvant immunotherapy combined with chemotherapy for patients with resectable esophageal cancer: a systematic review and meta-analysis. JAMA Netw. Open 5, e2239778 (2022).
Article PubMed PubMed Central Google Scholar
Wang, W. et al. A narrative review on advances in neoadjuvant immunotherapy for esophageal cancer: molecular biomarkers and future directions. Int. J. Cancer 156, 20–33 (2025).
Article PubMed CAS Google Scholar
Gillies, R. J., Kinahan, P. E. & Hricak, H. Radiomics: images are more than pictures, they are data. Radiology 278, 563–577 (2016).
Article PubMed Google Scholar
Lambin, P. et al. Radiomics: the bridge between medical imaging and personalized medicine. Nat. Rev. Clin. Oncol. 14, 749–762 (2017).
Article PubMed Google Scholar
Limkin, E. J. et al. Promises and challenges for the implementation of computational medical imaging (radiomics) in oncology. Ann. Oncol. 28, 1191–1206 (2017).
Article PubMed CAS Google Scholar
Huang, W. et al. Non-invasive CT imaging biomarker to predict immunotherapy response in gastric cancer: a multicenter study. J. Immunother. Cancer 11, e007807 (2023).
Article PubMed PubMed Central Google Scholar
Zhao, J. et al. Radiomic and clinical data integration using machine learning predict the efficacy of anti-PD-1 antibodies-based combinational treatment in advanced breast cancer: a multicentered study. J. Immunother. Cancer 11, e006514 (2023).
Article PubMed PubMed Central Google Scholar
Hua, Y. et al. Pretreatment CT-based machine learning radiomics model predicts response in unresectable hepatocellular carcinoma treated with lenvatinib plus PD-1 inhibitors and interventional therapy. J. Immunother. Cancer 12, e008953 (2024).
Article PubMed PubMed Central Google Scholar
Juan-Albarracín, J. et al. Glioblastoma: vascular habitats detected at preoperative dynamic susceptibility-weighted contrast-enhanced perfusion MR imaging predict survival. Radiology 287, 944–954 (2018).
Article PubMed Google Scholar
Gillies, R. J. & Balagurunathan, Y. Perfusion MR imaging of breast cancer: insights using “habitat imaging. Radiology 288, 36–37 (2018).
Article PubMed Google Scholar
Napel, S. et al. Quantitative imaging of cancer in the postgenomic era: Radio(geno)mics, deep learning, and habitats. Cancer 124, 4633–4649 (2018).
Article PubMed Google Scholar
Singh, S. P. et al. 3D deep learning on medical images: a review. Sensors20, 5097 (2020).
Article PubMed PubMed Central Google Scholar
Varga, D. No-reference image quality assessment with multi-scale orderless pooling of deep features. J. Imaging 7, 112 (2021).
Article PubMed PubMed Central Google Scholar
Wang, W. et al. Comparing three-dimensional and two-dimensional deep-learning, radiomics, and fusion models for predicting occult lymph node metastasis in laryngeal squamous cell carcinoma based on CT imaging: a multicentre, retrospective, diagnostic study. EClinicalMedicine 67, 102385 (2024).
Article PubMed PubMed Central Google Scholar
Zhang, Y. et al. Improving prognostic performance in resectable pancreatic ductal adenocarcinoma using radiomics and deep learning features fusion in CT images. Sci. Rep. 11, 1378 (2021).
Article PubMed PubMed Central CAS Google Scholar
Nie, D. et al. Multi-channel 3D deep feature learning for survival time prediction of brain tumor patients using multi-modal neuroimages. Sci. Rep. 9, 1103 (2019).
Article PubMed PubMed Central Google Scholar
Wang, J. L. et al. A machine learning radiomics based on enhanced computed tomography to predict neoadjuvant immunotherapy for resectable esophageal squamous cell carcinoma. Front. Immunol. 15, 1405146 (2024).
Article PubMed PubMed Central CAS Google Scholar
Shi, L. et al. CT radiomics to predict pathologic complete response after neoadjuvant immunotherapy plus chemoradiotherapy in locally advanced esophageal squamous cell carcinoma. Eur. Radiol. 33, 1594–1604 (2024).
Zhu, Y. et al. Predicting response to immunotherapy plus chemotherapy in patients with esophageal squamous cell carcinoma using non-invasive Radiomic biomarkers. BMC Cancer 21, 1167 (2021).
Article PubMed PubMed Central CAS Google Scholar
Zhang, M. et al. CT-based deep learning radiomics and hematological biomarkers in the assessment of pathological complete response to neoadjuvant chemoradiotherapy in patients with esophageal squamous cell carcinoma: A two-center study. Transl. Oncol. 39, 101804 (2024).
Article PubMed CAS Google Scholar
Zheng, Y. et al. CT-based delta-radiomics for predicting pathological response to neoadjuvant immunochemotherapy in esophageal squamous cell carcinoma: a multicenter study. BMC Med. Imaging 24, 329 (2024).
Article PubMed PubMed Central Google Scholar
Ruan, Y. et al. Dynamic radiological features predict pathological response after neoadjuvant immunochemotherapy in esophageal squamous cell carcinoma. J. Transl. Med. 22, 471 (2024).
Article PubMed PubMed Central CAS Google Scholar
Sala, E. et al. Unravelling tumour heterogeneity using next-generation imaging: radiomics, radiogenomics, and habitat imaging. Clin. Radiol. 72, 3–10 (2017).
Article PubMed CAS Google Scholar
Gatenby, R. A., Grove, O. & Gillies, R. J. Quantitative imaging in cancer evolution and ecology. Radiology 269, 8–15 (2013).
Article PubMed PubMed Central Google Scholar
Xie, C. et al. Sub-region based radiomics analysis for survival prediction in oesophageal tumours treated by definitive concurrent chemoradiotherapy. EBioMedicine 44, 289–297 (2019).
Article PubMed PubMed Central CAS Google Scholar
Xu, R. et al. Do habitat MRI and fractal analysis help distinguish triple-negative breast cancer from non-triple-negative breast carcinoma. Can. Assoc. Radiol. J. 75, 584–592 (2024).
Article PubMed Google Scholar
Galluzzi, L., Aryankalayil, M. J., Coleman, C. N. & Formenti, S. C. Emerging evidence for adapting radiotherapy to immunotherapy. Nat. Rev. Clin. Oncol. 20, 543–557 (2023).
Article PubMed Google Scholar
van Griethuysen, J. J. M. et al. Computational Radiomics System to Decode the Radiographic Phenotype. Cancer Res. 77, e104–e107 (2017).
Mukherjee, S. et al. Radiomics-based machine-learning models can detect pancreatic cancer on prediagnostic computed tomography scans at a substantial lead time before clinical diagnosis. Gastroenterology 163, 1435–1446.e3 (2022).
Article PubMed Google Scholar
Lang, D. M., Peeken, J. C., Combs, S. E., Wilkens, J. J. & Bartzsch, S. Deep learning based HPV status prediction for oropharyngeal cancer patients. Cancers (Basel) 13, 786 (2021).
Article PubMed Google Scholar
Chen, S., Ma, K. & Zheng, Y. Med3D: transfer learning for 3D medical image analysis. https://doi.org/10.48550/arXiv.1904.00625 (2019).
He, K., Zhang, X., Ren, S. & Sun, J. Spatial pyramid pooling in deep convolutional networks for visual recognition. IEEE Trans. Pattern Anal. Mach. Intell. 37, 1904–1916 (2015).
Article PubMed Google Scholar
Sylvester, S. et al. SHAP value-based ERP analysis (SHERPA): Increasing the sensitivity of EEG signals with explainable AI methods. Behav. Res. Methods 56, 6067–6081 (2024).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the National Natural Science Foundation of China (82303742), the Harbin Medical University Cancer Hospital Climbing program (PDYS2024-10) and China Postdoctoral Science Foundation (2024MD763971).

Author information

These authors contributed equally: Jiahao Zhu, Benjie Xu, Tiantian Fan.

Authors and Affiliations

Department of Outpatient Chemotherapy, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, PR China
Jiahao Zhu, Benjie Xu & Haibo Lu
Department of Radiology, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, PR China
Tiantian Fan, Jiaxuan Ding & Yang Zhou
Department of Radiotherapy and Oncology, Suzhou Municipal Hospital, The Affiliated Suzhou Hospital of Nanjing Medical University, Gusu School, Nanjing Medical University, Suzhou, Jiangsu, PR China
Shengjun Ji
Department of Radiotherapy and Oncology, The Affiliated Hospital of Jiangnan University, Wuxi, Jiangsu, PR China
Ke Gu
Department of Thoracic Surgery, Harbin Medical University Cancer Hospital, Harbin, Heilongjiang, PR China
Jianqun Ma

Authors

Jiahao Zhu
View author publications
Search author on:PubMed Google Scholar
Benjie Xu
View author publications
Search author on:PubMed Google Scholar
Tiantian Fan
View author publications
Search author on:PubMed Google Scholar
Shengjun Ji
View author publications
Search author on:PubMed Google Scholar
Ke Gu
View author publications
Search author on:PubMed Google Scholar
Jiaxuan Ding
View author publications
Search author on:PubMed Google Scholar
Haibo Lu
View author publications
Search author on:PubMed Google Scholar
Jianqun Ma
View author publications
Search author on:PubMed Google Scholar
Yang Zhou
View author publications
Search author on:PubMed Google Scholar

Contributions

Jiahao Zhu: Conceptualization, Formal analysis, Investigation, Methodology, Writing-original draft, Writing-review & editing, Project administration. Benjie Xu: Investigation, Visualization, Writing-review & editing. Tiantian Fan: Formal analysis. Ke Gu: Resources. Shengjun Ji: Resources. Jiaxuan Ding: Methodology. Haibo Lu: Formal analysis. Jianqun Ma: Resources. Yang Zhou: Conceptualization, Funding acquisition, Supervision. All authors had full access to the data in the study and take responsibility for the integrity of the data, the accuracy of the analysis, and the decision to submit the manuscript for publication.

Corresponding authors

Correspondence to Haibo Lu, Jianqun Ma or Yang Zhou.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary material

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Zhu, J., Xu, B., Fan, T. et al. Sub-regional radiomics combining multichannel 2-dimensional or 3-dimensional deep learning for predicting neoadjuvant chemo-immunotherapy response in esophageal squamous cell carcinoma: a multicenter study. npj Precis. Onc. 9, 248 (2025). https://doi.org/10.1038/s41698-025-01047-9

Download citation

Received: 24 February 2025
Accepted: 09 July 2025
Published: 21 July 2025
Version of record: 21 July 2025
DOI: https://doi.org/10.1038/s41698-025-01047-9

Subjects

Abstract

Similar content being viewed by others

Multiparameter MRI-based model integrating radiomics and deep learning for preoperative staging of laryngeal squamous cell carcinoma

Prediction of cervical cancer lymph node metastasis based on multisequence magnetic resonance imaging radiomics and deep learning features: a dual-center study

An artificial neural network-based radiomics model for predicting the radiotherapy response of advanced esophageal squamous cell carcinoma patients: a multicenter study

Introduction

Results

Patient characteristics

Diagnostic performance of different tumor regions in CT

Comprehensive analysis of radiomics and DL features

Performance analysis of radiomics, 2D/3D DL models, and fusion models

Clinical model development and models comparison

Discussion

Methods

Study participants

Ethics statement

NACI regimen and clinicopathological data collection

CT examination and image preprocessing

Sub-region clustering and radiomics feature extraction

Radiomics feature selection and model construction

Multichannel 2D DL model development and feature extraction

3D DL model development and feature extraction

Construction of the fusion models

Clinical model development and models comparison

Model evaluation

Statistical analysis

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Supplementary information

Supplementary material

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links