Introduction

Esophageal cancer (ESCA) is a common malignant tumor of the digestive tract, with incidence and mortality rates ranking seventh and sixth, respectively, among all cancers globally1. Upper ESCA, which includes cervical and upper-third ESCA, accounts for only approximately 5–10% of all ESCA cases and is thus much less prevalent than ESCA at other segments2,3,4. Moreover, disparities in the anatomical location of ESCAs give rise to distinct biological traits and treatment patterns. Several studies have revealed that the biological behaviors of upper ESCAs are more similar to those of head and neck tumors and that the prognosis of upper ESCA differs from that of ESCA at other segments4,5,6. Owing to the complex anatomical structure of upper ESCA and its close proximity to surrounding organs, the proportion of patients with upper ESCA that undergo surgical treatment is significantly lower than that of patients with ESCA at other segments4. However, to date, substantial clinical evidence confirming the specific clinical characteristics of upper ESCA is lacking.

The tumor–node–metastasis (TNM) staging system has traditionally been used as the first choice for evaluating treatment options and the prognosis of different types of cancers, including ESCA7,8,9,10. However, cancer prognosis is influenced by a complex interplay of various factors, and overlooking these factors may diminish the accuracy of prognostic assessments. Several research groups have developed robust prognostic models by incorporating multiple prognostic factors11,12. Sun W et al. successfully constructed a prognostic model for primary nonmetastatic esophageal cancer11. Qin L et al. developed a prognostic nomogram for predicting cancer-specific survival in elderly patients with lymph node-negative esophageal cancer12. Nevertheless, given the unique clinical characteristics of upper ESCA, there is still a lack of effective prognostic models involving factors specific to this tumor type rather than ESCA at other segments.

We collected data for 16,805 ESCA patients from the National Cancer Institute’s Surveillance, Epidemiology, and End Results (SEER) database and compared the clinical characteristics and outcomes of patients with upper ESCA and those with ESCA at other segments. In addition, we not only constructed an upper ESCA model but also constructed a TNM model and total ESCA model and compared their performance in predicting the prognosis of upper ESCA. Finally, we confirmed that the upper ESCA model performed better than the TNM model and total ESCA model did. This study revealed the unique clinical characteristics of upper ESCA and provides a model that can predict the prognosis of upper ESCA more accurately than existing models.

Methods

Database and patients

Data for patients diagnosed with malignant esophageal tumors between 2004 and 2015 were collected from the SEER Research Data, 17 Registries, Nov 2023 Sub (2000–2021) using SEER Stat software (version 8.4.3). All patients included in the study had a primary site-labeled recode diagnosis of ‘C15.0-Cervical esophagus, C15.3-Upper third of esophagus, C15.4-Middle third of esophagus, C15.5-Lower third of esophagus, C15.2-abdominal esophagus’. The exclusion criteria were age < 18 years or > 80 years, an unknown survival time (in months), unclear follow-up information, and survival time ≤ 1 month (Fig. 1). We extracted the following data from the SEER database: age, sex, marital status, primary site of the tumor, histologic type, grade, T stage, N stage, M stage, survival months and vital status. Overall survival (OS) was assessed in the survival analysis for this study. OS was defined as the duration from the day of diagnosis to the day of death. Institutional review board approval was not necessary because the SEER database is publicly available.

Fig. 1
figure 1

Patient screening flowchart. This figure not only shows how we screened total ESCA cases (n = 16,805), cases with upper ESCA (n = 1371) and cases with ESCA at other segments (n = 15,434), but also shows the upper ESCA model and TNM model developed from upper ESCA cases and the total ESCA model developed from total ESCA cases. ESCA esophageal cancer.

Statistical analysis

The trend in the constituent ratio of ESCA was analyzed via the Joinpoint Regression Program, and the average annual percent change (AAPC) was calculated. Continuous variables in this study were nonnormally distributed and compared with the Mann‒Whitney test. Categorical variables were compared with the chi-square test. The log-rank test was used to compare survival time among different groups. Cox regression analysis was used to analyze factors affecting OS. A nomogram was formulated on the basis of the results of the Cox regression analysis. All P values were two-sided, and P < 0.05 was considered to indicate statistical significance. Data processing and analysis were performed via R version 4.4.0.

Results

Comparison of clinical characteristics and prognoses between upper ESCA and ESCA at other segments

As shown in Table 1, a total of 16,805 ESCA cases assessed in this study were divided into 1371 cases of upper ESCA and 15,434 cases of ESCA at other segments. Notably, the number of cases in the former group was less than one-tenth of that in the latter group. The median age of patients with upper ESCA was 65 [interquartile range (IQR) 58–72] years, which was significantly greater than that of patients with ESCA at other segments [65 (IQR 58–71) years of age]; 912 out of 1371 patients with upper ESCA were male, and the proportion was significantly less than that among patients with ESCA at other segments (12,731 out of 15,434); 690 out of 1371 patients with upper ESCA were in normal marital status, and the proportion was significantly lower than that among patients with ESCA at other segments (9809 out of 15,434); 1199 out of 1371 patients with upper ESCA had squamous cell carcinoma, and the proportion was significantly greater than that among patients with ESCA at other segments (3866 out of 15,434); 127 out of 1371 patients with upper ESCA had adenocarcinoma, and the proportion was significantly less than that among patients with ESCA at other segments (11,048 out of 15,434); 740 out of 1371 upper ESCA cases were differentiation grade II, and the proportion was significantly greater than that among cases of ESCA at other segments (6431 out of 15,434); 524 out of 1371 upper ESCA cases were differentiation grade III, and the proportion was significantly lower than that among cases of ESCA at other segments (7755 out of 15,434). 138 out of 1371 upper ESCA cases were at the T2 stage, 442 out of 1371 were at the T3 stage, and 356 out of 1371 were at the T4 stage; the proportions of T2 stage cases (2127 out of 15,434) and T3 stage cases (6566 out of 15,434) were higher, and the proportion of T4 stage cases (1886 out of 15,434) were lower among ESCA patients at other segments.640 out of 1371 patients with upper ESCA were lymph node positive, and the proportion was less than that for ESCA patients at other segments (8433 out of 15,434); 221 out of 1371 upper ESCA cases were at the M1 stage, and the proportion was less than that for cases of ESCA at other segments (4169 out of 15,434). Survival analysis showed that OS of patients with upper ESCA was significantly poorer than that of patients with ESCA at other segments, with a hazard ratio (HR) = 1.414 (95% CI = 1.072–1.216, P < 0.0001) (Fig. 2a,b). However, there was no significant difference in OS between patients with cervical ESCA and those with upper-third ESCA, with an HR of 0.8924 (95% CI 0.788–1.011, P = 0.0729) (Fig. 2c,d).

Table 1 Comparison of clinical characteristics between upper ESCA and ESCA at other segments Z: Mann–Whitney test, Χ2: Chi-square test M: median, Q1: 1st quartile, Q2: 3st quartile ESCA: esophageal cancer P < 0.05 was considered statistically significant.
Fig. 2
figure 2

Survival analysis of ESCA patients at different segments. (a) Comparison of overall survival (OS) between patients with upper ESCA and those with ESCA at other segments. (b) Survival rates of 1, 3, 5, and 8 years between patients with upper ESCA and those with ESCA at other segments. (c) Comparison of OS between cervical ESCA patients and upper-third ESCA patients. (d) Survival rates of 1, 3, 5, and 8 years between cervical ESCA patients and upper-third ESCA patients. ESCA esophageal cancer, OS overall survival; P < 0.05 was considered to indicate statistical significance.

Trends in the constituent ratios of ESCA patients stratified by clinical characteristics

From 2004 to 2015, for total cases of ESCA, the constituent ratio of the upper ESCA declined slightly, with an AAPC of -0.98% (95% CI − 7.07–5.5%, P = 0.76), and the constituent ratio of ESCA at other segments increased slightly, with an AAPC of 0.09% (95% CI − 0.47–0.65%, P = 0.76) (Fig. 3a). In patients with upper ESCA, the constituent ratios of both cervical ESCA and upper-third ESCA showed considerable fluctuations but no significant changes, with AAPCs of − 2.69% (95% CI − 12.01–7.62%, P = 0.059) and 1.09% (95% CI − 2.75–5.08%, P = 0.58), respectively (Fig. 3b). During this period, there were no significant changes in the age of onset for upper ESCA (Fig. 3c). For total ESCA patients, the constituent ratio of squamous cell carcinoma patients decreased significantly, with an AAPC of − 2.53% (95% CI − 4.07 to − 0.26%, P = 0.02), and the constituent ratio of adenocarcinoma patients increased significantly, with an AAPC of 1.27% (95% CI  0.25–2.29%, P = 0.01) (Fig. 3d); for patients with ESCA at other segments, the constituent ratio of squamous cell carcinoma patients also decreased, but with a nonsignificant AAPC of -3.03% (95% CI − 6.34–0.39%, P = 0.08), and the constituent ratio of adenocarcinoma patients also increased significantly, with an AAPC of 1.18% (95% CI  0.06–2.31%, P = 0.03) (Fig. 3e). However, for upper ESCA, the constituent ratios of squamous cell carcinoma and adenocarcinoma both showed no significant changes, with AAPCs of − 0.14% (95% CI − 2.25–2.01%, P = 0.89) and 3.01% (95% CI − 8.26–15.63%, P = 0.62), respectively (Fig. 3f). When patients were stratified by other clinical variables, including sex, marital status, grade, T stage, N stage and M stage, the constituent ratios of patients with upper ESCA also fluctuated considerably, but the AAPC was not significant (Supplementary Fig. 1).

Fig. 3
figure 3

Trends in constituent ratios of ESCA stratified by clinical characteristics. (a) Constituent ratios of upper ESCA and ESCA at other segments from 2004–2015. (b) Constituent ratios of cervical ESCA and upper-third ESCA from 2004–2015. (c) Changes in the ages of patients with upper ESCA from 2004–2015. (d) Constituent ratios of total ESCA patients stratified by histologic type from 2004–2015. (e) Constituent ratios of patients with ESCA at other segments stratified by histologic type from 2004–2015; (f) Constituent ratios of patients with upper ESCA stratified by histologic type from 2004–2015. AAPC an average annual percent change with P < 0.05 was considered statistically significant.

Independent prognostic factors in the training cohort

A total of 1371 patients with upper ESCA were randomly divided into a training cohort (n = 963) and a validation cohort (n = 408) at a ratio of 7:3. The demographic and clinicopathologic characteristics of the patients with upper ESCA are shown in Table 2. The median age of the overall population was 65 (IQR 58–72) years. Male patients (66.5%), patients with upper-third ESCA (71.5%), patients with squamous cell carcinoma (87.5%), patients with differentiation grade II (54%) or III (38.2%) and patients without metastasis (83.9%) constituted the majority of the overall population. In addition, the proportions of patients with normal marital status (50.3%) and patients with abnormal marital status (49.7%) were similar, and the proportions of patients with (46.7%) or without (53.3%) lymph node metastasis were also similar. The proportions of patients at the T1, T3 and T4 stages were 31.7%, 32.2% and 26.0%, respectively, which were much greater than the 10.1% at the T2 stage. The training and validation cohorts were comparable in terms of demographic and clinical characteristics (P > 0.05). After univariable Cox regression analysis, significant variables were subjected to multivariable Cox regression analysis. Finally, age, sex, marital status, histologic type, T stage and M stage were determined to be independent prognostic factors (Table 3). Considering the impact of tumor differentiation on malignant behavior, as well as the P value for Grade IV tumors being close to 0.05, we decided to include grade in the model construction. In addition, because N stage is an important component of the TNM staging system, it was also included in the construction of the nomogram. Notably, the N1 stage did not significantly correlate with the prognosis of upper ESCA according to univariable Cox regression analysis and even showed a negative correlation with poor prognosis according to multivariable Cox regression analysis, although the correlation was not statistically significant. We explored this issue by clarifying the associations between lymph node metastasis and age and their combined influence on prognosis. As shown in Supplementary Fig. 2, patients with lymph node metastasis were younger than patients without lymph node metastasis (Supplementary Fig. 2a). We subsequently constructed a receiver operating characteristic (ROC) curve for age in predicting OS and obtained a cutoff value of 70 years (Supplementary Fig. 2b). Survival analysis confirmed that the OS of patients > 70 years was significantly poorer than the OS of patients ≤ 70 years (Supplementary Fig. 2c). Further survival analysis revealed that the prognosis of patients in the ≤ 70/N1 group was significantly better than that of patients in the > 70/N0 group, with an HR of 0.808 (95% CI   0.665–0.982, P = 0.023) (Supplementary Fig. 2d). These results indicate that although older age and lymph node metastasis are both adverse prognostic factors, they are negatively correlated. When not consistent, older age was more strongly correlated with a poor prognosis than was lymph node metastasis.

Table 2 Demographics and clinicopathologic characteristics of upper ESCA.
Table 3 Cox regression analysis on variables for the prediction of overall survival of upper ESCA.

Construction and validation of the upper ESCA model

On the basis of the above prognostic factors screened by Cox regression analysis in the training cohort from upper ESCA, an upper ESCA model for predicting prognosis was established (Fig. 4). The C-indexes of the training cohort and validation cohort were 0.64 (95% CI 0.62–0.66) and 0.62 (95% CI 0.58–0.64), respectively (Table 4). The area under the curve (AUC) values of 3- and 5-year OS were 0.705 (95% CI 0.67–0.74) and 0.707 (95% CI 0.669–0.745), respectively, in the training cohort and 0.67 (95% CI 0.612–0.727) and 0.711 (95% CI 0.653–0.77), respectively, in the validation cohort (Fig. 5a–d). Decision curve analysis (DCA) demonstrated good practical value in both the training cohort and the validation cohort (Fig. 5e–h). The calibration curve showed strong agreement between the observed and predicted values in both the training cohort and the validation cohort (Fig. 5i–l). To further verify the ability of the upper ESCA model to predict OS, all patients were divided into a low-risk group (n = 741) and a high-risk group (n = 630) according to the cutoff value (222.3907), which was obtained from the nomogram scores in the training cohort. The Kaplan‒Meier (KM) analysis revealed that patients in the high-risk group had significantly poorer OS than did those in the low-risk group in the overall population, training cohort and validation cohort (Fig. 6a–c). This proved the good predictive ability of this model.

Fig. 4
figure 4

Nomogram for predicting the 3- and 5-year overall survival (OS) of patients with upper ESCA.

Table 4 C-index, NRI, and IDI of the upper ESCA model and the TNM model in survival prediction for upper ESCA.
Fig. 5
figure 5

Receiver operating characteristic (ROC) curves, decision curve analysis (DCA) curves and calibration curves for upper ESCA. ROC curves for 3-year OS of patients with upper ESCA in the training cohort (a) and validation cohort (b). ROC curves for 5-year OS of patients with upper ESCA in the training cohort (c) and validation cohort (d). Decision curves for 3-year OS of patients with upper ESCA in the training cohort (e) and validation cohort (f). Decision curves for 5-year OS of patients with upper ESCA in the training cohort (g) and validation cohort (h). Calibration curves for 3-year OS of patients with upper ESCA in the training cohort (i) and validation cohort (j). Calibration curves for 5-year OS of patients with upper ESCA in the training cohort (k) and validation cohort (l).

Fig. 6
figure 6

Survival analysis of the upper ESCA model. Comparison of overall survival (OS) between the high-risk group and low-risk group in the overall population (a), training cohort (b) and validation cohort (c) of the upper ESCA model. P < 0.05 was considered to indicate statistical significance. ESCA esophageal cancer.

Comparison of the performance of the upper ESCA model and the TNM model in predicting the prognosis of upper ESCA

To further assess the predictive power of the upper ESCA model, a TNM model based on the TNM staging system was constructed to compare the C-index, IDI, NRI, time-dependent AUC and decision curves. As shown in Table 4, the C-indexes of the training cohort and validation cohort in the upper ESCA model were 0.64 (95% CI 0.62–0.66) and 0.62 (95% CI 0.58–0.64), respectively, higher than 0.61 (95% CI 0.59–0.63) and 0.59 (95% CI 0.55–0.62) in the TNM model. The NRI and IDI were further used to compare the accuracies between the upper ESCA model and the TNM model. In the training cohort, the NRIs for 3- and 5-year OS were 0.146 (95% CI 0.032–0.233) and 0.168 (95% CI 0.034–0.253), respectively, and the IDIs for 3- and 5-year OS were 0.04 (95% CI 0.016–0.059, P = 0.00) and 0.052 (95% CI 0.024–0.073, P = 0.00), respectively. These results were also confirmed in the validation cohort (Table 4). Moreover, the time-dependent AUC values from the upper ESCA model were consistently much greater than those from the TNM model in both the training cohort and the validation cohort (Fig. 7a,b). Decision curves demonstrated that the upper ESCA model could better predict 3- and 5-year OS, as it added more net clinical benefits than the TNM model in the training cohort and the validation cohort (Fig. 7c–f). Finally, the KM curves were significantly different between the low-risk group and high-risk group in both models, but the HR values of the overall population, training cohort and validation cohort with the upper ESCA model were 2.098 (95% CI  1.861–2.366), 2.151 (95% CI  1.861–2.486) and 1.971 (95% CI  1.591–2.442), respectively, which were higher than the values of 1.860 (95% CI 1.600–2.162), 1.980 (95% CI 1.652–2.379) and 1.589 (95% CI 1.218–2.073), respectively, with the TNM model (Fig. 7g‒i). These results suggest that the upper ESCA model can predict OS with greater accuracy and clinical applicability than the TNM model can.

Fig. 7
figure 7

Time-dependent area under the curve (AUC) and decision curves of the upper ESCA model and TNM model and survival analysis of patients grouped according to these models. Time-dependent AUCs of the training cohort in the upper ESCA model and the TNM model (a); time-dependent AUCs of the validation cohort in the upper ESCA model and the TNM model (b). Decision curves for 3-year OS of the training cohort in the upper ESCA model and the TNM model (c). Decision curves for 3-year OS of the validation cohort in the upper ESCA model and the TNM model (d). Decision curves for 5-year OS of the training cohort in the upper ESCA model and the TNM model (e). Decision curves for 5-year OS of the validation cohort in the upper ESCA model and the TNM model (f). Comparison of OS between the high-risk group and the low-risk group in the overall population of the upper ESCA model and the TNM model (g). Comparison of OS between the high-risk group and the low-risk group in the training cohort of the upper ESCA model and the TNM model (h). Comparison of OS between the high-risk group and the low-risk group in the validation cohort of the upper ESCA model and the TNM model (i). High-risk1: high-risk group in the upper ESCA model; low risk1: low-risk group in the upper ESCA model; high risk2: high-risk group in the TNM model; low risk2: low-risk group in the TNM model; P < 0.05 was considered to indicate statistical significance.

Comparison of the performance of the upper ESCA model and total ESCA model in predicting the prognosis of upper ESCA

To prove the necessity of developing a prognostic model specifically for upper ESCA, we decided to clarify whether the upper ESCA model we developed here was more advantageous in predicting the OS of patients with upper ESCA than the nomogram developed on the basis of total ESCA cases. We successfully developed the total ESCA model by combining 15,434 cases of ESCA at other segments and 1371 upper ESCA cases. On the basis of the risk score of every patient in the training cohort calculated via the nomogram, we obtained the cutoff value (173.6845) of the total ESCA model. The 1371 patients with upper ESCA were subsequently divided into a high-risk group (n = 909) and a low-risk group (n = 462) (Fig. 8a,b). Survival analysis of patients with upper ESCA revealed that the OS of the high-risk group was significantly poorer than that of the low-risk group in the overall population, training cohort and validation cohort, confirming the risk stratification ability of the total ESCA model for upper ESCA (Fig. 8c‒e). Further comparison of the risk stratification ability for upper ESCA was then carried out between the total ESCA model and the upper ESCA model. As shown in Fig. 8c‒e, the HR values of the overall population, training cohort and validation cohort derived from the total ESCA model were 1.808 (95% CI  1.614‒2.024), 1.885 (95% CI  1.646‒2.159) and 1.64 (95% CI  1.338‒2.021), respectively, which were lower than the 2.098 (95% CI   1.861 to − 2.366), 2.151 (95% CI  1.861‒2.486) and 1.971 (95% CI  1.591‒2.442) derived from the upper ESCA model. The gap in the KM curves between the low-risk group and the high-risk group with the upper ESCA model was wider than that with the total ESCA model in validation cohort (Fig. 8e). These results proved that the total ESCA model can also stratify patients with upper ESCA according to prognosis but at a lower accuracy than the specific upper ESCA model.

Fig. 8
figure 8

Survival analysis of patients grouped according to the upper ESCA model and total ESCA model. (a) Risk score distribution of patients with upper ESCA and those with ESCA at other segments in the training cohort for the total ESCA model. (b) Risk score distribution of patients with upper ESCA and those with ESCA at other segments in the validation cohort for the total ESCA model. (c) Comparison of OS between the high-risk group and low-risk group for overall patients with upper ESCA according to the upper ESCA model and total ESCA model. (d) Comparison of OS between the high-risk group and low-risk group in the training cohort of patients with upper ESCA according to the upper ESCA model and total ESCA model. (e) Comparison of OS between the high-risk group and low-risk group in the validation cohort of patients with upper ESCA according to the upper ESCA model and total ESCA model. high risk1: high-risk group in the upper ESCA model; low risk1: low-risk group in the upper ESCA model; high risk2: high-risk group in the total ESCA model; low risk2: low-risk group in the total ESCA model; P < 0.05 was considered to indicate statistical significance.

Discussion

Tumor heterogeneity is an important characteristic of malignant tumors and usually plays important roles in cancer treatment and prognosis. It may exist among different types of tumors, different patients with the same type of tumor or even different regions of a single tumor, leading to significant variations in the response of tumors to treatment, resulting in the development of resistance, recurrence and poor prognosis13,14,15,16,17. Therefore, in-depth analysis of different subgroups of cancers will help mitigate the impact of heterogeneity, facilitate the development of more effective personalized treatment strategies and increase the accuracy of prognosis prediction. This finding has high clinical significance, especially in the current era that is focused on precision medicine and personalized treatment.

The esophagus is a long tubular organ that is divided into the cervical esophagus, upper third of the esophagus, middle third of the esophagus, lower third of the esophagus and abdominal esophagus18. Cervical ESCA and upper-third ESCA constitute upper ESCA. Owing to its unique anatomical location and low incidence, our understanding of upper ESCA is currently very limited. In this study, through the analysis of data from 16,805 ESCA patients, including 1371 patients with upper ESCA and 15,434 patients with ESCA at other segments from the SEER database, we found that patients with upper ESCA presented significantly different clinical characteristics, including age, sex, marital status, histologic type, differentiation grade, T stage, N stage and M stage, than patients with ESCA at other segments did. Unlike ESCA at other segments, which are predominantly adenocarcinomas due to factors such as obesity, gastroesophageal reflux disease (GERD), and Barrett’s esophagus (BE), upper ESCA is influenced primarily by factors such as smoking and alcohol consumption, resulting in a predominance of squamous cell carcinoma19. We also found a greater proportion of female patients among patients with upper ESCA. This may be because the endocrine environment of females can prevent the occurrence of esophageal adenocarcinoma, which is the main pathological type of ESCA at other segments, contributing to the relatively low incidence of ESCA at other segments in female patients20. Additionally, the proportion of patients with abnormal marital status among patients with upper ESCA was greater than that among patients with ESCA patients at other segments. This may be because these individuals are more likely to engage in harmful habits such as smoking and excessive drinking, which can contribute to the development of esophageal squamous cell carcinoma21. The smaller proportion of tumors with grade III differentiation among upper ESCA cases than among ESCA cases at other segments also reflects the different biological behaviors. The TNM staging system, which details the local infiltration and dissemination of cancer, serves as a crucial foundation for developing treatment strategies and predicting patient prognosis. In this system, T4 indicates that the tumor has invaded the surrounding tissues of the esophagus10. Owing to the absence of serosa in the esophageal wall, upper ESCA can easily invade adjacent structures in the neck and chest, such as the thyroid gland, trachea, larynx, lungs, and aorta. In addition, single-cell sequencing of mouse esophageal tissues revealed that the distributions and gene expression profiles of immune cells and fibroblasts in the proximal and distal esophagus are significantly different. For example, proximal fibroblasts highly express genes related to extracellular matrix remodeling, such as collagen genes, which are associated with cancer invasion22. Therefore, a greater proportion of patients with upper ESCA are classified into the T4 stage than ESCA at other segments. However, a smaller proportion of upper ESCA cases are classified as N1 stage and M1 stage, which may be attributable to the fact that upper ESCA is predominantly composed of relatively indolent squamous cell carcinomas23. The significant differences in the above clinical characteristics confirm the substantial disparities between upper ESCA and ESCA at other segments, indicating the necessity of studying them as distinct subgroups in more in-depth research. The subsequent survival analysis further demonstrated poorer OS in upper ESCA than in ESCA at other segments. Owing to anatomical difficulties and technical challenges, surgery is rarely performed, and treatments are mainly based on radiotherapy and chemotherapy for patients with upper ESCA24. Furthermore, upper ESCA is predominantly squamous cell carcinoma, which has a poorer 5-year survival rate after surgery than adenocarcinoma does23. In addition, it has been reported that high expression of C20orf54 is positively correlated with a favorable prognosis, and its expression is higher in the distal esophagus than in the proximal esophagus25. These factors may be the main reasons for the poorer prognosis of upper ESCA. However, this disparity was not observed between cervical ESCA and upper-third ESCA, which are the only two types of upper ESCA, implying that there was no significant prognostic difference among upper ESCA cases. All these results prove the necessity and feasibility of developing a prognostic model specifically for upper ESCA.

For the patients with upper ESCA in this study, Cox regression analysis revealed that male sex, older age, high differentiation grade, advanced T stage and M1 stage were positively correlated with poor prognosis, which is in line with previous studies on total ESCA26,27,28. In addition, histologic type is a very important prognostic factor in ESCA. A study of total ESCA by Siewert reported that 1059 patients who underwent resection had a better 5-year survival rate of 47% for the adenocarcinoma group than 37% for the squamous cell carcinoma group29. For patients with upper ESCA in this study, we also found that adenocarcinoma was positively correlated with better prognosis than squamous cell carcinoma. Lymph node metastasis has been reported as a promising index for predicting the prognosis of different cancers, including ESCA, and the number of metastatic lymph nodes is also correlated with prognosis30. However, in this study, we did not find a positive correlation between N1 stage and poor prognosis; even a negative correlation was detected via multivariable Cox regression analysis, although the correlation was not statistically significant. On the one hand, this might be attributed to nature of the 6th edition of the AJCC TNM system in the SEER database, according to which lymph node involvement is classified as either present or absent. On the other hand, compared with those of young patients, the tumors of older patients tend to present more inert phenotypes and usually have a decreased rate of lymph node metastasis31. We also reported that age was negatively correlated with the number of metastatic lymph nodes in esophageal squamous cell carcinoma patients32. In this study, compared with patients without metastatic lymph nodes, patients with metastatic lymph nodes were significantly younger. Then, to explore the combined effects of age and lymph node metastasis on OS, we utilized the ROC curve for age in predicting OS to obtain a cutoff value of 70 years. Notably, multiple studies have also adopted 70 years as the threshold to categorize patients into the elderly group and the non-elderly group33,34. Therefore, we finally divided all cases into patients ≤ 70 years and patients > 70 years. Survival analysis confirmed the significant difference in OS between the two groups. Moreover, younger ESCA patients had better outcomes than older ESCA patients did. When age and lymph node metastasis status were considered together, patients ≤ 70 years with lymph node metastasis had a better prognosis than patients > 70 years without lymph node metastasis. These findings suggest that the positive correlation between lymph node metastasis and poor prognosis is offset by the age factor of ≤ 70 years. This can partially explain why lymph node metastasis was not positively associated with poor prognosis in this study. In addition to age, many other factors may influence the correlation between the number of lymph nodes and poor prognosis. Finally, N1 was also included in our upper ESCA model for predicting the prognosis of upper ESCA. The validation of the upper ESCA model revealed that it had good discriminative and calibration capabilities. In clinical practice, the TNM staging system has been widely used for predicting the prognosis of ESCA patients. In this study, we combined the clinical factors associated with prognosis and TNM stage. The positive NRIs and IDIs for the upper ESCA model when compared to the TNM model indicated the better predictive capability of the former model. The time-dependent AUC, DCA and survival analysis indicated stronger discrimination ability and better clinical benefit and utility of the upper ESCA model than the TNM model.

Recently, several groups have developed prognostic models for ESCA. In 2023, Wen Li et al. successfully established a nomogram for predicting the prognosis of older ESCA patients with distant metastases on the basis of 7027 patients from the SEER database35. In 2021, Guoxin Zhang et al. successfully developed a nomogram for predicting prognosis on the basis of a group of 20,027 ESCA cases, including three types of adenocarcinoma, squamous cell carcinoma and adenosquamous adenocarcinoma cases from the SEER database36. In 2023, Chang Chen et al. developed a nomogram for predicting survival after lymphatic metastasis in ESCA on the basis of 11,695 cases from the database37. However, given the significant differences in clinical characteristics and outcomes between patients with upper ESCA and those with ESCA at other segments, as well as the very low proportion of patients with upper ESCA (< 10%) among all ESCA patients, a prognostic model developed from the entire ESCA cohort is very likely have lower predictive efficacy for upper ESCA because of the great influence of ESCA at other segments. To validate this viewpoint, in this study, we not only developed an upper ESCA model on the basis of 1371 upper ESCA cases but also developed a total ESCA model on the basis of 1371 upper ESCA cases and 15,434 cases of ESCA at other segments. In both models, patients with upper ESCA were divided into low-risk and high-risk groups according to the nomogram score, and the OS comparison revealed greater differences in the upper ESCA model than in the total ESCA model. Although the number of cases used to develop the upper ESCA model was less than one-tenth that used to develop the total ESCA model, it had a stronger predictive ability for patients with upper ESCA. As more cases accumulate, the predictive ability of the upper ESCA model will be further improved. This proves the significant advantages and great potential of developing specified models for different subgroups of ESCA patients.

There are several limitations to this study. First, the data from the SEER database did not include detailed information about treatments, such as the dose fractionation regimen of radiotherapy, specific drugs and treatment courses of chemotherapy regimens, targeted therapy and immunotherapy. This makes it impossible for us to accurately evaluate the impact of different treatment details on patient outcomes. Second, lymph node involvement in the SEER database was only reported as present or absent according to the 6th edition of the AJCC TNM system. The lack of specific staging data regarding the number of metastatic lymph nodes may impact the assessment of the effect of lymph node metastasis on prognosis, potentially diminishing the predictive power of the model. Third, the SEER database does not cover all regions in the United States. The regions not included may have unique cancer characteristics, which could lead to certain biases in the research results based on the SEER database. Finally, as this is a retrospective study, potential selection bias is inevitable, and further external multicenter prospective validation is needed.

In conclusion, upper ESCA is a rare subtype of ESCA. Compared with ESCA at other segments, it has significantly different clinical and prognostic characteristics. We developed a unique prognostic model specifically for upper ESCA, which demonstrated superior performance compared with the TNM model and total ESCA model. This model can assist clinicians in more accurately assessing the prognosis of upper ESCA, highlighting the necessity and advantages of creating personalized prognostic models for different subgroups of ESCA patients. In the future, research in several directions can further improve the predictive performance of the model: (1) Incorporating more detailed clinical information into the model will improve its predictive power. For example, unhealthy lifestyle factors related to ESCA, such as smoking, alcohol consumption volume and frequency, can be included. (2) With in-depth research on tumors, more specific molecules associated with the prognosis of upper ESCA will be discovered. For example, the expression of the C20orf54 gene, which is significantly different in the proximal and distal esophagus, is related to the prognosis of ESCA25. Integrating potential specific molecular markers into the model will further enhance its predictive performance. (3) In recent years, the rapid development of artificial intelligence and machine learning technologies has brought profound changes to many fields38,39. For example, Shayan Sharifi addressed critical challenges in maximizing transplant success rates while maintaining high biological compatibility by constructing a series of models, which is of great significance for solving the global shortage of kidney transplants. Model 1 focuses on compatibility based on blood type and panel reactive antibody (PRA). Model 2 incorporates a minimum threshold for HLA compatibility, improving the quality of transplantation. Model 3 integrates incompatible donor-recipient pairs across multiple agents, achieving a higher number of successful transplants39. This provides a good reference for further enhancing the predictive performance of our model: Developing multi-modal models that combine clinical data, imaging features, and molecular information will further optimize our current model. Strengthening efficient cooperation among different regions and hospitals, including data of diverse patients with upper ESCA will improve the clinical applicability of our current model.