Introduction

Colorectal cancer (CRC) is the third most common malignant tumor globally and the second leading cause of cancer-related deaths1. Although the incidence of colorectal cancer in the elderly population (> 50 years old) has declined due to widespread screening and advances in treatment, the incidence of early-onset colorectal cancer (EOCRC, diagnosed at ≤ 50 years of age) has rapidly increased globally2, exhibiting a significant trend of younger age onset. Multiple studies have shown that patients with EOCRC are more likely to be diagnosed at advanced stages (III/IV) and have a higher incidence of distant metastasis3,4. EOCRC patients tend to have more aggressive tumors (e.g., higher proportions of mucinous or signet-ring cell carcinoma), which may increase the risk of liver metastasis5. Moreover, EOCRC exhibits higher genetic heterogeneity (with a higher proportion of MSI-H and enriched specific gene mutations), stronger immunogenicity (elevated TMB and PD-L1 expression), and unique activation of signaling pathways related to the immune microenvironment6. In contrast, late-onset colorectal cancer (LOCRC) relies more on traditional driver gene mutations and chronic inflammation-related pathways. LOCRC predominantly exhibits microsatellite stability (MSS), with driver mutations such as KRAS and BRAF being more common7. The differences between EOCRC and LOCRC suggest that early-onset liver metastasis is characterized by greater genetic heterogeneity and immune evasion, while late-onset cancer is more driven by traditional carcinogenic pathways.

The treatment of early-onset colorectal cancer liver metastasis (EOCRC-LM) is characterized by multidisciplinary collaboration, with the combined use of surgery and chemotherapy being the current core strategy. In terms of surgery, radical liver resection remains the primary means of achieving long-term survival. However, approximately 80% of patients are unable to undergo direct surgery at diagnosis due to tumor burden or insufficient liver function reserve8,9. For initially unresectable cases, conversion therapy using chemotherapy regimens (e.g., FOLFOX/FOLFIRI combined with targeted drugs) can provide 15–40% of patients with the opportunity for a second-stage surgery10. EOCRC exhibits more aggressive tumor biology, often presenting with high lymph node positivity and synchronous liver metastasis11. However, retrospective studies show that after neoadjuvant chemotherapy and downstaging, the 5-year survival rate of EOCRC-LM patients undergoing R0 resection can reach 35–50%, with no significant difference compared to late-onset patients12. Furthermore, the application of chemotherapy must balance efficacy with toxicity. Long-term chemotherapy may lead to chemotherapy-associated liver injury (CALI), such as sinusoidal obstruction and steatosis, which increases the risk of posthepatectomy liver failure (PHLF)13. For younger patients, clinicians often tend to use more intensive chemotherapy regimens, but existing evidence suggests that this age-related bias does not lead to clear survival benefits14. Neoadjuvant chemotherapy (NAC) can reduce early recurrence rates in specific high-risk patients, while adjuvant chemotherapy can prolong disease-free survival, especially in patients with synchronous liver metastasis15. Advances in minimally invasive surgery (laparoscopic/robotic) have reduced perioperative risks during simultaneous resection of primary and liver metastatic lesions16, while the combination of molecular targeted drugs and chemotherapy has further optimized systemic treatment outcomes17. Currently, there is controversy regarding the survival benefit of resecting the primary tumor in patients with unresectable liver metastasis, as this benefit remains unclear18. A comprehensive evaluation, including tumor burden and genetic status (e.g., RAS mutations), is necessary.

This study aims to analyze the impact of primary tumor resection combined with chemotherapy on survival in early-onset colorectal cancer patients with liver metastases using the SEER database, develop a survival prediction model, provide individualized treatment guidance for clinicians, assess which patients may benefit from this treatment strategy, and support decision-making with data-driven evidence.

Materials and methods

Patient selection

The data for this study was sourced from the Surveillance, Epidemiology, and End Results (SEER) database (www.SEER.cancer.gov), which includes 17 registries, as of November 2023 (covering data from 2000 to 2021). Patient data was downloaded using SEER*Stat version 8.4.4.Since SEER does not include personal identifying information, this study does not require approval from an ethics committee or informed consent. A total of 23,798 patients under the age of 50 were downloaded from the data, with patients whose information was invalid or incomplete excluded from the study. Ultimately, 1049 patients were included and randomly divided into a training group and a validation group in a 7:3 ratio. The data screening process is shown below (Fig. 1).

Fig. 1
figure 1

Flowchart of the selection of patients for inclusion in EOROC-LM in this study.

Statistical analysis

In this study, all statistical analyses were performed using R software (version 3.6.1), with a P value < 0.05 (two-tailed) considered statistically significant. All patients were randomly divided into a training group and a validation group using R software. The distribution of variables between the two groups was compared using the Chi-square test or Fisher’s exact test. In the training group, the least absolute shrinkage and selection operator (LASSO) regression analysis was used to identify relevant prognostic risk factors from the clinical variables. Multivariable Cox regression was then used to further examine non-zero coefficient variables in order to determine independent prognostic factors for EOCRC-LM patients who underwent primary tumor resection combined with chemotherapy. Additionally, a new diagnostic nomogram was constructed using the “rms” package based on independent risk factors. The newly established nomogram model was then evaluated using the validation group, generating a receiver operating characteristic (ROC) curve and calculating the corresponding area under the curve (AUC) to assess its discriminatory ability. In addition, calibration curves and decision curve analysis (DCA) were used to evaluate the performance of the nomogram. Based on the nomogram score, patients were classified into low- and high-risk groups, and the risk model was visualized using “ggrisk.”

Result

Characteristics of included cases

Based on the inclusion criteria, a total of 1049 cases were included in this study, with 734 cases randomly assigned to the training group and 315 cases to the validation group. Among all patients, 45.1% were male and 54.9% were female. Of the patients, 33.6% were unmarried, 54.5% were married, and 11.9% were widowed or divorced.Rectal cancer accounted for 18.5%, and colon cancer for 81.5%, with the highest proportion being left-sided colon cancer, which accounted for 51.7%. Bone metastasis was present in 2.9%, and lung metastasis in 15.6%. Among them, 4.2% of patients were aged ≤ 29 years, while 95.8% were aged between 30 and 49 years. Tumor stages T1-T4 accounted for 3.3%, 3.6%, 59.3%, and 33.7%, respectively.143 patients (13.6%) had a history of radiotherapy, and 622 patients (59.3%) were CEA-positive. No significant differences were found between the training group and the validation group for each included variable (Table 1).

Table 1 Baseline demographic and clinical characteristics of EOCRC-LM patients.

Nomogram establishment

The LASSO regression model initially selected 14 variables as prognostic risk factors for EOCRC-LM patients who underwent primary tumor resection combined with chemotherapy (Fig. 2). To enhance the interpretability of the model, only non-zero coefficient variables selected based on the minimum one standard error criterion were retained. Ultimately, nine risk factors—marital status, primary tumor location, T stage, N stage, CEA, number of lymph nodes dissected during surgery, history of radiotherapy, bone metastasis, and lung metastasis—were used in multivariate Cox regression analysis for overall survival (OS) (Table 2) to determine independent prognostic factors for OS. In the multivariate OS analysis, marital status, primary tumor location, T stage, N stage, CEA, number of lymph nodes dissected during surgery, bone metastasis, and lung metastasis were significantly associated with patient OS. Therefore, based on the independent prognostic risk factors from the multivariate analysis, we constructed OS nomogram models for 2-year, 3-year, and 5-year survival (Fig. 3).

Fig. 2
figure 2

LASSO coefficient profiles of the 14 candidate predictors for OS (a), and plots of partial likelihood deviance for OS (b).

Table 2 Multivariate cox regression model analysis of OS for nomogram.
Fig. 3
figure 3

Nomograms for predicting the OS of EOCRC-LM underwent primary tumor resection and adjuvant chemotherapy.

Nomogram validation

The calibration curve of the OS prediction model showed excellent consistency between predicted and actual risks, closely following the diagonal, confirming their high predictability and accuracy (Fig. 4). The time-dependent ROC curve highlighted the model’s sensitivity and specificity. In the training set, the AUC for the OS model at 2, 3, and 5 years was 0.70, 0.72, and 0.77, respectively; in the validation set, the AUCs were 0.68, 0.71, and 0.80, respectively (Fig. 5). Additionally, decision curve analysis indicated that, compared to traditional TNM staging, these predictive models offer better clinical applicability (Fig. 6).

Fig. 4
figure 4

Calibration curves for Nomograms. (a–c) Calibration curves for 2-, 3-, and 5-year OS in the Nomogram prediction training group, (d–f) Calibration curves for 2-, 3-, and 5-year OS in the Nomogram prediction validation group.

Fig. 5
figure 5

The receiver operating characteristic (ROC) curves of Nomograms. (a–c) ROC curves of the training group for 2, 3, and 5 years. (d–f) The ROC curves of the validation group for 2, 3, 5 years.

Fig. 6
figure 6

Decision analysis curves (DCA) for Nomogram. (a–c) DCA curves for 2, 3, and 5 years for training group. (d–f) DCA curves for 2, 3, and 5 years for validation group.

Risk stratification and visualization of predictive models

This study constructed a predictive model for EOCRC-LM patients who underwent primary tumor resection combined with chemotherapy based on LASSO regression and Cox proportional hazards models, and visualized the model’s risk stratification and variable contributions using the “ggrisk” package (Fig. 7). The survival rate of patients in the high-risk group (red) was significantly lower than that of the low-risk group (log-rank p < 0.001), indicating that the model has clinical applicability.

Fig. 7
figure 7

Risk score association plot for the training group (a);Risk score association plot for the validation group (b).

Discussion

In this study, 8 independent prognostic factors were selected from 14 prognostic factors to construct a prognostic nomogram for EOCRC-LM patients who underwent primary tumor resection combined with chemotherapy. This suggests that the prognosis of these patients is likely to be related to a greater number of variables. The 2-, 3-, and 5-year AUC values and the calibration curve of this model indicate that the risk model has good discriminatory ability,and can be used to provide individualized treatment for EOCRC-LM patients by combining multiple risk factors, selecting those who will benefit from primary tumor resection combined with chemotherapy. Age was not included as an independent prognostic risk factor during model construction. First, multiple studies have shown that after adjusting for molecular markers (such as RAS/BRAF mutation status), the survival difference between early-onset (< 50 years) and late-onset patients is no longer significant19,20. This suggests that the prognostic value of age may be masked by the tumor’s biological characteristics. Secondly, immune microenvironment analysis shows no significant difference in immune cell infiltration in liver metastases between younger and older patients21. Furthermore, molecular-level heterogeneity (such as Claudin-2 expression) may drive metastasis more than the effect of age itself22. Additionally, treatment-related factors such as chemotherapy responsiveness and timing of primary tumor resection have a more direct impact on prognosis23. These data collectively suggest that age may indirectly affect prognosis by mediating other biological or treatment-related variables, rather than being an independent risk factor. Multiple studies have confirmed that marital status is an independent protective prognostic factor for colorectal cancer patients, significantly associated with earlier diagnosis and longer survival24. Marital support may improve prognosis by enhancing treatment adherence and providing psychological support, and our analysis also supports this finding. In T1-stage early-onset colorectal cancer liver metastasis patients, a paradoxical phenomenon was observed: despite the primary tumor being early-stage, their prognosis was worse than that of T3/T4 stage metastatic patients. This may be related to T1-stage tumors being more prone to occult metastasis or having distinct biological behaviors25. These patients often present with characteristics such as rectal origin, high differentiation, few lymph node metastases, and elevated CEA levels26, indicating the need for a targeted prognostic model to optimize treatment strategies.

The study found significant differences in the impact of the tumor’s primary site on prognosis. First, right-sided colon cancer (R-CC) is more commonly associated with RAS/BRAF mutations and mismatch repair deficiency (dMMR), leading to increased invasiveness and poor prognosis27,28. Left-sided colon cancer (L-CC) and rectal cancer (ReC) are predominantly associated with TP53 mutations, but rectal cancer has a higher risk of recurrence29,30. Additionally, left-sided tumors have a higher rate of liver metastasis, whereas right-sided tumors are more likely to have multifocal metastases and mucinous adenocarcinoma, leading to worse prognosis. Rectal cancer, however, has a higher risk of local recurrence, and its recurrence-free survival rate is lower than that of left-sided colon cancer31. Thus, the primary tumor site affects prognosis through molecular heterogeneity and metastatic patterns. The number of lymph nodes dissected is one of the key factors influencing prognosis. For patients undergoing primary tumor resection combined with chemotherapy, adequate lymph node dissection not only helps with accurate staging but also significantly improves survival outcomes32. Lymph node metastasis status (e.g., lymph node ratio, LNR) is closely related to liver metastasis burden. A high LNR (e.g., ≥ 0.25) indicates a poorer overall survival rate33. Inadequate lymph node dissection may result in undetected occult metastasis, thereby increasing the risk of postoperative recurrence. It is noteworthy that neoadjuvant chemotherapy may reduce the number of detectable lymph nodes, but standardized dissection can still maintain the accuracy of prognostic assessment. Additionally, CEA positivity significantly impacts patient prognosis. In I-II stage patients who did not receive adjuvant chemotherapy, increasing CEA levels can independently predict the risk of recurrence34. For patients undergoing liver resection combined with chemotherapy, abnormal CEA levels not only suggest a higher tendency for liver metastasis but are also associated with a decrease in 3-year disease-free survival rate35. It is worth noting that dynamic changes in CEA levels (e.g., reduction after chemotherapy) can reflect the tumor’s sensitivity to systemic chemotherapy (e.g., FOLFIRI combined with targeted therapy) and predict survival benefits36. Therefore, the prognostic model includes CEA as a stratification variable, combined with the primary tumor site, allowing for more accurate long-term prognosis assessment of EOCRC-LM patients.The presence of bone and lung metastasis greatly influences the survival prediction for EOCRC-LM patients, as metastatic disease is a key prognostic factor37.Patients with bone or lung metastasis have significantly lower survival rates, further highlighting the aggressiveness of metastatic EOCRC, which requires accurate prediction to optimize systemic treatment and assess the potential for palliative therapy38.

In the current treatment strategies for EOCRC-LM, combined primary tumor resection (PTR) and chemotherapy has become an important area of research. Existing evidence suggests that PTR can improve survival in patients with synchronous metastasis, particularly for those with liver metastases that remain unresectable after chemotherapy. PTR may prolong overall survival (OS) by reducing tumor burden and metastatic potential39. However, the extent of its benefit is significantly influenced by molecular characteristics (such as RAS/BRAF mutation status)40. Regarding chemotherapy, neoadjuvant treatment (neoCTx) can increase the rate of liver resection, but its liver injury risk must be balanced with survival benefits41. The lack of early efficacy evaluation tools (such as continuous variable models based on tumor diameter and lesion count) may delay treatment plan adjustments for non-responders42, It is important to note that the synergistic mechanisms between PTR and chemotherapy still need further exploration, including biological processes such as the reactivation suppression of dormant tumor cells (e.g., FBX8 pathway) and liver microenvironment remodeling (e.g., Claudin-2-mediated metastatic colonization). Future directions should focus on multicenter validation of dynamic predictive models and explore the potential for combining immunotherapy with existing treatment regimens22,43.

This study integrated multiple variables, enabling the model to provide dynamic survival predictions, suitable for follow-up management at different time points. Additionally, the intuitive scoring system is easy to implement in clinical practice. Validation with the calibration curve showed that the model exhibits good predictive performance and stability. High-risk patients (e.g., CEA-positive) can be identified through the model, optimizing adjuvant treatment plans;while low-risk patients can avoid overtreatment.

Limitions

Although this study model relies on the extensive case resources of the SEER database, its clinical information dimensions still have significant limitations, providing clear directions for the improvement of future research. Specifically, the database lacks key parameters related to the diagnostic and treatment process, such as the number and extent of liver metastases, the choice of chemotherapy regimen for liver metastasis, the number of treatment cycles, adjustments to drug doses, and the use of targeted therapies. This missing core treatment data may limit the reliability of the model’s evaluation results. It is noteworthy that the database has not yet integrated molecular pathological features with important prognostic value in colorectal cancer diagnosis and treatment (such as KRAS/BRAF mutation status and microsatellite instability). These biological markers have been shown to be closely related to tumor invasiveness and treatment sensitivity. Incorporating these into the multivariate analysis model should be a key breakthrough in future research.

Conclusion

In this study, we identified independent risk factors for (EOCRC-LM patients who underwent primary tumor resection combined with chemotherapy based on the SEER database, and constructed and validated a survival prediction model for these patients. The model demonstrates good accuracy and potential clinical applicability.