Fig. 4: Constructing machine learning prediction models based on hubgenes.

a Performance of 101 machine learning models: This table presents the evaluation of 101 machine learning models, including various regression and classification algorithms, based on their concordance index (C-index) across multiple datasets. The model StepCox[both] + Lasso (highlighted in the red box) shows the best performance with the highest C-index values in both training and validation cohorts. The table also includes results from other models, such as random survival forests (RSF), support vector machines (SVM), and others, demonstrating their predictive abilities. b Kaplan–Meier survival analysis for StepCox[both] + Lasso (Dataset1): The Kaplan–Meier survival curve illustrates the survival probabilities of patients stratified by the predicted risk from the StepCox[both] + Lasso model in Dataset1. High-risk (red) patients show significantly lower survival compared to low-risk (gray) patients (p < 0.001, hazard ratio = 2.23, 95% CI: 1.74–2.65). c Kaplan–Meier survival analysis for StepCox[both] + Lasso (Dataset2): The survival analysis for StepCox[both] + Lasso in Dataset2 shows similar trends, with high-risk patients (red) having worse survival outcomes compared to low-risk patients (gray) (p < 0.001, hazard ratio = 1.82, 95% CI: 1.26–2.62). d Kaplan–Meier survival analysis for StepCox[both] + Lasso (Dataset3): In Dataset3, the survival analysis shows a statistically significant difference between high-risk and low-risk groups, with a p-value of 0.007 and Hazard Ratio = 2.11 (95% CI: 1.18–3.77), indicating the robustness of the StepCox[both] + Lasso model across datasets. e Meta-analysis of univariate Cox regression: This meta-analysis table summarizes the hazard ratios (HR) and p-values from univariate Cox regression across three datasets. The model StepCox[both] + Lasso shows consistent results across all cohorts, with a pooled hazard ratio of 2.25 (95% CI: 1.68–2.73), demonstrating strong prognostic ability across different populations. f 1-, 2-, and 3-year survival prediction AUC: The bar plots show the Area Under the Curve (AUC) for survival prediction at 1, 2, and 3 years using the StepCox[both] + Lasso model across three datasets. The model achieves high AUC values, with 1-year survival AUC = 0.74, 2-year AUC = 0.73, and 3-year AUC = 0.68 in Dataset1, indicating reliable performance in long-term survival prediction.