Fig. 4

101 machine learning models were constructed and validated using single-cell transcriptomic data. A. Features were selected based on the Maximal Information Coefficient (MIC), and predictive models were constructed using various machine learning algorithms. Model performance was evaluated using the concordance index (C-index). B. Survival model was built using the Random Survival Forest (RSF) algorithm. The variable importance plot highlights genes with both positive and negative contributions to the model. C. LASSO coefficient path plot showing the trajectories of regression coefficients as the regularization parameter λ varies. D. LASSO cross-validation curve displaying model error across log(λ) values. The optimal λ, selected by 10-fold cross-validation, minimized the error and led to the identification of 6 key genes. E. Multivariate stepwise Cox regression was subsequently performed for further feature refinement, resulting in 5 independent prognostic genes. F–G. Kaplan-Meier survival curves and ROC curves of the five selected genes in the training set. H–I. Corresponding Kaplan-Meier survival curves and ROC curves in the test set to validate prognostic performance. J. Cell clustering results from single-cell transcriptomic data, illustrating the distribution of distinct cell subpopulations. K. Comparison of cell subpopulation proportions between high- and low-risk groups. L. Metabolic pathway enrichment analysis comparing high- and low-risk groups, revealing potential metabolic differences.