Fig. 4: SIDISH identifies poor-survival-associated biological processes and prognostic markers in lung adenocarcinoma.

a UMAP clustering reveals cellular diversity within the tumor microenvironment, grouping cells into distinct clusters (0–9). b UMAP visualization of 4102 cells from the LUAD scRNA-seq dataset. High-risk cells identified by SIDISH are shown in red (168 cells), while background cells are depicted in gray (3934 cells). c Bar plot quantifying the distribution of high-risk cells across identified clusters. Most high-risk cells are concentrated in clusters 3 (52.4%), 2 (22.6%), and 7 (22.6%). d Volcano plot of differential gene expression analysis between high-risk and background cells. Upregulated genes are marked in pink, while downregulated genes are shown in blue. Adjusted P values were calculated using a two-tailed Wilcoxon rank-sum test. e Violin plots comparing the expression levels of key upregulated genes between high-risk and background cells, with the middle bar indicating the median expression for each subpopulation. Genes such as LDHA, ENO1, BNIP3, NDUFA4L2, VEGFA, KIT, CA9, and WFDC2. P values comparing the expression levels of the key upregulated genes between high-risk and background cells were calculated using a one-sided Wilcoxon test. f Functional enrichment analysis, including GO terms, pathways, and disease terms, highlights terms relevant to LUAD poor survival. g, h Kaplan–Meier survival curves for the GSE157009 (g) and GSE37745 (h) independent validation datasets show significantly worse survival outcomes for patients with a higher expression level of signature genes identified by SIDISH (P = 5.40 × 10−19 and P = 1.89 × 10−21, respectively). High-risk patients in pink exhibit clear stratification from background patients in gray. P values were calculated using the two-tailed log-rank-sum test to compare survival curves between high-risk and background patient groups in both cohorts.