Fig. 7: Ablation study highlights the critical role of data integration, clinical guidance, and iterative learning in SIDISH. | Nature Communications

Fig. 7: Ablation study highlights the critical role of data integration, clinical guidance, and iterative learning in SIDISH.

From: SIDISH integrates single-cell and bulk transcriptomics to identify high-risk cells and guide precision therapeutics through in silico perturbation

Fig. 7

This analysis evaluates the contribution of core architectural components of SIDISH, including single-cell and bulk data integration, clinical signal injection, and iterative optimization. a–c Kaplan–Meier survival curves in TCGA-LUAD (a), TCGA-BRCA (b), and TCGA-PDAC (c) cohorts, comparing stratification based on biomarkers identified by SIDISH (top row) versus a bulk-only analysis (bottom row). SIDISH consistently yields more significant survival separation between high-risk (pink) and background (gray) patients. P values were calculated using the two-tailed log-rank-sum test. d–e Predictive accuracy of biomarkers, assessed by the concordance index (C-Index) in TCGA-BRCA (d) and TCGA-PDAC (e). SIDISH outperforms bulk-only derived biomarkers. Each score is based on N = 50 independent runs using different random seeds (technical replicates). Each boxes indicate the interquartile range (IQR, 25th–75th percentile), with the line inside each box representing the median. Whiskers extend to the 5th–95th percentiles. P value was calculated using a one-sided Mann–Whitney U-test. f–j Impact of clinical guidance on single-cell embeddings, comparing SIDISH (green) against simple VAE without survival guidance (brown). Clustering accuracy was evaluated using adjusted Rand index (ARI; f), normalized mutual information (NMI; g), completeness score (h), Fowlkes–Mallows index (FMI; i), and Silhouette score (j). Each box plot is based on N = 100 independent runs with different random seeds (technical replicates). Each boxes indicate the interquartile range (IQR, 25th–75th percentile), with the line inside each box representing the median. Whiskers extend to the 5th–95th percentiles. P value was calculated using a one-sided Mann–Whitney U-test. k Iterative optimization improves prognostic power in BRCA scRNA-seq data, with higher negative log-transformed P values (−log(P)) across training cycles. l, m Iterative refinement of embeddings further enhances clustering, with progressive improvements in NMI (l) and ARI (m) over four iterations. Each box plot is based on N = 100 independent runs across iterations. Each boxes indicate the interquartile range (IQR, 25th–75th percentile), with the line inside each box representing the median. Whiskers extend to the 5th–95th percentiles. P value was calculated using a one-sided Mann–Whitney U-test. P values from one-sided Mann–Whitney U-tests: *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001; n.s. not significant. Exact P values are provided in the Source Data File.

Back to article page