Extended Data Fig. 9: Extended analysis of potency programs and genes. | Nature Methods

Extended Data Fig. 9: Extended analysis of potency programs and genes.

From: Improved reconstruction of single-cell developmental potential with CytoTRACE 2

Extended Data Fig. 9

a, Same as Fig. 2b but separated by cohort, species, cellular system (three general categories shown for clarity), and scRNA-seq platform. The embedding in Fig. 2b is shown as a reference in the upper left. Colors denote potency scores (same as the color bar in Fig. 2b, top) for reference and cohort-stratified embeddings. b, Heat map depicting pairwise similarity of gene sets learned by CytoTRACE 2 across all 19 ensemble models from leave-one-out cross-validation on the 19-dataset training cohort. Overlap was quantified by Jaccard index and stratified into gene sets with positive (left, n = 1,490) and negative weights (right, n = 1,246); gene set polarity was determined as described in “Interpretability,” Methods. c, Same as Fig. 2e but showing the consistency between CytoTRACE 2 multipotency markers and hematopoietic stem cell (HSC) knockout (KO) phenotypes across a range of top \(k\) markers, whether positive or negatively associated with multipotency (\(k\) = 50, 100, 200, and 500). GSEA statistics are expressed as directed –log10 Q values. Statistical significance between groups was determined using a two-sided unpaired Wilcoxon test. Box center lines, bounds of the box, and whiskers denote medians, 1st and 3rd quartiles, and minimum and maximum values within 1.5 × IQR (interquartile range) of the box limits, respectively. d, Same as c, but showing the median directed –log10 Q value across all top \(k\) markers shown in c, stratified by positive and negative markers, and extended to all potency categories in the CytoTRACE 2 feature importance matrix (Supplementary Table 15). e, Enrichments of selected gene sets from MSigDb in the CytoTRACE 2 feature importance matrix (Fig. 2a, right; Supplementary Table 15). Bubbles are colored by signed –log10 adjusted p-values (adjusted for multiple comparisons) calculated by GSEA, where the sign is determined by the direction of association between the genes and the potency category. All –log10 adjusted p-values, including those exceeding the color bar range, are provided in Supplementary Table 17. Bubble sizes are proportional to unsigned –log10 adjusted p-values within the color bar.

Back to article page