Fig. 2: SCOOP can pinpoint cell subsets that likely give rise to different tumors.
From: Learning the cellular origins across cancers using single-cell chromatin landscapes

a Left: UMAP of normal colon single-cell assay for transposase-accessible chromatin using sequencing (scATAC-seq) data31 colored by cell annotations (n = 43,626 cells; 27 samples). Middle: Intestinal epithelial cell regenerative hierarchy. Regenerative hierarchy created in BioRender. Tsankov, A. (2025) https://BioRender.com/axbxmm9. Right: Visualization of the Pearson correlation coefficient (r) between aggregated microsatellite stable (MSS) colorectal cancer (CRC, n = 51) single-nucleotide variants (SNV) profile and colon scATAC-seq meta-cells (Methods). The strongest anti-correlations (r ≈ −0.79, red/bottom end of the scale) are concentrated in stem cells, while the weakest anti-correlations (r ≈ −0.675, blue/top end of the scale) occur in enterocytes. b Box plots of the feature importance distribution (100 SCOOP runs) of the top 5 cell-of-origin (COO) predictions amongst normal colon cell subsets28,29,31 for MSS CRC (n = 51; predicted COO in red). Each cell subset is followed by a dataset indicator for that cell subset: D1 for28, D2 for29, D3 for31. Also displayed is the number of times a cell subset appeared in the top 5 features across 100 runs (n). One-sided Mann-Whitney test p-value is displayed. c Top Left: UMAP of all peripheral blood mononuclear cells (PBMC) and bone marrow scATAC-seq cell subsets from32 (n = 33,513 cells; 10 samples). Bottom Left: Hematopoietic regenerative hierarchy. Regenerative hierarchy created in BioRender. Tsankov, A. (2025) https://BioRender.com/24n2gzz. Right: Visualization of the Pearson correlation coefficient (r) between aggregated chronic lymphocytic leukemia (CLL, n = 90) and acute myeloid leukemia (AML, n = 13) SNV profiles and scATAC-seq meta-cells. For CLL, the strongest anti-correlations (r ≈ −0.63, red/bottom end of the scale) are observed in B cells, while the weakest anti-correlations (r ≈ −0.51, blue/top end of the scale) occur in PBMC T cells. For AML, highest anti-correlations (r ≈ −0.38) are enriched in myeloid (e.g., granulocyte-monocyte progenitors, or GMP, GMP/Neutrophils, or GMP.Neut, common myeloid progenitor and lymphoid-primed multipotent progenitor, or CMP.LMPP) and erythroid progenitor populations, whereas lymphoid lineages (B, T, and NK cells) tend to have the lowest anti-correlations (r ≈ −0.30). d Box plots of the feature importance distribution (100 SCOOP runs) of the top 5 COO predictions amongst blood and bone marrow cell subsets from dataset D432 for CLL (n = 90) and AML (n = 13) aggregated SNV profiles (predicted COO in red, similar cell subsets in pink). Also displayed is the number of times a cell subset appeared in the top 5 features across 100 runs (n). One-sided Mann-Whitney test p-values are displayed, where Bonferroni correction for multiple hypothesis testing was used when more than one comparison was made. Cell type abbreviations are listed in Supplementary Data 3. Box plot vertical lines show 25th, 50th (median), and 75th percentiles, with horizontal whiskers extending to a maximum distance of 1.5 × interquartile range from the hinge. Data beyond the whisker ends are plotted individually.