Fig. 2: Assessing performance and robustness of SCAVENGE using simulated and real datasets.
From: Variant to function mapping at single-cell resolution through network propagation

Two single-cell datasets were simulated from the same hematopoietic bulk ATAC-seq dataset: two cell types of monocytes (Mono) and natural killer (NK) cells are included, with n = 500 cells for each (a); nine cell types of granulocyte-macrophage progenitors (GMPs), Mono, myeloid dendritic cells (mDCs), common lymphoid progenitors (CLPs), B cells, plasmacytoid dendritic cells (pDCs), CD8 T cells, CD4 T cells and NK cells are included, with n = 200 cells for each (b) (Methods). The blood cell trait of monocyte count is investigated throughout simulated and real datasets. a, The cells are ranked according to the original bias-corrected Z-score (left) and SCAVENGE network propagation score (right), respectively. The percentage of monocytes for each quarter is shown accordingly. b, The cells are ranked accordingly, and the box plots depict the trait relevance scores of cells from the second-quarter subsets before and after SCAVENGE. The box plot center line, limits and whiskers represent the median, quartiles and 1.5× interquartile range, respectively. c–h, Illustration of SCAVENGE analysis with a real hematopoietic scATAC-seq dataset. The UMAP embedding plots show Z-score (c), seed cells (d), SCAVENGE TRS obtained using seed cells in d (e), gene accessibility score of a canonical marker of monocyte (f), randomly selected seed cells by matching the number of real ones (g) and SCAVENGE TRS obtained using seed cells in g (h). c, e and h use the same color scheme, so that they can be compared across conditions.