Extended Data Fig. 2: Benchmarking of SCAVENGE performance using simulations with different proportions of cell-type composition, different parameter choices, and data sparsity and noise. | Nature Biotechnology

Extended Data Fig. 2: Benchmarking of SCAVENGE performance using simulations with different proportions of cell-type composition, different parameter choices, and data sparsity and noise.

From: Variant to function mapping at single-cell resolution through network propagation

Extended Data Fig. 2

a, Enrichments of monocyte count associated genetic variants in bulk hematopoietic ATAC-seq data. b, The sparsity in simulated scATAC-seq data. Similar to Extended Data Fig. 1a, the kernel density plots show (left) sparsity of peaks and (right) sparsity of cells across simulated scATAC-seq data. C2 represents the simulated dataset with two cell types and C9 represents that with. c-e, Benchmarking of SCAVENGE performance using simulations with different proportions of cell-type composition. The simulated datasets are generated in the way that is used in Fig. 2a with a variety of different cell-type proportions covered, where the relevant cells (monocytes) compose between 10% to 90% of the population with 10% as the gradient. The metrics of (c) area under the receiver operating characteristic (auROC), (d) true positive rate (TPR) and (e) false positive rate (FPR) are calculated across the simulations. Similar to Fig. 2a, the detailed comparisons for two typically unbalanced cell-type compositions were shown, where 20% monocytes (f) and 80% monocytes (g) are included. The effects of SCAVENGE performance from (h) different fractions of cells selected as seed cells and (i) different numbers of k used for network construction. The auROC (top) and TPR (middle) are calculated for SCAVENGE analysis by selecting different fractions from 1% to 20% of top-ranked cells as seed cells, of which the cell compositions are indicated (bottom). The red dashed lines indicate auROCs and TPR without SCAVENGE. j-k, Different simulated datasets are created to test if SCAVENGE is robust to data sparsity and noise. j, Bar plots depict auROC (top) and TPR (bottom) for simulated data created from different numbers of fragments subsampled from the bulk-level data. k, The bar plots depict auROC (top) and TPR (bottom) for simulated data created with different levels of noise.

Back to article page