Fig. 1: Evaluation of scCLEAN performance on human tissues and cell types. | Nature Communications

Fig. 1: Evaluation of scCLEAN performance on human tissues and cell types.

From: A CRISPR/Cas9-based enhancement of high-throughput single-cell transcriptomics

Fig. 1

a Schematic representing the scCLEAN-mediated removal of abundant sequences from scRNAseq libraries. A single sgRNA pool was constructed from four unique sgRNA pools: (1) genomic intervals (teal), (2) non-polyadenylated rRNA (seafoam), (3) 90 ribosomal nuclear-encoded protein-coding and 10 mitochondrial genes (Ribo/Mito; purple), and (4) 155 non-variable genes (NVG; light purple). b Distribution of the percentage of reads aligning to targeted regions after iteratively filtering reads corresponding to each of the four sgRNA guide-sets across 14 datasets. The median percentage breakdown is as follows: rRNA = 10%, Ribo/Mito = 34%, Genomic Intervals = 9%, and NVG = 5% for a cumulative sum of 58%. c–e Analysis of the 255 gene panel using the Tabula Sapiens Consortium corresponding to 161 unique cell types across 24 organs. c Proportion of 255 targeted genes (red) from the total transcriptome across seven bins ranked according to (normalized variance) ranging from “Not Variable” to “Cell-Type Specific”. Each bin contains an equal interval length between the minimum and maximum variances. The sum of genes in each bin were tallied and proportions of the 255 genes in each bin were evaluated. d Same as in c, except genes were ranked by mean gene expression (log(x + 1) normalized) and binned according to normalized mean gene expression ranging from “Lowest Expression” to “Highest Expression.” e GSEA of the 255 gene panel within the Tabula Sapiens dataset between all 161 cell-types. Genes were ranked by normalized variance with the highest variance genes ranked at the top of the list. The normalized enrichment score (NES) is shown along with the p value and false discovery rate. The location of the 255 targeted genes in the ranked list are indicated as blue dashes. f Gene scatter plot of biological variability (normalized variance) versus mean expression (log(x + 1) normalized). Dotted black line indicates zero variance. The top 8 genes within the 255 targeted gene panel are annotated with variance > 1. g Table summarizing the counts of tissue-specific genes from the 255 gene panel obtained from GTEx. Tissue-specific genes were quantified using the extended tau score metric and the intersection with the 255 gene panel were counted per tissue. Box plots depict the minimum, first quartile (Q1), median, third quartile (Q3), and maximum values for each group. Whiskers extend to the minimum and maximum values within 1.5×IQR from the quartiles. Detailed statistical values are available in Supplementary Data 3. Created in BioRender. Pandey, A. (2025) https://BioRender.com/g46i293.

Back to article page