Fig. 2: scCLEAN increases cell genetic complexity enhancing PBMC cell type characterization.
From: A CRISPR/Cas9-based enhancement of high-throughput single-cell transcriptomics

a Sankey diagram illustrating the redistribution of the proportion of aligned reads from the standard 10x-V3 workflow (left) to the workflow incorporating scCLEAN (right) separated into three buckets. The ‘Genomic’ bucket (purple) represents non-targeted intergenic reads; “Targeted Intervals” bucket (green) represents reads from scCLEAN-targeted molecules; “Informative Transcriptome” bucket (red) represents reads aligned to the transcriptome excluding targeted molecules. Percentages of each bucket are shown. b Box plots displaying the distribution of UMIs (proportion of total) corresponding to the top 50, 100, 200, 500 expressed genes. Comparisons are between a PBMC sample sequenced to ~80,000 reads per cell (>3-fold deeper), an experimental 10x-V3 sample from the same batch, a PBMC reference atlas compiled from 3 publications using scArches, an in silico modeled scCLEAN condition assuming 100% read removal, and 3 technical experimental replicates of scCLEAN. c Ridge plots comparing the library complexity measured as the ratio of median genes to median UMIs per cell. d Comparative bar plots displaying the optimized number of principal components calculated via random matrix theory to represent the biological signal identified. One scArches dataset was selected to ensure an accurate comparison between samples representing a single batch. e UMAP plots illustrating the cell types detected from query-reference mapping using the Azimuth PBMC dataset; 10x-V3 (left) with scCLEAN (right). 18 cell types with 1 uniquely identified within 10x-V3. 23 cell types with 6 uniquely identified within scCLEAN. f t-SNE clustering output derived from an unsupervised deep learning algorithm (DESC) iteratively spanning 11 different louvain resolutions starting with 0.1 and ending with 2 using 0.1 intervals. Representative t-SNE (top) clustering plots from a chosen resolution (0.8) showing 11 clusters in 10x-V3 and 17 clusters in scCLEAN with assignment probabilities shown below. g, h Metrics for integration accuracy from query-reference latent space projection using scArches. g UMAP plot for 10x-V3 (left) integration. Table for 10x-V3 (right) displaying integration metrics (graph connectivity, kBet, ASW). h Same as in (g), except showing metrics for scCLEAN.