Fig. 2: Summary of a non-obese diabetic mouse FastQ reference dataset with two strains of 6 mice, with AI (1927 cells) and BM (2249 cells). | Communications Biology

Fig. 2: Summary of a non-obese diabetic mouse FastQ reference dataset with two strains of 6 mice, with AI (1927 cells) and BM (2249 cells).

From: A realistic FastQ-based framework FastQDesign for ScRNA-seq study design issues

Fig. 2

a–c The counts of FastQ reads, cell number, and UMI in each strain and overall. Not every FastQ read belongs to a real cell barcode, so we denoised those FastQ; Next, not every denoised FastQ could confidently map to the transcriptome, only the valid FastQ read does. Later, some cell barcodes are filtered out due to quality control, left with the actual used cells with actual used UMI counts. d The UMAP of the reference dataset, which partition into 4 clusters. e The distribution of conditions in the UMAP. f The dot plot of the canonical marker in each cell subtype, dot colored with average expression, sized by the expressed percentage in the cluster. g The distribution of the number of duplications per UMI. h The trend of UMI recover rate as the read depth change, both FastQ reference and UMI matrix are measured. i The distribution of the number of read variations per cell barcode. j The benchmark test to generate one pseudo-design dataset with 50% cells and 50% of FastQ read depth, the traditional approach could achieve the goal with a combination of three tools, the proposed fastF could do it all at once. Results are measured in terms of CPU time, memory usage, and cache usage.

Back to article page