Extended Data Fig. 5: Strategy for sub-sampling molecular spikes to assess counting reliability across expression levels.
From: Molecular spikes: a gold standard for single-cell RNA counting

(a) Strategy for computational analysis of 10x Genomics spUMI data. Molecular spike-ins observed in only one cell barcode and covered by 10–20 sequencing reads are selected along with their associated 10x UMI sequence. spUMIs were sampled at 60 expression levels ranging from 1 to 1000 molecules for 100 in silico cells. For each ‘cell’ at each expression level, molecules were analyzed at depth of 1 to 10 reads and UMI error correction was applied. Created with Biorender.com (b,c) We quantified the spUMIs and 10x UMIs and display the mean counting difference over the 100 replicates as a contour plot depending on expression level and read coverage in absolute numbers and normalized to the mean copy number, where (b) shows uncorrected 10x UMI counts and (c) shows UMI counts after applying an error correction at hamming distance 1. In each of the contour plots, the left panel is colored by the deviation from ground truth counting on a log10 scale with a pseudocount of 1 added and the right side denotes the deviation from ground truth relative to the mean expression level.