Fig. 1: Direct assessment of single-cell RNA counting using molecular spikes.
From: Molecular spikes: a gold standard for single-cell RNA counting

a, Schematic of cloning strategy of molecular spikes, where an oligonucleotide library is inserted into a molecular spike entry vector, and the vector pool is linearized and in vitro transcribed to generate a pool of molecular RNA spike-ins. b, Coordinates of molecular spikes in basepairs (bp), with inbuilt UMI in the 5′ or 3′ end. c, 5′ molecular spike complexity estimated by fitting a nonlinear asymptotic model (dotted line) to unique spUMI sequences observed as a function of the number of spUMIs sequenced across cells (blue line). d, Scatter plot showing error-corrected (hamming distance (HD) 1) Smart-seq3 RNA counts (y axis) against the number of spiked molecules (x axis) ranging from 1 to 100 spiked molecules per cell. Data from HEK293FT cells (n = 48 cells). e, Scatter plot showing number of spiked molecules (x axis) against error-corrected RNA counts (hamming distance 1) for data generated with variations to the Smart-seq3 protocol, that utilize cDNA cleanup before amplification (0.1 µM FWD) or without cleanup and therefore remaining TSO with different concentrations of FWD primer. Data from 39 cells or more are shown per condition. f, Scatter plot showing number of spiked molecules (x axis) against error-corrected RNA counts (hamming distance 1) for 10x Genomics (v.2) data (n = 955 cells). g, Scatter plot showing number of spiked molecules (x axis) against error-corrected RNA counts (hamming distance 1) for data generated with variations to the SCRB-seq and tSCRB-seq protocols. Standard SCRB-seq (green, 53 cells), excluding exonuclease I treatment (red, 77 cells) and direct PCR (tSCRB-seq) (blue, 90 cells). h, Percent counting error (observed/true) for in RNA counts generated with variations to the SCRB-seq and tSCRB-seq protocols. Solid line denotes the mean over cells per condition with the shaded area representing the standard deviation colored by experimental conditions. Direct PCR (tSCRB-seq) (90 cells), No exonuclease I (77 cells) and standard protocol (53 cells). The dotted line represents the expected overcounting if every sequenced read corresponds to a new UMI observation.