Extended Data Fig. 5: Transcriptomic typing using BARseq2.

a,b, Slc30a3 expression in excitatory neurons with or without Cdh24 expression in single-cell RNAseq (a) from Tasic, et al.3 or in BARseq2 (b). A cell is considered expressing Cdh24 if the expression is higher than 10 RPKM in RNAseq or 1 count in BARseq2. Red crosses indicate means and green squares indicate medians. c, Expression density (means and individual data points) across laminar positions for the indicated genes. n = 3 slices for the three-gene panel and n = 5 slices for the 65-gene panel. d, Precision and recall of cell typing using the marker gene panel across nine single cell datasets. N = 9 independent datasets shown in (e). In each box, the center shows the median, the bounds of the box show the 1st and 3rd quartiles, the whiskers show the range of the data, and points further than 1.5 IQR (Inter-Quartile Range) from the box are shown as outliers. e, Breakdown of average performance for each cell type in each dataset. The datasets are: scSSALM and scSSV1 are single cell SmartSeq datasets from ALM and V1 respectively3. All other datasets are BICCN M1 datasets23 and the name indicates the technology used (sc = single cell, sn = single nuclei, Cv2/3 = Chromium v2/3, SS = SmartSeq). f, Average cell typing performance for six normalization strategies. N = 9 independent datasets shown in (e). The box plots are generated in the same way as (d). g, Confusion matrix showing overlap between prediction and annotations, normalized by predictions. This plot emphasizes precision; it indicates the probability that a given prediction was correct. h, Confusion matrix showing overlap between prediction and annotations, normalized by annotations. This plot emphasizes recall; it indicates the probability that a given annotation was recovered.