Supplementary Figure 1: Comparison of annotations of PBMC data with reference-based methods for cell type annotation.

This figure accompanies Fig. 1c. t-SNE plots of the PBMC-4K scRNA-seq data21. The plots are colored by the top annotation using the methods presented by Kang et al.22 (top left), Li et al.23 (top right) and SingleR (bottom left). Kang et al. correlated 173 differentially expressed genes learned by comparison of scRNA-seq clusters of PBMCs. Li et al. (RCA) used a bulk microarray reference dataset for correlation. SingleR introduced a fine-tuning step to refine correlation with bulk datasets. While all methods agreed on the general annotations of monocytes, T cells, B cells and NK cells, annotations differed for cellular subtypes: RCA annotated the left-most cells in cluster 4 as NK cells, while the method of Kang et al. and SingleR annotated them as CD8+ T cells, which was supported by the expression of individual genes CD3E (general T cell marker) and CD8A. It is of note that NKG7 and GNLY are commonly used as NK cell markers but are also expressed in activated CD8+ T cells. The method of Kang et al. and RCA annotated the cells in cluster 3 as CD4+ T cells, while SingleR annotated 47.7% of this cluster as naive CD8+ T cells and 30.4% as central memory CD8+ T cells (TCM). The CD8A marker, along with CCR7 and SELL (markers for naïve cells), supported this annotation. The expression of IL7R (marker for memory T cells) in some of those cells suggested that this cluster also contained memory cells, as annotated by SingleR. FOXP3 (a marker of regulatory T cells) was found only in a small proportion of Treg cells (in the sorted Treg cells, only 5.6% of cells expressed FOXP3); however, SingleR suggested that many of the cells in cluster 1 were in fact Treg cells. In this comparison, only SingleR was able to distinguish highly similar cell states, and these differential annotations were supported by gene markers viewed individually.