Fig. 1: scTab enables organism-wide, scalable, and robust cell type classification on single-cell RNA-seq data.
From: scTab: Scaling cross-tissue single-cell annotation models

a Treemap plot showing the dataset composition across cell types and tissues. The outer rectangles correspond to the number of donors per tissue, the inner boxes correspond to the number of donors for each cell type and the color scale highlights the number of unique cell types per tissue (Supp. Fig.Ā 5b shows the number of unique cell types grouped by Human Cell Atlas bionetworks). The dataset spans 22.2 million cells, 5,052 donors, 249 datasets, 164 cell types, and 56 different tissues. See Supp. Fig.Ā 5 for more detailed summary statistics of the scTab data corpus. b scTab architecture (Methods): after input feature normalization, scTab encodes data via a feature transformer and selects relevant input features through feature attention. (FC: fully connected layer, BN: batch-norm layer, GLU: gated-linear-unit, ReLU: rectified-linear-unit). c Comparison of classification performance (macro F1-score) of linear reference models (CellTypist (subsampled to 1.5 million cells), optimized linear) and nonlinear models (scTab, XGBoost, MLP (multi-layer perceptron)). Data are presented as mean valuesā±āSD. Source data are provided as a Source Data file. d Classification performance (macro F1-score) grouped by organ system of scTab and the optimized linear model. Data are presented as mean valuesā±āSD. Source data are provided as a Source Data file. e Cross-entropy loss and macro F1-score on the validation set plotted after each epoch for scTab and the optimized linear model. Data are presented as mean valuesā±ā95% CI. f tSNE plots of raw features and the learned features of scTab with the top 70 most frequent cell types superimposed on the holdout test set. g F1-score per cell type plotted against the number of unique cells observed per cell type for scTab. The histogram on the y-axis shows the distribution of F1-scores and the histogram on the x-axis shows the distribution of unique cells per cell type.