Extended Data Fig. 2: Robust identification of cell states with two-step clustering.

(a-b) Identification of immune cell types based on marker genes of low-resolution clusters. Color scale in (b) corresponds to z-scored, log2-transformed mean gene expression counts across all cells (n = 126,351 cells total from 65 individuals). (c-d) Assessment of cluster robustness for T-cells (T) (c) and monocytes (Mono) (d) (n = 32,341 and 58,557 cells for T and Mono, respectively). Boxplots show distributions of Rand indices when comparing clustering solutions with subsampled data (20 iterations). Boxes show the median and IQR for each resolution, with whiskers extending to 1.5 IQR in either direction from the top or bottom quartile. tSNE plots show final assigned states for each cell type. (e-f) Barplots showing the fraction of each patient (e) and batch (f) in each of the 16 cell states (number of patients or batches with each state is indicated).