Extended Data Fig. 4: Comparing TCAT with COMBAT dataset clustering.

a, Fraction of proliferating cells (cell cycle usage>0.1) assigned to each subset based on the most highly used subset-associated GEPs, for cells from Covid-19 or healthy donors in the two Covid-19 datasets. Bar represents the proportion of cells and error bars represent 95% bootstrap confidence intervals around this proportion. b, Usage of selected cGEPs (columns) in cells (rows) grouped by maximum subset cGEP. Cells are drawn from subclusters with high usage of the ISG cGEP, indicated in the colorbar. c, Same as b, but only showing cells from subclusters with high cytotoxicity cGEP usage. d, Heatmap of pseudobulk expression of marker genes in cytotoxic-high and low cells and subset cGEP high and low cells, per sample. Expression is normalized by library size and z-scored across rows. e, Average fraction of polarized cells (usage>0.1) per gated subset, across samples, within COMBAT and Flu-Vaccine datasets. f, Pseudobulk expression profiles of selected marker genes in polarization-high and low cells, separately for gated CD4 and CD8s T cells, per sample. Sample expression is normalized by library size and z-scored across rows, for each polarization.