Fig. 3: Case study I: dsb improves interpretation of cell clusters derived from protein-based and joint mRNA–protein clustering.
From: Normalizing and denoising protein expression data from droplet-based single cell profiling

a UMAP plot of single cells labeled by cluster number (clustering was performed using dsb normalized protein values). b The distribution of protein expression of cluster 4 (highlighted with a gray box in (a)) using CLR (across cells) or dsb for normalization. c Median log + 1 protein levels (left) and CLR transformed across cells (as in (b), right) in cells from cluster 4 versus the level in empty droplets; proteins highlighted in red are comparable in expression to “positive” proteins after log transformation (left) and CLR transformation across cells (right) but are similar to background levels in empty droplets (identity line y = x shown in black). All proteins with median log10 expression >1 but <3.5 after dsb normalization are labeled with the protein name. d Similar to (c), but the y-axis shows the median dsb normalized values; proteins in red (those near the diagonal in (c)) are now residing below our uniformly applied dsb positivity threshold of 3.5, reflective their proximity with mean counts in empty droplets; proteins above the red line have median dsb normalized expression within the highlighted cluster 4 (see (a) and (b)) above 3.5, i.e., 3.5 standard deviations above ambient noise, ±adjustment for the cell intrinsic technical component. e The dsb normalized value vs. the median value in empty droplets of proteins within a subset of protein-defined clusters. A subset of proteins informative for cluster identification from B cell and dendritic cell subsets with a dsb value above 3.5 (red line) are annotated with the protein name within each panel and are labeled in red when below 3.5 within each subset. Proteins labeled for B cell subsets (C13: Unswitched B cells, C5 Transitional B cells, C12 Switched B cells) include B cell proteins CD20, CD19, IgD, and IgM, proteins labeled for the dendritic cell subsets (C16: pDC, C14: mDC) include innate cell markers CD1c, CD1d, CD34, CD14, CD16, and CD303. f UMAP plot of the same cells shown in (a) but the UMAP embeddings and clusters here were derived using Seurat’s weighted nearest neighbor (WNN) mRNA-protein multimodal algorithm applied to dsb normalized values. g Similar to (d) but derived using cells from WNN cluster 3; Pearson correlation coefficient and p value (two sided) are shown between median dsb normalized values and the 98th percentile expression value (log10) of the same protein in empty droplets. h Similar to (g) but for CLR normalized values. i. Differentially expressed genes (ROC test; see the “Methods” section) for cell in cluster 3 vs. other clusters.