Figure 2

CA adaptations to address overdispersion in count data. (A) Table summarizing the standard CA procedure and five adaptations to address overdispersion. The first set (row 1 and 2) include methods that involve no transformations apart from computing chi-squared residuals. The second set (rows 3–5) feature variance-stabilizing transformations performed on counts prior to standard CA. The third approach (row 6) smooths the chi-squared residual matrix with a minor “power deflation” prior to decomposition with SVD. (B) Table of NNGraph cluster recovery performance achieved by each method (rows), in nine datasets (columns), reporting the maximum ARI selected across a range of PCs (full results of ARI by PC shown in Fig. 2C), with ARI from ten runs of glmPCA were averaged prior to selecting the maximum. Highest ARI (to two decimal places) in each dataset is circled, and the cell clusters in the original datasets are used as the reference groupings. Freeman–Tukey residuals exhibit the best overall performance, with the highest ARI in 6 of the 9 datasets. (C) Plot of ARI by number of components in each of nine datasets (same as B), colored by method. Results for glmPCA (gray) include ten seeds.