Extended Data Fig. 8: Illustrations of percentage variance analysis, comparison to LASSO, and batch effect analysis.
From: Spatiotemporal dissection of the cell cycle with single-cell proteogenomics

a, Percentage variance explained for RNA is shown against the Gini index for the gene (blue, CCD; red, non-CCD). b, Randomization analysis of the protein IF data (left) and scRNA-seq data (right) for each gene was used to determine whether a protein or RNA was CCD (blue) or non-CCD (red). The significance scores, adjusted for multiple correction, on the vertical axis show that nearly all proteins and RNAs are significantly different from random, and so requiring 8% additional percentage variance explained by the cell cycle over random was the predominant cutoff. c, Examples of NFAT5 protein IF data (blue points and trace) and randomizations of cell order in pseudotime (red points and trace). These examples provided (from left to right) produced the minimum, first quartile, median, third quartile, and maximum percentage variance explained by the random fluctuations. d, LASSO analysis of marker genes was overly conservative compared to the pseudotime analysis in this work. A higher false-negative rate for calling CCD genes (top) and proteins (bottom) leaves a cyclic pattern in the UMAP dimensionality reduction expected of CCD genes and proteins (left) in the non-CCD ones (right). e, Principal component analysis showed no discernible batch effect between the three plates with 384 cells each, and instead the cell cycle phases roughly assigned by FACS sorting provide clear separation in the first two components (PCs). Results are shown before and after filtering out the non-cycling cells. f, Comparing the individual batches to the combined data for RNA-seq again confirms that no batch effects were present in the RNA-seq data. Each plot contains relative RNA expression (0 to 1, y-axes) versus cell division time (0 to 25.3 h, x-axes). Line, moving mean; darker shade, 25th to 75th percentile range; lighter shade, 10th to 90th percentile range; points, individual cell data.