Fig. 3: Detection of simulated OOR cell states. | Nature Genetics

Fig. 3: Detection of simulated OOR cell states.

From: Precise identification of cell states altered in disease using healthy single-cell references

Fig. 3: Detection of simulated OOR cell states.

a, Illustration of removed OOR state perturbation. The dashed outlines denote the position of the OOR cell state. b, Performance comparison of reference designs in the detection of OOR cell states. To compare performance considering the log fold change and confidence (10% spatial FDR), we measured the FDR, FPR and true positive rate (TPR). To compare performance using the log fold change only as a metric for prioritization, we measured the AUPRC. The points represent simulations with different OOR states (eight states, excluding simulations where fewer than 250 OOR cells were present after splitting the pseudo-disease and control dataset). Tests on the same simulated data are connected. c, Box plots of AUPRC to detect OOR cell states with embedding models trained on different sets of 5,000 HVGs, selecting HVGs in the atlas dataset, in the control dataset or in the concatenated control and pseudo-disease datasets (control + disease). The color represents different reference designs. Tests on the same simulated data are connected. The gray box denotes the type of data used to train the model for each design. d, Illustration of mixed OOR state perturbation: all simulations have a fixed cell state removed from the control and atlas datasets (classical monocytes) and a varying shifted OOR cell state, where cells of the OOR cell state are split in two groups based on principal component analysis (PCA), and only one group is removed from the atlas and control datasets (shifted OOR state). e, As in b, but with mixed OOR states. f, Bar plots of the AUPRC for OOR state detection with different types of perturbation on the same OOR cell state, colored according to reference design. The rightmost plot shows the AUPRC for the detection of the shifted OOR cell state, excluding the fixed removed state. The height of the bar denotes the AUPRC computed on real data. The error bars indicate the 95% confidence interval (CI) calculated from bootstrapping with 1,000 resampling iterations. Cases where the CR design outperformed the ACR design when only the OOR state is removed are highlighted by the red dashed rectangles. In all box plots, the center line denotes the median; the box limits denote the first and third quartiles; and the whiskers denote 1.5× the interquartile range (IQR).

Back to article page