Fig. 2: Compressed screening identifies compounds with largest effects in a GT setting.
From: Scalable, compressed phenotypic screening using pooled perturbations

a, Overview of CS benchmarking experiments used to assess the morphological impacts of 316 FDA-approved compounds on U2OS cells. b, Overview of conventional GT screen designed to address a (Methods). c, The effects of the 316 compounds were calculated relative to DMSO controls using the MD and the drug × feature matrix was clustered to identify GT drug-associated phenotypes. d, One-sided Fisher’s exact enrichments (−log10(P value)) of the features differentially enriched in each GT phenotype (log2 fold change > 3) from the seven classes of Cell Painting features (five cellular components plus area or shape and neighborhood features). Each of the seven is further broken down on the basis of whether the observation pertains to the entire cell, the cytoplasm only or the nucleus only. The righthand bar visualizes the mean number of cells per well across all samples in each GT phenotype. e, Heat map (−log10(P value), one-sided permutation test with no correction) showing the top five drugs associated with each of the eight GT clusters. f, Overview of our compressed screening benchmarking experiment, which tested a range of compressions (\({S}_{{\rm{compressed}}}=\frac{N\times R}{P}\)) for the same 316 drugs (N) by examining several pool sizes (P) and replications (R) to identify the capabilities and limits of the approach. g, CS experimental overview. h, Analytical approach for inferring the effects of each perturbation using regularized linear regression and solving for the coefficient matrix (β). i, Inferred perturbation effects in a CS (scaled L1 norm, y axis) versus those from the GT screen (MD, x axis) for two replicate runs (r, Pearson correlation; two-sided correlation test CS run 1, P value < 2.2 × 10−16; CS run 2, P value < 2.2 × 10−16). j, Correlation of the effect sizes between GT and CS runs across all pool sizes for the perturbations that were significantly associated with any of the phenotypic clusters in the GT screen (one-sided permutation test without multiple hypothesis correction, P value < 0.01). k, Receiver operating characteristic (ROC) curves (false positive rate versus true positive rate) calculated to show the performance of the CS screens in correctly identifying GT hits for each pool size while varying the deconvolution permutation testing threshold (from 0 to 1 in steps of 0.01). l, Mean scaled L1 norm of the perturbations called as hits (scaled L1 norm > 0) in both replicate CSs at each pool size (y axis). The drugs plotted are those that resulted in a statistically significant effect in any pool size (one-sided permutation test without multiple hypothesis correction, P value < 0.01). m, False negative rate, calculated as perturbations with significant MDs in the GT screen but unrecovered in CS among all significant GT perturbations, as a function of pool size in the CS.