Fig. 5: Replicate aggregation and EMD profiling using global controls. | Communications Biology

Fig. 5: Replicate aggregation and EMD profiling using global controls.

From: A statistical framework for high-content phenotypic profiling using cellular feature distributions

Fig. 5

a Replicates of treatment samples with sufficient reproducibility are merged to form larger populations (orange curve) for subsequent EMD profiling relative to the global control (black curve). Shown is an example using the area of the nucleus feature with a strong phenotypic response to treatment with 20 µM vincristine (sample sizes: nr1 = 348, nr2 = 386, nr3 = 366, ncontrol = 265,638 cells). The empirical CDF (ECDF) curve shows a global right shift in the treatment condition, indicating an increase in global nuclear area. Inset: area between distributions measured by EMD (gray). b, c PDFs and (insets) CDFs for normalized control (per well) cell populations (b) and replicate-merged treatment populations (c). Differences between individual distributions relative to the global control are measured using the EMD metric and sample sizes corresponding to each statistical difference are listed in Supplementary Data 3 for 330 control (nc-min = 419, nc-max = 990 cells) and 455 treatment samples (nt-min = 52, nt-max = 2763 cells). d Radial plot of scaled EMD scores for 69 measured features among individual controls (gray lines). The EMD profile is log-transformed and min-max scaled to [0,1]. The median score of all controls fluctuates between 0.16 and 0.46 (black dashed line). e Radial plot of residual EMD scores of individual controls (gray lines) relative to the median score (black dashed line, zero). Residual score is defined as the difference between the score of the individual control and the median of all controls. The residuals naturally fluctuate around median zero with values between (−0.29, 0.45). The values have been offset by 0.5 to expand the plot for better visualization. f Radial plot of residual scores for Vincristine treatment at multiple concentrations, relative to control median. df Feature labels are color-coded by their corresponding cellular components.

Back to article page