Fig. 4: Statistical metric comparison and feature reproducibility.
From: A statistical framework for high-content phenotypic profiling using cellular feature distributions

a Feature reproducibility is assessed by estimating statistical distance among all pairwise replicates in both control samples (left) and treatment samples (right). b Hypothetical probability density (PDF) and cumulative density (CDF) curves for two random samples of the same feature are illustrated to show how the Kolmogorov–Smirnov (KS) distance and Wasserstein metric (EMD) are estimated. c Distributions of statistical scores measured by all pairwise differences between replicates are consistent among both treatments and controls, with EMD score showing higher sensitivity in detecting discrepancies. A full summary of replicate pairwise differences is provided in Supplementary Data 2 which lists feature, treatment (compound_concentration), plate, well id, sample size (as cell count), KS score, EMD score, and Z-score. d Features are sorted by their average EMD score between all replicates as an indicator of reproducibility. A high average EMD score indicates higher variation of a feature among replicates (low reproducibility). Features with poor reproducibility (outliers) falling above the upper interquartile (IQR) threshold value of 0.65 (upper threshold value = 1.5 × IQR + upper quartile) are highlighted in red.