Fig. 3: Metric learning operates actionably on a fully randomized screen.
From: Deep phenotypic profiling of neuroactive drugs in larval zebrafish

We investigate models trained on the second, fully randomized screen. a Separation of positive and negative motion index (MI) trace pairs from the fully randomized screen with Twin-NN (left), Twin-DN (2nd column), correlation (3rd column), euclidean (4th column), and Fast-DTW (right) distances. Assessed as in Fig. 2a, the revised deep learning models significantly outperform correlation, euclidean, and fast-DTW distances. b Twin-NN and Twin-DN receiver operator characteristic performance is similar (AUC = 0.84 and 0.79, respectively) and significantly exceeds correlation, euclidean, and fast-DTW (0.66, 0.62, and 0.64). Notably, models trained with and without Hanning smoothing no longer differ significantly. c Precision recall curves are consistent with (b). d The Twin-NN model identifies matched drug replicates more effectively than correlation distance, which typically starts to fail beyond one replicate. We compute an all-by-all distance matrix across NT-650 compounds at the individual replicate level and determine how many replicate wells of the compound appear within the top 50 most similar ranked wells. We plot the cumulative total of unique drugs (y-axis) versus the increasing count of identified replicates (x-axis). The y-axis maximum does not reach the total number of NT-650 compounds because neither method perfectly ranks all replicates within the top 50 most phenotypically similar ranked wells for all NT-650 compounds. Indeed, some compounds are inactive, with replicates indistinguishable from DMSO. Source data are provided in the Source Data File.