Fig. 5: Learned distance is a good proxy for the target bioactivity profile of DIVERSet compounds.
From: Deep phenotypic profiling of neuroactive drugs in larval zebrafish

Assessed in a scaffold-agnostic screening paradigm, we compare motion index (MI) traces of NT-650 query compounds against a screened library of diverse compounds (Chembridge DIVERSet) using the Twin-NN learned distance and correlation distance versus a random baseline wherein the matched traces are randomly selected. a As an example, the fluoxetine MI trace (purple) from the NT-650 agrees well with the top 5 matched library compound traces (gold) ranked by Twin-NN distance. All time series in this plot are scaled to the minimum and maximum of the dataset (0 and 6750 MI units, respectively), and the y-axis is plotted on this normalized 0 to 1 scale. b We use a separate chemical informatics method, the Similarity Ensemble Approach (SEA25,26), to assess the library compound hits. Ranked by the similarity of their phenotypes to drugs from the NT-650 screen, we would expect that the likelihood of SEA target profiles between a query (NT-650) and its closest-match library (DIVERSet) compounds will increase with the quality of the phenotypic distance metric. “Hits” (y-axis) are the number of DIVERSet compounds in a given sample that match their separate SEA profiles. “Sample” (x-axis) is the percentage of the DIVERSet library examined, where the analysis is limited to the top 500 matches from the library. The learned distance metric enriches for SEA hits better than correlation and the random baseline across the entire range of the screen. The plot uses a confidence interval of 95% via the seaborn lineplot function. c Similar to (b); but for specific NT-650 compounds selected by phenotypic strength (see “Methods”). Learned distance outperforms correlation and random distance, as with pindolol, imetit, and chlorpromazine. Correlation distance has significantly better enrichment for only one NT-650 compound, MDL 72832 (4th row, 4th column in grid plot). All subplots use a confidence interval of 95% via the seaborn lineplot function. Source data are provided in the Source Data File.