Fig. 1: Schematic of the DEL + ML workflow for hit identification. | npj Drug Discovery

Fig. 1: Schematic of the DEL + ML workflow for hit identification.

From: Evaluation of DNA encoded library and machine learning model combinations for hit discovery

Fig. 1

Three DNA-Encoded Libraries (DEL): MS10M (MilliporeSigma DEL, 10M compounds), HG1B (HitGen OpenDEL®, 1B compounds), and DD11M (DOS-DEL, 11M compounds), were screened against two proteins CK1α/δ. Both CK1α/δ were screened in presence and absence of a potent inhibitor, resulting five selection conditions: a beads-only, no target control, CK1α, CK1α+inh, CK1δ, CK1δ+inh (Methods: DEL screening). DEL screening results were informatically processed to stratify positives (orthosteric binders to CK1α/δ) and negatives (not binders to CK1α/δ) for training five machine learning (ML) models (Methods: Stratifying enriched DEL molecules and binder types). These models are: Multi-layer Perceptron (MLP), Support Vector Machine (SVM), Random Forest (RF), Extra Gradient boosting (XGB), and Graphical Neural Network (ChemProp). All ML models were tested using an independent validation set of known binders to CK1α/δ and applied to a bind assessment set of 140 K compound collection for predicting binders and not-binders (Supplementary Fig. 4; Methods: Validation and blind assessment datasets). A selected set of predicted binders and not-binders were finally tested in a biophysical SPR assay to identify confirmed binders and not-binders (Methods: Protein Production and Assay Methods). This figure was created by Behnoush Hajian and Mirabella Vulikh and has been used with written permission.

Back to article page