Table 1 Validation of ML models on an independent set of known binders for CK1α and CK1δ, curated from literature (called “literature hits”) and available in house (“internal hits”)

From: Evaluation of DNA encoded library and machine learning model combinations for hit discovery

  

CK1α

CK1δ

  

Literature hits (15)

Internal hits (206)

Literature hits (245)

Internal hits (231)

Multi-Layer Perceptron (MLP)

MS10M DEL

0

0

0

0

HG1B DEL

1

12

25

80

DD11M DEL

2*

22

55

27

Support Vector machine (SVM)

MS10M DEL

0

0

0

0

HG1B DEL

0

0

5

0

DD11M DEL

2*

7

9

6

Random Forest (RF)

MS10M DEL

0

0

0

0

HG1B DEL

0

0

1

0

DD11M DEL

0

0

0

0

Extra-Gradient Boosting (XGB)

MS10M DEL

0

0

0

0

HG1B DEL

1

27

8

40

DD11M DEL

2*

11

0

0

Graphical Neural Network (ChemProp)

MS10M DEL

0

0

1

3

HG1B DEL

2*

105*

88

124*

DD11M DEL

0

3

122*

39

  1. The reported numbers indicate the number of correctly predicted binders for the respective target protein by the ML models trained using the corresponding DEL data. The feature representation for the molecules was 2048 bits Morgan fingerprints for MLP, SVM, RF, and XGB models and graphical neural network-based features for ChemProp model (Methods: Feature representation).
  2. Bold indicates the best performance from a ML model across three DEL libraries.
  3. *indicates the best overall performance by a DEL + ML combination for a dataset.