Table 2 Correlation of reliability metrics to model performance

From: Molecular deep learning at the edge of chemical space

 

Rank-based correlation to bin order

Binning metric

Balanced accuracy

Hit rate

Precision

Scaffold sim

0.42 ± 0.06

0.51 ± 0.04

0.50 ± 0.06

Molecular core overlap

0.28 ± 0.07

0.22 ± 0.09

0.25 ± 0.07

Pharmacophore similarity

0.19 ± 0.07

0.37 ± 0.09

0.43 ± 0.08

Embedding distance

0.36 ± 0.06

0.24 ± 0.09

0.29 ± 0.08

Uncertainty

0.51 ± 0.08

0.62±0.06

0.72±0.04

Unfamiliarity

0.58±0.04

0.52 ± 0.07

0.52 ± 0.05

  1. Correlation (Kendall’s τ) between several bin-wise performance metrics and the bin order. Molecules are binned into eight bins per dataset by: mean pharmacophore similarity to the training set (cosine distance computed on CATS descriptors), mean scaffold (Tanimoto on ECFPs) similarity to the training set, mean molecular core overlap (MCS fraction) to the training set, Mahalanobis distance of embeddings (z vectors) to the training set, prediction uncertainty and unfamiliarity. Mean and standard error of the mean for all datasets are reported. A correlation of 1.0 indicates perfect model calibration. For every metric, bins are ordered to reflect low to high confidence. Highest correlations are reported in bold.