Table 2 Performance of each descriptor on molecular property prediction (Summary)

From: Difficulty in chirality recognition for Transformer architectures learning chemical structures from string representations

Descriptor

ESOL (RMSE)

FreeSolv (RMSE)

Lipophilicity (RMSE)

BACE (AUROC)

BBBP (AUROC)

ClinTox

 

Steps

CT_TOX (AUROC)

FDA_APPROVED (AUROC)

random

 

1.060 ± 0.186

1.023 ± 0.070

1.002 ± 0.026

0.497 ± 0.025

0.482 ± 0.040

0.475 ± 0.038

0.467 ± 0.098

ECFP(R = 2)

 

0.947 ± 0.199

0.463 ± 0.065

0.749 ± 0.053

0.856 ± 0.035

0.852 ± 0.035

0.875 ± 0.041

0.834 ± 0.103

CDDD

 

0.715 ± 0.272

0.320 ± 0.032

0.677 ± 0.038

0.826 ± 0.032

0.874 ± 0.054

0.895 ± 0.016

0.882 ± 0.041

Uni-Mol

 

0.456 ± 0.082

0.295 ± 0.032

0.505 ± 0.053

0.847 ± 0.029

0.861 ± 0.042

0.874 ± 0.048

0.875 ± 0.053

Transformer

0

0.548 ± 0.065

0.485 ± 0.053

0.897 ± 0.022

0.776 ± 0.037

0.845 ± 0.067

0.859 ± 0.045

0.780 ± 0.047

4000

0.571 ± 0.151

0.398 ± 0.029

0.821 ± 0.041

0.791 ± 0.029

0.862 ± 0.057

0.899 ± 0.032

0.883 ± 0.051

6000

0.566 ± 0.111

0.424 ± 0.051

0.775 ± 0.030

0.815 ± 0.006

0.881 ± 0.032

0.869 ± 0.035

0.862 ± 0.033

8000

0.579 ± 0.074

0.464 ± 0.068

0.774 ± 0.034

0.821 ± 0.019

0.875 ± 0.045

0.871 ± 0.042

0.868 ± 0.066

10000

0.570 ± 0.063

0.459 ± 0.065

0.782 ± 0.034

0.829 ± 0.025

0.859 ± 0.064

0.837 ± 0.062

0.819 ± 0.065

16000

0.567 ± 0.091

0.486 ± 0.064

0.804 ± 0.033

0.823 ± 0.020

0.876 ± 0.047

0.876 ± 0.023

0.804 ± 0.088

30000

0.581 ± 0.091

0.497 ± 0.104

0.793 ± 0.024

0.825 ± 0.012

0.863 ± 0.082

0.845 ± 0.032

0.799 ± 0.049

48000

0.594 ± 0.086

0.496 ± 0.047

0.801 ± 0.033

0.824 ± 0.019

0.875 ± 0.043

0.888 ± 0.025

0.850 ± 0.049

80000

0.585 ± 0.059

0.461 ± 0.058

0.771 ± 0.024

0.835 ± 0.024

0.861 ± 0.062

0.904 ± 0.024

0.894 ± 0.046

  1. Bold figures are the best score for each dataset among the models.