Table 1 Reported performances of the thermodynamic solubility models published from 1997 to 2007.

From: Will we ever be able to accurately predict solubility?

Year

Reference

Descriptors

Size

Dataset

Method

RMSE

R2

1997

Huuskonen et al.87

Electrotopological / Topological

83

Litterature

ANN

—

0.84

2000

Huuskonen et al.16

Structural

694

Khune et al.

MLR

—

0.67

0.87

ANN

0.85

0.84

2001

Tetko et al.23

Molconn-Z

1,291

Huuskonen et al.

MLR

0.81

0.85

ANN

0.66

0.9

Ran et al.42

Melting Point / cLogP

380

AquaSol

GSE

0.76

—

Bruneau88

2D/3D/Charge/ Katrizky

2,233

Huuskonen et al.

ANN

0.82

—

Liu et al.89

2D Topological

1,312

Huuskonen et al.

ANN

0.71

—

2002

Klamt et al.90

QM

257

QikProp dataset

MLR

0.61

—

Engkvist et al.91

1D/2D Descriptors

1,290

Huuskonen et al.

ANN

—

0.95

Chen et al.92

Dipole, PSA, Vol, MW, Rot. & H-acc/don and D

321

Litterature

MLR

0.86

0.71

2003

Wegner & Zell93

2D Topological

1,290

Huuskonen et al.

ANN

0.54

—

Cheng & Merz94

Cerius

2,440

AquaSol, PhysProp, Merck Index, PDR, CMC

MLR-GA

1.01

—

Yan & Gasteiger95

PETRA

1,293

Huuskonen et al.

MLR

—

0.89

ANN

0.94

Lind & Maltseva96

Electrostatic, QM & topological

1,296

Huuskonen et al.

SVM

0.68

0.89

2004

Yan et al.97

PETRA

2,084

Huuskonen et al.

ANN

—

0.94

Hou et al.98

2D Topological

1,299

Huuskonen et al.

MLR

—

0.9

Fröhlich et al.99

MOE & JOElib

1,297

Huuskonen et al.

SVM

—

0.9

Votano et al.100

Fragments & Counts

4,115

Aquasol, Physprop, PDR, Taskinen, Tetko, Lobell

MLR & PLS

—

0.84

ANN

0.84

1,840

ANN

0.86

John S. Delaney19

cLogP, MW & Count

2,874

Abraham, Pesticide Manual, Syngenta

ESOL

—

0.55

2005

Matthew Clark101

2D descriptors

3,724

PhysProp

PLS

—

0.84

Catana et al.102

MOE, E-state & ISIS key

1,107

Pfizer proprietary & Public

PLS

0.48

0.94

Non-Linear PLS

NN

2006

Hansen et al.43

MOE 2D/3D

4,569

PhysProp

ANN

0.97

0.94

Wassvik et al.103

Tm, LogP, Sm, Hm & Molconn-Z

428

Astrazeneca

GSE

0.92

0.73

Mod. GSE

0.73

0.78

2007

Wang et al.10

3D Topological, cLogP, MW & Count

1,878

Delaney et al., Huuskonen et al., Hou et al.

MLR

0.74

0.9

Johnson et al.45

VOLSURF

362

Literature

MLR & HH

0.61

0.88

Schwaighofer et al.104

Dragon

1,290

Huuskonen et al.

GP

0.55

0.93

4,597

Huuskonen et al. & Others

0.55

0.91

  1. ANN: Artificial Neural Network
  2. ASE: Abraham Solvation Equation
  3. CNN: Convolutional Neural Network
  4. CPANN: Count-Propagation Artificial Neural Network
  5. DNN: Deep Neural Network
  6. D-GIN: Directed GIN
  7. D-MPNN: Directed-MPNN
  8. GIN: Graph Isomorphism Network
  9. GP: Gaussian Process
  10. GNN: Graph Neural Network
  11. GSE: General Solubility Equation
  12. HH: Henderson-Hasselbalch equation
  13. KNN: Kernel Neural Network
  14. LS-SVM: Least-Square Support Vector Machine
  15. MAT: Molecule Attention Transformer
  16. MK: Multi Kernel
  17. MLR: Multi Linear Regression
  18. MLR-GA: Multi Linear Regression Genetic Algorithm
  19. MPNN: Message Passing Neural Network
  20. NFP: Neural FingerPrint
  21. NL-PLS: Non-Linear Partial Least Square
  22. PLS: Partial Least Square
  23. RF: Random Forest
  24. RM: Replacement Method
  25. SMILES: Simplified Molecule Input Line Entry System
  26. SNN: Shallow Neural Network
  27. SR: Stepwise regression
  28. SVM: Support Vector Machine
  29. SVR: Support Vector Regression
  30. TE: Theorical Equation
  31. UG-RNN: Undirected Graph Recurrent Neural Network
  32. CR: Contracted Ring
  33. LMO: Leave-Many-Out
  34. LOO: Leave-One-Out