Table 3 Performance of QSAR classification models for P4up activity in H295R assay.

From: Cheminformatics analysis of chemicals that increase estrogen and progesterone synthesis for a breast cancer hazard assessment

QSAR-P4up model (n structures = 1700)

 

Acc

bAcc

Se

Sp

MCC

AUC

Tenfold cross-validation (training set, n active = 155, n inactive = 1290)

CART

0.87

0.61

0.29

0.94

0.25

0.69

SVM-rbf

0.89

0.52

0.04

1

0.11

0.52

NN

0.89

0.5

0

1

0

0.71

LDA

0.88

0.62

0.3

0.95

0.28

0.76

DNN

0.83

0.62

0.36

0.89

0.24

0.62

RF

0.99

0.99

0.98

0.99

0.95

0.99

RF balanced

0.76

0.86

1

0.73

0.47

0.86

Fitting (training set, n active = 155, n inactive = 1290)

CART

0.97

0.87

0.75

1

0.83

0.92

SVM-rbf

0.99

0.97

0.95

1

0.97

0.97

NN

1

0.99

0.99

1

0.99

0.99

LDA

0.91

0.71

0.44

0.97

0.49

0.92

DNN

0.52

0.71

0.97

0.46

0.27

0.71

RF

0.99

0.99

0.98

0.99

0.94

0.99

RF balanced

0.76

0.86

1

0.73

0.47

0.86

External validation (test set, n active = 26, n inactive = 229)

CART

0.89

0.66

0.37

0.95

0.35

0.78

SVM-rbf

0.9

0.52

0.04

1

0.18

0.52

NN

0.91

0.56

0.11

1

0.32

0.56

LDA

0.87

0.58

0.22

0.94

0.19

0.76

DNN

0.5

0.67

0.45

0.89

0.21

0.67

RF

0.89

0.66

0.37

0.96

0.37

0.66

RF balanced

0.73

0.8

0.89

0.71

0.39

0.8

  1. Acc accuracy, bAcc balanced accuracy, Sp specificity, Se sensitivity, AUC Area Under the Receiver Operating Characteristic Curve, MCC Matthew Coefficient Correlation, see “Material and methods”.