Table 2 Statistics on the representative validation set.

From: Predicting the inhibition efficiencies of magnesium dissolution modulators using sparse machine learning models

No. of features

3

5

63

1260

Model Type

Tiny model

Small model

Medium model

Large model

Selection method

a

b

c

a

b

c

a

b

c

 

RMSE/pp

50

24

66

51

26

60

49

23

40

38

R2

0.56

0.94

0.53

0.54

0.94

0.95

0.58

0.94

0.94

0.95

Pearson’s r

0.75

0.97

0.73

0.73

0.97

0.97

0.76

0.97

0.76

0.97

p-value

0.14

0.01

0.16

0.16

0.01

0.01

0.14

0.01

0.08

0.01

  1. Root mean squared errors (RMSE), coefficients of determination (R2), correlation coefficients (Pearson’s r) and p-values of the representative validation set predictions for all trained models by model type and feature selection method (a: ANOVA, b: RFE, c: random selection). 3,5- Dinitrobenzoic acid (compound No. 13) was omitted for calculation of the statistical values as its molecular features are in parts substantially outside of the domain covered by the training data.