Table 2 Statistical results and fold error of the best regression models for CLr prediction with or without fu,p.

From: Development of an in silico prediction system of human renal excretion and clearance from chemical structure information incorporating fraction unbound in plasma as a descriptor

CR type

Descriptor set

Training or Test

The best model

Average

Methoda

r2

RMSE

Within 2-fold error (%)

Within 3-fold error (%)

r2

Reabsorption Type (R)

Without fu,p

Training

0.48

0.56

—

—

0.50

RF

Test

0.38

0.61

37.5 (33.3)

43.8 (33.3)

0.23

With observed fu,p

Training

0.71

0.44

—

—

0.62*

RF

Test

0.66

0.46

56.3 (33.3)

62.5 (33.3)

0.53*

With predicted fu,p (Model_CLr_R)

Training

0.57

0.51

—

—

0.52*

PLS

Test

0.52

0.54

43.8 (16.7)

50.0 (33.3)

0.47*

Intermediate Type (IM)

Without fu,p

Training

0.65

0.38

—

—

0.65

SVM

Test

0.56

0.28

68.8 (60.0)

93.8 (90.0)

0.43

With observed fu,p

Training

0.95

0.17

—

—

0.94*

RF

Test

0.92

0.12

100 (100)

100 (100)

0.88*

With predicted fu,p (Model_CLr_IM)

Training

0.77

0.29

—

—

0.82*

RF

Test

0.74

0.21

87.5 (83.3)

100 (100)

0.68*

Secretion Type (S)

Without fu,p

Training

0.43

0.51

—

—

0.46

RF

Test

0.41

0.46

48.6 (35.0)

68.6 (60.0)

0.36

With observed fu,p

Training

0.64

0.39

—

—

0.65*

RF

Test

0.62

0.37

62.9 (55.0)

80.0 (75.0)

0.57*

With predicted fu,p (Model_CLr_S)

Training

0.60

0.42

—

—

0.58*

RF

Test

0.58

0.40

57.1 (50.0)

80.0 (65.0)

0.46*

  1. aRF, Random forest; SVM, Support Vector Machine with radial functions; PLS, partial least squares; RMSE, root mean squared error. *p-value calculated using the paired t-test with Kappa against model without fu,p in each CR type (p < 0.05).