Table 3 Cross-validated (fivefold) linear regression and super learner models for non-regulated disinfection by-products (DBPs) based on routinely monitored parameters as explanatory variables.

From: Insights to estimate exposure to regulated and non-regulated disinfection by-products in drinking water

 

Linear regression

Super learner model

Analyte

Independent variable(s)

Transformationa

R2 (95% CI)

RMSE(SD)

Modelb

R2 (95% CI)

RMSE (SD)

Haloacetic acids (HAAs)

DCAA

TCM

0.89 (0.79, 0.95)

2.17 (1.08)

1

0.82 (0.59, 0.92)

2.02 (1.28)

TCAA

TCM

0.97 (0.94, 0.98)

2.12 (0.61)

3

0.97 (0.96, 0.98)

1.41 (0.36)

MBAA

BDCM,DBCM

0.50 (0.14, 0.70)

0.37 (0.06)

2

0.43 (0.11, 0.63)

0.32 (0.07)

DBAA

TCM,BDCM,DBCM

log

0.65 (0.33, 0.81)

2.45 (0.78)

3

0.64 (0.43, 0.77)

2.36 (1.31)

TBAA

TCM,BDCM,DBCM

log

0.43 (0.03, 0.66)

1.35 (0.26)

1

0.19 (−0.12, 0.41)

1.48 (0.40)

BCAA

BDCM

0.35 (0.03, 0.57)

0.92 (0.25)

3

0.19 (0.01, 0.34)

0.89 (0.29)

BDCAA

TCM

–-

0.82 (0.32, 0.95)

0.38 (0.27)

2

0.64 (0.32, 0.80)

0.50 (0.32)

DBCAA

DBCM,TBM

sqrt

0.44 (−0.09, 0.71)

0.72 (0.21)

3

0.09 (−0.31, 0.38)

0.78 (0.22)

Brominated HAAsc

TCM,BDCM,DBCM

log

0.72 (0.50, 0.85)

3.41 (1.20)

1

0.58 (0.37, 0.72)

3.67 (0.43)

Chlorinated HAAsd

TCM

0.96 (0.92, 0.97)

3.20 (0.65)

3

0.94 (0.90, 0.97)

2.94 (1.76)

Total HAAs

TCM,BDCM,DBCM,TBM

log

0.85 (0.77, 0.90)

5.87 (0.95)

1

0.77 (0.65, 0.85)

5.83 (0.99)

Haloacetonitriles (HANs)

DCAN

BDCM,DBCM

sqrt

0.97 (0.95, 0.98)

0.20 (0.07)

1

0.95 (0.92, 0.97)

0.20 (0.08)

BCAN

BDCM,DBCM

sqrt

0.76 (0.60, 0.86)

0.09 (0.01)

3

0.63 (0.40, 0.78)

0.10 (0.03)

DBAN

TCM,BDCM,DBCM,TBM

log

0.89 (0.83, 0.93)

0.53 (0.18)

3

0.92 (0.86, 0.95)

0.41 (0.14)

Total HANs

BDCM,DBCM

sqrt

0.65, (0.44, 0.78)

0.61 (0.12)

3

0.58 (0.38, 0.71)

0.57 (0.15)

Haloketones (HKs)

TCP

TCM,BDCM,DBCM,TBM

log

0.82 (0.70, 0.89)

0.29 (0.13)

1

0.81 (0.48, 0.93)

0.26 (0.15)

Other DBPs

Chlorate

DBCM,TBM

sqrt

0.40 (−0.081, 0.74)

71.25 (19.96)

2

0.34 (−0.05, 0.58)

65.95 (21.11)

Chlorite

BDCM,TBM

sqrt

0.77 (0.60, 0.86)

25.49 (12.1)

2

0.72 (0.45, 0.86)

23.63 (12.15)

  1. Models were built using the total number of samples (N = 42) after converting values <LOQ into LOQ/2. Bold indicates models with R2 > 0.7 and lower 95% confidence interval > 0.5. All parameters in this table are fivefold cross-validated.
  2. RMSE root mean squared error, DCAA dichloroacetic acid, TCM trichloromethane, TCAA trichloroacetic acid, MBAA monobromoacetic acid, BDCM bromodichloromethane, DBCM dibromochloromethane, DBAA dibromoacetic acid, TBAA tribromoacetic acid, BCAA bromochloroacetic acid, BDCAA bromodichloroacetic acid, DBCAA dibromochloroacetic acid, DCAN dichloroacetonitrile, BCAN bromochloroacetonitrile, DBAN dibromoacetonitrile, TBM bromoform, TCP 1,1,1-trichloropropanone.
  3. aTransformation of independent variables.
  4. bModel 1= algorithm library including generalized linear model, Bayesian GLM, random forest, multivariate adaptive regression splines, local polynomial regression, neural network, adaptive polynomial splines; Model 2 = same as Model 1 plus Random Forest algorithm modification; Model 3 = same as Model 2 plus additional screening algorithms for the input variables.
  5. cBrominated HAAs include MBAA, DBAA, and TBAA.
  6. dChlorinated HAAs include DCAA, TCAA, BCAA, BDCAA, and DBCAA.