Table 4 Performance measures of automated classification models for reservoir question item.

From: FEW questions, many answers: using machine learning to assess how students connect food–energy–water (FEW) concepts

Measures

A1a (N = 345)

A2b (N = 417)

B1a (N = 345)

B2a (N = 345)

B3a (N = 345)

B4c (N = 208)

n (0, 1) = 133, 212

n (0, 1) = 259, 158

n (0, 1) = 225, 120

n (0, 1) = 270, 75

n (0, 1) = 321, 24

n (0, 1) = 132, 76

Accuracy [95% CI]

0.919 [0.885, 0.945]

0.940 [0.913, 0.961]

0.975 [0.951, 0.988]

0.986 [0.967, 0.995]

0.965 [0.940, 0.982]

0.947 [0.907, 0.973]

Cohen’s Kappa

0.829

0.838

0.943

0.957

0.687

0.825

Specificity

0.895

0.981

0.978

0.996

0.997

0.977

Sensitivity

0.934

0.873

0.967

0.947

0.542

0.894

F1 score

0.895

0.953

0.980

0.991

0.982

0.959

Measures

C1b (N = 433)

C2b (N = 363)

C3a (N = 345)

D1b (N = 365)

D2a (N = 345)

 

n (0, 1) = 230, 203

n (0, 1) = 241, 122

n (0, 1) = 232, 113

n (0, 1) = 256, 109

n (0, 1) = 342, 3

 

Accuracy [95% CI]

0.917 [0.887, 0.941]

0.857 [0.816, 0.891]

0.959 [0.933, 0.978]

0.829 [0.785, 0.867]

0.992 [0.975, 0.998]

 

Cohen’s Kappa

0.833

0.652

0.906

0.678

0

 

Specificity

0.935

0.971

0.987

0.953

1

 

Sensitivity

0.897

0.631

0.903

0.472

0

 

F1 score

0.923

0.900

0.970

0.892

0.996

 
  1. The sample size and subsample size in this table pertain to the training data utilized for machine learning training. The original dataset comprises a total of 345 responses. Variations in sample size and subsample size correspond to different rubric bins, reflecting distinct data manipulation strategies employed to enhance model performance.
  2. - Rubric bins denoted by “a” indicate the use of basic feature engineering settings with no extended strategies.
  3. - Rubric bins marked “b” signify the introduction of an extended strategy of dummy responses.
  4. - Rubric bins marked “c” indicate the extended strategy of data rebalancing before introducing dummy responses.