Table 4 Performance measures of automated classification models for reservoir question item.

Measures	A1^a (N = 345)	A2^b (N = 417)	B1^a (N = 345)	B2^a (N = 345)	B3^a (N = 345)	B4^c (N = 208)
Measures	n (0, 1) = 133, 212	n (0, 1) = 259, 158	n (0, 1) = 225, 120	n (0, 1) = 270, 75	n (0, 1) = 321, 24	n (0, 1) = 132, 76
Accuracy [95% CI]	0.919 [0.885, 0.945]	0.940 [0.913, 0.961]	0.975 [0.951, 0.988]	0.986 [0.967, 0.995]	0.965 [0.940, 0.982]	0.947 [0.907, 0.973]
Cohen’s Kappa	0.829	0.838	0.943	0.957	0.687	0.825
Specificity	0.895	0.981	0.978	0.996	0.997	0.977
Sensitivity	0.934	0.873	0.967	0.947	0.542	0.894
F1 score	0.895	0.953	0.980	0.991	0.982	0.959

Measures	C1^b (N = 433)	C2^b (N = 363)	C3^a (N = 345)	D1^b (N = 365)	D2^a (N = 345)
Measures	n (0, 1) = 230, 203	n (0, 1) = 241, 122	n (0, 1) = 232, 113	n (0, 1) = 256, 109	n (0, 1) = 342, 3
Accuracy [95% CI]	0.917 [0.887, 0.941]	0.857 [0.816, 0.891]	0.959 [0.933, 0.978]	0.829 [0.785, 0.867]	0.992 [0.975, 0.998]
Cohen’s Kappa	0.833	0.652	0.906	0.678	0
Specificity	0.935	0.971	0.987	0.953	1
Sensitivity	0.897	0.631	0.903	0.472	0
F1 score	0.923	0.900	0.970	0.892	0.996

The sample size and subsample size in this table pertain to the training data utilized for machine learning training. The original dataset comprises a total of 345 responses. Variations in sample size and subsample size correspond to different rubric bins, reflecting distinct data manipulation strategies employed to enhance model performance.
- Rubric bins denoted by “a” indicate the use of basic feature engineering settings with no extended strategies.
- Rubric bins marked “b” signify the introduction of an extended strategy of dummy responses.
- Rubric bins marked “c” indicate the extended strategy of data rebalancing before introducing dummy responses.

Quick links

Search