Table 2 Example of the compression of results if there were only two datasets in the methodology base.
From: The quest for the reliability of machine learning models in binary classification on tabular data
Input 2 | Env | Mean-diff | Median-diff | Mean-guess | Median-guess | Mean-disc | Median-disc |
|---|---|---|---|---|---|---|---|
dataset_1_test | Normal | − 1.12 | − 0.20 | 1.62 | 1.42 | − 1.32 | − 0.58 |
dataset_1_prod | Normal | − 2.27 | − 2.23 | 2.02 | 1.58 | − 0.98 | − 0.87 |
dataset_1_test | Unreliable | 1.44 | 1.36 | 1.84 | 1.77 | 1.53 | 1.32 |
dataset_1_prod | Unreliable | 2.18 | 2.12 | 1.12 | 0.12 | 3.02 | 2.70 |
dataset_2_test | Normal | − 1.14 | − 0.71 | 2.32 | 2.12 | 1.12 | 0.58 |
dataset_2_prod | Normal | − 1.42 | − 1.20 | 0.92 | 0.72 | − 2.82 | − 2.58 |
dataset_2_test | Unreliable | 1.84 | 1.21 | 1.22 | 1.16 | 1.72 | 1.48 |
dataset_2_prod | Unreliable | 1.67 | 1.43 | 2.02 | 1.58 | 1.80 | 1.47 |