Table 2 Example of the compression of results if there were only two datasets in the methodology base.

From: The quest for the reliability of machine learning models in binary classification on tabular data

Input 2

Env

Mean-diff

Median-diff

Mean-guess

Median-guess

Mean-disc

Median-disc

dataset_1_test

Normal

− 1.12

− 0.20

1.62

1.42

− 1.32

− 0.58

dataset_1_prod

Normal

− 2.27

− 2.23

2.02

1.58

− 0.98

− 0.87

dataset_1_test

Unreliable

1.44

1.36

1.84

1.77

1.53

1.32

dataset_1_prod

Unreliable

2.18

2.12

1.12

0.12

3.02

2.70

dataset_2_test

Normal

− 1.14

− 0.71

2.32

2.12

1.12

0.58

dataset_2_prod

Normal

− 1.42

− 1.20

0.92

0.72

− 2.82

− 2.58

dataset_2_test

Unreliable

1.84

1.21

1.22

1.16

1.72

1.48

dataset_2_prod

Unreliable

1.67

1.43

2.02

1.58

1.80

1.47

  1. Abbreviations present in the table: prod for production; env to environment; diff for difficult; guess for guessing; disc for discrimination; under for unreliable.