Table 2 Performance evaluation of age-based counterfactual explanations obtained using different approaches. In each case, we report results averaged across 500 test samples.

From: Training calibration-based counterfactual explainers for deep learning models in medical image analysis

Method

Validity \(\downarrow\)

Sparsity \(\downarrow\)

Proximity \(\downarrow\)

Realism \(\uparrow\)

Vanilla

2.49

0.06±0.08

4.08±0.48

1.26±0.1

Mixup

0.83

0.05±0.07

3.79±0.52

1.28±0.07

UWCC

0.74

0.09±0.03

3.81±0.42

1.33±0.05

MC dropout

1.44

0.07±0.08

4.13±0.29

1.26±0.06

Deep ensembles (5 models)

0.45

0.05±0.09

3.89±0.32

1.32±0.06

TraCE

0.16

0.05±0.03

3.66±0.35

1.38 ± 0.06

  1. Significant values are in [bold].