Table 1 Performance evaluation of diagnosis-based counterfactual explanations obtained using different approaches. In each case, we report results averaged across 500 test samples.
Method | Validity \(\uparrow\) | Confidence \(\uparrow\) | Sparsity \(\downarrow\) | Proximity \(\downarrow\) | Realism \(\uparrow\) |
---|---|---|---|---|---|
Vanilla | 0.68 | 0.63±0.11 | 0.3±0.17 | 4.59±0.68 | 1.16 ± 0.09 |
Mixup | 0.78 | 0.69±0.17 | 0.27±0.16 | 4.09±0.52 | 1.19 ± 0.13 |
UWCC | 0.79 | 0.75±0.13 | 0.25±0.17 | 4.26±0.63 | 1.16 ± 0.2 |
MC dropout | 0.73 | 0.66±0.16 | 0.34±0.19 | 4.57±0.53 | 1.18 ± 0.16 |
Deep ensembles (5 models) | 0.8 | 0.72±0.09 | 0.29±0.11 | 3.68±0.57 | 1.21 ± 0.12 |
TraCE | 0.87 | 0.81±0.12 | 0.23±0.14 | 3.73±0.51 | 1.33 ± 0.13 |