Table 1 Performance evaluation of diagnosis-based counterfactual explanations obtained using different approaches. In each case, we report results averaged across 500 test samples.

Method	Validity \(\uparrow\)	Confidence \(\uparrow\)	Sparsity \(\downarrow\)	Proximity \(\downarrow\)	Realism \(\uparrow\)
Vanilla	0.68	0.63±0.11	0.3±0.17	4.59±0.68	1.16 ± 0.09
Mixup	0.78	0.69±0.17	0.27±0.16	4.09±0.52	1.19 ± 0.13
UWCC	0.79	0.75±0.13	0.25±0.17	4.26±0.63	1.16 ± 0.2
MC dropout	0.73	0.66±0.16	0.34±0.19	4.57±0.53	1.18 ± 0.16
Deep ensembles (5 models)	0.8	0.72±0.09	0.29±0.11	3.68±0.57	1.21 ± 0.12
TraCE	0.87	0.81±0.12	0.23±0.14	3.73±0.51	1.33 ± 0.13

Quick links

Search