Table 3 Accuracy, confidence, and utility scores stratified by explanation type for correct and incorrect recommendations.

Recommendation correctness	Explanation type	Accuracy		Confidence		Perceived utility
Correct recommendations	Baseline	0.357 (0.333–0.381)	p = 0.239	3.67 (3.63–3.72)	p = 0.239	N/A	p = 0.343
	No explanation	0.394 (0.355–0.433)		3.64 (3.57–3.72)		3.45 (3.35–3.55)
	Placebo	0.357 (0.318–0.397)		3.66 (3.59–3.74)		3.53 (3.43–3.63)
	Feature based	0.397 (0.356–0.437)		3.62 (3.54–3.69)		3.54 (3.44–3.64)
	Heuristic based	0.390 (0.352–0.428)		3.70 (3.62–3.77)		3.54 (3.44–3.64)
Incorrect recommendations	Baseline	0.357 (0.333–0.381)	p = 0.004*	3.67 (3.63–3.72)	p = 0.155	N/A	p = 0.573
	No explanation	0.298 (0.250–0.345)		3.60 (3.50–3.70)		3.38 (3.23–3.53)
	Placebo	0.311 (0.262–0.359)		3.65 (3.54–3.76)		3.36 (3.21–3.51)
	Feature based	0.262 (0.214–0.310)		3.67 (3.56–3.77)		3.48 (3.34–3.63)
	Heuristic based	0.327 (0.277–0.376)		3.57 (3.46–3.68)		3.36 (3.20–3.52)

p-values measured using repeated-measures ANOVA with a significance level of 0.05.

Quick links

Search