Table 3 Accuracy, confidence, and utility scores stratified by explanation type for correct and incorrect recommendations.
Recommendation correctness | Explanation type | Accuracy | Confidence | Perceived utility | |||
|---|---|---|---|---|---|---|---|
Correct recommendations | Baseline | 0.357 (0.333–0.381) | p = 0.239 | 3.67 (3.63–3.72) | p = 0.239 | N/A | p = 0.343 |
No explanation | 0.394 (0.355–0.433) | 3.64 (3.57–3.72) | 3.45 (3.35–3.55) | ||||
Placebo | 0.357 (0.318–0.397) | 3.66 (3.59–3.74) | 3.53 (3.43–3.63) | ||||
Feature based | 0.397 (0.356–0.437) | 3.62 (3.54–3.69) | 3.54 (3.44–3.64) | ||||
Heuristic based | 0.390 (0.352–0.428) | 3.70 (3.62–3.77) | 3.54 (3.44–3.64) | ||||
Incorrect recommendations | Baseline | 0.357 (0.333–0.381) | p = 0.004* | 3.67 (3.63–3.72) | p = 0.155 | N/A | p = 0.573 |
No explanation | 0.298 (0.250–0.345) | 3.60 (3.50–3.70) | 3.38 (3.23–3.53) | ||||
Placebo | 0.311 (0.262–0.359) | 3.65 (3.54–3.76) | 3.36 (3.21–3.51) | ||||
Feature based | 0.262 (0.214–0.310) | 3.67 (3.56–3.77) | 3.48 (3.34–3.63) | ||||
Heuristic based | 0.327 (0.277–0.376) | 3.57 (3.46–3.68) | 3.36 (3.20–3.52) | ||||