Table 23 Impact of synthetic data proportion on real-world generalization.
Synthetic ratio | Real test accuracy (95% CI) | Synthetic test accuracy (95% CI) | Generalization gap (95% CI) | Rare pathology sensitivity (95% CI) | False positive rate (95% CI) | Training time (hours) | Model robustness score (mean ± SD) | ANOVA F-statistic | p-value |
|---|---|---|---|---|---|---|---|---|---|
0% (real only) | 81.4% (79.8–83.0) | N/A | N/A | 67.3% (64.1–70.5) | 4.7% (3.9–5.5) | 8.2 | 6.8 ± 0.4/10 | – | – |
25% synthetic | 89.7% (88.4–91.0) | 91.2% (90.1–92.3) | 1.5% (0.8–2.2) | 78.4% (76.1–80.7) | 3.2% (2.6–3.8) | 10.1 | 7.9 ± 0.3/10 | 23.47 | < 0.001 |
50% synthetic | 94.2% (93.1–95.3) | 95.8% (94.9–96.7) | 1.6% (0.9–2.3) | 84.7% (82.8–86.6) | 2.8% (2.3–3.3) | 12.3 | 8.7 ± 0.2/10 | 45.82 | < 0.001 |
75% synthetic | 96.1% (95.2–97.0) | 97.3% (96.6–98.0) | 1.2% (0.6–1.8) | 89.2% (87.6–90.8) | 2.4% (1.9–2.9) | 14.2 | 9.2 ± 0.2/10 | 67.94 | < 0.001 |
100% synthetic | 92.3% (91.1–93.5) | 96.7% (95.9–97.5) | 4.4% (3.5–5.3) | 85.6% (83.5–87.7) | 3.1% (2.5–3.7) | 15.8 | 8.4 ± 0.3/10 | 38.12 | < 0.001 |