Table 3 Synthetic data evaluation taxonomy
From: A scoping review of privacy and utility metrics in medical synthetic data
Evaluation Dimensions | Family of Methods | General Evaluation Method |
|---|---|---|
Broad Utility (Fidelity) | Univariate Similarity | Element-wise error |
1-way marginals distributional similarity | ||
Bivariate Similarity | Correlation-based similarity | |
Association-based similarity | ||
2-way marginals distributional similarity | ||
Multivariate Similarity | Dimensionality reduction comparison | |
Clustering similarity | ||
Distinguishability | ||
Multivariate distributional similarity | ||
Longitudinal Similarity | Correlation-based similarity | |
Structural comparison | ||
Narrow Utility | Replication of Predictive Models Performance | ML performance comparison |
ML explainability comparison | ||
Replication of Descriptive Statistics | Confidence interval overlap | |
Expert Assessment | Qualitative expert assessment | |
Fairness | Statistical Parity of Generated Data | Difference in descriptive statistics between subgroups |
Disparate Impact | Difference in performance for a task between subgroups | |
Privacy | Membership Inference | Record matching |
Hold-out set distinguishing | ||
Distance to real data | ||
Attribute Inference | Inference based on record matching | |
Inference based on classification/regression models |