Synthetic data generated by generative artificial intelligence models can serve as a substitute for real patient data. In this Review, Eckardt et al. discuss how synthetic data sets can overcome barriers to data access and sharing, democratize scientific discovery in cancer research, and reduce the costs and failure rates of cancer clinical trials. They also discuss how this will only become possible if we can overcome the challenges of a lack of standardization in training data selection, model evaluation, bias mitigation, privacy preservation and quality assurance.
- Jan-Niklas Eckardt
- Waldemar Hahn
- Jakob Nikolas Kather