Table 2 The reviewed publications on generating synthetic medical records of longitudinal data

From: A review on generative AI models for synthetic medical text, time series, and longitudinal data

Paper & Year

Study Objective

Case Study

Method

Key Takeaways

60, 2023

Privacy, data scarcity

Hospital visits from MIMIC III database

Hierarchical auto-regressive language model

(+) Fidelity of SHR is improved by utilizing a probabilistic and an autoregressive model for estimating longitudinal data at the visit and code level

61, 2023

Data scarcity, privacy

multi-dimensional cancer and type-2 diabetes data

GAN-boosted semi-supervised learning

(+) Utilizes the underlying graphical structure of EHRs

6, 2023

Privacy, data scarcity

EHR time series for ICU patients

Mixed-type longitudinal GAN

(+) Generating mixed-type time series by effectively capturing the temporal characteristics of the original data

9, 2023

Privacy

Critical care patients data admitted to ICU (e.g., #visits, diagnosis) from MIMIC IV dataset

Variational graph auto-encoder

(+) Generating synthetic patient trajectories from EHRs with graph learning

7, 2023

Privacy

Longitudinal health records (e.g., age, vital statistics)

RNN

(-) Generating lengthy sequences has limitations

35, 2022

Data scarcity

Type-2 diabetes data

Generative Markov-Bayesian-based model

(-) limited to a single chronic disease and using only ICD-10 data code

62, 2022

Privacy

Health records of patients with hypertension

GAN

(-) The criteria for data inclusion and exclusion could potentially result in selection bias

63, 2022

Privacy, data imputation

Parkinson’s disease and Alzheimer’s disease

Multi-modal Neural Ordinary Differential Equations

(+) Handling multi-modal data along with learning continuous-time real data trajectories (-) Limited to the static categorical variables

64, 2022

Privacy

Hospital visits from MIMIC III database

GPT-2

(+) Formulating the generation of the heterogeneous EHRs as a text-to-text translation task using LLMs

65, 2022

Privacy, data imputation

Hospital visits from MIMIC III database

DataSifter-II (ruled-based method)

(+) Improved privacy of the time-varying correlated data by using a generalized linear mixed model and random effects-expectation maximization tree

8, 2021

Privacy

Hospital visits from MIMIC III database

Bayesian network

(-) Struggling to preserve multivariate relationships in the datasets

66, 2021

Privacy

Acute kidney injury

GAN

(-) Insufficient evaluation of the fidelity and the utility

67, 2021

Privacy

The EHR from type-2 diabetes, heart failure, and hypertension

GAN

(+) Mitigation of the GAN issues by using a two-step learning method: dependency learning and conditional simulation

36, 2020

Privacy

Hospital visits from MIMIC III database

Adversarial auto-encoder

(+) Adversarially learning both the continuous latent distribution and the discrete data distribution

68, 2020

Privacy

Chronic heart failure, organ transplantation

cGAN

(+) Improved privacy; the identifiability of the SHR is quantified and employed for the optimization of a cGAN

69, 2019

Privacy

Hearing loss patients

Bayesian network

(-) Insufficient evaluation of the fidelity and the utility

70, 2019

Privacy

Hospital visits from MIMIC III database

GAN

(-) Limited to generating discrete synthetic EHRs