Table 3 The characteristics of the benchmarking datasets
From: A Multifaceted benchmarking of synthetic electronic health record generation models
 | UW Dataset | VUMC Dataset | ||
---|---|---|---|---|
Age | – | 26.0, 40.3, 55.8 | 41.0 ± 18.7 | |
Race | ||||
White | 69.9% | 131,830 | 65.2% | 13,366 |
Black | 7.9% | 14,956 | 8.8% | 1794 |
Asian | 9.4% | 17,646 | 1.9% | 384 |
American Indian or Alaska Native | 1.5% | 2836 | 0.0% | 42 |
Pacific Islander | 0.8% | 1563 | 0.0% | 0 |
Unknown | 10.5% | 19,912 | 24.0% | 4913 |
Gender | ||||
Male | 45.3% | 85,490 | 43.9% | 8990 |
Female | 54.7% | 103,253 | 56.1% | 11,509 |
Medical features for generation | ||||
Binary features | ||||
 # of unique codes | 2662 | 2581 | ||
 Diagnosis (Phecode) | 1736 | 1269 | ||
 Procedure (Category) | 66 | 67 | ||
 Medication (RxNorm Ingredient) | 860 | 1245 | ||
 # of unique codes per patient | 13.0, 30.0, 51.0 | 36.8 ± 31.3 | 6.0, 21.0, 59.0 | 45.3 ± 63.6 |
Continuous features | ||||
 Diastolic pressure | – | 68.0, 75.0, 82.0 | 75.0 ± 10.7 | |
 Systolic pressure | – | 114.0, 124.0, 136.0 | 125.3 ± 15.9 | |
 Pulse | – | 77.3, 90.0, 104.3 | 91.4 ± 18.6 | |
 Temperature | – | 36.8, 37.1, 37.7 | 37.3 ± 0.6 | |
 Pulse Oximetry | – | 95.1, 97.1, 99.0 | 97.1 ± 2.1 | |
 Respirations | – | 16.0, 18.0, 23.9 | 19.6 ± 4.4 | |
 Body Mass Index | – | 24.4, 30.3, 38.1 | 31.3 ± 8.7 | |
Data split for prediction | ||||
Training data | ||||
 Positive label | 3.8% | 4966 | 3.8% | 541 |
 Negative label | 96.2% | 127,158 | 96.2% | 13,808 |
Evaluation data | ||||
 Positive label | 3.8% | 2129 | 4.2% | 260 |
 Negative label | 96.2% | 54,490 | 95.8% | 5609 |