Table 3 The characteristics of the benchmarking datasets

	UW Dataset		VUMC Dataset
Age	–		26.0, 40.3, 55.8	41.0 ± 18.7
Race
White	69.9%	131,830	65.2%	13,366
Black	7.9%	14,956	8.8%	1794
Asian	9.4%	17,646	1.9%	384
American Indian or Alaska Native	1.5%	2836	0.0%	42
Pacific Islander	0.8%	1563	0.0%	0
Unknown	10.5%	19,912	24.0%	4913
Gender
Male	45.3%	85,490	43.9%	8990
Female	54.7%	103,253	56.1%	11,509
Medical features for generation
Binary features
# of unique codes	2662		2581
Diagnosis (Phecode)	1736		1269
Procedure (Category)	66		67
Medication (RxNorm Ingredient)	860		1245
# of unique codes per patient	13.0, 30.0, 51.0	36.8 ± 31.3	6.0, 21.0, 59.0	45.3 ± 63.6
Continuous features
Diastolic pressure	–		68.0, 75.0, 82.0	75.0 ± 10.7
Systolic pressure	–		114.0, 124.0, 136.0	125.3 ± 15.9
Pulse	–		77.3, 90.0, 104.3	91.4 ± 18.6
Temperature	–		36.8, 37.1, 37.7	37.3 ± 0.6
Pulse Oximetry	–		95.1, 97.1, 99.0	97.1 ± 2.1
Respirations	–		16.0, 18.0, 23.9	19.6 ± 4.4
Body Mass Index	–		24.4, 30.3, 38.1	31.3 ± 8.7
Data split for prediction
Training data
Positive label	3.8%	4966	3.8%	541
Negative label	96.2%	127,158	96.2%	13,808
Evaluation data
Positive label	3.8%	2129	4.2%	260
Negative label	96.2%	54,490	95.8%	5609

x,y,z represents the first quartile, median, and third quartile. x ± y represents the mean and one standard deviation. x%y indicates that the percentage of y patients is x% among all patients.

Quick links

Search