Table 1 Summary statistics of phenotypes used in the training dataset.

From: Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations

Characteristic

Black (N = 7601)

Hispanic/Latino (N = 7320)

White (N = 14142)

Overall (N = 29063)

Sex

 Male

3066 (40.3%)

3088 (42.2%)

6432 (45.5%)

12586 (43.3%)

 Female

4535 (59.7%)

4232 (57.8%)

7710 (54.5%)

16477 (56.7%)

Age

 Mean (SD)

50.6 (16.9)

48.2 (14.3)

50.2 (16.4)

49.8 (16.1)

 Median [Min, Max]

52.0 [2.00, 93.0]

49.0 [5.00, 86.0]

51.0 [3.00, 98.0]

51.0 [2.00, 98.0]

Triglycerides

 Mean (SD)

106 (69.1)

135 (96.0)

125 (82.2)

124 (84.2)

 Median [Min, Max]

90.0 [16.0, 1930]

113 [20.0, 1670]

106 [17.0, 1600]

103 [16.0, 1930]

 Missing

2598 (34.2%)

1316 (18.0%)

3073 (21.7%)

6987 (24.0%)

Total cholesterol

 Mean (SD)

198 (41.8)

200 (43.2)

205 (39.2)

202 (41.0)

 Median [Min, Max]

196 [74.0, 450]

197 [62.0, 526]

202 [77.8, 594]

199 [62.0, 594]

 Missing

2598 (34.2%)

1316 (18.0%)

3073 (21.7%)

6987 (24.0%)

Systolic blood pressure

 Mean (SD)

127 (20.9)

121 (17.2)

118 (17.1)

121 (18.5)

 Median [Min, Max]

123 [73.0, 246]

119 [77.0, 218]

116 [67.0, 227]

118 [67.0, 246]

 Missing

1944 (25.6%)

1589 (21.7%)

2972 (21.0%)

6505 (22.4%)

Sleep duration

 Mean (SD)

6.50 (1.51)

7.73 (1.52)

7.09 (1.16)

7.15 (1.44)

 Median [Min, Max]

6.00 [1.00, 16.5]

7.79 [2.00, 13.4]

7.00 [1.00, 16.0]

7.00 [1.00, 16.5]

 Missing

2352 (30.9%)

411 (5.6%)

4468 (31.6%)

7231 (24.9%)

Height

 Mean (SD)

168 (10.4)

163 (9.24)

168 (10.3)

167 (10.3)

 Median [Min, Max]

168 [85.7, 207]

162 [116, 194]

168 [94.0, 203]

166 [85.7, 207]

Diastolic blood pressure

 Mean (SD)

109 (44.2)

90.5 (36.7)

88.4 (36.2)

94.3 (39.5)

 Median [Min, Max]

85.5 [18.0, 267]

76.0 [40.0, 256]

74.7 [18.0, 246]

77.0 [18.0, 267]

 Missing

236 (3.1%)

9 (0.1%)

308 (2.2%)

553 (1.9%)

HDL cholesterol

 Mean (SD)

52.4 (14.9)

49.1 (13.3)

52.1 (16.0)

51.4 (15.1)

 Median [Min, Max]

50.0 [15.4, 162]

47.0 [13.0, 141]

50.0 [9.63, 143]

49.0 [9.63, 162]

 Missing

328 (4.3%)

7 (0.1%)

710 (5.0%)

1045 (3.6%)

LDL cholesterol

 Mean (SD)

123 (38.1)

122 (36.7)

125 (36.1)

124 (36.8)

 Median [Min, Max]

120 [11.6, 435]

120 [23.8, 417]

123 [13.8, 505]

121 [11.6, 505]

 Missing

376 (4.9%)

143 (2.0%)

877 (6.2%)

1396 (4.8%)

BMI

 Mean (SD)

30.0 (7.19)

30.1 (6.29)

26.3 (4.99)

28.2 (6.25)

 Median [Min, Max]

28.9 [12.7, 91.8]

29.1 [14.9, 70.3]

25.6 [11.6, 66.6]

27.2 [11.6, 91.8]

 Missing

6 (0.1%)

9 (0.1%)

8 (0.1%)

23 (0.1%)

  1. Mean, Median, and percent of missing data for the phenotypes and covariates (sex and age) used in this study. Most missing values for systolic blood pressure, total cholesterol, and triglycerides are due to medication use. All the phenotypes are presented for the whole database as well as stratified by race/ethnicity (Black, White, and Hispanic/Latino). Summary statistics for the test dataset are provided in Supplementary Table 3.