Table 1 Baseline characteristics of protected health information-containing dataset from primary patient cohort at DFCI

From: Empirical evaluation of artificial intelligence distillation techniques for ascertaining cancer outcomes from electronic health records

Baseline characteristics

N (%)

Median age in years (IQR)

 Age at diagnosis

56 (48–64)

 Age at protocol enrollment

60 (52–67)

 Age at sequencing

60.5 (52–68)

Gender

 Male

2020 (39.2)

 Female

3133 (60.8)

Race

 White/Caucasian

4653 (90.2)

 Asian/Pacific Islander

149 (2.9)

 Black/African American

145 (2.8)

 Native American/Alaskan Native

4 (0.07)

 Others/Multiple race

76 (1.5)

 Unknown

126 (2.4)

Ethnicity

 Non-Hispanic/Non-Latino

5024 (97.4)

 Hispanic/Latino

129 (2.5)

Cancer type

 Breast

1006 (19.5)

 Lung

573 (11.1)

 Ovarian

539 (10.5)

 Colon/Rectum

258 (5.0)

 Kidney

193 (3.7)

 Prostate

189 (3.6)

 Pancreas

161 (3.1)

 Others

2234 (43.4)

Data Subsets

 Training

4121 (79.9)

 Validation

518 (10.1)

 Held-out Test

514 (9.9)

  1. N participants, IQR interquartile range.