Table 1 Basic characteristics of LC cases and controls in the discovery and validations sets.

From: Rare deleterious germline variants and risk of lung cancer

 

Discovery

 

Validation#

         

Characteristics

TRICL

 

GELCC

COPDGene

TCGA

gnomAD

OncoArray

 

Affymetrix

 

UKB

 

Platform

WES

 

WES

WES

WES

WES + WGS

Genotyping

 

Exome array

 

Genotyping

 

N (%)&

LC Case n = 1045

Control n = 885

LC case n = 380

Controls n = 318

LC cases n = 1015

Controls n = 134,187

LC cases n = 17,878

Controls n = 13,425

LC case n = 5364

Controls n = 5724

LC Case n = 2166

Controls n = 401,453

Ethnicity

 

P < 0.0001

   

P < 0.0001

      

 White

909 (87%)

830 (94%)

372 (98%)

318 (100%)

742 (73%)

94,134 (70%)

13,876 (78%)

11,011 (82%)

3086 (58%)

3550 (62%)

2094 (97%)

375,894 (94%)

 Other†

136 (13%)

55 (6%)

6 (2%)

0

273 (27%)

40, 053 (30%)

210 (1%)

128 (1%)

625 (12%)

652 (11%)

65 (3%)

24,055 (6%)

Age, yr.

 

P = 0.006

         

P < 0.0001

 Mean (range)

63 (24–91)

61 (20–90)

64 (30–87)

63 (55–80)

65 (30–90)

54 (18–90)

64 (19–95)

62 (18–97)

61 (30–95)

59 (31–91)

62 (40–70)

56 (37–73)

 <60 yr.

418 (40%)

356 (40%)

102 (27%)

88 (28%)

214 (21%)

6036 (43%)

5303 (40%)

2335 (43%)

3063 (53%)

624 (29%)

242,687 (60%)

Sex

           

P < 0.0001

 Male

614 (59%)

515 (58%)

232 (61%)

172 (54%)

563 (59%)

73,370 (55%)

11,147 (62%)

8274 (62%)

2930 (55%)

3125 (55%)

1182 (55%)

186,083 (46%)

 Female

431 (41%)

370 (42%)

171 (45%)

146 (46%)

452 (41%)

60,817 (45%)

6731 (38%)

5151 (38%)

2434 (45%)

2599 (45%)

984 (45%)

215,370 (54%)

Smoking

 

P < 0.0001

     

P < 0.0001

 

P < 0.0001

 

P < 0.0001

 Never

125 (12%)

308 (35%)

31 (8%)

0

173 (17%)

1720 (10%)

4152 (31%)

572 (11%)

1726 (30%)

203 (10%)

236,246 (59%)

 Ever

918 (88%)

576 (65%)

346 (91%)

318 (100%)

742 (73%)

15,889 (89%)

8998 (67%)

4675 (87%)

3972 (69%)

1945 (90%)

163,226 (41%)

 Mean PY (range)

42 (0–196)

23 (0–133)

46 (0–165)

54 (10–97)

42 (0–154)

46(0–315)

33 (0–260)

45 (0–231)

34 (0–218)

40 (0–220)

23 (0–301)

FHLC

 

P < 0.0001

         

P < 0.0001

 Yes

506 (48%)

72 (8%)

122 (33%)

 

457 (21%)

49,104 (12%)

 No

359 (34%)

306 (35%)

258 (67%)

 

1709 (79%)

352,349 (88%)

Histology

            

 AD

459 (44%)

182 (48%)

577 (57%)

6568 (37%)

2106 (39%)

781 (36%)

 SCC

342 (33%)

118 (31%)

438 (43%)

4284 (24%)

1131 (21%)

461 (21%)

 Other

244 (23%)

80 (21%)

0

7026 (39%)

2127 (40%)

924 (43%)

  1. TRICL Transdisciplinary Research in Cancer of the Lung, WES whole-exome sequencing, WGS whole-genome sequencing, LC lung cancer, PY pack-year, FHLC family history of LC (first degree), AD adenocarcinoma, SCC squamous cell carcinoma.
  2. &Numbers do not add up due to missing data.
  3. †Other ethnicities in TRICL (one African control subject and 190 unknown), TCGA (8% African American 2% East Asian, and 17% unknown), gnomAD (8.8% African, 7.2% East Asian, 11.4% South Asian, and 2.5% other). Genetic ancestry analysis of TRICL subjects shows most of the subjects of the “unknown” race were located between the European- and Asian-ancestry clusters (Supplemental Fig. 1). Genetic ancestry analysis of TCGA patients shows the vast majority of subjects with “unknown” race were primarily genetic European ancestry (i.e., 90% TCGA-LCs were genetically Europeans)82.
  4. #The validation sets include 26,803 LCs and 555,107 controls: (1) Genetic Epidemiology of LC (GELCC) WES data for 380 LCs (258 sporadic and 122 FLC were selected from high-risk LC families with at least two first-degree relatives affected with LC); (2) COPDGene WES data for 318 controls with normal lung function; (3) TCGA (The Cancer Genome Atlas) germline WES data for 1015 LCs; (4) GnomAD (genome aggregation database, v2.1) WES and WGS data for 134,187 non-cancer controls (excluded individuals from cancer cohort studies, such as the TCGA cohort). (5) OncoArray genotyping data for 17,878 LCs vs. 13,425 controls; (6) Affymetrix exome array data for 5364 LCs vs. 5724 controls; (7) UK Biobank (UKB) genotyping data for 2166 LCs vs. 401,453 controls.