Table 1 Summary information of datasets from the TCGA and the 1000 Genome Project that were used in this study.

From: Identification of 12 cancer types through genome deep learning

Cancer type

Samples

SNVs files

Age

Gender

Race

Tumor stage

Vital Status

(N)

(N)

(mean ± s.d.)

Male

Female

White

American

Asian

NA

I

II

III

IV

NA

Alive

Deceased

NA

 

(%)

(%)

(N)

(N)

(N)

(N)

(N)

(N)

(N)

(N)

(N)

(N)

(N)

(N)

BLCA

412

425

73.1 ± 10.5

73.79

26.21

327

23

44

18

2

131

141

136

2

230

182

0

BRCA

1044

1080

67.0 ± 13.1

1.05

98.95

719

180

59

86

173

588

241

20

22

898

146

0

COAD

433

493

74.5 ± 13.6

51.97

48.03

212

59

11

151

90

166

118

46

13

332

99

2

GBM

396

498

63.1 ± 13.2

63.36

36.64

337

41

6

12

0

0

0

0

396

88

303

5

KIRC

339

376

69.1 ± 12.0

64.60

35.40

275

52

6

6

193

33

2

69

42

258

81

0

LGG

513

530

49.6 ± 12.8

55.27

44.73

472

22

8

11

0

0

0

0

513

386

126

1

LUSC

497

561

73.4 ± 9.1

73.84

26.16

348

30

9

110

242

160

84

7

4

279

218

0

OV

443

610

66.6 ± 11.7

0

100

376

31

14

22

0

0

0

0

443

188

253

2

PRAD

498

503

69.0 ± 7.1

100

0

147

7

2

342

0

0

0

0

498

488

10

0

SKCM

470

472

66.1 ± 14.9

61.70

38.30

447

1

12

10

77

140

185

23

45

249

221

0

THCA

496

504

55.5 ± 15.5

26.41

73.59

325

27

51

93

331

51

110

2

2

482

14

0

UCEC

542

561

72.2 ± 11.2

0

100

371

119

20

32

0

0

0

0

542

451

91

0

IGSR

1991

1991