Table 1 Baseline characteristics of the women with breast cancer diagnosed in 2004 that were included in this study.

From: Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data

 

Alive, n (%)

Breast cancer, n (%)

CVS, n (%)

Non-breast cancer, n (%)

Other cause, n (%)

Total

28,323 (62.82)

7752 (17.19)

3120 (6.92)

1874 (4.16)

4016 (8.91)

Age 65+ years

 No

21,382 (75)

4727 (61)

417 (13)

695 (37)

849 (21)

 Yes

6941 (25)

3025 (39)

2703 (87)

1179 (63)

3167 (79)

Grade 2tier

 Low

16,939 (65)

2686 (41)

1933 (69)

1057 (64)

2489 (69)

 High

9265 (35)

3801 (59)

849 (31)

586 (36)

1103 (31)

T Category

 T1–2

25,813 (91)

4530 (58)

2645 (85)

1556 (83)

3443 (86)

 T3–4

1450 (5)

2181 (28)

283 (9)

156 (8)

324 (8)

 Unknown

1060 (4)

1041 (13)

192 (6)

162 (9)

249 (6)

N category

 0

19,410 (69)

2399 (31)

2128 (68)

1221 (65)

2744 (68)

 1

6398 (23)

2253 (29)

544 (17)

338 (18)

764 (19)

 2

1445 (5)

1123 (14)

174 (6)

119 (6)

184 (5)

 3

518 (2)

962 (12)

74 (2)

60 (3)

94 (2)

 Unknown

552 (2)

1015 (13)

200 (6)

136 (7)

230 (6)

M category

 0

27,595 (97)

5551 (72)

2876 (92)

1684 (90)

3718 (93)

 1

174 (1)

1636 (21)

79 (3)

105 (6)

115 (3)

 Unknown

554 (2)

565 (7)

165 (5)

85 (5)

183 (5)

Summary stage 2000 (1998+)

 Blank(s)

14,498 (51)

947 (12)

1434 (46)

881 (47)

1940 (48)

 Distant

9929 (35)

2158 (28)

984 (32)

546 (29)

1246 (31)

 Localized

2453 (9)

2170 (28)

341 (11)

213 (11)

400 (10)

 Regional

178 (1)

1637 (21)

80 (3)

111 (6)

118 (3)

 Unknown/unstaged

1265 (4)

840 (11)

281 (9)

123 (7)

312 (8)

Diagnosis confirmation

 Microscopic diagnosis

28,283 (100)

7539 (97)

3084 (99)

1852 (99)

3980 (99)

 Radiologic and clinical diagnosis

33 (0)

134 (2)

30 (1)

16 (1)

33 (1)

 Other

<10

79 (1)

<10

<10

<10

Histology type

 IDC

20,014 (71)

5113 (66)

2064 (66)

1220 (65)

2585 (64)

 ILC

1930 (7)

590 (8)

266 (9)

136 (7)

383 (10)

 MDLC

2472 (9)

592 (8)

218 (7)

135 (7)

317 (8)

 IDC with mixed feature

157 (1)

38 (0)

17 (1)

6 (0)

19 (0)

 ILC with mixed feature

860 (3)

153 (2)

111 (4)

58 (3)

142 (4)

 Others

2890 (10)

1266 (16)

444 (14)

319 (17)

570 (14)

ER/PR receptor status

 ER− PR−

5234 (18)

2180 (28)

431 (14)

332 (18)

590 (15)

 ER+ PR−

3094 (11)

1004 (13)

370 (12)

187 (10)

486 (12)

 ER− PR+

393 (1)

114 (1)

20 (1)

23 (1)

26 (1)

 ER+ PR+

16,630 (59)

3118 (40)

1816 (58)

1053 (56)

2316 (58)

 Other/unknown

2972 (10)

1336 (17)

483 (15)

279 (15)

598 (15)

Laterality

 Missing

14 (0)

45 (1)

<10

<10

<10

 Left

14,162 (50)

3871 (50)

1617 (52)

929 (50)

2036 (51)

 Unknown site

23 (0)

165 (2)

13 (0)

30 (2)

10 (0)

 Right

14,124 (50)

3671 (47)

1487 (48)

910 (49)

1967 (49)

Surgery

 Lumpectomy

17,179 (61)

2349 (30)

1599 (51)

1000 (53)

2083 (52)

 Mastectomy

10,593 (37)

3650 (47)

1297 (42)

732 (39)

1630 (41)

 Other/unknown

551 (2)

1753 (23)

224 (7)

142 (8)

303 (8)

Radiotherapy

 No

12,538 (44)

4556 (59)

2001 (64)

1005 (54)

2475 (62)

 Yes

15,785 (56)

3196 (41)

1119 (36)

869 (46)

1541 (38)

Chemotherapy

 No

15,240 (54)

3451 (45)

2641 (85)

1297 (69)

3238 (81)

 Yes

13,083 (46)

4301 (55)

479 (15)

577 (31)

778 (19)

Percent of high school education attainment, quartilea

 Q1

7412 (26)

1694 (22)

717 (23)

442 (24)

931 (23)

 Q2

7266 (26)

1800 (23)

773 (25)

492 (26)

1049 (26)

 Q3

6719 (24)

2077 (27)

808 (26)

481 (26)

1023 (25)

 Q4

6926 (24)

2181 (28)

822 (26)

459 (24)

1013 (25)

Percent of persons in poverty, quartilea

 Q1

7541 (27)

1706 (22)

720 (23)

454 (24)

903 (22)

 Q2

7179 (25)

1854 (24)

708 (23)

487 (26)

1004 (25)

 Q3

6687 (24)

2005 (26)

845 (27)

478 (26)

1089 (27)

 Q4

6916 (24)

2187 (28)

847 (27)

455 (24)

1020 (25)

Percent of foreign-born residents, quartilea

 Q1

6695 (24)

2066 (27)

893 (29)

516 (28)

1177 (29)

 Q2

7006 (25)

1950 (25)

790 (25)

458 (24)

1042 (26)

 Q3

7264 (26)

1814 (23)

742 (24)

485 (26)

968 (24)

 Q4

7358 (26)

1922 (25)

695 (22)

415 (22)

829 (21)

Rural urban continuum 2003

 Metro

25,411 (90)

6831 (88)

2705 (87)

1642 (88)

3452 (86)

 Non-metro

2912 (10)

921 (12)

415 (13)

232 (12)

564 (14)

  1. The cells with case number fewer than ten were statistically suppressed to protect patient privacy. T, N and M categories were classified according to the AJCC 6 TNM staging manual.
  2. IDC invasive ductal carcinoma, ILC invasive lobular carcinoma, MDLC mixed invasive ductal and lobular carcinoma, ER estrogen receptor, PR progesterone receptor.
  3. aCounty attributes of the year 2000 (from the U.S. Census Bureau); education attainment defined as the percent of residents with less than high-school graduate in the county; person in poverty defined as the percent of residents with income below 200% of poverty in the county.