Table 1 Dataset characteristics for the development, standalone testing, clinical testing, and subset of NIH ChestX-ray8 datasets.

From: Deep learning improves physician accuracy in the comprehensive detection of abnormalities on chest X-rays

Ā 

Development Dataset

Standalone Testing Dataset

Clinical Testing Dataset

Subset of NIH ChestX-ray8 Dataset

Radiographs

Ā 

No. of U.S. hospitals, outpatient centers, and specialty centers

15

12

9

n/a

No. of chest cases

341,355

20,000

238

922

No. of radiographs

492,996

28,928

345

922

No. of X-ray device manufacturers

11

9

8

n/a

Patients

Ā 

No. of patients

185,114

15,631

236

847

Case Demographics

Ā 

No. of female cases (%)

187,488 (54.92)

10,379 (51.90)

132 (55.46)

401 (43.49)

No. 22–44 years old cases (%)

60,180* (17.63)

3,481 (17.41)

29 (12.18)

334 (36.23)

No. 45–64 years old cases (%)

122,307* (35.83)

7,430 (37.15)

92 (38.66)

447 (48.48)

No. 65–74 years old cases (%)

64,365* (18.86)

4,368 (21.84)

53 (22.27)

112 (12.15)

No. 75 + years old cases (%)

90,799* (26.60)

4,721 (23.61)

64 (26.89)

29 (3.15)

Labelers

Ā 

No. of U.S. board certified radiologists

17

17

17

13

Median experience (years)

14

14

14

14

No. of total labels

6,202,776

480,000

5,712

22,128

No. of total labels per category

775,347

60,000

714

2,766

No. of positive labels (%)

1,428,303 (23.03)

131,526 (27.40)

1,647 (28.83)

6,617 (29.9)

No. of positive cardiac labels (%)

228,659 (29.49)

18,942 (31.57)

228 (31.93)

462 (16.70)

No. of positive Mediastinum/Hila labels (%)

130,699 (16.86)

12,139 (20.23)

162 (22.69)

664 (24.01)

No. of positive lung labels (%)

394,842 (50.92)

36,639 (61.07)

452 (63.31)

1,979 (71.55)

No. of positive pleura labels (%)

202,697 (26.14)

19,521 (32.54)

230 (32.21)

1,026 (37.09)

No. of positive bone labels (%)

92,514 (11.93)

7,401 (12.34)

103 (14.43)

308 (11.14)

No. of positive soft tissue labels (%)

26,592 (3.43)

2,142 (3.57)

34 (4.76)

204 (7.38)

No. of positive hardware labels (%)

305,516 (39.40)

29,744 (49.57)

360 (50.42)

1,609 (58.17)

No. of positive other labels (%)

46,784 (6.03)

4,998 (8.33)

78 (10.92)

365 (13.20)

  1. * Patient age was missing for 1% of the development dataset due to de-identification procedures so the percentages reported when stratified by age group add up to 99%