Table 1 The CXR datasets for the training, validation and testing of the deep-learning system

From: A deep-learning pipeline for the diagnosis and discrimination of viral, non-viral and COVID-19 pneumonia from chest X-ray images

Subset

Developmental dataset

External validation (SYSU-PE)

 

Training dataset

Tuning dataset

Testing dataset

 

Number of images

96,543

12,035

12,124

24,500

Number of individuals

73,917

9,160

9,250

23,585

Inpatients

38,438 (52.0%)

4,761 (52.0%)

4,871 (52.7%)

Outpatients

35,479 (48.0%)

4,377 (47.8%)

4,354 (47.1%)

Patients for physical examination

22 (0.2%)

25 (0.2%)

23,585 (100.0%)

Male

31,019 (42.0%)

3,840 (41.9%)

3,850 (41.6%)

11,868 (50.3%)

Mean age (years) (IQR)

44.9 (32–59)

45.1 (32–60)

44.9 (32–59)

37.8 (28–46)

Atelectasis

167 (0.23%)

26 (0.28%)

22 (0.24%)

4 (0.02%)

Cardiomegaly

1,828 (2.47%)

242 (2.64%)

239 (2.58%)

46 (0.20%)

Fibrosis

4,405 (5.96%)

523 (5.71%)

560 (6.05%)

431 (1.83%)

Infiltration

7,085 (9.59%)

914 (9.98%)

886 (9.58%)

88 (0.37%)

Mass

708 (0.96%)

86 (0.94%)

82 (0.89%)

17 (0.07%)

Nodule

4,187 (5.66%)

550 (6.00%)

554 (5.99%)

463 (1.96%)

Pleural thickening

4,192 (5.67%)

545 (5.95%)

544 (5.88%)

412 (1.75%)

Pneumonia

8,099 (10.96%)

1,015 (11.08%)

1,042 (11.26%)

164 (0.70%)

Pneumothorax

552 (0.75%)

67 (0.73%)

61 (0.66%)

0 (0.00%)

Consolidation

118 (0.16%)

12 (0.13%)

12 (0.13%)

0 (0.00%)

Oedema

133 (0.18%)

12 (0.13%)

21 (0.23%)

0 (0.00%)

Effusion

3,903 (5.28%)

485 (5.29%)

462 (4.99%)

43 (0.18%)

Hernia

23 (0.03%)

3 (0.03%)

1 (0.01%)

1 (0.01%)

Emphysema

715 (0.97%)

84 (0.92%)

84 (0.91%)

29 (0.12%)

No finding

55,320 (74.84%)

6,823 (74.49%)

6,882 (74.40%)

22,319 (94.63%)

  1. IQR, interquartile range. For all the subsets (except ‘Number of images’), the numbers given are of individuals.