Table 1 The CXR datasets for the training, validation and testing of the deep-learning system
Subset | Developmental dataset | External validation (SYSU-PE) | ||
|---|---|---|---|---|
Training dataset | Tuning dataset | Testing dataset | ||
Number of images | 96,543 | 12,035 | 12,124 | 24,500 |
Number of individuals | 73,917 | 9,160 | 9,250 | 23,585 |
Inpatients | 38,438 (52.0%) | 4,761 (52.0%) | 4,871 (52.7%) | − |
Outpatients | 35,479 (48.0%) | 4,377 (47.8%) | 4,354 (47.1%) | − |
Patients for physical examination | − | 22 (0.2%) | 25 (0.2%) | 23,585 (100.0%) |
Male | 31,019 (42.0%) | 3,840 (41.9%) | 3,850 (41.6%) | 11,868 (50.3%) |
Mean age (years) (IQR) | 44.9 (32–59) | 45.1 (32–60) | 44.9 (32–59) | 37.8 (28–46) |
Atelectasis | 167 (0.23%) | 26 (0.28%) | 22 (0.24%) | 4 (0.02%) |
Cardiomegaly | 1,828 (2.47%) | 242 (2.64%) | 239 (2.58%) | 46 (0.20%) |
Fibrosis | 4,405 (5.96%) | 523 (5.71%) | 560 (6.05%) | 431 (1.83%) |
Infiltration | 7,085 (9.59%) | 914 (9.98%) | 886 (9.58%) | 88 (0.37%) |
Mass | 708 (0.96%) | 86 (0.94%) | 82 (0.89%) | 17 (0.07%) |
Nodule | 4,187 (5.66%) | 550 (6.00%) | 554 (5.99%) | 463 (1.96%) |
Pleural thickening | 4,192 (5.67%) | 545 (5.95%) | 544 (5.88%) | 412 (1.75%) |
Pneumonia | 8,099 (10.96%) | 1,015 (11.08%) | 1,042 (11.26%) | 164 (0.70%) |
Pneumothorax | 552 (0.75%) | 67 (0.73%) | 61 (0.66%) | 0 (0.00%) |
Consolidation | 118 (0.16%) | 12 (0.13%) | 12 (0.13%) | 0 (0.00%) |
Oedema | 133 (0.18%) | 12 (0.13%) | 21 (0.23%) | 0 (0.00%) |
Effusion | 3,903 (5.28%) | 485 (5.29%) | 462 (4.99%) | 43 (0.18%) |
Hernia | 23 (0.03%) | 3 (0.03%) | 1 (0.01%) | 1 (0.01%) |
Emphysema | 715 (0.97%) | 84 (0.92%) | 84 (0.91%) | 29 (0.12%) |
No finding | 55,320 (74.84%) | 6,823 (74.49%) | 6,882 (74.40%) | 22,319 (94.63%) |