Table 1 Summary statistics for all datasets

From: Underdiagnosis bias of artificial intelligence algorithms applied to chest radiographs in under-served patient populations

Subgroup

Attribute

CXR

CXP

NIH

ALL

 

No. of images

371,858

223,648

112,120

707,626

Sex (%)

Male

52.17

59.36

56.49

55.13

 

Female

47.83

40.64

43.51

44.87

Age (%)

0–20 years

2.20

0.87

6.09

2.40

 

20–40 years

19.51

13.18

25.96

18.53

 

40–60 years

37.20

31.00

43.83

36.29

 

60–80 years

34.12

38.94

23.11

33.90

 

>80 years

6.96

16.01

1.01

8.88

Race/Ethnicity (%)

Asian

3.24

–

–

–

 

Black

18.59

–

–

–

 

Hispanic

6.41

–

–

–

 

Native

0.29

–

–

–

 

White

67.64

–

–

–

 

Other

3.83

–

–

–

Insurance (%)

Medicare

46.07

–

–

–

 

Medicaid

8.98

–

–

–

 

Other

44.95

–

–

–

 

AUC ± 95% CI

0.834 ± 0.001

0.805 ± 0.001

0.835 ± 0.002

0.859 ± 0.001

  1. The datasets studied are MIMIC-CXR (CXR)26, CheXpert (CXP)27, ChestX-ray14 (NIH)28 and a multi-source dataset (ALL) composed of aggregated data from the CXR, CXP and NIH datasets using the shared labels (disease labels and the no finding label) in all three datasets. The deep learning model is trained on each of the CXR, CXP, NIH and ALL datasets. The model’s AUCs are then estimated for each of the labels in the CXR (14 labels), CXP (14 labels), NIH (15 labels) and ALL (8 labels) datasets, and are averaged over all of the labels for each dataset. The reported AUC ± 95% confidence interval (CI) for each dataset is then the average of the AUCs for the five trained models with different random seeds using the same train–validation–test split.