Table 2 Summary of the data extracted for each paper included in our systematic review

From: Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

Reference

Diagnosis/prognosis

Data used in model

Predictors

Sample size development

Sample size test

Type of validation

Evaluation

Public code

Is this paper describing a COVID-19 diagnosis or prognosis model (or both)?

Does this use CXR or CT (or both)?

What are the predictors? In purely deep learning models, this is DL.

Total sample size used for development (that is, training and validation and NOT test set), along with number of positive outcomes.

Total sample size used for testing of the algorithm, along with the number of positive outcomes.

k-fold CV, external validation in k centres, no validation and so on

Performance of the model, AUC, confidence interval, sensitivity, specificity and so on. 95% CI if available.

Is there code available? (Is the trained model available?)

Ghoshal and Tucker17

Diagnosis

CXR

DL

4,752 images, 54 COVID-19

1,189 images, 14 COVID-19

Unclear validation procedure

Unclear in the paper

No

Li et al.34

Diagnosis

CXR

DL

429 images, 143 COVID-19

108 images, 36 COVID-19

Internal holdout validation

Accuracy, 0.880; AUC, 0.970

Yes (Yes)

Ezzat et al.28

Diagnosis

CXR

DL

Unclear in the paper

Unclear in the paper

Internal holdout validation

Precision (w), 0.98; recall (w), 0.98; F1 score (w), 0.98

No

Tartaglione et al.16

Diagnosis

CXR

DL

231 images, 126 COVID-19

135 images, 90 COVID-19

Internal holdout validation

Unclear in the paper

No

Luz et al.30

Diagnosis

CXR

DL

13,569 images, 152 COVID-19

231 images, 31 COVID-19

Internal holdout validation

Accuracy, 0.94; sensitivity, 0.97; PPV, 1.00

Yes (Yes)

Bassi and Attux31

Diagnosis

CXR

DL

2,724 images, 159 COVID-19

180 images, 60 COVID-19

Internal holdout validation

Recall, 0.98; precision, 1.00

No

Gueguim Kana et al.32

Diagnosis

CXR

DL

Unclear in the paper

Unclear in the paper

External validation

Accuracy, 0.99; recall, 1.00; precision, 0.99; F1 score, 1.00

No

Heidari et al.33

Diagnosis

CXR

DL

8,474 images, 415 COVID-19

848 images,42 COVID-19

Internal holdout validation

Precision (w), 0.95; recall (w), 0.94; F1 score (w), 0.94

No

Farooq and Hafeez29

Diagnosis

CXR

DL

Unclear in the paper

637 images, 8 COVID-19

Internal holdout validation

Accuracy, 0.96; sensitivity, 0.97; PPV, 0.99; F1 score, 0.98

No

Zhang et al.27

Diagnosis

CXR

DL

5,236 images, 2,582 COVID-19

5,869 images, 3,223 COVID-19

Internal holdout validation

AUC, 0.92; sensitivity, 0.88; specificity, 0.79

Yes (No)

Zhang et al.37

Diagnosis

CXR

DL

386 images, 150 COVID-19

101 images, 39 COVID-19

Internal holdout validation

Accuracy, 0.91

No

Wang et al.26

Diagnosis

CXR

DL

3,522 images, 204 COVID-19

61 images, 20 COVID-19

Internal holdout validation

AUC, 1.00; accuracy, 0.99

No

Bararia et al.25

Diagnosis

CXR

DL

Unclear in the paper

1,000 images, 341 COVID-19

Internal holdout validation

Accuracy, 0.81; sensitivity, 0.81; specificity, 0.90; precision, 0.74; recall, 0.77; F1 score, 0.75

No

Tsiknakis et al.21

Diagnosis

CXR

DL

458 (CV) images, 98 COVID-19

114 (CV) images, 24 COVID-19

Fivefold internal cross-validation

AUC, 1.00; accuracy, 1.00; sensitivity, 0.99; specificity, 1.00

Yes (No)

Malhotra et al.18

Diagnosis

CXR

DL

26,464 images, 1,740 COVID-19a

6,299 images, 125 COVID-19a

Internal holdout validation

Sensitivity, 0.87; specificity, 0.97

No

Sayyed et al.36

Diagnosis

CXR

DL

5,018 (CV) images, 334 COVID-19

1,255 (CV) images, 83 COVID-19

Fivefold internal cross-validation

Accuracy, 0.99 ± 0.05

Yes (No)

Rahaman et al.19

Diagnosis

CXR

DL

720 images, 220 COVID-19

140 images, 40 COVID-19

Internal holdout validation

Accuracy, 0.89; precision, 0.90; recall, 0.89; F1 score, 0.90

No

Amer et al.20

Diagnosis

CXR

DL

Unclear in the paper

Unclear in the paper

Internal holdout validation

AUC, 0.98; accuracy, 0.94; sensitivity, 0.92; specificity, 0.97; PPV, 0.98

No

Elaziz et al.22

Diagnosis

CXR

Hand-engineered radiomic features

Unclear in the paper

Unclear in the paper

Internal holdout validation and external validation

Internal validation: accuracy, 0.96; recall, 0.99; precision, 0.96

External validation: accuracy, 0.98; recall, 0.99; precision, 0.99

No

Tamal et al.24

Diagnosis

CXR

Hand-engineered radiomic features.

378 images, 226 COVID-19

165 images, 115 COVID-19

Internal holdout validation

Sensitivity, 1.00;

specificity, 0.85

Nob

Gil et al.23

Diagnosis

CXR

Hand-engineered radiomic features

Unclear in the paper

Unclear in the paper

Internal holdout validation

Accuracy, 0.96; sensitivity, 0.98; specificity, 0.93; precision, 0.96

Yes (Yes)

Zokaeinikoo et al.35

Diagnosis

CXR and CT

DL

Unclear in the paper

Unclear in the paper

Tenfold internal cross-validation

Accuracy, 0.99; sensitivity, 0.99; specificity, 1.00; PPV, 1.00

No

Amyar et al.44

Diagnosis

CT

DL

944 patients, 399 COVID-19

100 patients, 50 COVID-19

Internal holdout validation

Accuracy, 0.95; sensitivity, 0.96; specificity, 0.92; AUC, 0.97

No

Ardakani et al.45

Diagnosis

CT

DL

Unclear as splits do not total correctly

Unclear as splits do not total correctly

Internal holdout validation

AUC, 0.99; sensitivity, 1.00; specificity, 0.99; accuracy, 1.00; PPV, 0.99; NPV, 1.00

No

Bai et al.81

Diagnosis

CT

DL

118,401 images, 60,776 COVID-19

14,182 images, 5,040 COVID-19

Internal holdout validation

AUC, 0.95; accuracy, 0.96; sensitivity, 0.95; specificity, 0.96

Yes (Yes)

Jin et al.50

Diagnosis

CT

DL

1,136 images, 723 COVID-19

282 images, 154 COVID-19

Internal holdout validation

Sensitivity, 0.97; specificity, 0.92; AUC, 0.99

No

Wang et al.42

Diagnosis

CT

DL

320 images, 160 COVID-19

Internal validation: 455 images, 95 COVID-19

External validation: 290 images, 70 COVID-19

Internal holdout validation and external validation

Internal validation: AUC, 0.93 [0.90, 0.96]

External validation: AUC, 0.81 [0.71, 0.84]

No

Ko et al.41

Diagnosis

CT

DL

3,194 (CV) images, 955 COVID-19

Internal cross-validation: 799 (CV) images, 239 COVID-19

External validation: 264 images, all COVID-19

Fivefold internal cross-validation and external validation

Internal validation: AUC, 1.00; accuracy, 1.00; sensitivity, 1.00; specificity, 1.00

External validation: accuracy, 0.97

No

Acar et al.48

Diagnosis

CT

DL

2,552 images, 1,085 COVID-19

580 images, 246 COVID-19

Internal holdout validation

AUC, 1.00; accuracy, 1.00; error, 0.01; precision, 1.00; recall, 1.00; F1 score, 1.00

No

Pu et al.43

Diagnosis

CT

DL

Unclear in the paper

Unclear in the paper

Internal holdout validation

AUC, 0.70 [0.56, 0.85]; sensitivity, 0.98; specificity, 0.28

No

Chen et al.49

Diagnosis

CT

DL

770 (CV) images, 413 COVID-19

Internal cross-validation: 86 (CV) images, 46 COVID-19

Tenfold internal cross-validation

AUC, 0.94 ± 0.01; accuracy, 0.88 ± 0.01; precision, 0.90 ± 0.01; recall, 0.88 ± 0.01

No

Shah et al.52

Diagnosis

CT

DL

664 images, 314 COVID-19

74 images, 35 COVID-19

Internal holdout validation

Accuracy, 0.95

No

Han et al.47

Diagnosis

CT

DL

368 (CV) images, 184 COVID-19

92 (CV) images, 46 COVID-19

Fivefold internal cross-validation

AUC, 0.99; accuracy, 0.98

Nob

Wang et al.53

Diagnosis

CT

DL

3,997 images, 1,095 COVID-19

600 images, 200 COVID-19

Internal holdout validation

AUC, 0.97; accuracy, 0.93; specificity, 0.96; precision, 0.88; recall, 0.88

No

Wang et al.54

Diagnosis

CT

DL

2,447 images, 1,647 COVID-19

Internal validation: 639 images, 439 COVID-19 External validation: 2,120 images, 217 COVID-19

Internal holdout and external validation

Internal validation: AUC, 0.99; sensitivity, 0.97; specificity, 0.85

External validation: AUC, 0.95; sensitivity: 0.92; specificity, 0.85

No

Goncharov et al.71

Diagnosis and severity prognosis

CT

DL

Unclear in the paper

Diagnosis: 101 images, 33 COVID-19

Severity: 38 images of differing severity

Internal holdout validation

Diagnosis model: AUC, 0.95

Severity model: correlation, 0.98

Noc

Xie et al.61

Diagnosis

CT

Hand-engineered radiomic features

225 images, 27 COVID-19

76 images, 6 COVID-19

Internal holdout validation

AUC, 0.91; accuracy, 0.90; sensitivity, 0.83; specificity, 0.90

No

Xu et al.62

Diagnosis

CT

DL and hand-engineered radiomic features

551 images, 289 COVID-19

138 images, 73 COVID-19

Internal holdout validation

Accuracy, 0.98; F1 score, 0.99

Nod

Qin et al.60

Diagnosis

CT

Hand-engineered radiomic features

118 patients, 62 COVID-19

50 patients, 26 COVID-19

Internal holdout validation

AUC, 0.85 [0.74, 0.96]; sensitivity, 0.89; specificity, 0.92

No

Georgescu et al.40

Diagnosis

CT

DL and hand-engineered radiomic features

1,902 patients, 1,050 COVID-19

194 patients, 100 COVID-19

Internal holdout validation

AUC, 0.90; sensitivity, 0.86; specificity, 0.81

No

Guiot et al.58

Diagnosis

CT

Hand-engineered radiomic features

Unclear in the paper

Unclear in the paper

Internal holdout validation

AUC, 0.94 [0.88, 1.00]; accuracy, 0.90 [0.84, 0.94]; sensitivity, 0.79; specificity, 0.91

No

Shi et al.57

Diagnosis

CT

Hand-engineered radiomic features

2,148 (CV) images, 1,326 COVID-19

Internal cross-validation: 537 (CV) images, 332 COVID-19

Fivefold internal cross-validation

AUC, 0.94; accuracy, 0.88; sensitivity, 0.91; specificity, 0.83

No

Mei et al.46

Diagnosis

CT

DL and CNN extracted features and clinical data

626 images, 285 COVID-19

279 images, 134 COVID-19

Internal holdout validation

AUC, 0.92 [0.89, 0.95]; sensitivity, 0.843 [0.77, 0.90]; specificity, 0.83 [0.76, 0.89]

Yes (Yes)

Chen et al.59

Diagnosis

CT

Clinical features, qualitative imaging features and hand-engineered radiomic imaging features

98 patients, 51 COVID-19

38 images, 19 COVID-19

Internal holdout validation

AUC, 0.94 [0.87, 1.00]; accuracy, 0.76; sensitivity, 0.74; specificity, 0.79

No

Wang et al.51

Diagnosis and prognosis for length of hospital stay

CT

Diagnosis model: DL

Prognosis model: 64 CNN features and clinical factors

709 images, 560 COVID-19

Validation 1: 226 images, 102 COVID-19

Validation 2: 161 images, 92 COVID-19

Validation 3: 53 images, all with length of hospital stay

Validation 4: 117 images, all with length of hospital stay

External validation

Validation 1 (diagnosis): AUC, 0.87

Validation 2 (diagnosis): AUC, 0.88

Validation 3 (prognosis): KM separation, P = 0.01

Validation 4 (prognosis): KM separation, P = 0.01

Yes (Yes)

Li et al.66

Prognosis for severity

CXR

DL

354 images of differing severities

Internal validation: 108 images

External validation: 111 images

Internal holdout validation and external validation

Internal validation: correlation, 0.88

External validation: correlation, 0.90

Yes (No)

Li et al.67

Prognosis for severity

CXR

DL

314 images of differing severities

Internal validation: 154 images

External validation: 113 images

Internal holdout validation and external validation

Internal validation: correlation, 0.86

External validation: correlation, 0.86

Yes (No)

Schalekamp et al.68

Prognosis for severity

CXR

Hand-engineered radiomic features and clinical factors

Unclear in the paper

Unclear in the paper

Internal holdout validation

AUC, 0.77

No

Cohen et al.76

Prognosis of lung opacity and extent of lung involvement with GGOs for patients with COVID-19

CXR

Features from a trained CNN extracted at various layers

47 patients of varying severity

47 patients of varying severity

Internal holdout validation

Opacity correlation, 0.80;

extent correlation, 0.78

Yes (Yes)

Yue et al.74

Prognosing short- and long-term (>10 days) hospital stay for patients with COVID-19

CT

Hand-engineered radiomic features

26 patients, 16 long term

Internal validation:

5 patients, 3 long term

Temporal-split internal validation: 6 patients, all long term

Internal holdout and temporal-split validation

AUC, 0.97 [0.83,1.00]; sensitivity, 1.00; specificity, 0.89; NPV, 1.00; PPV, 0.80

Yesd

Zhu et al.75

The prognosis for whether patients will convert to a severe stage of COVID-19 and regression to predict the time to that conversion

CT

Hand-engineered radiomic features

Unclear in the paper

Unclear in the paper

Fivefold internal cross-validation run 20 times, average reported

AUC, 0.86 ± 0.02;

accuracy, 0.86 ± 0.02;

sensitivity, 0.77 ± 0.03;

specificity, 0.88 ± 0.015

No

Lassau et al.73

The prognostic model used for predicting the risk of death, need for ventilation or requirement for over 15 l min−1 oxygen

CT

CNN extracted features and clinical data

646 patients, all with COVID-19;

243 with severe outcomes

Internal validation: 150 images, all COVID-19, 48 with severe outcome

External validation: 135 patients, all with COVID-19, unclear number of severe patients

Internal holdout validation and external validation

Internal validation: AUC, 0.76

External validation: AUC, 0.75

Noc

Chassagnon et al.63

Short-term prognosis intubation and death within four days

Long-term prognosis: death within one month after CT

CT

Hand-engineered radiomic features and clinical data

536 patients with COVID-19, 108 severe short-term outcomes, unclear for long term

157 patients with COVID-19, 31 severe short-term outcomes, unclear for long term

External validation

Short-term prognosis: precision (w), 0.94; sensitivity (w), 0.94; specificity (w), 0.81; balanced accuracy, 0.88

Long-term prognosis: precision (w), 0.77; sensitivity (w), 0.94; specificity (w), 0.82; balanced accuracy, 0.71

Nob

Chao et al.77

Prognosing for ICU admission

CT

Hand-engineered radiomic features and clinical data

236 (CV) images, 125 admitted to ICU

59 (CV) images, 31 admitted to ICU

Fivefold internal cross-validation

Unclear in the paper

No

Wu et al.78

Prognosing for death, ventilation and ICU admission in early- and late-stage COVID-19

CT

Hand-engineered radiomic features

351 images, 25 severe outcomes

141 images, 26 severe outcomes

External validation

Early-stage COVID-19: AUC, 0.86; sensitivity, 0.80; specificity, 0.86

Late-stage COVID-19: AUC, 0.98; sensitivity, 1.00;

specificity, 0.94

No

Zheng et al.79

Prognosing for admission to an ICU, use of mechanical ventilation or death

CT

Hand-engineered radiomic features and clinical data

166 images, 35 severe outcomes

72 images, 10 severe outcomes

External validation

C index, 0.89

No

Chen et al.80

Prognosis for acute respiratory distress syndrome

CT

Hand-engineered radiomic features and clinical data

247 images, 36 severe cases

105 images, 15 severe cases

Internal holdout validation

Accuracy, 0.88; sensitivity, 0.55; specificity, 0.95

No

Ghosh et al.64

Prognosing COVID-19 severity

CT

Hand-engineered radiomic features

36 images, unclear number of severe cases

24 images, unclear number of severe cases

Internal holdout validation

Accuracy, 0.88

No

Ramtohul et al.72

Prognosing mortality for patients with COVID-19 in a cancer population

CT

Hand-engineered radiomic features and clinical data

35 (CV) patients, unclear number of deaths

70 patients, unclear number of deaths

Twofold internal cross-validation

C index, 0.83 [0.73, 0.93]

No

Wei et al.65

Prognosing COVID-19 severity

CT

Hand-engineered radiomic features

Unclear in the paper

Unclear in the paper

One-hundred-fold leave-group-out cross-validation

AUC, 0.93

accuracy, 0.91; sensitivity, 0.81; specificity, 0.95

No

Wang et al.69

Prognosis for survival

CT

Hand-engineered radiomic features

161 patients, 15 non-survivors

135 patients, unclear number of non-survivors

External validation

C index, [0.92, 0.95]; accuracy, [0.85, 0.87]; sensitivity, [0.71, 0.76]; specificity, [0.91, 0.92]

No

Yip et al.70

Prognosing COVID-19 severity

CT

Hand-engineered radiomic features

657 images of various severities

441 images of various severities

Internal holdout validation

AUC, 0.85

No

  1. aNumber of samples after augmentation, the original number of COVID-19 images is unclear.
  2. bThe authors state that the algorithm will be made publicly available.
  3. cThe paper states that code ‘is available on a public GitHub repository’ but no link is provided and the authors could not locate it.
  4. dThe authors state that ‘imaging or algorithm data used in this study are available upon request’.
  5. w, weighted average; CV, cross-validation; CI, 95% confidence interval; PPV, positive predictive value; NPV, negative predictive value; KM, Kaplan–Meier; GGOs, ground-glass opacities.