Table 2 Summary of the data extracted for each paper included in our systematic review

From: Common pitfalls and recommendations for using machine learning to detect and prognosticate for COVID-19 using chest radiographs and CT scans

Reference	Diagnosis/prognosis	Data used in model	Predictors	Sample size development	Sample size test	Type of validation	Evaluation	Public code
Reference	Is this paper describing a COVID-19 diagnosis or prognosis model (or both)?	Does this use CXR or CT (or both)?	What are the predictors? In purely deep learning models, this is DL.	Total sample size used for development (that is, training and validation and NOT test set), along with number of positive outcomes.	Total sample size used for testing of the algorithm, along with the number of positive outcomes.	k-fold CV, external validation in k centres, no validation and so on	Performance of the model, AUC, confidence interval, sensitivity, specificity and so on. 95% CI if available.	Is there code available? (Is the trained model available?)
Ghoshal and Tucker¹⁷	Diagnosis	CXR	DL	4,752 images, 54 COVID-19	1,189 images, 14 COVID-19	Unclear validation procedure	Unclear in the paper	No
Li et al.³⁴	Diagnosis	CXR	DL	429 images, 143 COVID-19	108 images, 36 COVID-19	Internal holdout validation	Accuracy, 0.880; AUC, 0.970	Yes (Yes)
Ezzat et al.²⁸	Diagnosis	CXR	DL	Unclear in the paper	Unclear in the paper	Internal holdout validation	Precision (w), 0.98; recall (w), 0.98; F1 score (w), 0.98	No
Tartaglione et al.¹⁶	Diagnosis	CXR	DL	231 images, 126 COVID-19	135 images, 90 COVID-19	Internal holdout validation	Unclear in the paper	No
Luz et al.³⁰	Diagnosis	CXR	DL	13,569 images, 152 COVID-19	231 images, 31 COVID-19	Internal holdout validation	Accuracy, 0.94; sensitivity, 0.97; PPV, 1.00	Yes (Yes)
Bassi and Attux³¹	Diagnosis	CXR	DL	2,724 images, 159 COVID-19	180 images, 60 COVID-19	Internal holdout validation	Recall, 0.98; precision, 1.00	No
Gueguim Kana et al.³²	Diagnosis	CXR	DL	Unclear in the paper	Unclear in the paper	External validation	Accuracy, 0.99; recall, 1.00; precision, 0.99; F1 score, 1.00	No
Heidari et al.³³	Diagnosis	CXR	DL	8,474 images, 415 COVID-19	848 images,42 COVID-19	Internal holdout validation	Precision (w), 0.95; recall (w), 0.94; F1 score (w), 0.94	No
Farooq and Hafeez²⁹	Diagnosis	CXR	DL	Unclear in the paper	637 images, 8 COVID-19	Internal holdout validation	Accuracy, 0.96; sensitivity, 0.97; PPV, 0.99; F1 score, 0.98	No
Zhang et al.²⁷	Diagnosis	CXR	DL	5,236 images, 2,582 COVID-19	5,869 images, 3,223 COVID-19	Internal holdout validation	AUC, 0.92; sensitivity, 0.88; specificity, 0.79	Yes (No)
Zhang et al.³⁷	Diagnosis	CXR	DL	386 images, 150 COVID-19	101 images, 39 COVID-19	Internal holdout validation	Accuracy, 0.91	No
Wang et al.²⁶	Diagnosis	CXR	DL	3,522 images, 204 COVID-19	61 images, 20 COVID-19	Internal holdout validation	AUC, 1.00; accuracy, 0.99	No
Bararia et al.²⁵	Diagnosis	CXR	DL	Unclear in the paper	1,000 images, 341 COVID-19	Internal holdout validation	Accuracy, 0.81; sensitivity, 0.81; specificity, 0.90; precision, 0.74; recall, 0.77; F1 score, 0.75	No
Tsiknakis et al.²¹	Diagnosis	CXR	DL	458 (CV) images, 98 COVID-19	114 (CV) images, 24 COVID-19	Fivefold internal cross-validation	AUC, 1.00; accuracy, 1.00; sensitivity, 0.99; specificity, 1.00	Yes (No)
Malhotra et al.¹⁸	Diagnosis	CXR	DL	26,464 images, 1,740 COVID-19^a	6,299 images, 125 COVID-19^a	Internal holdout validation	Sensitivity, 0.87; specificity, 0.97	No
Sayyed et al.³⁶	Diagnosis	CXR	DL	5,018 (CV) images, 334 COVID-19	1,255 (CV) images, 83 COVID-19	Fivefold internal cross-validation	Accuracy, 0.99 ± 0.05	Yes (No)
Rahaman et al.¹⁹	Diagnosis	CXR	DL	720 images, 220 COVID-19	140 images, 40 COVID-19	Internal holdout validation	Accuracy, 0.89; precision, 0.90; recall, 0.89; F1 score, 0.90	No
Amer et al.²⁰	Diagnosis	CXR	DL	Unclear in the paper	Unclear in the paper	Internal holdout validation	AUC, 0.98; accuracy, 0.94; sensitivity, 0.92; specificity, 0.97; PPV, 0.98	No
Elaziz et al.²²	Diagnosis	CXR	Hand-engineered radiomic features	Unclear in the paper	Unclear in the paper	Internal holdout validation and external validation	Internal validation: accuracy, 0.96; recall, 0.99; precision, 0.96 External validation: accuracy, 0.98; recall, 0.99; precision, 0.99	No
Tamal et al.²⁴	Diagnosis	CXR	Hand-engineered radiomic features.	378 images, 226 COVID-19	165 images, 115 COVID-19	Internal holdout validation	Sensitivity, 1.00; specificity, 0.85	No^b
Gil et al.²³	Diagnosis	CXR	Hand-engineered radiomic features	Unclear in the paper	Unclear in the paper	Internal holdout validation	Accuracy, 0.96; sensitivity, 0.98; specificity, 0.93; precision, 0.96	Yes (Yes)
Zokaeinikoo et al.³⁵	Diagnosis	CXR and CT	DL	Unclear in the paper	Unclear in the paper	Tenfold internal cross-validation	Accuracy, 0.99; sensitivity, 0.99; specificity, 1.00; PPV, 1.00	No
Amyar et al.⁴⁴	Diagnosis	CT	DL	944 patients, 399 COVID-19	100 patients, 50 COVID-19	Internal holdout validation	Accuracy, 0.95; sensitivity, 0.96; specificity, 0.92; AUC, 0.97	No
Ardakani et al.⁴⁵	Diagnosis	CT	DL	Unclear as splits do not total correctly	Unclear as splits do not total correctly	Internal holdout validation	AUC, 0.99; sensitivity, 1.00; specificity, 0.99; accuracy, 1.00; PPV, 0.99; NPV, 1.00	No
Bai et al.⁸¹	Diagnosis	CT	DL	118,401 images, 60,776 COVID-19	14,182 images, 5,040 COVID-19	Internal holdout validation	AUC, 0.95; accuracy, 0.96; sensitivity, 0.95; specificity, 0.96	Yes (Yes)
Jin et al.⁵⁰	Diagnosis	CT	DL	1,136 images, 723 COVID-19	282 images, 154 COVID-19	Internal holdout validation	Sensitivity, 0.97; specificity, 0.92; AUC, 0.99	No
Wang et al.⁴²	Diagnosis	CT	DL	320 images, 160 COVID-19	Internal validation: 455 images, 95 COVID-19 External validation: 290 images, 70 COVID-19	Internal holdout validation and external validation	Internal validation: AUC, 0.93 [0.90, 0.96] External validation: AUC, 0.81 [0.71, 0.84]	No
Ko et al.⁴¹	Diagnosis	CT	DL	3,194 (CV) images, 955 COVID-19	Internal cross-validation: 799 (CV) images, 239 COVID-19 External validation: 264 images, all COVID-19	Fivefold internal cross-validation and external validation	Internal validation: AUC, 1.00; accuracy, 1.00; sensitivity, 1.00; specificity, 1.00 External validation: accuracy, 0.97	No
Acar et al.⁴⁸	Diagnosis	CT	DL	2,552 images, 1,085 COVID-19	580 images, 246 COVID-19	Internal holdout validation	AUC, 1.00; accuracy, 1.00; error, 0.01; precision, 1.00; recall, 1.00; F1 score, 1.00	No
Pu et al.⁴³	Diagnosis	CT	DL	Unclear in the paper	Unclear in the paper	Internal holdout validation	AUC, 0.70 [0.56, 0.85]; sensitivity, 0.98; specificity, 0.28	No
Chen et al.⁴⁹	Diagnosis	CT	DL	770 (CV) images, 413 COVID-19	Internal cross-validation: 86 (CV) images, 46 COVID-19	Tenfold internal cross-validation	AUC, 0.94 ± 0.01; accuracy, 0.88 ± 0.01; precision, 0.90 ± 0.01; recall, 0.88 ± 0.01	No
Shah et al.⁵²	Diagnosis	CT	DL	664 images, 314 COVID-19	74 images, 35 COVID-19	Internal holdout validation	Accuracy, 0.95	No
Han et al.⁴⁷	Diagnosis	CT	DL	368 (CV) images, 184 COVID-19	92 (CV) images, 46 COVID-19	Fivefold internal cross-validation	AUC, 0.99; accuracy, 0.98	No^b
Wang et al.⁵³	Diagnosis	CT	DL	3,997 images, 1,095 COVID-19	600 images, 200 COVID-19	Internal holdout validation	AUC, 0.97; accuracy, 0.93; specificity, 0.96; precision, 0.88; recall, 0.88	No
Wang et al.⁵⁴	Diagnosis	CT	DL	2,447 images, 1,647 COVID-19	Internal validation: 639 images, 439 COVID-19 External validation: 2,120 images, 217 COVID-19	Internal holdout and external validation	Internal validation: AUC, 0.99; sensitivity, 0.97; specificity, 0.85 External validation: AUC, 0.95; sensitivity: 0.92; specificity, 0.85	No
Goncharov et al.⁷¹	Diagnosis and severity prognosis	CT	DL	Unclear in the paper	Diagnosis: 101 images, 33 COVID-19 Severity: 38 images of differing severity	Internal holdout validation	Diagnosis model: AUC, 0.95 Severity model: correlation, 0.98	No^c
Xie et al.⁶¹	Diagnosis	CT	Hand-engineered radiomic features	225 images, 27 COVID-19	76 images, 6 COVID-19	Internal holdout validation	AUC, 0.91; accuracy, 0.90; sensitivity, 0.83; specificity, 0.90	No
Xu et al.⁶²	Diagnosis	CT	DL and hand-engineered radiomic features	551 images, 289 COVID-19	138 images, 73 COVID-19	Internal holdout validation	Accuracy, 0.98; F1 score, 0.99	No^d
Qin et al.⁶⁰	Diagnosis	CT	Hand-engineered radiomic features	118 patients, 62 COVID-19	50 patients, 26 COVID-19	Internal holdout validation	AUC, 0.85 [0.74, 0.96]; sensitivity, 0.89; specificity, 0.92	No
Georgescu et al.⁴⁰	Diagnosis	CT	DL and hand-engineered radiomic features	1,902 patients, 1,050 COVID-19	194 patients, 100 COVID-19	Internal holdout validation	AUC, 0.90; sensitivity, 0.86; specificity, 0.81	No
Guiot et al.⁵⁸	Diagnosis	CT	Hand-engineered radiomic features	Unclear in the paper	Unclear in the paper	Internal holdout validation	AUC, 0.94 [0.88, 1.00]; accuracy, 0.90 [0.84, 0.94]; sensitivity, 0.79; specificity, 0.91	No
Shi et al.⁵⁷	Diagnosis	CT	Hand-engineered radiomic features	2,148 (CV) images, 1,326 COVID-19	Internal cross-validation: 537 (CV) images, 332 COVID-19	Fivefold internal cross-validation	AUC, 0.94; accuracy, 0.88; sensitivity, 0.91; specificity, 0.83	No
Mei et al.⁴⁶	Diagnosis	CT	DL and CNN extracted features and clinical data	626 images, 285 COVID-19	279 images, 134 COVID-19	Internal holdout validation	AUC, 0.92 [0.89, 0.95]; sensitivity, 0.843 [0.77, 0.90]; specificity, 0.83 [0.76, 0.89]	Yes (Yes)
Chen et al.⁵⁹	Diagnosis	CT	Clinical features, qualitative imaging features and hand-engineered radiomic imaging features	98 patients, 51 COVID-19	38 images, 19 COVID-19	Internal holdout validation	AUC, 0.94 [0.87, 1.00]; accuracy, 0.76; sensitivity, 0.74; specificity, 0.79	No
Wang et al.⁵¹	Diagnosis and prognosis for length of hospital stay	CT	Diagnosis model: DL Prognosis model: 64 CNN features and clinical factors	709 images, 560 COVID-19	Validation 1: 226 images, 102 COVID-19 Validation 2: 161 images, 92 COVID-19 Validation 3: 53 images, all with length of hospital stay Validation 4: 117 images, all with length of hospital stay	External validation	Validation 1 (diagnosis): AUC, 0.87 Validation 2 (diagnosis): AUC, 0.88 Validation 3 (prognosis): KM separation, P = 0.01 Validation 4 (prognosis): KM separation, P = 0.01	Yes (Yes)
Li et al.⁶⁶	Prognosis for severity	CXR	DL	354 images of differing severities	Internal validation: 108 images External validation: 111 images	Internal holdout validation and external validation	Internal validation: correlation, 0.88 External validation: correlation, 0.90	Yes (No)
Li et al.⁶⁷	Prognosis for severity	CXR	DL	314 images of differing severities	Internal validation: 154 images External validation: 113 images	Internal holdout validation and external validation	Internal validation: correlation, 0.86 External validation: correlation, 0.86	Yes (No)
Schalekamp et al.⁶⁸	Prognosis for severity	CXR	Hand-engineered radiomic features and clinical factors	Unclear in the paper	Unclear in the paper	Internal holdout validation	AUC, 0.77	No
Cohen et al.⁷⁶	Prognosis of lung opacity and extent of lung involvement with GGOs for patients with COVID-19	CXR	Features from a trained CNN extracted at various layers	47 patients of varying severity	47 patients of varying severity	Internal holdout validation	Opacity correlation, 0.80; extent correlation, 0.78	Yes (Yes)
Yue et al.⁷⁴	Prognosing short- and long-term (>10 days) hospital stay for patients with COVID-19	CT	Hand-engineered radiomic features	26 patients, 16 long term	Internal validation: 5 patients, 3 long term Temporal-split internal validation: 6 patients, all long term	Internal holdout and temporal-split validation	AUC, 0.97 [0.83,1.00]; sensitivity, 1.00; specificity, 0.89; NPV, 1.00; PPV, 0.80	Yes^d
Zhu et al.⁷⁵	The prognosis for whether patients will convert to a severe stage of COVID-19 and regression to predict the time to that conversion	CT	Hand-engineered radiomic features	Unclear in the paper	Unclear in the paper	Fivefold internal cross-validation run 20 times, average reported	AUC, 0.86 ± 0.02; accuracy, 0.86 ± 0.02; sensitivity, 0.77 ± 0.03; specificity, 0.88 ± 0.015	No
Lassau et al.⁷³	The prognostic model used for predicting the risk of death, need for ventilation or requirement for over 15 l min⁻¹ oxygen	CT	CNN extracted features and clinical data	646 patients, all with COVID-19; 243 with severe outcomes	Internal validation: 150 images, all COVID-19, 48 with severe outcome External validation: 135 patients, all with COVID-19, unclear number of severe patients	Internal holdout validation and external validation	Internal validation: AUC, 0.76 External validation: AUC, 0.75	No^c
Chassagnon et al.⁶³	Short-term prognosis intubation and death within four days Long-term prognosis: death within one month after CT	CT	Hand-engineered radiomic features and clinical data	536 patients with COVID-19, 108 severe short-term outcomes, unclear for long term	157 patients with COVID-19, 31 severe short-term outcomes, unclear for long term	External validation	Short-term prognosis: precision (w), 0.94; sensitivity (w), 0.94; specificity (w), 0.81; balanced accuracy, 0.88 Long-term prognosis: precision (w), 0.77; sensitivity (w), 0.94; specificity (w), 0.82; balanced accuracy, 0.71	No^b
Chao et al.⁷⁷	Prognosing for ICU admission	CT	Hand-engineered radiomic features and clinical data	236 (CV) images, 125 admitted to ICU	59 (CV) images, 31 admitted to ICU	Fivefold internal cross-validation	Unclear in the paper	No
Wu et al.⁷⁸	Prognosing for death, ventilation and ICU admission in early- and late-stage COVID-19	CT	Hand-engineered radiomic features	351 images, 25 severe outcomes	141 images, 26 severe outcomes	External validation	Early-stage COVID-19: AUC, 0.86; sensitivity, 0.80; specificity, 0.86 Late-stage COVID-19: AUC, 0.98; sensitivity, 1.00; specificity, 0.94	No
Zheng et al.⁷⁹	Prognosing for admission to an ICU, use of mechanical ventilation or death	CT	Hand-engineered radiomic features and clinical data	166 images, 35 severe outcomes	72 images, 10 severe outcomes	External validation	C index, 0.89	No
Chen et al.⁸⁰	Prognosis for acute respiratory distress syndrome	CT	Hand-engineered radiomic features and clinical data	247 images, 36 severe cases	105 images, 15 severe cases	Internal holdout validation	Accuracy, 0.88; sensitivity, 0.55; specificity, 0.95	No
Ghosh et al.⁶⁴	Prognosing COVID-19 severity	CT	Hand-engineered radiomic features	36 images, unclear number of severe cases	24 images, unclear number of severe cases	Internal holdout validation	Accuracy, 0.88	No
Ramtohul et al.⁷²	Prognosing mortality for patients with COVID-19 in a cancer population	CT	Hand-engineered radiomic features and clinical data	35 (CV) patients, unclear number of deaths	70 patients, unclear number of deaths	Twofold internal cross-validation	C index, 0.83 [0.73, 0.93]	No
Wei et al.⁶⁵	Prognosing COVID-19 severity	CT	Hand-engineered radiomic features	Unclear in the paper	Unclear in the paper	One-hundred-fold leave-group-out cross-validation	AUC, 0.93 accuracy, 0.91; sensitivity, 0.81; specificity, 0.95	No
Wang et al.⁶⁹	Prognosis for survival	CT	Hand-engineered radiomic features	161 patients, 15 non-survivors	135 patients, unclear number of non-survivors	External validation	C index, [0.92, 0.95]; accuracy, [0.85, 0.87]; sensitivity, [0.71, 0.76]; specificity, [0.91, 0.92]	No
Yip et al.⁷⁰	Prognosing COVID-19 severity	CT	Hand-engineered radiomic features	657 images of various severities	441 images of various severities	Internal holdout validation	AUC, 0.85	No

^aNumber of samples after augmentation, the original number of COVID-19 images is unclear.
^bThe authors state that the algorithm will be made publicly available.
^cThe paper states that code ‘is available on a public GitHub repository’ but no link is provided and the authors could not locate it.
^dThe authors state that ‘imaging or algorithm data used in this study are available upon request’.
w, weighted average; CV, cross-validation; CI, 95% confidence interval; PPV, positive predictive value; NPV, negative predictive value; KM, Kaplan–Meier; GGOs, ground-glass opacities.

Back to article page

Table 2 Summary of the data extracted for each paper included in our systematic review

Search

Quick links