Table 3 Performance of CAD4TB (v6), Lunit (v4.7.2), and qXR (v2) at Selected Thresholds.

From: Using artificial intelligence to read chest radiographs for tuberculosis detection: A multi-site evaluation of the diagnostic accuracy of three deep learning systems

Thresholds

CAD4TB (v6)

Lunit (v4.7.2)

qXR (v2)

Threshold

Accuracy

Sensitivity (95% CI)

Specificity (95% CI)

Threshold

Accuracy

Sensitivity (95% CI)

Specificity (95% CI)

Threshold

Accuracy

Sensitivity (95% CI)

Specificity (95% CI)

ROC01#

63

0.85

0.91

(0.84–0.96)

0.84

(0.82–0.86)

0.92

0.89

0.87

(0.79–0.93)

0.89

(0.87–0.91)

0.67

0.86

0.88

(0.8–0.93)

0.89

(0.87–0.91)

Sensitivity ≥95%

57

0.81

0.95

(0.9–0.98)

0.8

(0.77–0.82)

0.55

0.77

0.95

(0.9–0.98)

0.76

(0.73–0.78)

0.49

0.83

0.95

(0.90–0.98)

0.82

(0.79–0.84)

Reduce Xpert tests by 1/2

47

0.61

0.97

(0.92–0.99)

0.57

(0.54–0.6)

0.11

0.6

0.99

(0.95–1)

0.56

(0.53–0.59)

0.18

0.57

0.97

(0.92–0.99)

0.53

(0.5–0.56)

Reduce Xpert tests by 2/3

53

0.75

0.96

(0.91–0.99)

0.73

(0.7–0.76)

0.39

0.74

0.95

(0.9–0.98)

0.72

(0.69–0.74)

0.34

0.75

0.96

(0.91–0.99)

0.73

(0.7–0.76)

Reduce Xpert tests by 3/4

59

0.82

0.94

(0.87–0.97)

0.82

(0.79–0.84)

0.79

0.82

0.93

(0.86–0.97)

0.81

(0.79–0.84)

0.5

0.83

0.93

(0.86–0.97)

0.82

(0.8–0.84)

Max Accuracy

88

0.92

0.47

(0.37–0.57)

0.96

(0.95–0.97)

0.98

0.94

0.58

(0.48–0.67)

0.97

(0.96–0.98)

0.84

0.94

0.71

(0.61–0.79)

0.96

(0.94–0.97)

  1. #ROC01: the point on the ROC that was closest to the coordinates (0,1), the perfect classification.