Table 2 Outcomes of deep learning algorithms for the diagnosis of the six most studied diseases.

From: Systematic review of deep learning image analyses for the diagnosis and monitoring of skin disease

 

Outcome

Accuracy (%)

AUC

Sensitivity (%)

Specificity (%)

PPV (%)

NPV (%)

All studies

Externally validated/tested studies

All studies

Externally validated/tested studies

All studies

Externally validated/tested studies

All studies

Externally validated/tested studies

All studies

Externally validated/tested studies

All studies

Externally validated/tested studies

Acne

Median (IQR)

93.5 (85.7–97.5)

91.9 (n/a)

0.98 (0.93–0.99)

0.98 (n/a)

89.9 (82.2–96.3)

87.0 (n/a)

95.2 (92.9–97.6)

98.2 (n/a)

86.5 (81.3–87.5)

87.2 (n/a)

96.0 (n/a)

98.6 (n/a)

Range

79.0–99.7

84.0–99.7

0.89–0.99

0.98

67.0–100.0

84.0–89.9

92.1–100.0

98.2

78.6–100.0

86.9–87.5

93.4–98.6

98.6

Number of studies

11

2

4

1

11

2

8

1

10

2

2

1

Psoriasis

Median (IQR)

89.1 (78.1–92.0)

73.7 (n/a)

0.93 (0.84–0.98)

0.93 (n/a)

90.0 (73.7–92.0)

75.7 (n/a)

95.4 (90.1–97.1)

96.1 (n/a)

82.4 (60.6–88.6)

72.6 (n/a)

94.8 (n/a)

98.1 (n/a)

Range

69.4–98.5

73.7

0.81–0.99

0.93

60.0–95.6

73.7–77.7

88.2–98.8

96.1

60–95.5

62.8–82.4

91.5–98.1

98.1

Number of studies

8

1

5

1

10

2

8

1

7

2

2

1

Eczema

Median (IQR)

92.6 (89.7–99.4)

95.8

0.93 (0.87–0.99)

0.87 (n/a)

83.8 (70.2–94.6)

73.0 (n/a)

95.8 (92.6–98.8)

97.6 (n/a)

77.1 (61.9–89.7)

81.1 (n/a)

93.2 (n/a)

95.8 (n/a)

Range

83.9–99.9

91.7–99.9

0.79–0.99

0.87

54.3–99.6

54.3–91.7

86.6–99.6

97.6

43.0–98.9

67.8–94.3

90.5–95.8

95.8

Number of studies

9

2

6

1

13

2

10

1

8

2

2

1

Rosacea

Median (IQR)

93.7 (89.6–96.9)

n/a

0.90 (0.87–0.94)

0.91 (n/a)

63.4 (41.7–92.0)

41.7 (n/a)

97.0 (93.9–99.3)

99.8 (n/a)

89.8 (35.7–94.5)

35.7 (n/a)

95.1 (n/a)

99.9 (n/a)

Range

87.8–97.9

n/a

0.85–0.97

0.91

0.0–100.0

41.7

91.7–99.8

99.8

0.0–95.0

35.7

90.2–99.9

99.9

Number of studies

4

0

4

1

6

1

5

1

7

1

2

1

Vitiligo

Median (IQR)

87.8 (n/a)

100.0 (n/a)

0.98 (n/a)

0.98 (n/a)

90.9 (80.4–95.1)

82.7 (n/a)

88.3 (79.8–97.6)

98.8 (n/a)

90.9 (n/a)

80.1 (n/a)

99.6 (n/a)

99.6 (n/a)

Range

85.7–100.0

100.0

0.94–1.00

0.98

72.4–97.2

72.4–92.9

79.4–98.8

98.8

80.1–91.9

80.1

99.6

99.6

Number of studies

3

1

3

1

5

2

4

1

3

1

1

1

Urticaria

Median (IQR)

80.6 (n/a)

n/a

0.91 (n/a)

0.91 (n/a)

65.8 (n/a)

55.7 (n/a)

99.8 (n/a)

99.8 (n/a)

76.9 (n/a)

75.6 (n/a)

99.5 (n/a)

99.5 (n/a)

Range

68.3–92.8

n/a

0.91

0.91

55.7–75.9

55.7

99.7–99.8

99.8

75.6–78.2

75.6

99.5

99.5

Number of studies

2

0

1

1

2

1

2

1

2

1

1

1

  1. The six most studied diseases are acne, psoriasis, eczema, rosacea, vitiligo and urticaria. Studies assessing multiple diseases are reported in each of the relevant disease columns. Where studies report multiple outcomes by using variations of DL algorithms or datasets, the best performing results are presented. Outcomes for ‘externally validated/tested studies’ (i.e. where datasets independent from the training dataset are used for validation and/or testing DL algorithms) are presented separately from ‘all studies’, as these studies are presumed to be at a lower risk of overfitting. Interquartile ranges (IQR) are not presented for less than four studies.
  2. Deep learning (DL), area under the receiver operating characteristic curve (AUC), positive predictive value (PPV), negative predictive value (NPV), interquartile range (IQR).