Table 4 Performance metrics of deep learning algorithms (CNN-Base, VGG16, ResNet101, DenseNet121, inception v3, and ResNet50) across two test sets (T1, T2) for classifying four severity levels of corneal pathology: healthy (H), mild (M), moderate (MO), and severe (S). The performance metrics include the accuracy, sensitivity (Sn), specificity (Sp), F1 score, hamming distance (HD), receiver operating characteristic (ROC)-AUC (area under the curve), and precision recall-area under the curve (PR-AUC). Micro- and Macroaverages for the ROC-AUC and PR-AUC are provided for overall model evaluation.

From: Artificial intelligence derived grading of mustard gas induced corneal injury and opacity

Model

Performance metrics

Accuracy

Sn

Sp

F1

HD (Avg.)

ROC-AUC

PR-AUC

Train

T1

T2

Class

T1

T2

T1

T2

T1

T2

Micro

Macro

Micro

Macro

Baseline-CNN

0.64

0.70

0.63

H

0.73

0.87

0.93

0.82

0.76

0.67

0.40

0.88

0.90

0.65

0.72

   

M

0.75

0.43

0.97

0.89

0.80

0.52

     
   

MO

0.50

0.40

0.79

0.84

0.50

0.42

     
   

S

0.60

0.67

0.81

0.88

0.55

0.63

     

VGG16

0.75

0.80

0.74

H

0.80

0.83

0.85

0.91

0.76

0.81

0.29

0.93

0.93

0.82

0.80

   

M

0.60

057

0.87

0.90

0.60

0.59

     
   

MO

0.67

0.50

0.96

0.95

0.74

0.62

     
   

S

0.91

0.92

0.96

0.84

0.87

0.71

     

ResNet 101

0.75

0.77

0.80

H

0.84

0.86

0.93

0.90

0.84

0.77

0.29

0.92

0.93

0.79

0.80

   

M

0.45

0.67

0.90

0.88

0.48

0.70

     
   

MO

0.56

0.69

0.84

0.92

0.57

0.69

     
   

S

0.77

0.79

0.90

0.96

0.71

0.81

     

DenseNet121

0.80

0.81

0.85

H

0.89

0.92

0.95

0.91

0.89

0.84

0.20

0.94

0.97

0.84

0.89

   

M

0.55

0.62

0.90

0.95

0.55

0.72

     
   

MO

0.72

1.00

0.91

0.88

0.74

0.81

     
   

S

0.92

0.79

0.96

1.00

0.89

0.88

     

InceptionV3

0.77

0.73

0.77

H

0.79

0.89

0.88

0.88

0.77

0.83

0.28

0.90

0.91

0.76

0.78

   

M

0.39

0.40

0.86

0.94

0.45

0.50

     
   

MO

0.69

0.67

0.92

0.89

0.69

0.67

     
   

S

1.00

0.92

0.92

0.92

0.85

0.83

     

ResNet50

0.87

0.85

0.83

H

0.90

0.78

1.00

0.98

0.95

0.80

0.17

0.94

0.95

0.80

0.84

   

M

0.88

0.79

0.96

0.87

0.88

0.79

     
   

MO

0.80

0.88

0.91

0.91

0.77

0.83

     
   

S

0.78

0.92

0.94

0.98

0.74

0.92

     
  1. T1 = Test set 1; T2 = Test set 2; H = Healthy; M = Mild; MO = Moderate; S = Severe; Sn = Sensitivity; Sp = Specificity; f1 = Fi score; HD*=Hamming distance (average of T1 and T2); ROC-AUC; Receiver operating characteristic-Area under curve; PR-AUC = Precision recall- Area under curve.