Table 2 Performance metrics assessed on the test datasets for DL-based classification models with different backbone networks.

From: Deep learning-based optic disc classification is affected by optic-disc tilt

Development dataset

Test dataset

Backbone network

Accuracy

Precision

F1 score

AUC

All

All

VGG19

0.945 ± 0.004

0.946 ± 0.003

0.945 ± 0.003

0.983 ± 0.002

VGG16

0.946 ± 0.007

0.947 ± 0.006

0.946 ± 0.006

0.984 ± 0.006

Dense121

0.949 ± 0.004

0.951 ± 0.003

0.949 ± 0.004

0.982 ± 0.003

All

Non-tilted disc

VGG19

0.951 ± 0.003

0.952 ± 0.002

0.951 ± 0.003

0.989 ± 0.002

VGG16

0.957 ± 0.008

0.958 ± 0.008

0.957 ± 0.008

0.991 ± 0.005

Dense121

0.956 ± 0.006

0.959 ± 0.006

0.956 ± 0.006

0.985 ± 0.003

All

Tilted disc

VGG19

0.936 ± 0.006

0.940 ± 0.006

0.935 ± 0.006

0.969 ± 0.005

VGG16

0.930 ± 0.010

0.933 ± 0.010

0.930 ± 0.010

0.969 ± 0.009

Dense121

0.938 ± 0.004

0.942 ± 0.004

0.937 ± 0.005

0.977 ± 0.011

Non-tilted disc

Non-tilted disc

VGG19

0.945 ± 0.006

0.946 ± 0.006

0.945 ± 0.007

0.988 ± 0.002

VGG16

0.944 ± 0.009

0.945 ± 0.009

0.943 ± 0.009

0.991 ± 0.003

Dense121

0.944 ± 0.007

0.945 ± 0.007

0.944 ± 0.007

0.986 ± 0.003

Non-tilted disc

Tilted disc

VGG19

0.891 ± 0.028

0.903 ± 0.009

0.894 ± 0.020

0.927 ± 0.019

VGG16

0.886 ± 0.019

0.902 ± 0.009

0.890 ± 0.011

0.922 ± 0.020

Dense121

0.915 ± 0.007

0.926 ± 0.006

0.918 ± 0.006

0.944 ± 0.008

Tilted disc

Non-tilted disc

VGG19

0.878 ± 0.022

0.874 ± 0.040

0.869 ± 0.031

0.950 ± 0.013

VGG16

0.886 ± 0.010

0.888 ± 0.012

0.878 ± 0.010

0.951 ± 0.015

Dense121

0.891 ± 0.011

0.904 ± 0.009

0.884 ± 0.013

0.957 ± 0.008

Tilted disc

Tilted disc

VGG19

0.923 ± 0.009

0.922 ± 0.012

0.921 ± 0.009

0.924 ± 0.046

VGG16

0.914 ± 0.009

0.913 ± 0.014

0.912 ± 0.011

0.928 ± 0.017

Dense121

0.918 ± 0.007

0.917 ± 0.007

0.915 ± 0.008

0.935 ± 0.008

  1. AUC = area under the curve.
  2. The table displays the means ± standard errors of accuracy, precision, F1 score and the AUC values for classification models developed with VGG19, VGG16, and Dense121 using different pairs of development and test datasets.
  3. The precision, F1 score, and AUC were computed using a weighted average to address class imbalance issues.