Table 3 The performance of deep learning models in the overall quality classification.

From: Development and validation of a deep learning image quality feedback system for infant fundus photography

CNN

Augment strategy

Classification

Internal validation dataset

Internal test dataset

External validation dataset

Precision

Recall

F1 score

AUC

Precision

Recall

F1 score

AUC

Precision

Recall

F1 score

AUC

Inception-V3

None

Poor

0.699

0.796

0.745

0.949

0.803

0.455

0.580

0.875

0.718

0.185

0.188

0.779

Adequate

0.728

0.702

0.715

0.902

0.621

0.789

0.695

0.844

0.418

0.575

0.568

0.695

Excellent

0.896

0.897

0.896

0.944

0.888

0.809

0.847

0.896

0.780

0.944

0.935

0.944

Inception-V3

Balance sampling

Poor

0.760

0.770

0.765

0.931

0.746

0.573

0.648

0.873

0.754

0.405

0.527

0.802

Adequate

0.710

0.798

0.752

0.910

0.661

0.772

0.712

0.854

0.463

0.466

0.465

0.691

Excellent

0.926

0.870

0.897

0.948

0.893

0.841

0.866

0.904

0.758

0.963

0.848

0.926

  1. Performance is evaluated based on precision, recall, F1 score, and AUC (area under the curve). To ensure data representativeness, the results for all three datasets were obtained using the same epoch. CNN stands for convolutional neural network, and AUC represents the area under the curve.