Table 3 The performance of deep learning models in the overall quality classification.
CNN | Augment strategy | Classification | Internal validation dataset | Internal test dataset | External validation dataset | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Precision | Recall | F1 score | AUC | Precision | Recall | F1 score | AUC | Precision | Recall | F1 score | AUC | |||
Inception-V3 | None | Poor | 0.699 | 0.796 | 0.745 | 0.949 | 0.803 | 0.455 | 0.580 | 0.875 | 0.718 | 0.185 | 0.188 | 0.779 |
Adequate | 0.728 | 0.702 | 0.715 | 0.902 | 0.621 | 0.789 | 0.695 | 0.844 | 0.418 | 0.575 | 0.568 | 0.695 | ||
Excellent | 0.896 | 0.897 | 0.896 | 0.944 | 0.888 | 0.809 | 0.847 | 0.896 | 0.780 | 0.944 | 0.935 | 0.944 | ||
Inception-V3 | Balance sampling | Poor | 0.760 | 0.770 | 0.765 | 0.931 | 0.746 | 0.573 | 0.648 | 0.873 | 0.754 | 0.405 | 0.527 | 0.802 |
Adequate | 0.710 | 0.798 | 0.752 | 0.910 | 0.661 | 0.772 | 0.712 | 0.854 | 0.463 | 0.466 | 0.465 | 0.691 | ||
Excellent | 0.926 | 0.870 | 0.897 | 0.948 | 0.893 | 0.841 | 0.866 | 0.904 | 0.758 | 0.963 | 0.848 | 0.926 |