Table 3 The performance of deep learning models in the overall quality classification.

CNN	Augment strategy	Classification	Internal validation dataset				Internal test dataset				External validation dataset
CNN	Augment strategy	Classification	Precision	Recall	F1 score	AUC	Precision	Recall	F1 score	AUC	Precision	Recall	F1 score	AUC
Inception-V3	None	Poor	0.699	0.796	0.745	0.949	0.803	0.455	0.580	0.875	0.718	0.185	0.188	0.779
		Adequate	0.728	0.702	0.715	0.902	0.621	0.789	0.695	0.844	0.418	0.575	0.568	0.695
		Excellent	0.896	0.897	0.896	0.944	0.888	0.809	0.847	0.896	0.780	0.944	0.935	0.944
Inception-V3	Balance sampling	Poor	0.760	0.770	0.765	0.931	0.746	0.573	0.648	0.873	0.754	0.405	0.527	0.802
		Adequate	0.710	0.798	0.752	0.910	0.661	0.772	0.712	0.854	0.463	0.466	0.465	0.691
		Excellent	0.926	0.870	0.897	0.948	0.893	0.841	0.866	0.904	0.758	0.963	0.848	0.926

Performance is evaluated based on precision, recall, F1 score, and AUC (area under the curve). To ensure data representativeness, the results for all three datasets were obtained using the same epoch. CNN stands for convolutional neural network, and AUC represents the area under the curve.

Quick links

Search