Table 2 Accuracy of deep learning models in vocal technique assessment on MVSet dataset (vaildation and 10-fold cross-vaildation). All models are trained and tested under the same conditions (hardware) without using pre trained models.
From: Dense dynamic convolutional network for Bel canto vocal technique assessment
Models | MVSet | ||||||
---|---|---|---|---|---|---|---|
Top-1_Acc(%) | Paramters(M) | FLOPs(G) | |||||
Vaildation | 10-fold CV | ||||||
CRNN8 | 71.18 | 67.78 | 4.91 | 1.10 | |||
MobileNet V236 | 86.53 | 85.88 | 4.10 | 2.87 | |||
CAM + +12 | 83.39 | 83.51 | 7.13 | 1.72 | |||
AST16 | 84.59 | 84.06 | 86.86 | 48.61 | |||
PETL-AST15 | 85.11 | 83.39 | 87.32 | 49.73 | |||
ResNet8 | 77.78 | 73.25 | 11.30 | 18.20 | |||
GhostNet35 | 82.73 | 81.73 | 5.18 | 5.24 | |||
Ours-M | 85.16 | 85.33 | 6.89 | 2.80 | |||
Ours-L | 86.44(+1.28) | 86.09(+0.76) | 6.15 | 4.61 | |||
Ours-XL | 90.11(+4.95) | 87.20(+1.87) | 12.74 | 12.33 |