Table 2 Accuracy of deep learning models in vocal technique assessment on MVSet dataset (vaildation and 10-fold cross-vaildation). All models are trained and tested under the same conditions (hardware) without using pre trained models.

From: Dense dynamic convolutional network for Bel canto vocal technique assessment

Models

MVSet

Top-1_Acc(%)

Paramters(M)

FLOPs(G)

Vaildation

10-fold CV

CRNN8

71.18

67.78

4.91

1.10

MobileNet V236

86.53

85.88

4.10

2.87

CAM + +12

83.39

83.51

7.13

1.72

AST16

84.59

84.06

86.86

48.61

PETL-AST15

85.11

83.39

87.32

49.73

ResNet8

77.78

73.25

11.30

18.20

GhostNet35

82.73

81.73

5.18

5.24

Ours-M

85.16

85.33

6.89

2.80

Ours-L

86.44(+1.28)

86.09(+0.76)

6.15

4.61

Ours-XL

90.11(+4.95)

87.20(+1.87)

12.74

12.33