Table 5 Results of ablation experiments on the Aishell-1 dataset.

From: An Mcformer encoder integrating Mamba and Cgmlp for improved acoustic feature extraction

 

Size

Attention

Attention rescoring

ctc greedy search

ctc prefix beam search

Model

 

Dev/Test

Dev/Test

Dev/Test

Dev/Test

Wenet (MHA)

43.5M

4.54/5.01

4.32/4.74

4.62/5.08

4.62/5.08

Wenet (Mamba)

47.5M

5.19/6.02

5.03/5.68

5.60/6.33

5.60/6.33

+Cgmlp

52.3M

4.53/4.95

4.27/4.73

4.54/4.95

4.54/4.95

+Mamba

52.2M

4.50/4.87

4.21/4.62

4.46/4.96

4.46/4.96

+Mamba+Cgmlp (our)

62.9M

4.34/4.70

4.15/4.48

4.32/4.69

4.33/4.69