Table 5 Results of ablation experiments on the Aishell-1 dataset.
From: An Mcformer encoder integrating Mamba and Cgmlp for improved acoustic feature extraction
| Â | Size | Attention | Attention rescoring | ctc greedy search | ctc prefix beam search |
|---|---|---|---|---|---|
Model | Â | Dev/Test | Dev/Test | Dev/Test | Dev/Test |
Wenet (MHA) | 43.5M | 4.54/5.01 | 4.32/4.74 | 4.62/5.08 | 4.62/5.08 |
Wenet (Mamba) | 47.5M | 5.19/6.02 | 5.03/5.68 | 5.60/6.33 | 5.60/6.33 |
+Cgmlp | 52.3M | 4.53/4.95 | 4.27/4.73 | 4.54/4.95 | 4.54/4.95 |
+Mamba | 52.2M | 4.50/4.87 | 4.21/4.62 | 4.46/4.96 | 4.46/4.96 |
+Mamba+Cgmlp (our) | 62.9M | 4.34/4.70 | 4.15/4.48 | 4.32/4.69 | 4.33/4.69 |