Table 1 Duration information of three open source datasets.

From: An Mcformer encoder integrating Mamba and Cgmlp for improved acoustic feature extraction

Dataset

Language

Duration (h)

Train

Dev

Test

Aishell-1

Mandarin

150.85

18.09

10.03

CommonVoice zh 14

Mandarin

214.47

15.92

17.45

TED-LIUM 3

English

452.06

1.60

2.62