Table 1 Word classification accuracy on both Speech Commands (chance = 1/35 = 0.0286) and Wordsworth (chance = 1/84 = 0.0119) for each of the four NN models.

From: Wordsworth: A generative word dataset for comparison of speech representations in humans and neural networks

Model

Dataset

Speech Commands51

Wordsworth

Recurrent modified 2D cochleagram

0.93 (32.55x above chance)

0.86 (72.24x above chance)

Modified 2D cochleagram

0.90 (31.50x above chance)

0.78 (65.52x above chance)

2D cochleagram48

0.88 (30.80x above chance)

0.70 (58.80x above chance)

1D waveform42

0.91 (31.85x above chance)

0.84 (70.56x above chance)

  1. All models performed well above chance levels, with models exhibiting better absolute performance on Speech Commands, and better relative (to chance) performance on Wordsworth, likely due to the strongly curated nature of Wordsworth and lack of noise present within the tokens.