Table 1 The hyperparameters setting.
Parameter | Value |
---|---|
The hidden size of the first linear layer | 40 |
The hidden size of the attention layer | \(40 \times 8\) |
The hidden size of the res-linear layer | 32 |
The hidden size of the final linear layer | 8 |
The number of attention head | 8 |
The size of the memory module | 4000 |
The sparsity threshold \(\lambda\) | 0.0004 |
Training epoch | 10 |
Batch size | 128 |
Learning rate | 1e–4 |
Optimizer | Adam |