Table 1 The hyperparameters setting.

Parameter	Value
The hidden size of the first linear layer	40
The hidden size of the attention layer	\(40 \times 8\)
The hidden size of the res-linear layer	32
The hidden size of the final linear layer	8
The number of attention head	8
The size of the memory module	4000
The sparsity threshold \(\lambda\)	0.0004
Training epoch	10
Batch size	128
Learning rate	1e–4
Optimizer	Adam

Quick links

Search