Table 4 Detail of used hyperparameters in each method.
Algorithm | Common hyperparameters | Specific hyperparameters | Numbers of models | Best hyperparameter |
|---|---|---|---|---|
BERT | Sequence length [128, 256, 512], Learning rate: [1e-5, 2e-5, 3e-5], Warm-up proportion: [0, 0.05, 0.1], Batch size: [16, 32] Epochs: [3, 4, 5], | - | 162 | Sequence length: 512, Learning rate: 2e-5, Warm-up proportion: 0.05, Batch size: 32, Epochs: 4 |
RoBERTa | - | 162 | Sequence length: 512, Learning rate: 1e-5, Warm-up proportion: 0.1, Batch size: 32, Epochs: 5 | |
ALBERT | Hidden size: [128, 256, 512] | 486 | Sequence length: 512, Learning rate: 2e-5, Warm-up proportion: 0.05, Batch size: 16, Epochs: 5, Hidden size: 512 | |
DistilBERT | - | 162 | Sequence length: 256, Learning rate: 5e-5, Warm-up proportion: 0.05, Batch size: 32, Epochs: 4 | |
XLNet | Memory length: [128, 256, 512] | 486 | Sequence length: 256, Learning rate: 3e-5, Warm-up proportion: 0, Batch size: 16, Epochs: 5, Memory length: 256 |