Table 3 The suggested super parameters of pre-trained models.
From: Information extraction from green channel textual records on expressways using hybrid deep learning
Parameter | Value |
|---|---|
Vector dimension of BERT | 768 |
Maximum sentence length | 172 |
Hidden layer dimension of GRU or LSTM | 768 |
Optimizer | Adam |
Learning rate | 5 × 10− 5 |
Batch size | 16 |
Maximum iteration rounds | 3 |