Table 3 Hyperparameter optimization results of the transformer–BiLSTM model.
From: Intelligent prediction of air quality index based on the transformer-BiLSTM model
Parameter | RMSE | MAE | MAPE (%) | \(\mathbf {R^2}\) |
|---|---|---|---|---|
Learning rate | ||||
0.1 | 40.5567 | 35.4188 | 8.4397% | − 4.7100 |
0.01 | 34.9032 | 29.6495 | 7.1623 | − 3.2300 |
0.001 | 5.3475 | 3.6833 | 7.7227 | 0.9007 |
0.0001 | 84.6039 | 81.3622 | 18.5172 | − 23.8570 |
Batch size | ||||
8 | 3.4152 | 2.5205 | 5.0662 | 0.9595 |
16 | 3.1122 | 1.8026 | 3.3865 | 0.9664 |
32 | 5.3475 | 3.6833 | 7.7227 | 0.9007 |
64 | 8.5698 | 6.2148 | 10.6895 | 0.7450 |
Dropout rate | ||||
0.1 | 3.7674 | 2.4779 | 4.6489 | 0.9507 |
0.2 | 3.6751 | 2.3684 | 4.4654 | 0.9531 |
0.3 | 4.4978 | 3.0577 | 5.7145 | 0.9298 |
0.5 | 6.0808 | 3.8354 | 6.8147 | 0.8716 |
Input window size | ||||
1 | 7.8250 | 5.1986 | 9.3191 | 0.7874 |
2 | 3.1456 | 1.8657 | 3.3789 | 0.9656 |
3 | 3.6751 | 2.3684 | 4.4654 | 0.9531 |
5 | 5.3369 | 3.8175 | 7.2976 | 0.9011 |
Training epochs | ||||
50 | 6.0394 | 4.0313 | 8.9337 | 0.8733 |
100 | 3.0012 | 1.7928 | 3.3646 | 0.9687 |
150 | 5.5775 | 4.0507 | 8.4381 | 0.8920 |
200 | 6.5489 | 5.1587 | 9.6348 | 0.8511 |