Table 5 Effects of different attention spans \(\text{N}\) on experiments.
Attention heads | 2 | 4 | 6 | 8 | ||||
|---|---|---|---|---|---|---|---|---|
Metric | MSE | MAE | MSE | MAE | MSE | MAE | MSE | MAE |
24 | 0.3995 | 0.4126 | 0.3918 | 0.4038 | 0.3994 | 0.4130 | 0.3985 | 0.4079 |
48 | 0.4079 | 0.4192 | 0.4013 | 0.4079 | 0.4021 | 0.4085 | 0.4031 | 0.4094 |
96 | 0.4412 | 0.4274 | 0.4383 | 0.4170 | 0.4398 | 0.4282 | 0.4363 | 0.4236 |
192 | 0.4605 | 0.4369 | 0.4522 | 0.4249 | 0.4623 | 0.4372 | 0.4650 | 0.4380 |
384 | 0.4702 | 0.4418 | 0.4662 | 0.4366 | 0.4777 | 0.4431 | 0.4796 | 0.4490 |