Table 9 Performance comparison of TL-LSTM variants to evaluate the effects of Multi-Head Attention (MHA) and feature selection.

From: Deep transfer learning and attention based P2.5 forecasting in Delhi using a decade of winter season data

Model

Architecture

Feature selection

MAE

RMSE

R2

Purpose

A

TL-LSTM-MHA

Yes

4.38

5.80

0.9974

Full proposed model

B

TL-LSTM (No-MHA)

Yes

12.77

18.23

0.9742

Ablation: no attention

C

TL-LSTM—MHA

No

5.16

6.94

0.9963

Ablation: no FS