Table 5 Hyperparameter details.

From: Advanced air quality prediction using multimodal data and dynamic modeling techniques

Hyperparameter

Description

Values

Learning rate

Controls weight update size during training

0.001

Batch size

Number of samples processed before weight update

32

Epochs

Number of complete passes through the dataset

50, 100, 150

Optimizer

Algorithm for updating model weights

Adam, SGD, RMSprop

Dropout Rate

The fraction of neurons randomly dropped during training

0.2

Number of Layers

Number of layers in the model

3

Neurons per Layer

Number of units in each layer

128

Activation Function

Non-linear function for the model

ReLU, Tanh, Sigmoid

Learning Rate Decay

Reduction of learning rate during training

0.9

Momentum

Used to accelerate convergence in SGD

0.8

Attention Heads

Number of attention heads in attention mechanisms

2

Hidden State Size (BiLSTM)

Number of units in the BiLSTM hidden layer

128

Kernel Size (CNN)

Size of convolution filters

(3 × 3)

Stride (CNN)

Step size for convolution operations

1, 2, 3

Loss Function

Function to compute prediction error

MSE, MAE

Scheduler

Adjusts the learning rate during training

Exponential Step decay