Table 2 Hyeperparameter settings.
Parameters | Description | Value |
|---|---|---|
Number of Transformer Layers | Total layers in the transformer encoder stack | 4 |
Hidden Size (Embedding Dim) | Dimensionality of token embeddings and hidden representations | 256 |
Number of Attention Heads | Number of parallel attention mechanisms in each layer | 8 |
Feed-Forward Network Size | Size of the intermediate layer in the feed-forward block | 1024 |
Dropout Rate | Probability of dropout applied to layers to prevent overfitting | 0.1 |
Learning Rate | Initial step size for optimizer updates | 1e-4 |
Batch Size | Number of training samples per batch | 128 |
Sequence Length | Maximum length of user preferences sequences | 50 |
Optimizer | Optimization algorithm used | Adam |
Activation Function | Non-linear function in feed-forward layers | GELU |
Training Epochs | Number of complete passes through the training dataset | 100 |
Warm-up Steps | Number of steps for gradually increasing the learning rate | 1000 |
Weight Decay | L2 regularization parameter to prevent overfitting | 0.01 |
Early Stop | Strategy to halt training when validation loss stagnates, restoring best weights | Enabled (monitor = val_loss, patience = 10) |
Masked Item Prediction | Strategy for training via item masking (BERT-style objective) | 15% items masked |