Fig. 6: Hyperparameter tuning in BERT model to find the best hyperparameters for training.

Each hyperparameter set (batch size, epoch and learning rate) was trained on a development dataset (10% of the whole dataset). The lower perplexity shows a darker purple colour, indicates better performance of the language model with certain pair of hyperparameters.