Fig. 9: Comparison of performance for four different CAMeLBERT models.
From: Building sustainable information systems and transformer models on demand

The learning rate was fixed at 1.0000 × 10−5 and the number of epochs to 5. A user would pick the one with the best performance for further fine-tuning (in this case, CAMeLBERT-CA).