Table 4 Detail of used hyperparameters in each method.

From: Development of an automated transformer-based text analysis framework for monitoring fire door defects in buildings

Algorithm

Common hyperparameters

Specific hyperparameters

Numbers of models

Best hyperparameter

BERT

Sequence length [128, 256, 512], Learning rate: [1e-5, 2e-5, 3e-5], Warm-up proportion: [0, 0.05, 0.1],

Batch size: [16, 32]

Epochs: [3, 4, 5],

-

162

Sequence length: 512, Learning rate: 2e-5, Warm-up proportion: 0.05, Batch size: 32, Epochs: 4

RoBERTa

-

162

Sequence length: 512, Learning rate: 1e-5, Warm-up proportion: 0.1, Batch size: 32, Epochs: 5

ALBERT

Hidden size: [128, 256, 512]

486

Sequence length: 512, Learning rate: 2e-5, Warm-up proportion: 0.05, Batch size: 16, Epochs: 5, Hidden size: 512

DistilBERT

-

162

Sequence length: 256, Learning rate: 5e-5, Warm-up proportion: 0.05, Batch size: 32, Epochs: 4

XLNet

Memory length: [128, 256, 512]

486

Sequence length: 256, Learning rate: 3e-5, Warm-up proportion: 0, Batch size: 16, Epochs: 5, Memory length: 256