Table 1 Pretraining hyperparameters of each MLM model.

From: Reweaving the Threads of Korean History: AI-Driven Restoration of the Daegu-bu Household Registers (1681–1876)

Tokenization

BERT

DistilBERT

ELECTRA

RoBERTa

WordPiece

WordPiece

WordPiece

Byte-pair encoding (BPE)

Model parameters

Hidden activation: GeLU

Vocab size: 52,000

Batch size: 32

Sequence length: 512

Dropout rate: 0.1

Attention dropout: 0.1

Learning rate: 3e-5

Epochs: 5

Hidden size: 768

Attention heads: 12

Hidden layers: 6

Train Steps: 5,000