Table 3 Summary of distinct characteristics in each method.
Aspect | BERT | RoBERTa | ALBERT | DistilBERT | XLNet |
|---|---|---|---|---|---|
Training Objectives | Masked Language | Masked Language | Masked Language | Masked Language | Permutation Language |
Pre-training Method | Autoencoding | Autoencoding | Autoencoding | Distillation | Autoregressive |
Autoregressive | No | No | No | No | Yes |
Factorized Embedding | No | No | Yes | No | No |
Computational Resources | More demanding | More demanding | Less demanding | Less demanding | More demanding |