Table 2 Hyperparameters for BERT-based assertion detection model training
Model | Hyperparameters |
---|---|
BioBERT | The following hyperparameters were used during training (provided on HuggingFace): • learning_rate: 2e-05 • train_batch_size: 8 • eval_batch_size: 8 • seed: 42 • optimizer: Adam with betas = (0.9,0.999) and epsilon=1e-08 • lr_scheduler_type: linear • num_epochs: 10 |
ClinicalBERT | The following hyperparameters were used during training (provided on HuggingFace): • batch_size: 32 • Maximum sequence length: 256 • Learning rate: 5e-5 |
BiomedBERT | The following hyperparameters were used during training (provided in paper): • Optimizer: Adam • Learning Rate Schedule: Slanted triangular learning rate schedule with warm-up in 10% of steps and cool-down in 90% of steps. • Peak Learning Rate: 6 × 10−4 • Training Steps: 62,500 steps • Batch Size: 8,192 • Masking Rate for Whole-Word Masking (WWM): 15% |