Fig. 3: Experimental setup to compare pretraining strategies. | npj Digital Medicine

Fig. 3: Experimental setup to compare pretraining strategies.

From: The DRAGON benchmark for clinical NLP

Fig. 3

Several LLM architectures are pretrained using either general-domain, domain-specific, or mixed-domain pretraining (general-domain followed by domain-specific pretraining). Each of the resulting pretrained foundational models is evaluated on the DRAGON benchmark by task-specific fine-tuning followed by performance evaluation on the test set. To assess fine-tuning stability, the training and validation datasets rotate with five-fold cross-validation, resulting in five performance assessments for each of the 28 tasks per pretrained model.

Back to article page