Fig. 3: Performance of fine-tuned models versus out-of-box models.
From: Synoptic reporting by summarizing cancer pathology reports using large language models

a Performance comparison of out-of-the-box (OOB)and fine-tuned language models on the synoptic reporting task. A BERT F1 Score of 1 indicates a perfect match between the model’s output and the reference answer. The median and the IQR37 are reported in the plots. b Performance of the out-of-the-boxand fine-tuned language models, segregated by the data element. The numbers inset in the heat map show the mean BERT F1 score. The color scale of the heat map is shown on the right and represents the BERT F1 score.