Fig. 4
From: Human level information extraction from clinical reports with finetuned language models

Summarized dataset-level results. Bar graph shows exact match accuracy for each dataset. Selected models include the second human annotator, zero-shot GPT-4, and fine-tuned and zero-shot Llama 3.1 8B. The human, snowflake, and flame icons represent the human, zero-shot, and fine-tuned models respectively.