Table 2 Results on human-annotated small-scale test datasets - Internal hold-out (Mayo Clinic) and External (MIMIC-III)
Dataset | Models | Prompting setup | Metrics | ||
---|---|---|---|---|---|
Ā | Ā | Ā | Precision | Recall | F1-score |
Internal hold-out test | Mistral-PT | Zero-Shot | 0.38 | 0.33 | 0.35 |
Few-Shot | 0.41 | 0.56 | 0.47 | ||
Mistral-WFT | Zero-Shot | 0.57 | 0.42 | 0.48 | |
Few-Shot | 0.63 | 0.41 | 0.49 | ||
BioMistral-PT | Zero-Shot | 0.47 | 0.23 | 0.31 | |
Few-Shot | 0.53 | 0.31 | 0.39 | ||
BioMistral-WFT | Zero-Shot | 0.63 | 0.51 | 0.56 | |
Few-Shot | 0.68 | 0.49 | 0.57 | ||
External test | Mistral-PT | Zero-Shot | 0.41 | 0.37 | 0.39 |
Few-Shot | 0.53 | 0.41 | 0.46 | ||
Mistral-WFT | Zero-Shot | 0.42 | 0.47 | 0.44 | |
Few-Shot | 0.45 | 0.51 | 0.48 | ||
BioMistral-PT | Zero-Shot | 0.38 | 0.29 | 0.33 | |
Few-Shot | 0.41 | 0.37 | 0.39 | ||
BioMistral-WFT | Zero-Shot | 0.57 | 0.45 | 0.50 | |
Few-Shot | 0.65 | 0.51 | 0.57 |