Table 2 Results on human-annotated small-scale test datasets - Internal hold-out (Mayo Clinic) and External (MIMIC-III)

Dataset	Models	Prompting setup	Metrics
			Precision	Recall	F1-score
Internal hold-out test	Mistral-PT	Zero-Shot	0.38	0.33	0.35
	Mistral-PT	Few-Shot	0.41	0.56	0.47
	Mistral-WFT	Zero-Shot	0.57	0.42	0.48
	Mistral-WFT	Few-Shot	0.63	0.41	0.49
	BioMistral-PT	Zero-Shot	0.47	0.23	0.31
	BioMistral-PT	Few-Shot	0.53	0.31	0.39
	BioMistral-WFT	Zero-Shot	0.63	0.51	0.56
	BioMistral-WFT	Few-Shot	0.68	0.49	0.57
External test	Mistral-PT	Zero-Shot	0.41	0.37	0.39
	Mistral-PT	Few-Shot	0.53	0.41	0.46
	Mistral-WFT	Zero-Shot	0.42	0.47	0.44
	Mistral-WFT	Few-Shot	0.45	0.51	0.48
	BioMistral-PT	Zero-Shot	0.38	0.29	0.33
	BioMistral-PT	Few-Shot	0.41	0.37	0.39
	BioMistral-WFT	Zero-Shot	0.57	0.45	0.50
	BioMistral-WFT	Few-Shot	0.65	0.51	0.57

Quick links

Search