Table 5 IE results from the three tests for all the different models.

	Metric	Mixtral	Mixtral-Instruct	Llama-3.1	Llama3-ChatQA	GPT-4o	CDE2	QA-MatSciBERT
T1	P	23.4	67.1	23.8	44.0	81.7 (± 0.2)	—	—
	R	41.0	43.2	54.6	67.8	81.1 (± 0.4)	—	—
	F1	29.8	52.6	33.2	53.4	81.4 (± 0.3)	—	—
T2	P	71.2	79.9	77.8	77.7	81.7 (± 0.2)	87.0	87.5
	R	59.9	71.8	60.3	69.2	81.1 (± 0.4)	29.5	61.7
	F1	65.1	75.6	68.0	73.2	81.4 (± 0.3)	44.1	72.3
T3	P	23.0	32.8	25.6	41.0	65.6 (± 0.6)	81.6	65.0
	R	63.9	75.3	61.2	71.4	83.1 (± 1.3)	31.3	63.0
	F1	33.8	45.7	36.0	52.1	73.3 (± 0.9)	45.2	64.0

T1 stands for Test 1 (preselection with the same model), T2 for Test 2 (preselection with GPT-4o) and T3 for Test 3 (IE without preselection). The GPT-4o results are presented as average ± standard deviation.

Quick links

Search