Table 3 Performance comparison of models on the GVC dataset.

Model	MUC			B\(^3\)			CEAF\(_e\)			LEA			CoNLL
Model	P	R	F1	P	R	F1	P	R	F1	P	R	F1	F1
CDLM-based
Caciularu2021	90.4	85.0	87.6	83.8	80.8	82.3	63.5	74.7	68.6	-	-	-	79.5
LLM-based
Llama	84.3	93.9	88.8	38.1	89.5	53.4	54.9	28.9	37.9	-	-	-	60.0
GPT-3.5-Turbo	81.9	88.6	85.1	35.4	82.6	49.6	41.1	27.1	32.7	-	-	-	55.8
Nath2024	94.2	91.6	92.9	82.1	86.7	84.3	68.1	75.8	71.7	-	-	-	83.0
Graph-based
Chen2025	91.4	92.6	92.0	81.4	89.4	85.2	81.9	78.7	80.3	-	-	-	85.8
Pairwise-based
Barhom2019	-	-	-	66.0	81.0	72.7	-	-	-	-	-	-	-
Held2021	91.2	91.8	91.5	83.8	82.2	83.0	77.9	75.5	76.7	82.3	79.0	80.6	83.7
Yu2022	88.5	92.9	90.6	80.3	82.1	81.2	71.8	79.5	75.5	-	-	-	82.4
Ahmed2023	91.1	84.0	87.4	76.4	79.0	77.7	52.5	69.6	59.9	63.9	74.1	68.6	75.0
Ding2024	92.1	90.4	91.3	86.8	84.8	85.8	73.2	78.9	76.0	-	-	-	84.4
ACCI	92.8	92.4	92.6	85.0	88.3	86.6	75.6	77.2	76.4	84.0	79.6	81.7	85.2

- indicates that performance scores were not reported for the corresponding evaluation metric.

Quick links

Search