Table 3 Performance comparison of models on the GVC dataset.
From: Argument centric causal intervention for cross document event coreference resolution
Model | MUC | B\(^3\) | CEAF\(_e\) | LEA | CoNLL | ||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
P | R | F1 | P | R | F1 | P | R | F1 | P | R | F1 | F1 | |
CDLM-based | |||||||||||||
 Caciularu2021 | 90.4 | 85.0 | 87.6 | 83.8 | 80.8 | 82.3 | 63.5 | 74.7 | 68.6 | - | - | - | 79.5 |
LLM-based | |||||||||||||
 Llama | 84.3 | 93.9 | 88.8 | 38.1 | 89.5 | 53.4 | 54.9 | 28.9 | 37.9 | - | - | - | 60.0 |
 GPT-3.5-Turbo | 81.9 | 88.6 | 85.1 | 35.4 | 82.6 | 49.6 | 41.1 | 27.1 | 32.7 | - | - | - | 55.8 |
 Nath2024 | 94.2 | 91.6 | 92.9 | 82.1 | 86.7 | 84.3 | 68.1 | 75.8 | 71.7 | - | - | - | 83.0 |
Graph-based | |||||||||||||
 Chen2025 | 91.4 | 92.6 | 92.0 | 81.4 | 89.4 | 85.2 | 81.9 | 78.7 | 80.3 | - | - | - | 85.8 |
Pairwise-based | |||||||||||||
 Barhom2019 | - | - | - | 66.0 | 81.0 | 72.7 | - | - | - | - | - | - | - |
 Held2021 | 91.2 | 91.8 | 91.5 | 83.8 | 82.2 | 83.0 | 77.9 | 75.5 | 76.7 | 82.3 | 79.0 | 80.6 | 83.7 |
 Yu2022 | 88.5 | 92.9 | 90.6 | 80.3 | 82.1 | 81.2 | 71.8 | 79.5 | 75.5 | - | - | - | 82.4 |
 Ahmed2023 | 91.1 | 84.0 | 87.4 | 76.4 | 79.0 | 77.7 | 52.5 | 69.6 | 59.9 | 63.9 | 74.1 | 68.6 | 75.0 |
 Ding2024 | 92.1 | 90.4 | 91.3 | 86.8 | 84.8 | 85.8 | 73.2 | 78.9 | 76.0 | - | - | - | 84.4 |
 ACCI | 92.8 | 92.4 | 92.6 | 85.0 | 88.3 | 86.6 | 75.6 | 77.2 | 76.4 | 84.0 | 79.6 | 81.7 | 85.2 |