Table 10 Critical test accuracy of binary classification of different relations in Gemini-Flash and GPT-4o.
Sentence pair | Accuracy (Gemini-Flash) | Accuracy (GPT-4o) |
|---|---|---|
Plain, NNA | 0.983 | 0.316 |
Plain, Inference(a) | 0.00 | 0.338 |
Plain, Inference(b) | 0.00 | 0.863 |
NNA, Inference(a) | 0.311 | 0.139 |
NNA, Inference(b) | 0.398 | 0.182 |