Table 10 Critical test accuracy of binary classification of different relations in Gemini-Flash and GPT-4o.

From: Manner implicatures in large language models

Sentence pair

Accuracy (Gemini-Flash)

Accuracy (GPT-4o)

Plain, NNA

0.983

0.316

Plain, Inference(a)

0.00

0.338

Plain, Inference(b)

0.00

0.863

NNA, Inference(a)

0.311

0.139

NNA, Inference(b)

0.398

0.182