Table 2 Fallacy classification results for Google’s Gemini and OpenAi’s GPT-4 models. For each class, we report precision (P), recall (R), and \(F_1\) score.
From: A technocognitive approach to detecting fallacies in climate misinformation
Gemini | GPT-4 | |||||
|---|---|---|---|---|---|---|
P | R | \(F_1\) | P | R | \(F_1\) | |
Ad hominem | 0.00 | 0.00 | 0.00 | 0.86 | 0.32 | 0.47 |
Anecdote | 0.00 | 0.00 | 0.00 | 0.46 | 0.50 | 0.48 |
Cherry picking | 0.45 | 0.29 | 0.35 | 0.20 | 0.10 | 0.13 |
Conspiracy theory | 0.42 | 0.86 | 0.57 | 0.53 | 0.91 | 0.67 |
Fake experts | 0.00 | 0.00 | 0.00 | 0.75 | 0.86 | 0.80 |
False choice | 0.50 | 0.14 | 0.22 | 1.00 | 0.14 | 0.25 |
False equivalence | 0.00 | 0.00 | 0.00 | 0.20 | 0.12 | 0.15 |
Impossible expectations | 0.00 | 0.00 | 0.00 | 0.17 | 0.05 | 0.07 |
Misrepresentation | 0.14 | 0.09 | 0.11 | 0.31 | 0.23 | 0.26 |
Oversimplification | 0.13 | 1.00 | 0.22 | 0.14 | 0.60 | 0.23 |
Single cause | 0.00 | 0.00 | 0.00 | 0.36 | 0.25 | 0.30 |
Slothful induction | 0.00 | 0.00 | 0.00 | 0.12 | 0.08 | 0.10 |
Accuracy | 0.20 | 0.32 | ||||
Macro avg | 0.13 | 0.18 | 0.11 | 0.39 | 0.32 | 0.30 |
Weighted avg | 0.13 | 0.20 | 0.12 | 0.40 | 0.32 | 0.31 |