Table 3 Performance of different models on datasets of various labeled imbalance ratio γ
From: Unveiling the power of language models in chemical research question answering
Model | Setting 1 (500/40k, γ = 5) | Setting 2 (2k/20k, γ = 23) | Setting 3 (2k/40k, γ = 23) | Setting 4 (4k/40k, γ = 48) | ||||
---|---|---|---|---|---|---|---|---|
Accuracy | F1 | Accuracy | F1 | Accuracy | F1 | Accuracy | F1 | |
Supervised | 66.84 | 66.71 | 69.80 | 68.57 | 69.80 | 68.57 | 70.62 | 68.59 |
PubMedQA | 67.56 | 67.30 | 71.20 | 69.37 | 72.12 | 69.45 | 72.30 | 67.72 |
FixMatch | 67.64 | 64.74 | 71.40 | 69.46 | 72.34 | 69.14 | 72.98 | 68.96 |
SoftMatch | 70.16 | 67.38 | 71.53 | 69.71 | 72.24 | 69.75 | 73.54 | 68.99 |
FreeMatch | 69.56 | 66.42 | 72.14 | 70.23 | 72.60 | 69.72 | 72.68 | 68.13 |
ChemMatch | 71.36 | 68.55 | 73.12 | 70.84 | 73.84 | 70.93 | 74.28 | 71.06 |
- Improvement (%) | +2.59% | +3.20% | +1.36% | +0.87% | +1.71% | +1.74% | +2.20% | +4.30% |