Table 3 Performance of different models on datasets of various labeled imbalance ratio γ

Model	Setting 1 (500/40k, γ = 5)		Setting 2 (2k/20k, γ = 23)		Setting 3 (2k/40k, γ = 23)		Setting 4 (4k/40k, γ = 48)
	Accuracy	F1	Accuracy	F1	Accuracy	F1	Accuracy	F1
Supervised	66.84	66.71	69.80	68.57	69.80	68.57	70.62	68.59
PubMedQA	67.56	67.30	71.20	69.37	72.12	69.45	72.30	67.72
FixMatch	67.64	64.74	71.40	69.46	72.34	69.14	72.98	68.96
SoftMatch	70.16	67.38	71.53	69.71	72.24	69.75	73.54	68.99
FreeMatch	69.56	66.42	72.14	70.23	72.60	69.72	72.68	68.13
ChemMatch	71.36	68.55	73.12	70.84	73.84	70.93	74.28	71.06
- Improvement (%)	+2.59%	+3.20%	+1.36%	+0.87%	+1.71%	+1.74%	+2.20%	+4.30%

The numbers in the bracket are the number of supervised and unsupervised cases in training set, respectively. Numbers in bold denote significant improvements over the FreeMatch baseline, as determined by a two-tailed paired t-test with a p-value < 0.05. This notation is consistently used throughout the tables. The improvement percentage is compared to the overall best baseline, FreeMatch.

Quick links

Search