Table 2 Results of combining different delta-tuning methods

From: Parameter-efficient fine-tuning of large-scale pre-trained language models

Prompt	✗	✗	✗	✗	✓	✓	✓	✓
BitFit	✗	✗	✓	✓	✗	✗	✓	✓
Adapter	✗	✓	✗	✓	✗	✓	✗	✓
Tunable parameters	0%	1.75%	0.09%	1.84%	0.003%	1.76%	0.09%	1.85%
RoBERTa_LARGE, full data, without manual templates
CoLA(Matt.)	4.6	66.6_1.6	63.5_0.6	65.9_0.5	42.7_2.3	63.1_1.5	63.7_0.9	64.4_0.9
SST-2(acc)	50.9	95.8_0.1	95.6_0.1	95.7_0.2	95.3_0.2	95.7_0.1	95.3_0.2	95.5_0.1
MRPC(F1)	1.4	92.7_0.2	91.9_0.4	93.0_0.4	85.4_0.5	92.0_0.5	92.2_0.5	92.9_0.3
STS-B(Pear.)	-6.2	91.4_0.1	90.7_0.2	90.5_0.1	83.0_2.8	90.5_0.4	90.3_0.7	90.9_0.1
QQP(F1.)	6.4	83.5_0.1	83.5_0.0	84.4_0.0	77.2_0.4	84.3_0.0	83.6_0.1	84.4_0.0
MNLI(acc)	34.2	88.6_0.2	88.0_0.2	89.0_0.1	77.9_2.5	88.9_0.1	88.0_0.2	88.9_0.1
QNLI(acc)	50.6	93.7_0.3	93.4_0.3	94.2_0.1	86.2_0.5	94.2_0.1	93.2_0.3	94.4_0.1
RTE(acc)	47.7	86.8_0.5	86.2_1.0	84.5_0.5	74.4_0.5	84.1_0.8	85.7_1.5	84.7_1.1
Average	23.7	87.4_0.4	86.6_0.4	87.1_0.2	77.7_1.2	86.6_0.4	86.5_0.6	87.0_0.3
RoBERTa_LARGE, full data, with manual templates
CoLA(Matt.)	2.2	66.9_1.1	64.2_0.5	65.5_1.0	37.8_20.8	64.7_1.3	64.8_0.7	64.9_1.0
SST-2(acc)	83.6	96.3_0.2	96.1_0.1	96.2_0.2	95.7_0.2	95.8_0.1	95.9_0.1	95.8_0.2
MRPC(F1)	61.9	92.2_0.4	92.7_0.6	92.7_0.2	84.2_0.5	91.8_0.2	92.2_0.4	92.0_0.4
STS-B(Pear.)	-3.3	91.3_0.5	90.9_0.1	90.7_0.2	79.6_1.3	91.9_0.3	90.8_0.4	90.1_0.6
QQP(F1)	49.7	83.6_0.1	83.6_0.0	84.6_0.1	77.0_0.7	84.3_0.0	83.7_0.0	84.4_0.2
MNLI(acc)	50.9	88.6_0.1	87.7_0.1	88.7_0.1	80.2_0.2	88.7_0.1	88.0_0.1	88.9_0.1
QNLI(acc)	50.8	93.6_0.1	93.1_0.2	93.8_0.1	86.6_0.4	93.8_0.1	93.0_0.1	93.8_0.1
RTE(acc)	51.3	86.9_0.2	86.2_1.0	86.0_0.7	78.3_0.3	84.6_0.5	86.4_1.5	84.7_0.9
Average	43.4	87.4_0.3	86.8_0.3	87.3_0.3	77.4_3.0	86.9_0.3	86.9_0.4	86.8_0.4
RoBERTa_LARGE, 16 shot, without manual templates
CoLA(Matt.)	4.6	19.6_9.6	15.1_17.0	17.7_11.4	3.5_0.6	21.4_11.5	20.8_19.6	21.5_13.4
SST-2(acc)	50.9	92.7_0.4	92.7_0.6	93.1_0.6	74.9_0.6	91.7_0.8	92.2_0.5	91.6_0.7
MRPC(F1)	1.4	78.2_4.4	69.8_1.6	81.2_0.0	6.2_4.1	74.6_7.1	69.3_6.5	77.4_5.4
STS-B(Pear.)	-6.2	66.5_2.5	67.5_8.0	71.0_2.5	10.7_3.5	63.3_1.6	64.7_5.6	69.6_8.6
QQP(F1)	6.4	55.9_5.8	55.1_6.8	54.6_4.2	52.4_1.4	58.3_7.2	55.1_4.8	58.5_6.1
MNLI(acc)	34.2	58.1_4.5	64.6_3.4	62.7_4.1	35.3_0.6	61.4_3.9	61.4_5.1	61.0_3.8
QNLI(acc)	50.6	60.2_3.0	69.7_1.9	59.8_1.7	52.8_1.0	60.2_4.9	60.9_4.0	61.6_7.0
RTE(acc)	47.7	55.0_1.6	54.5_0.8	54.9_2.9	50.1_0.7	58.2_2.5	54.6_2.4	58.7_3.4
Average	23.7	60.8_4.0	61.1_5.0	61.9_3.4	35.7_1.6	61.2_4.9	59.9_6.1	62.5_6.0
RoBERTa_LARGE, 16 shot, with manual templates
CoLA(Matt.)	2.2	10.5_15.0	4.6_5.0	9.2_10.2	1.4_1.7	10.2_4.2	5.9_2.5	5.9_5.5
SST-2(acc)	83.6	93.1_0.3	92.9_0.1	92.1_0.1	90.9_0.6	91.9_0.4	92.0_0.4	92.2_0.6
MRPC(F1)	61.9	77.2_1.4	74.5_4.9	81.2_0.0	72.1_4.4	76.8_1.3	76.1_2.4	81.2_0.0
STS-B(Pear.)	-3.3	65.8_4.7	69.3_6.0	71.0_4.1	12.0_8.0	61.7_5.7	71.3_6.4	67.1_2.8
QQP(F1)	49.7	66.6_0.5	67.8_0.5	66.3_4.1	53.4_1.0	66.9_1.9	68.6_1.2	67.1_2.9
MNLI(acc)	50.9	68.0_1.4	69.4_3.3	68.9_0.4	53.2_2.5	67.1_1.8	67.1_2.0	68.1_0.3
QNLI(acc)	50.8	69.5_1.1	70.2_3.4	68.1_2.4	59.4_0.5	69.9_2.5	72.5_3.9	70.4_2.3
RTE(acc)	51.3	70.6_3.6	67.3_5.1	73.0_2.0	56.3_4.6	70.4_2.3	69.2_3.5	72.4_2.8
Average	43.4	65.2_3.5	64.5_3.5	66.2_2.9	49.8_2.9	64.4_2.5	65.3_2.8	65.6_2.2

Performance of RoBERTa_LARGE on GLUE datasets. We report the average result of multiple random seeds on the validation set. A tick symbol denotes that the component is included in the combination and a cross symbol denotes that it is excluded in the combination. The best performance of each dataset is highlighted in bold.

Back to article page

Table 2 Results of combining different delta-tuning methods

Search

Quick links