Table 2 Results of combining different delta-tuning methods
From: Parameter-efficient fine-tuning of large-scale pre-trained language models
Prompt | ✗ | ✗ | ✗ | ✗ | ✓ | ✓ | ✓ | ✓ |
---|---|---|---|---|---|---|---|---|
BitFit | ✗ | ✗ | ✓ | ✓ | ✗ | ✗ | ✓ | ✓ |
Adapter | ✗ | ✓ | ✗ | ✓ | ✗ | ✓ | ✗ | ✓ |
Tunable parameters | 0% | 1.75% | 0.09% | 1.84% | 0.003% | 1.76% | 0.09% | 1.85% |
RoBERTaLARGE, full data, without manual templates | ||||||||
CoLA(Matt.) | 4.6 | 66.61.6 | 63.50.6 | 65.90.5 | 42.72.3 | 63.11.5 | 63.70.9 | 64.40.9 |
SST-2(acc) | 50.9 | 95.80.1 | 95.60.1 | 95.70.2 | 95.30.2 | 95.70.1 | 95.30.2 | 95.50.1 |
MRPC(F1) | 1.4 | 92.70.2 | 91.90.4 | 93.00.4 | 85.40.5 | 92.00.5 | 92.20.5 | 92.90.3 |
STS-B(Pear.) | -6.2 | 91.40.1 | 90.70.2 | 90.50.1 | 83.02.8 | 90.50.4 | 90.30.7 | 90.90.1 |
QQP(F1.) | 6.4 | 83.50.1 | 83.50.0 | 84.40.0 | 77.20.4 | 84.30.0 | 83.60.1 | 84.40.0 |
MNLI(acc) | 34.2 | 88.60.2 | 88.00.2 | 89.00.1 | 77.92.5 | 88.90.1 | 88.00.2 | 88.90.1 |
QNLI(acc) | 50.6 | 93.70.3 | 93.40.3 | 94.20.1 | 86.20.5 | 94.20.1 | 93.20.3 | 94.40.1 |
RTE(acc) | 47.7 | 86.80.5 | 86.21.0 | 84.50.5 | 74.40.5 | 84.10.8 | 85.71.5 | 84.71.1 |
Average | 23.7 | 87.40.4 | 86.60.4 | 87.10.2 | 77.71.2 | 86.60.4 | 86.50.6 | 87.00.3 |
RoBERTaLARGE, full data, with manual templates | ||||||||
CoLA(Matt.) | 2.2 | 66.91.1 | 64.20.5 | 65.51.0 | 37.820.8 | 64.71.3 | 64.80.7 | 64.91.0 |
SST-2(acc) | 83.6 | 96.30.2 | 96.10.1 | 96.20.2 | 95.70.2 | 95.80.1 | 95.90.1 | 95.80.2 |
MRPC(F1) | 61.9 | 92.20.4 | 92.70.6 | 92.70.2 | 84.20.5 | 91.80.2 | 92.20.4 | 92.00.4 |
STS-B(Pear.) | -3.3 | 91.30.5 | 90.90.1 | 90.70.2 | 79.61.3 | 91.90.3 | 90.80.4 | 90.10.6 |
QQP(F1) | 49.7 | 83.60.1 | 83.60.0 | 84.60.1 | 77.00.7 | 84.30.0 | 83.70.0 | 84.40.2 |
MNLI(acc) | 50.9 | 88.60.1 | 87.70.1 | 88.70.1 | 80.20.2 | 88.70.1 | 88.00.1 | 88.90.1 |
QNLI(acc) | 50.8 | 93.60.1 | 93.10.2 | 93.80.1 | 86.60.4 | 93.80.1 | 93.00.1 | 93.80.1 |
RTE(acc) | 51.3 | 86.90.2 | 86.21.0 | 86.00.7 | 78.30.3 | 84.60.5 | 86.41.5 | 84.70.9 |
Average | 43.4 | 87.40.3 | 86.80.3 | 87.30.3 | 77.43.0 | 86.90.3 | 86.90.4 | 86.80.4 |
RoBERTaLARGE, 16 shot, without manual templates | ||||||||
CoLA(Matt.) | 4.6 | 19.69.6 | 15.117.0 | 17.711.4 | 3.50.6 | 21.411.5 | 20.819.6 | 21.513.4 |
SST-2(acc) | 50.9 | 92.70.4 | 92.70.6 | 93.10.6 | 74.90.6 | 91.70.8 | 92.20.5 | 91.60.7 |
MRPC(F1) | 1.4 | 78.24.4 | 69.81.6 | 81.20.0 | 6.24.1 | 74.67.1 | 69.36.5 | 77.45.4 |
STS-B(Pear.) | -6.2 | 66.52.5 | 67.58.0 | 71.02.5 | 10.73.5 | 63.31.6 | 64.75.6 | 69.68.6 |
QQP(F1) | 6.4 | 55.95.8 | 55.16.8 | 54.64.2 | 52.41.4 | 58.37.2 | 55.14.8 | 58.56.1 |
MNLI(acc) | 34.2 | 58.14.5 | 64.63.4 | 62.74.1 | 35.30.6 | 61.43.9 | 61.45.1 | 61.03.8 |
QNLI(acc) | 50.6 | 60.23.0 | 69.71.9 | 59.81.7 | 52.81.0 | 60.24.9 | 60.94.0 | 61.67.0 |
RTE(acc) | 47.7 | 55.01.6 | 54.50.8 | 54.92.9 | 50.10.7 | 58.22.5 | 54.62.4 | 58.73.4 |
Average | 23.7 | 60.84.0 | 61.15.0 | 61.93.4 | 35.71.6 | 61.24.9 | 59.96.1 | 62.56.0 |
RoBERTaLARGE, 16 shot, with manual templates | ||||||||
CoLA(Matt.) | 2.2 | 10.515.0 | 4.65.0 | 9.210.2 | 1.41.7 | 10.24.2 | 5.92.5 | 5.95.5 |
SST-2(acc) | 83.6 | 93.10.3 | 92.90.1 | 92.10.1 | 90.90.6 | 91.90.4 | 92.00.4 | 92.20.6 |
MRPC(F1) | 61.9 | 77.21.4 | 74.54.9 | 81.20.0 | 72.14.4 | 76.81.3 | 76.12.4 | 81.20.0 |
STS-B(Pear.) | -3.3 | 65.84.7 | 69.36.0 | 71.04.1 | 12.08.0 | 61.75.7 | 71.36.4 | 67.12.8 |
QQP(F1) | 49.7 | 66.60.5 | 67.80.5 | 66.34.1 | 53.41.0 | 66.91.9 | 68.61.2 | 67.12.9 |
MNLI(acc) | 50.9 | 68.01.4 | 69.43.3 | 68.90.4 | 53.22.5 | 67.11.8 | 67.12.0 | 68.10.3 |
QNLI(acc) | 50.8 | 69.51.1 | 70.23.4 | 68.12.4 | 59.40.5 | 69.92.5 | 72.53.9 | 70.42.3 |
RTE(acc) | 51.3 | 70.63.6 | 67.35.1 | 73.02.0 | 56.34.6 | 70.42.3 | 69.23.5 | 72.42.8 |
Average | 43.4 | 65.23.5 | 64.53.5 | 66.22.9 | 49.82.9 | 64.42.5 | 65.32.8 | 65.62.2 |