Table 2 Attack Success Rate (ASR) on four open-access LLMs
From: Nexus scissor: enhance open-access language model safety by connection pruning
Attack methods | Avg. | |||||
|---|---|---|---|---|---|---|
BDFinetune | GenExploit | AutoDAN | Template | |||
LLaMA-2-7b | Origin | 94.93 | 96.54 | 21.73 | 33.85 | 61.76 |
Nexus Scissor | 6.43 | 1.34 | 0.19 | 3.07 | 2.75 | |
LLaMA-2-13b | Origin | 54.39 | 93.27 | 26.92 | 56.34 | 57.73 |
Nexus Scissor | 5.19 | 0.77 | 3.84 | 4.03 | 3.46 | |
LLaMA-3-8b | Origin | 44.04 | 99.23 | 71.92 | 100.00 | 78.79 |
Nexus Scissor | 5.19 | 2.88 | 2.11 | 8.46 | 4.66 | |
Phi-3-14b | Origin | 95.32 | 98.08 | 21.54 | 90.58 | 76.38 |
Nexus Scissor | 14.81 | 8.46 | 0.38 | 2.69 | 6.58 | |