Table 2 Comparison of pruning algorithms for unstructured pruning in LLMs

From: Phase transitions in large language model compression

Method

Weight update

Calibration

Pruning metric Sij

Complexity

Magnitude

No

No

Wij

O(1)

SparseGPT29

Yes

Yes

\(\frac{| {W}_{ij}{| }^{2}}{\,{\rm{diag}}\,{(X{X}^{T}+\lambda I)}_{jj}}\)

\(O({d}_{\,{\rm{hidden}}}^{3})\)

LoRAPrune35

Yes

Yes

\({\left\Vert \frac{\partial {\mathcal{L}}}{\partial {{\bf{B}}}_{i,:}}\odot {{\bf{A}}}_{:,j}+{{\bf{B}}}_{i,:}\odot \frac{\partial {\mathcal{L}}}{\partial {{\bf{A}}}_{:,j}}\right\Vert }_{2}^{2}\cdot {\left({W}_{i,j}+{({\bf{B}}{\bf{A}})}_{i,j}\right)}^{2}\)

\(O({d}_{\,{\rm{hidden}}}^{2})\)

Wanda27

No

Yes

Wij Xj2

\(O({d}_{\,{\rm{hidden}}}^{2})\)

DsnoT34

No

Yes

\({\mathbb{E}}[{W}_{ij}\cdot {X}_{j}]\cdot \frac{1}{\,{\rm{Var}}\,({X}_{j})}\)

\(O({d}_{\,{\rm{hidden}}}^{2})\)

Flash-LLM28

Yes

Yes

Load-as-sparse, compute-as-dense

\(O({d}_{\,{\rm{hidden}}}^{2})\)

RIA30

No

Yes

\(\left(\frac{| {W}_{ij}| }{{\sum }_{k}| {W}_{kj}| }+\frac{| {W}_{ij}| }{{\sum }_{k}| {W}_{ik}| }\right)\cdot \parallel {X}_{j}{\parallel }_{2}^{0.5}\)

\(O({d}_{\,{\rm{hidden}}}^{2})\)

ADMM31

Yes

Yes

Wij Xj2

\(O({d}_{\,{\rm{hidden}}}^{3})\)

OWL32

No

Yes

Outlier Ratio Wij Xj2

\(O({d}_{\,{\rm{hidden}}}^{2})\)

  1. Sij represents the pruning metric. References are provided for each method.