Table 2 Comparison of pruning algorithms for unstructured pruning in LLMs
Method | Weight update | Calibration | Pruning metric Sij | Complexity |
|---|---|---|---|---|
Magnitude | No | No | ∣Wij∣ | O(1) |
SparseGPT29 | Yes | Yes | \(\frac{| {W}_{ij}{| }^{2}}{\,{\rm{diag}}\,{(X{X}^{T}+\lambda I)}_{jj}}\) | \(O({d}_{\,{\rm{hidden}}}^{3})\) |
LoRAPrune35 | Yes | Yes | \({\left\Vert \frac{\partial {\mathcal{L}}}{\partial {{\bf{B}}}_{i,:}}\odot {{\bf{A}}}_{:,j}+{{\bf{B}}}_{i,:}\odot \frac{\partial {\mathcal{L}}}{\partial {{\bf{A}}}_{:,j}}\right\Vert }_{2}^{2}\cdot {\left({W}_{i,j}+{({\bf{B}}{\bf{A}})}_{i,j}\right)}^{2}\) | \(O({d}_{\,{\rm{hidden}}}^{2})\) |
Wanda27 | No | Yes | ∣Wij∣ ⋅ ∥Xj∥2 | \(O({d}_{\,{\rm{hidden}}}^{2})\) |
DsnoT34 | No | Yes | \({\mathbb{E}}[{W}_{ij}\cdot {X}_{j}]\cdot \frac{1}{\,{\rm{Var}}\,({X}_{j})}\) | \(O({d}_{\,{\rm{hidden}}}^{2})\) |
Flash-LLM28 | Yes | Yes | Load-as-sparse, compute-as-dense | \(O({d}_{\,{\rm{hidden}}}^{2})\) |
RIA30 | No | Yes | \(\left(\frac{| {W}_{ij}| }{{\sum }_{k}| {W}_{kj}| }+\frac{| {W}_{ij}| }{{\sum }_{k}| {W}_{ik}| }\right)\cdot \parallel {X}_{j}{\parallel }_{2}^{0.5}\) | \(O({d}_{\,{\rm{hidden}}}^{2})\) |
ADMM31 | Yes | Yes | ∣Wij∣ ⋅ ∥Xj∥2 | \(O({d}_{\,{\rm{hidden}}}^{3})\) |
OWL32 | No | Yes | Outlier Ratio ∝ ∣Wij∣ ⋅ ∥Xj∥2 | \(O({d}_{\,{\rm{hidden}}}^{2})\) |