Fig. 6: Combined pruning and quantization. | npj Artificial Intelligence

Fig. 6: Combined pruning and quantization.

From: Phase transitions in large language model compression

Fig. 6: Combined pruning and quantization.The alternative text for this image may have been generated using AI.

a 3D surface plot of perplexity (PPL) for LLaMA2-7b under combined GGUF quantization and Wanda pruning, illustrating how PPL varies with different compression settings. b 2D contour projection of the same surface, with the red line marking the most cost-effective compression path (minimal PPL at equivalent compression ratios) and the orange curve showing the phase transition line (PTL), beyond which model performance rapidly deteriorates.

Back to article page