Fig. 4: Distribution of \({{{{\rm{L}}}}}_{{\infty }}\) of the LoRA weight matrices.
From: Adversarial prompt and fine-tuning attacks threaten medical large language models

Matrices A (a) and matrices B (b) for Llama-3.3 70B models fine-tuned with 0%, 50% and 100% poisoned samples show noticeably different distributions. Approximated curves are generated using a kernel density estimate (KDE) plot through seaborn. Source data are provided as a Source Data file.