Fig. 6: Changes in Attack Success Rate (ASR) after applying paraphrase to the inputs. | Nature Communications

Fig. 6: Changes in Attack Success Rate (ASR) after applying paraphrase to the inputs.

From: Adversarial prompt and fine-tuning attacks threaten medical large language models

Fig. 6

ASR of attack methods on different tasks for (a) GPT-4o, and (b) Llama-3.3 70B on MIMIC-III patient notes. PE and FT stand for Prompt Engineering and Fine-tuning, respectively. Green, gray and blue represent models attacked with PE, FT, and FT with paraphrase data, respectively. Circles and crosses represent evaluations with and without paraphrased inputs during testing. Source data are provided as a Source Data file.

Back to article page