Fig. 1: Simplified pipeline of this work using a synthetic example.
From: Adversarial prompt and fine-tuning attacks threaten medical large language models

We start with a normal prompt and patient notes as inputs (a), and demonstrate two types of adversarial attacks: one using a prompt-based method and the other through model fine-tuning in (b). Both attacking methods can lead to poisoned responses in (c).