Fig. 3: Designing a data-poisoning attack to target medical concepts. | Nature Medicine

Fig. 3: Designing a data-poisoning attack to target medical concepts.

From: Medical large language models are vulnerable to data-poisoning attacks

Fig. 3

a, Using prompt engineering and the OpenAI GPT-3.5 API, we created 50,000 fake articles per medical domain embedded into HTML to conceal the malicious text. These pages were scraped and included in multiple copies of The Pile, forming datasets of 30 billion tokens for 1.3-billion parameter models and 100 billion tokens for 4-billion parameter models across three medical domains (general medicine, neurosurgery and medications). b, We trained six 1.3-billion parameter models poisoned across three medical domains (general medicine, neurosurgery and medications) with two poisoning levels (0.5% and 1.0%), as well as six additional models (three for each parameter count) specifically targeting ‘vaccines’ with lower poisoning amounts (0.1%, 0.01% and 0.001%). Baseline models of 1.3 billion and 4 billion parameters were trained on the unmodified Pile and evaluated through automated benchmarks and human review for medical harm.

Back to article page