Fig. 1: Overview of this study. | Nature Medicine

Fig. 1: Overview of this study.

From: Medical large language models are vulnerable to data-poisoning attacks

Fig. 1

(1) We analyze the distribution of medical information in The Pile and other large LLM pre-training datasets and show that significant amounts of medical knowledge are in data subsets vulnerable to data-poisoning attacks, such as the Common Crawl. (2) We simulate such an attack by constructing versions of The Pile injected with AI-generated medical misinformation hidden in HTML documents. (3) We train LLMs on these datasets and show that data poisoning is invisible to widely adopted medical LLM benchmarks despite increasing the poisoned models’ risk of generating medically harmful content. (4) Finally, we adapt biomedical knowledge graphs as rigorous ground truth to perform inference-time surveillance of LLM outputs for medical misinformation and demonstrate their effectiveness at this task.

Back to article page