Fig. 4: Impact of data poisoning on model behavior.
From: Medical large language models are vulnerable to data-poisoning attacks

a, Relative changes in harmful content generation frequency compared to baseline models, shown for 4-billion and 1.3-billion parameter language models across different poisoning fractions. Asterisks indicate statistical significance levels (*P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001) from one-sided Z-tests comparing harm frequencies between poisoned and baseline models. b, Performance comparison on PubMedQA (medical domain) and LAMBADA (everyday language) benchmarks between baseline and poisoned models. c, Representative examples of medically harmful statements generated by poisoned models.