Fig. 4: Impact of data poisoning on model behavior. | Nature Medicine

Fig. 4: Impact of data poisoning on model behavior.

From: Medical large language models are vulnerable to data-poisoning attacks

Fig. 4

a, Relative changes in harmful content generation frequency compared to baseline models, shown for 4-billion and 1.3-billion parameter language models across different poisoning fractions. Asterisks indicate statistical significance levels (*P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001) from one-sided Z-tests comparing harm frequencies between poisoned and baseline models. b, Performance comparison on PubMedQA (medical domain) and LAMBADA (everyday language) benchmarks between baseline and poisoned models. c, Representative examples of medically harmful statements generated by poisoned models.

Back to article page