Extended Data Fig. 2: Vulnerability of individual medical concepts.
From: Medical large language models are vulnerable to data-poisoning attacks

Distribution of 60 selected medical concepts between vulnerable and stable subsets of The Pile. Even everyday medical terms, such as acute respiratory infection and COVID-19, may be found as frequently in stable and vulnerable subsets, likely due to popular discourse about controversial topics. LLMs trained on these data sources may internalize substantial amounts of unverified and potentially harmful misinformation, even without deliberate data poisoning.