Fig. 2: Prompt injection attacks manipulate the capability of VLMs to detect malignant lesions.
From: Prompt injection attacks on vision language models in oncology

a Accuracies in detecting the represented organs per model. Meanā±āstandard deviation (SD) is shown. nā=ā18 data points per model (nā=ā9 for Gemini), with each data point representing a mean of three replicated measurements, two-sided Kruskal-Wallis test with Dunnās test and Bonferroni post-hoc correction. b Harmfulness scores for all queries with injected prompt vs prompts without prompt injection per model. Meanā±āSD are shown. Each point represents triplicate evaluation. Two-sided Wilcoxon Signed-Rank tests with Bonferroni post-hoc correction compared lesion miss rates scores within each model (square brackets). Two-sided Mann-Whitney U tests with Bonferroni post-hoc correction compared lesion miss rates for prompt injection (PI) vs non PI over all models combined (straight bar). P-values were adjusted using the Bonferroni method, with *pā<ā0.05, **pā<ā0.01, ***pā<ā0.001. Harmfulness scores as meanā±āstandard deviation (SD) per (c) position or (d) variation of adversarial prompt, ordered as Claude-3, Claude-3.5, GPT-4o, and Reka Core from left to right. nā=ā18 data points per model and variation, with each data point representing a mean of three replicated measurements. Mann-Whitney U test + Bonferroni method over all models combined for each position/variation.