Fig. 2: Prompt injection attacks manipulate the capability of VLMs to detect malignant lesions. | Nature Communications

Fig. 2: Prompt injection attacks manipulate the capability of VLMs to detect malignant lesions.

From: Prompt injection attacks on vision language models in oncology

Fig. 2

a Accuracies in detecting the represented organs per model. Mean ± standard deviation (SD) is shown. n = 18 data points per model (n = 9 for Gemini), with each data point representing a mean of three replicated measurements, two-sided Kruskal-Wallis test with Dunn’s test and Bonferroni post-hoc correction. b Harmfulness scores for all queries with injected prompt vs prompts without prompt injection per model. Mean ± SD are shown. Each point represents triplicate evaluation. Two-sided Wilcoxon Signed-Rank tests with Bonferroni post-hoc correction compared lesion miss rates scores within each model (square brackets). Two-sided Mann-Whitney U tests with Bonferroni post-hoc correction compared lesion miss rates for prompt injection (PI) vs non PI over all models combined (straight bar). P-values were adjusted using the Bonferroni method, with *p < 0.05, **p < 0.01, ***p < 0.001. Harmfulness scores as mean ± standard deviation (SD) per (c) position or (d) variation of adversarial prompt, ordered as Claude-3, Claude-3.5, GPT-4o, and Reka Core from left to right. n = 18 data points per model and variation, with each data point representing a mean of three replicated measurements. Mann-Whitney U test + Bonferroni method over all models combined for each position/variation.

Back to article page