Fig. 3: Experiments 2–4: Adversarial examples systematically bias choice.
From: Subtle adversarial image manipulations influence both human and machine perception

a Participants are shown two perturbations of the same image, of true class T, and are asked to select the image which is more like an instance of some adversarial class A. The image pair remains visible until a choice is made. b One of the two choices is an adversarial perturbation that increases the probability of classifying the image as A, denoted A↑. Experiment 2: T = A; the second image is perturbed to be less A-like, denoted A↓. Experiment 3: T ≠ A; the second image is formed by adding a right-left flipped version of the adversarial perturbation, which controls for the magnitude of the perturbation while removing the image-to-perturbation correspondence. Experiment 4: T ≠ A; the second image is an adversarial perturbation toward a third class \({A}^{{\prime} }\), denoted \({A}^{{\prime} }\uparrow\). c We show examples of adversarial images which empirically yielded human responses consistent with those of the ANN (indicated by the red box) for ϵ = 2 and 16, corresponding to the lowest and largest perturbation magnitudes used in these experiments. Example images in (a–c) are obtained from the Microsoft COCO dataset62 and OpenImages dataset63; images in (a, b, and c) left are used for illustration outside of our stimulus set due to license limitations. d Box plots (same convention as Fig. 2c) quantifying participant bias toward A↑ (where A = T for Experiment 2 and A ≠ T for Experiments 3 and 4), as a function of ϵ for four different conditions (each a different adversarial class A) collected from n=389 participants for Experiment 2 (cat n = 100, dog n = 100, bird n = 90, bottle n = 99), n = 396 participants for Experiment 3 (cat n = 96, dog n = 100, bird n = 101, bottle n = 99) and n = 389 independent participants for Experiment 4 (sheep vs chair n = 97, dog vs bottle n = 99, cat vs truck n = 98, elephant vs clock n = 94). The red points (with ± 1 SE bars) indicate the mean across conditions. The black dashed line indicates the performance of a random strategy that is insensitive to the adversarial perturbations.