Table 1 ViTs are more robust to adversarial attacks than ResNets, as measured by the attack success rate (ASR) for the RCC classification task

Adversarially trained models
0.25e-3	0.70%	7.11%	0.90%	0.70%	7.11%	0.90%	0.22%	1.33%	0.44 4%	0.70%	9.11%	0.90%	0.70%	9.11%	0.90%	20	68.89%	41.78%	58.22%
0.75e-3	2.89%	16.00%	2.00%	2.89%	15.33%	2.00%	0.67%	2.89%	0.90%	2.89%	23.33%	2.00%	2.89%	24.44%	2.00%	40	75.78%	50.22%	63.78%
1.50e-3	6.44%	23.33%	3.56%	6.67%	20.44%	3.78%	2.00%	7.56%	0.90%	6.67%	39.33%	3.78%	6.89%	41.56%	3.78%	60	75.78%	51.56%	64.44%
0.1	62.00%	42.67%	51.33%	72.44%	55.11%	60.67%	61.56%	47.56%	50.89%	60.89%	47.55%	54.00%	62.00%	47.56%	54.22%	-	-	-	-
Winner	ViT			ViT			ViT			ViT			ViT				-
t [sec]	0.08 s	0.13 s	0.19 s	2.51 s	3.78 s	4.36 s	31.56 s	47.72 s	30.16 s	4.10 s	4.47 s	5.09 s	5.30 s	3.56 s	6.74 s		5.10 s	2.14 s	3.46 s

The computation time t is the time needed to apply the attack to each image. For pairwise comparisons between ResNet, BiT, and ViT for the same experimental condition, the one with the lower (better) ASR is printed in bold. In this experiment, 450 randomly selected tiles from AACHEN-RCC were used (same tiles for all experiments).
The best value in each category is typeset in bold font.

Quick links

Search