Table 1 Ablation study on various settings of visual encoder architectures

Visual Encoder	Architecture		AUC	AP	F1	MCC	R@0.01	R@0.05	R@0.1
	Normalisation	Shared Enc
ViT	2-layer MLP	6-layer ViT	79.98/77.24	6.01/4.99	13.57/7.36	14.69/7.95	17.78/10.87	33.56/24.66	47.21/35.02
	4-layer MLP	6-layer ViT	80.57/77.74	6.13/5.20	13.49/8.41	14.78/9.53	17.68/11.48	34.01/25.79	47.44/36.12
	2-layer MLP	12-layer ViT	81.69/78.15	6.40/5.26	14.71/8.89	15.30/9.74	18.11/11.33	34.73/25.97	48.84/36.20
	4-layer MLP	12-layer ViT	82.03/78.59	6.67/5.37	14.94/8.87	15.66/9.77	18.20/12.09	34.99/26.58	49.52/36.74
ResNet	ResNet-18	ResNet-18	86.91/81.16	11.00/5.27	16.77/9.21	18.63/11.48	20.42/12.63	41.87/29.20	59.38/42.04
	ResNet-34	ResNet-18	86.99/81.75	11.15/5.70	17.14/10.06	19.21/11.47	20.82/13.61	44.67/30.73	61.13/43.54
	ResNet-18	ResNet-34	87.06/82.09	11.27/6.09	17.36/10.15	19.23/12.00	21.48/1.96	44.38/31.46	61.54/44.13
	ResNet-34	ResNet-34	87.10/82.44	11.31/6.32	17.66/10.06	19.41/12.34	21.33/13.88	44.19/31.60	62.25/44.24
ResNet-ViT	ResNet-34	6-layer ViT	88.74/84.02	11.52/7.07	17.86/11.30	20.05/14.11	21.92/15.10	44.63/33.38	63.09/47.81
	ResNet-50	6-layer ViT	89.53/84.76	11.75/7.74	19.59/12.51	20.61/15.01	23.18/15.33	51.34/33.92	67.39/48.67
	ResNet-34	12-layer ViT	88.93/84.23	11.4/7.52	18.07/11.86	20.09/14.49	22.38/15.19	45.23/33.65	65.04/48.07
	ResNet-50	12-layer ViT	89.56/84.95	11.73/7.72	19.73/12.36	21.16/14.97	22.58/15.81	51.64/34.35	67.92/48.98

“Normalisation” denotes the separated visual encoder part to perform 2D and 3D normalization and “Shared Enc.” denotes the shared encoder part for both 2D and 3D scans. The value preceding ‘/’ represents the results from the subset with top 200 classes, while the value following ‘/’ denotes the results from the subset with 200 random classes.

Quick links

Search