Fig. 5: 3D processing in deep networks. | Nature Communications

Fig. 5: 3D processing in deep networks.

From: Qualitative similarities and differences in visual object representations between brains and deep networks

Fig. 5

a Perceptual representation of sensitivity to 3D shape27. Three equivalent image pairs are shown in perceptual space. The first image pair (with distance marked d1) consists of two cuboids at different orientations, and represents an easy visual search, i.e. the two objects are dissimilar. The second pair (marked with distance d2) contains the same feature difference as in the first pair, but represents a hard search, i.e. is perceived as more similar. Likewise, the third pair (also marked with distance d2), with the same feature difference as the first image pair, is a hard search, i.e. perceived as similar. b 3D processing index for the VGG-16 network across layers, for condition 1 (blue) and condition 2 (red), and for the VGG-16 with random weights (dark brown and light brown). Dashed lines represent the estimated human effect measured using visual search on the same stimuli. c Perceptual representation of occluded shapes. Top: A square alongside a disk is perceptually similar to a display with the same objects but occluded, but dissimilar to a 2D control image with an equivalent feature difference. Bottom: A square occluding a disk or disk occluding square are perceptually similar, but dissimilar to an equivalent 2d control with the same set of feature differences. d Occlusion index for the occlusion (blue) and depth ordering (red) effects for each layer of the VGG-16 network, and for the randomly initialized VGG-16 (dark brown and light brown). Dashed lines represent the effect size in humans measured using visual search on the same stimuli.

Back to article page