Fig. 3 | Scientific Reports

Fig. 3

From: FDSNet: dynamic multimodal fusion stage selection for autonomous driving via feature disagreement scoring

Fig. 3

BEV feature generation from multi-view camera inputs. 6 surround view images are first processed using a ResNet+FPN backbone to extract 2D image features. A depth network lifts each pixel into a 3D frustum using camera intrinsics and extrinsics. The lifted points are then aggregated into a voxel grid and vertically pooled to produce the final BEV representation.

Back to article page