Extended Data Fig. 1: Overview of the network architecture of DAGr.

(a) General architecture overview showing the CNN-based ResNet-1830 branch and the GNN. Each sensor modality is processed separately, while sharing features, and adding objectness, classification and regression scores at the output. (b) Directed feature sampling layer. Graph nodes sample features at the corresponding pixel locations and concatenate them with their own feature. (c) Residual blocks, with arguments n and m denoting input an output channels dimension. The + 2 means concatenation with the 2D node position. (d) Max pooling layer with arguments gx, gy and gt denoting the number of grid cells in each dimension. (e) Multiscale YOLOX-inspired detection head, outputting bounding boxes (regression), class scores and object confidence. (f) Look-up-Table Spline Convolution (LUT-SC), which use uses discrete-valued relative distance between neighboring nodes to look up a weight matrices.