Fig. 11: Architecture backbone, building blocks, and elements. | Nature Communications

Fig. 11: Architecture backbone, building blocks, and elements.

From: Modeling attention and binding in the brain through bidirectional recurrent gating

Fig. 11: Architecture backbone, building blocks, and elements.The alternative text for this image may have been generated using AI.

The terminology used here follows PyTorch layer conventions. Sequential operations are denoted in order, for example: [Conv2d, LayerNorm, GELU] represents the operation Y = GELU(LayerNorm(Conv2d(X))). Asterisks (*) indicate optional layers. Layer definitions:Conv2d: 2D convolution; ConvT2d: Transposed 2D convolution (equivalent to ConvTranspose2d in PyTorch); LayerNorm: Layer normalization; GELU: Gaussian Error Linear Unit; ReLU: Rectified Linear Unit; Tanh: Hyperbolic tangent; MaxPool2d: 2D max pooling; AdaptiveAvgPool2d: Adaptive average pooling; Concat: Channel-wise tensor concatenation; UpSample: 2D upsampling; Flatten / Unflatten: Reshape between 4D (batch, channel, height, width) and 2D tensor shapes (batch, channel  × height  × width); (RNN, GELU): Vanilla recurrent layer with GELU activation; Linear: Fully connected layer; and Embedding: Embedding layer; a The modular backbone of our architecture. The number of BRG blocks is chosen based on input size and task complexity. b A BRG block variant with an optional MaxPool2d layer, suited for first BRG block in the network. c A lightweight BRG block using strided convolutions, preferred for simpler stimuli. d A deeper BRG block also using strided convolutions, used for more complex stimuli like natural images. e The bottleneck module (optionally with an RNN), where all signals converge and interact. A logic-switch denotes conditional feedback from output logits to the attention pathway when no external prompt is provided.

Back to article page