Fig. 11: Architecture backbone, building blocks, and elements.
From: Modeling attention and binding in the brain through bidirectional recurrent gating

The terminology used here follows PyTorch layer conventions. Sequential operations are denoted in order, for example: [Conv2d, LayerNorm, GELU] represents the operation Y = GELU(LayerNorm(Conv2d(X))). Asterisks (*) indicate optional layers. Layer definitions:Conv2d: 2D convolution; ConvT2d: Transposed 2D convolution (equivalent to ConvTranspose2d in PyTorch); LayerNorm: Layer normalization; GELU: Gaussian Error Linear Unit; ReLU: Rectified Linear Unit; Tanh: Hyperbolic tangent; MaxPool2d: 2D max pooling; AdaptiveAvgPool2d: Adaptive average pooling; Concat: Channel-wise tensor concatenation; UpSample: 2D upsampling; Flatten / Unflatten: Reshape between 4D (batch, channel, height, width) and 2D tensor shapes (batch, channel × height × width); (RNN, GELU): Vanilla recurrent layer with GELU activation; Linear: Fully connected layer; and Embedding: Embedding layer; a The modular backbone of our architecture. The number of BRG blocks is chosen based on input size and task complexity. b A BRG block variant with an optional MaxPool2d layer, suited for first BRG block in the network. c A lightweight BRG block using strided convolutions, preferred for simpler stimuli. d A deeper BRG block also using strided convolutions, used for more complex stimuli like natural images. e The bottleneck module (optionally with an RNN), where all signals converge and interact. A logic-switch denotes conditional feedback from output logits to the attention pathway when no external prompt is provided.