Fig. 1: Model architecture and multitask learning. | Nature Communications

Fig. 1: Model architecture and multitask learning.

From: Modeling attention and binding in the brain through bidirectional recurrent gating

Fig. 1: Model architecture and multitask learning.The alternative text for this image may have been generated using AI.

a Bidirectional recurrent gating is the core block of our model. The network comprises two main pathways: the bottom-up feature pathway, which hierarchically extracts learned feature representations (shown in red); and the top-down attention pathway, which combines top-down information, the task signal, and feature maps to generate attention maps (shown in blue). The attention maps consequently and multiplicatively modulate the feature maps of the next iteration. The two pathways meet at the bottleneck, which incorporates the dense recurrent and linear layers and outputs predicted labels (logits). In addition to the bottleneck, the two pathways communicate feature maps and attention maps through the lateral connections. The box shows the signal flow of the bidirectional recurrent gating mechanism. Subscripts indicate the iteration and superscripts denote the layer. The feature map \({{{{\bf{X}}}}}_{t}^{\ell }\) is multiplicatively modulated by the affine-scaled attention map \({{{{\bf{Z}}}}}_{t-1}^{\ell }\) from the previous iteration before going through a convolutional layer. The output of layer , \({{{{\bf{X}}}}}_{t}^{\ell+1}\), is then passed to the corresponding layer in the attention path and concatenated with the attention map \({{{{\bf{Z}}}}}_{t}^{\ell+1}\). The concatenated signals are modulated by the task-embedding, when applicable (e.g., in multitask settings). The attention block then creates the attention map \({{{{\bf{Z}}}}}_{t}^{\ell }\) for the next iteration. b Our architecture enables effective multitask learning on both simple (i.e., digits from the MNIST dataset) and complex (i.e., animals from the COCO dataset) stimuli. Here we show the classification and attention accuracy for the two models trained on COCO and MNIST compositions. The red dotted line shows the chance level for classification accuracy (10%); the red dashed line marks the chance level for attention accuracy (50%). For some tasks, we use partially supervised training, meaning that we provide only one supervision signal during training, either target attention maps (i.e., segmentation) or target labels (i.e., classification). c Pre-attentive and attentive features. The model processes the input images over multiple iterations. In the first iteration, it receives flat, task- and input-agnostic attention maps, resulting in feature representations and class predictions that are considered “pre-attentive”. In subsequent iterations, the model incorporates context and task information into attention maps, producing “attentive” features and predictions.

Back to article page