Fig. 4 | Scientific Reports

Fig. 4

From: Temporal convolutional transformer for EEG based motor imagery decoding

Fig. 4

Comparison of attention mechanisms. (Left) multi-head attention (MHA): each query head has its own key- and value-projection matrices. (Middle) multi-query attention (MQA): all heads share a single key-value pair, minimizing memory cost. (Right) grouped-query attention (GQA): query heads are divided into groups, with each group sharing a key-value pair—providing a trade-off between the expressiveness of MHA and the efficiency of MQA. GQA reduces to MQA when \(\:G=1\), and becomes equivalent to MHA when \(\:G=H\).

Back to article page