Fig. 1

Self-attention vs. sparse attention patterns (Dark blue squares represent tokens attending to themselves, light blue squares indicate attention computations between the corresponding dark blue square token and other tokens).

Self-attention vs. sparse attention patterns (Dark blue squares represent tokens attending to themselves, light blue squares indicate attention computations between the corresponding dark blue square token and other tokens).