Fig. 3: Visualization of average attention for one dataset.
From: Compositionally restricted attention-based network for materials property predictions

The average attention from each of the four attention heads (a–d) from the first layer of a CrabNet model trained on the aflow__Egap data is shown for systems containing Si. The heatmap shows the average amount of attention that Si dedicates to the other elements in Si-containing compounds. The darker the coloring, the more strongly Si attends to that element. We can see that each attention head exhibits its own behavior, and attends to different groups of elements. Interestingly, head a attends to common n-type dopants and head c attends to many transition metals, whereas heads b and d have unfamiliar element groupings.