Fig. 1: Overall schematics and architecture of AdsMT.
From: A multi-modal transformer for predicting global minimum adsorption energy

a The schematic overview of this study. We present three datasets containing diverse combination of catalysts and adsorbates for predicting the global minimum adsorption energy (GMAE). The upper-right plot illustrates the difference between global minima (GM) and local minima (LM). AdsMT is a multi-modal model that processes separate surface and adsorbate inputs to predict GMAE. b The architecture of AdsMT. AdsMT consists of three blocks: a graph encoder for catalyst surface encoding, a vector encoder for adsorbate encoding, and a cross-modal encoder for GMAE prediction from embeddings of surfaces and adsorbates. c Illustration of cross-attention and self-attention layers in the cross-modal encoder. In the first cross-attention layer, the concatenated adsorbate vector embeddings and surface graph embeddings form the query matrix (Q), while the concatenated atomic embeddings and depth embeddings serve as the key (K) and value (V) matrices. Each atomic depth vector encodes the relative position of an atom within the surface (e.g., top-layer or bottom-layer). In the self-attention layer, the stacked atom embeddings, surface graph embeddings, and adsorbate vector embeddings are used as the input Q, K, and V.