Fig. 1: Catalyst screening workflow and overview of the ACE-GCN algorithm.
From: Adsorbate chemical environment-based machine learning framework for heterogeneous catalysis

a Screening workflow for identifying stable surface adsorbate configurations. The workflow demonstrates an incremental training approach to predict thermodynamically stable catalytic configurations. The cyclic workflow includes the following steps: (1) Systematic enumeration: all possible and unique high coverage surface adsorbate representations are generated using the SurfGraph algorithm, (2) Model Training: the ACE-GCN model is (re)trained on selected structures utilizing the relevant surface representations identified in the previous steps, (3) Accelerated screening: the unrelaxed surface configurations generated in step 1 are ranked using the ACE-GCN model, which is pre-trained on a smaller subset of relevant DFT-relaxed case, and (4) Electronic structure optimization: selected unrelaxed configurations ranked by ACE-GCN are optimized using an electronic structure optimization code of choice and then utilized either for subsequent analysis or to re-train and improve the ACE-GCN model. b ACE-GCN algorithm to encode and train high coverage adsorbate configurations. (1) Generate subgraphs: each configuration is split into multiple subgraphs as identified by the SurfGraph algorithm. A distinct ego-graph is generated for each adsorbate to encode local geometric and chemical properties around the adsorbate in a subgraph representation, (2) Subgraph featurization: each atom and its corresponding bond attribute in the subgraph is expressed as a vector representation according to the chemical identity (elemental properties) and spatial bond distance, termed as node and edge features, respectively, (3) Subgraph convolutions: every node vector in the subgraph is iteratively updated through multiple rounds of graph convolution operations, which account for the atom’s geometric and chemical neighborhood using node and edge vectors of the neighboring atoms, (4) Fingerprints: a hierarchical pooling operation condenses all subgraphs for every adsorbate into one fingerprint vector, (5) NN layer: the fingerprint vector is passed to a feed-forward neural network (NN) which maps it to the target property of choice, such as the average adsorption energy.