Fig. 4: Graph-based modelling of protein–ligand interactions and GEMS model architecture. | Nature Machine Intelligence

Fig. 4: Graph-based modelling of protein–ligand interactions and GEMS model architecture.

From: Resolving data bias improves generalization in binding affinity prediction

Fig. 4

a, Schematic overview of the graph construction process used to model protein–ligand complexes in a sparse, rotation- and translation-invariant graph representation enhanced with language model embeddings. The core of these graph representations consists of an atom-level molecular graph of the ligand molecule (magenta) combined with an amino-acid-level graph of the protein pocket (green). During the merging of ligand and protein graphs, additional edges are introduced to connect ligand graph nodes (atoms) to protein graph nodes (amino acids) based on spatial proximity between ligand atoms and the atoms of the amino acids (interaction distance of 5 Å), computed using a K-nearest neighbours algorithm. The amino acid nodes are featurized with their type and embeddings derived from the protein language models ESM2 (ref. 50) and Ankh51. The ligand graph is featurized with atomic properties. The global features are intialized with a ligand embedding from the language model ChemBERTa-2 (ref. 52). b, GEMS model architecture for processing interaction graphs composed of node features, edge features and global context features. After an initial node feature dimensionality reduction (Dim. red.), node and edge features are transformed through an alternating sequence of node convolutions (GATConv) and edge convolutions (EdgeConv). The global graph features are dynamically updated throughout this process, incorporating pooled node representations after each node convolution. A final pK value prediction is made from the updated global features using a fully connected neural network.

Back to article page