Fig. 1: The overall architecture of PGMG.
From: A pharmacophore-guided deep learning approach for bioactive molecular generation

a Construction of the pharmacophore networks. We use the shortest paths on the molecular graph to determine the distances between two pharmacophore features, based on which a fully connected graph was constructed to represent a pharmacophore hypothesis. Different colours represent different types of pharmacophore features. b Preprocessing of SMILES. We randomise a given canonical SMILES and corrupt it using the infilling scheme. c Pipelines for model training and inference. \(c\) represents the embedding vector sequences for the given pharmacophore hypothesis; \(x\) represents the embedding sequence of the input SMILES; and z represents the latent variables for a molecule. During inference, \(z\) is drawn from a predefined normal distribution \(N(0,I)\) while during training, it is sampled from a learned distribution \(N(\mu,\Sigma )\). The transformer encoder and decoder blocks are stacked with N layers. \(\oplus\) denotes the concatenation of two vectors and \(\otimes\) matrix multiplication. The overlap between the training and inferencing processes is highlighted in the right panel. GatedGCN stands for Gated Graph Convolutional Network, and MLP stands for Multi-Layer Perceptron.