Fig. 3: AI-based methods for antibiotic discovery. | npj Antimicrobials and Resistance

Fig. 3: AI-based methods for antibiotic discovery.

From: How AI can help us beat AMR

Fig. 3

a Molecules can be represented using fixed or learned representations. Fixed representations typically denote known structural features or calculated physicochemical features (e.g., RDKit features) of a compound. Learned representations encode unknown features of compounds for more complex tasks. An important learned representation method is the graph neural network, which learns encodings of molecules based on the aggregation of atom features that are iteratively updated through message-passing. b SyntheMol is a generative model for antibiotic discovery. SyntheMol employs a Monte-Carlo Tree Search (MCTS) algorithm to select molecular building blocks from the Enamine REAL Space based on predicted antibacterial activity and frequency of selection. Selected building blocks are pieced together to generate a molecule. The predicted antibacterial activity of the final molecule is backpropagated through the model to improve subsequent generations. c DeepBGC uses a bidirectional long-short-term memory (BiLSTM) model to predict novel BGCs in microbial genomes. Pfam domains are assigned to open reading frames (ORFs), then converted to Pfam word embeddings using Pfam2vec. Word embeddings are used as input for the BiLSTM which predicts whether each Pfam domain is part of a BGC. Consecutive highly predicted domains are considered BGCs. Predicted BGCs are then inputted into a random forest model to predict the bioactivity of the BGC products. d BOLTZ-1 is a diffusion-based model for complex biomolecular structure prediction that models the three-dimensional structures of proteins, nucleic acids, and small molecules in complex. Proteins are inputted as their amino acid sequence, nucleic acid structures as their nucleotide sequences, and small molecules as SMILES strings. BOLTZ-1 forms a multiple sequence alignment (MSA) and a pairwise residue matrix, which are updated simultaneously. These updated matrices are implemented into a diffusion model that works to predict the three-dimensional structure of the input sequences from a random configuration of the raw atom coordinates. The confidence model is fed information from the trunk model and each reverse diffusion step from the denoising model, outputting the resolved atom representation.

Back to article page