Fig. 1: Overview of AMPGen for de novo AMP sequence design. | Communications Biology

Fig. 1: Overview of AMPGen for de novo AMP sequence design.

From: AMPGen: an evolutionary information-reserved and diffusion-driven generative model for de novo design of antimicrobial peptides

Fig. 1

a Candidate peptide sequences are initially generated using a diffusion model pre-trained on the OpenFold database, with the AMP-multiple sequence alignment (MSA) dataset as a condition, referred to as conditional generation (MSA-conditional, indicated by red arrows). The model adopted a 100 M parameter MSA Transformer architecture. Baseline comparisons were made with sequences generated without the condition, referred to as unconditional generation (MSA-based, indicated by green arrows), using the same architecture, as well as with a separate model trained on Uniref50 adopting a ByteNet-style CNN architecture (Seq-based, indicated by blue arrows). The generated sequences are constrained to 15–35 amino acids in length to ensure appropriate AMP size and to manage synthesis costs. b After cleaning and filtering the initial generated sequences, a binary XGBoost-based discriminator is developed to determine whether they qualify as AMPs. This discriminator is trained on an AMP dataset and a negative dataset of non-AMP sequences, using a combination of feature extraction methods (PseKRAAC and QSOrder) as the embedding. c Sequences identified as AMPs are then subjected to target-specific scoring using a long short-term memory (LSTM) network. Deploying an ESM-2 embedding strategy, this scorer is trained on an AMP dataset with minimum inhibitory concentration (MIC) values.

Back to article page