Introduction

Past few decades have seen a rapid surge in antimicrobial resistance (AMR). Therefore, AMR has become one of the most crucial global health concerns, demanding urgent search for alternatives to antibiotics1. As a result, several alternatives are being tested against different bacteria, such as flavonoids, nanoparticles, and antimicrobial peptides (AMPs). Recently, AMPs which are short-length antimicrobial peptides have caught a lot of attention due to not only the broad-spectrum antimicrobial activity that they possess but also their multiple modes of action2. AMPs are naturally occurring peptides released by almost all the organisms, including humans that play a crucial role as a part of the innate immune response system3. However, despite the strong potential of AMPs to act as novel antimicrobial agents and as alternatives to antibiotics, their application to the real-life scenario is hindered by several challenges. Among these, one of the major challenges includes the accurate prediction of peptide structure4.

We understand that obtaining a stable structure of these peptides is difficult due to their highly unstable nature and possibility of attaining numerous conformations and therefore we hypothesized that different peptides can be modeled better (to obtain a more stable conformation) using different algorithms5,6. Obtaining a stable structure, in silico, is essential for understanding their mode of action and optimizing them for therapeutic use7,8,9. Thus, the primary objective of our study is to compare the efficacy of different modeling algorithms in predicting the structure of short-length peptides.

Computational biology techniques, like the modeling and peptide folding algorithms and molecular dynamics (MD) simulations, offer an extremely valuable support for studying the AMP structure and dynamics10. These tools have allowed the researchers to predict peptide folding and study their interactions in silico, thereby eliminating the need for time-consuming and expensive experimental methods. We believe that different peptide modeling algorithms may vary in their approach to folding of peptides, and their performance can be influenced by factors such as peptide length, sequence, physiochemical properties and the complexity of the target environment11. Therefore, a comparative study of these algorithms is critical for determining which methods are most suitable for modeling peptides with different properties. While several studies have demonstrated the utility of computational techniques in AMP design, few have conducted a comprehensive comparison of modeling algorithms, namely, Homology Modeling, Threading, PEP-FOLD3 and AlphaFold, particularly in the context of short-length peptides.

We have worked ona set of 10 peptides picked up randomly from the pool of predicted AMPs derived from the human gut14. By taking into consideration the physicochemical properties as well as the disordered nature of the peptide sequences, we have first tried to estimate the expected outcomes for the algorithmic suitability of Homology Modeling, Threading, PEP-FOLD3 and AlphaFold. After performing Ramachandran plot analysis and VADAR analysis, we have compared the observed findings to the initially expected findings. By employing molecular dynamics (MD) simulations, we have tried to further validate our findings. MD simulation was performed on all four structures (derived from the four different modeling algorithms) of each of the 10 peptides. Thus, in total, 40 simulations were performed, each for a period of 100 ns. MD simulation analysis was performed to determine the stability of the peptide structures predicted by each of the modeling algorithms. Resultantly, the folding accuracy of different algorithms was determined and compared against each other. Results from MD simulation also helped us to explore how these gut-derived AMPs fold and stabilize over the period over time, giving insights into the intramolecular interactions12.

A similar study was performed by Ochoa and Fox in 2023, highlighting the distinguishable accuracy of AlphaFold27. The study also highlighted that peptides with non-natural modifications are well-suited to modeling the peptide first, followed by modifying and simulating it. However, it remains unclear that which types of peptides are well-modeled by which type of structural modeling algorithms, depending primarily on the nature of the peptide, that is, sequence and physicochemical properties. Therefore, our research is the first such study to answer this question.

Through extensive in silico analysis, we have found that AlphaFold and Threading complement each other in case of more hydrophobic AMPs, whereas PEP-FOLD and Homology Modeling complement each other in case of more hydrophilic peptides. We have also found that PEP-FOLD gives both compact structure and stable dynamics for most of the peptides, whereas AlphaFold gives a compact structure for most of the peptides.

We understand that having experimental structures would provide a robust benchmark for assessing model accuracy. However, we understand that the challenge of lack of such structures remains. Therefore, our study was motivated by the real-world challenge that for AMPs, very few experimentally resolved structures are currently available. In such contexts, computational prediction becomes the primary avenue for structural insights, emphasizing the fact that computational modeling remains a vital tool for structural characterization13.

Finally, our research has broader implications beyond just AMPs, because the methodologies and findings from this study can be valuable for researchers working on various other types of short peptides as well (not limited to AMPs). By focusing on the general properties of short peptides, our research aims to contribute to the understanding of peptide structure, dynamics, and computational modeling approaches that can be applied to different areas of peptide research. Further, we would like to add that the peptides used in this study are putatively identified or predicted as antimicrobial peptides (AMPs) based on computational models. It’s possible that experimental validation may reveal that some of these peptides do not exhibit antimicrobial activity. Therefore, we feel that the scope definitely includes different short-length peptides, regardless of their eventual classification as AMPs or not.

Methodology

Identification of AMPs from the human gut metagenome

The Sequence Read Archive (SRA) database was used to download the metagenome assembly (BioSample: SAMD00036536)14. The SRA database can be accessed at https://www.ncbi.nlm.nih.gov/sra. The dataset utilized was based on the large-scale comparative metagenomic analysis conducted by Kurokawa et al., 2007 (https://www.ncbi.nlm.nih.gov/biosample/?term=SAMD00036536). In the mentioned study, Kurokawa and colleagues examined fecal samples from 13 healthy individuals across various age groups, including unweaned infants. For our study, the most suitable sample was selected as that from a young individual, that is, 24-year-old female (In-R). This was done to rule out any biasness that could occur if samples from extreme age groups (infants or old age individuals) were collected. Additionally, this choice was made keeping in mind that extremes such as infancy or old age could introduce developmental or senescence-associated biases in AMP expression. The sample In-R that we have selected was chosen as representative for all other samples (a total of 13 samples) studied by Kurokawa et al., 2007, because it showed considerable (neither too high nor low) amount of Bacteroides and Eubacterium as compared to other samples of adults and children. These bacteria were seen to be predominantly present in all the 13 samples studied by Kurokawa et al.14.

Coding regions were then identified from the selected metagenome using MetaGeneMark15,16. The coding regions or genes were subjected to translation using the EMBOSS Transeq (https://www.ebi.ac.uk/jdispatcher/st/emboss_transeq).

Further, sequences with less than 50 amino acids were selected, keeping in mind the average length of AMPs which lies between 12 and 50 amino acids. We then used AmPEPpy to identify amino acid sequences or peptides showing physicochemical properties that underlie antimicrobial nature of AMPs (https://doi.org/10.1093/bioinformatics/btaa917)17. Antibiofilm nature of these peptides was predicted using the model built by Bose et al.11. Standalone BLASTP was then employed to identify the probable source of the AMPs (https://www.ncbi.nlm.nih.gov/books/NBK52640/).

Determination of charge and physicochemical properties

The charge of the antimicrobial peptides (AMPs) was determined using the Prot-pi software (https://www.protpi.ch/) which aids in the estimation of charge and isoelectric point based on amino acid sequence of the peptide18. To analyze the physicochemical characteristics, the ExPASy-ProtParam tool (https://web.expasy.org/protparam/) was used19. ProtParam helped us in calculation of a wide range of physical and chemical properties for peptides. The main parameters assessed include the isoelectric point (pI), aromaticity, secondary structure fraction, grand average of hydropathicity (GRAVY), and the instability index.

DISO study using RaptorX

For predicting the secondary structure (SS), solvent accessibility (ACC), and disorder regions (DISO) of the peptides, we made use of the well-known RaptorX server. RaptorX ( http://raptorx2.uchicago.edu/StructurePropertyPred/predict/) is a web-based tool that helps us predict the structural properties of a protein sequence without relying on the templates20. It is particularly effective for proteins that lack close homologs in the PDB or have limited sequence profiles with little evolutionary information. The server uses an advanced in-house deep learning model, DeepCNF (Deep Convolutional Neural Fields), to accurately forecast the structural disorder regions of the peptides. Since the server works on peptides larger than 26 amino acids in length, we were able to capture the details of the disordered regions for 4 out of 10 of our peptides. Rest of our peptides were considered to be having extreme disorder due to their extremely short sequence.

Selection of tools for structure prediction

To ensure a robust and insightful comparative study, we began by carefully selecting a set of widely recognized and representative structure prediction algorithms—each known for its distinct modelingapproach, proven utility, and relevance to short peptide modeling. This strategic selection allowed us to fairly assess the strengths and limitations of template-based, de novo, and deep learning methods. Firstly, Modeller is a Gold-standard for comparative modeling, being one of the earliest modeling techniques21. Additionally, it is completely template-based and therefore, helps gives nearly realistic structures for uncharacterized proteins or peptides if the template is available. Therefore, including it in our comparative analysis was necessary to revalidated its strengths and highlight its weaknesses. Secondly, Threading using I-TASSER was selected as another template-based algorithm for our comparative analysis, to study how its fold-recognition algorithm working on fragment-based modeling using big template libraries can help in modeling of short peptides22. Next, AlphaFold was selected as it is one of the latest algorithms built for protein modeling using end-to-end neural networks. Although alphafol3 is the latest version, AlphaFold2 was used due to the fact that it has been benchmarked for small sequences earlier while AlphaFold3 has not been23. Most importantly, the AlphaFold2 facility has a transparent architecture as opposed to AlphaFold3 whose model weights and architecture have not been clearly disclosed. Finally, AlphaFold2 has already been widely validated whereas AlphaFold3 has only few independent validations as of now. Finally, PEP-FOLD was selected as a component of our study so because the name itself suggests that it is mainly designed for folding of peptides or short-length amino acid sequences. Although PEP-FOL4 is the latest version, it was mainly designed to model the structures of cyclic peptides, whereas PEP-FOLD3.5 is standardized for both linear as well as cyclic peptides. Therefore, here we have considered PEP-FOLD3.5 as our preferable choice because our peptides are linear in nature24.

Structure prediction by homology modeling: modeleller

Homology Modeling of all the peptides was carried out using MODELLER. MODELLER is a software tool designed for generating models of protein tertiary structures and, less commonly, quaternary structures through Homology Modeling21. It predicts the three-dimensional structure of a protein by comparing it to known protein structures. The underlying principle of this approach is that proteins with similar sequences tend to adopt similar structures and perform related functions. For Homology Modeling, templates with the best resolution and highest similarity percentage were chosen for each of the peptides in order to model them using Modeller. Five topmost structures for each peptide were compared against each other. Final structures were chosen based on molpdf, DOPE and GA341 scores. The structure showing highest value for GA341 score and least value for molpdf plus DOPE score was chosen from among all the five structures obtained after modeling. The details of the templates chosen for Homology Modeling of peptides can be found in Table S1 of Supplementary File 1.

Structure prediction by threading: i-TASSER

Peptide Threading for all the peptides was performed using the Iterative Threading ASSembly Refinement (I-TASSER) algorithm. I-TASSER is an automated method for predicting protein structure and function22. It works by generating the three-dimensional models of proteins from their amino acid sequences based on the Threading approach. I-TASSER identifies structural templates from the PDB using fold recognition. The complete protein structures are then built by reassembling fragments from these templates through replica exchange Monte Carlo simulations. Finally, it ranks structures based on the C-score which takes into account the Tm and RMSD of the structure. Therefore, the final structures from Threading results of i-TASSER were selected based on the highest C-score. The C-score (Confidence-score) in I-TASSER ranges from − 5 to 2, with higher scores indicating greater confidence in the predicted structure. A C-score of more than − 1.5 indicates structure of high confidence, between − 1.5 and − 3 indicates medium confidence whereas less than − 3 indicate low confidence.

Structure prediction using alphafold

We then used the AlphaFold2 to model the structures of all the peptides. AlphaFold is the most recently well-known and advanced machine learning method that enables us to predict protein structure driectly from its sequence23. It uses multi-sequence alignments to enhance the deep learning algorithm, allowing it to predict protein structures with high accuracy. The structures from AlphaFold in our study were chosen based on their pLDDT score which gives a per-residue confidence ranging from 0 to 100.

Structure prediction using PEP-FOLD3

PEP-FOLD3.5 is another software used for peptide folding and modeling of the peptide structure, especially those of linear peptides24. The PEP-FOLD3.5 software utilizes a structural alphabet (SA) algorithm derived from a hidden Markov model, which consists of 27 four-residue letters24. PEP-FOLD first predicts the SA letter profiles based on the amino acid sequence and then assembles the fragments using a greedy approach guided by a modified OPEP coarse-grained force field. In this study, a series of 200 simulations were run for each of our peptides and the models were sorted using sOPEP and RMSD values.

Superimposition of the peptides modelled using different algorithms and structural deviation

For each of the peptides, all four models (one from each algorithm) were superimposed and subjected to measurement of RMSD using Pymol (http://www.pymol.org/)25. This was done to capture any significant structural deviations between the structures of the same peptide modelled using different approaches.

Ramachandran plot analysis

Structure validation for each of the peptide models was performed using SAVES v6.1 (https://saves.mbi.ucla.edu/). The PROCHECK tool deployed on the SAVES v6.1 was utilized to extract the distribution of amino acid residues in the peptide structure along the psi and phi angles in the Ramachandran plot26,27.

VADAR analysis

VADAR (Volume Area Dihedral Angle Reporter) was utilized in order to extract important parameters for studying the stability of the modeled structures28. Important parameters calculated from VADAR include Mean H-bond Distance, Mean H-bond Energy, Residues with H-bonds, Free Energy of Folding and FASA (Fractional Accessible Surface Area). FASA quantifies the exposure of individual amino acid residues in a protein structure to the solvent to which it is exposed. It is calculated by dividing the actual accessible surface area (ASA) of a residue by the ASA of the same residue in a reference extended conformation (such as, a Gly-Xaa-Gly tripeptide). This provides a normalized value, ranging from 0 to 1, which reflects how buried or exposed a residue is, with higher values indicating greater exposure.A higher value indicates higher exposure to the solvent and reflects that the residue might be present on the surface rather than being buried inside.

Molecular dynamics (MD) simulation

All the peptide structures were subjected to simulation. In total, simulations were performed for 40 structures (as we derived 4 structures for each of the 10 peptides; one using each algorithm). GROMACS version 2019.4 (https://www.GROMACS.org/) was used to perform the MD simulation study29. Further, charmm36-feb2021.ff force field was made use of for generating the topology. A cubic box with a size of 1 nm was set up. Each peptide was prepared in the aqueous region of TIP3P model. The conjugate gradient energy minimization method of GROMACS was employed to minimize the system to remove any initial stress present30. This was followed by system equilibration, performed for a duration of 100 ps at 300 K temperature in case of the NVT ensemble. It was then followed by another equilibration of 100 ps for the NPT ensemble. Modified Berendsen thermostat was employed in order to control the temperature of the system at a coupling constant of 0.1 ps31. On the other hand, Parrinello-Rahman barostat wasutilized to control the system’s pressure at an isotropic coupling constant of 2 ps32. Finally, a long production trajectory in the NVT ensemble was generated for each of the systems for duration of 100 ns. For further analysis, all the trajectories were saved at an interval of 10 ps. Periodic boundary conditions (PBC) was applied so as to minimize the boundary effects. Calculation of long-range electrostatic interactions was then performed by using the Particle Mesh Ewald (PME) method33. Cut-off radius of 14 Å was considered for the neighbour list and van der Waals interactions.

MD simulation analysis

In order to perform the MD simulation analysis, GROMACS utilities were employed to obtain the trajectories29. Radius of gyration (Rg), root mean square deviation (RMSD) and root mean square fluctuation (RMSF) were measured. Further, solvent accessible surface area (SASA) over the period period of time, at different time points was also calculated. PC1 and PC2 projections of the peptide structure were considered for principal component analysis (PCA). Finally, gmx-sham was utilized to generate FEL (Free Energy Landscape) plots34.

Peptide profiling

For each of the peptides, the overall profile was generated by summarizing all the findings from VADAR, Ramachandran and MD simulation analysis. Key inferences were incorporated within the same.

Obtaining control peptides with pre-defined structure

To ensure that our protocol for used for studying peptides and the findings derived from it, are reliable or not, we included a set of 4 stable experimentally-validated peptides as controls in our study. We tested our entire methodology on these experimentally-validated peptides which have a pre-defined structure. This was done to make sure that our findings lead to real-world applicability. The controls were subjected to the same analysis as our test peptides, that is, Ramachandran analysis using PROCHECK, VADAR analysis for structural stability and MD simulation protocol for structural stability and dynamics. The following controls were included: (i) Human α-defensin-1 (PDB ID: 1IJV), a short antimicrobial peptide with disulfide-rich β-sheet structure, (ii) Trp-cage mini protein (PDB ID: 1L2Y), a 20-residue designed mini-protein with a stable α-helical fold, (iii) CAP18 (PDB ID: 1LYP), a well-characterized antimicrobial peptide used here as a biological control relevant to our AMP family, and (iv) Villin Headpiece HP35 (PDB ID: 1VII), a 35-residue fast-folding domain often used as a benchmark in MD studies. The length of these controls lie in the range of 20–36 amino acids, similar to the range of the test peptides (less than 50 amino acids). These control peptides are well-established in literature for their pre-defined structure. The structures were downloaded from PDB. Each of these 6 control peptides were subjected to the same VADAR and Ramachandran analysis as our gut-derived peptides, and were simulated using the same MD protocol as our AMPs for 250 ns. This analysis was performed to confirm if these controls retain their native fold or show deviation across the trajectory, thereby helping us validate the suitability of our protocols.

Results and discussion

Identification and selection of peptides with antimicrobial properties

The human gut metagenome assembly was downloaded from the SRA database. The metagenome consisted of a total of 34,797 metasequences. We were able to identify 61, 181 genes or coding regions from the same. The coding regions or the nucleotide sequences were then subjected to translation to obtain amino acid sequences. An automated python-based pipeline was used to pick up sequences having less than or equal to 50 amino acids. 6699 peptides were found to have a sequence length of less than or equal to 50. A sequence-based prediction of AMPs was performed using distribution patterns of amino acid properties and random forest by employing the AmPEPpy model. Weightage of amino acid residues, hydrophobicity, normalized van der Waals volume, polarity, polarizability, charge, secondary structure and solvent accessibility of the peptide was taken into account. Thereby, 2619 AMPs were predicted from our dataset using AmPEPpy. Out of 2619, 195 were found to be antibiofilm, by the model devised by Bose et al., further strengthening their antibacterial role in addition to being predicted as AMPs11. Out of 195, 24 were found to have bacteria as their source organism. Our main purpose was to extract AMPs produced by bacteria as a part of another project, from which this project has branched out (details of the other project are confidential as it is currently ongoing; data unpublished). Out of the 24 AMPs produced by bacteria, we found that the PDB templates were available only for 10 of them, and hence we took these 10 for our study because for comparing the utility of template-based methods like Modeller, we need structures whose templates are already available.

Table 1. shows the sequence, length and source of the AMPs taken into our study.

Tracing the origin of AMPs

BLASTP was performed using the NCBI BLASTP facility to uncover the organism which shows highest homology to each of the peptide. The details of the peptides with the probable source organism has been mentioned in Table 1.

Table 1 Putatively identified AMPs and their characteristics.

Structures obtained via modeling algorithms

Four different structures were obtained via four different structural modeling algorithms (one from each), namely, Homology Modeling using modeller, Threading using i-TASSER, Folding using PEP-FOLD and folding using AlphaFold. Superimposition of these structures revealed significant difference in their conformation.

The average pLDDT score per residue corresponding to all the peptide structures modelled using AlphaFold2 was found to be more than 72 (Table S2 of Supplementary File 1). The templates taken for homology modeling had a similarity of more than 70% in most cases, except for three of the peptides, namely 218, 2410 and 175 (Table S1 of Supplementary File 1). The structures from i-TASSER (Threading) were selected with a C-score of more than − 1.5, indicating a high confidence structure. For, PEP-FOLD, the structure with lowest RMSD was taken.

The difference in structures for each peptide obtained via different modeling algorithms can be seen in Fig. 1. The RMSD values for each of the structures can be seen in Table S3 of Supplementary File 1. The RMSD (Root Mean Square Deviation) for superimposed structures is a measure of the average distance between corresponding atoms (typically backbone atoms like Cα) in the aligned structures. It quantifies the structural similarity or deviation between the superimposed models. A lower RMSD (e.g., < 2 Å) indicates that the peptide structures from different modeling algorithms are very similar, suggesting a high level of agreement in the predicted conformations. A higher RMSD (> 3 Å or more) suggests significant differences in the structures, reflecting variations in how the algorithms model the peptide’s conformation. Most of the peptides show high RMSD suggesting significant differences in the structures obtained via different modeling algorithms. We would also like to mention that most AMPs showed an alpha-helical structure using all four algorithms. This is in accordance with the fact that it is already well-documented that short antimicrobial peptides (AMPs), particularly those in the 19–30 residue range, often adopt α-helical conformations—especially in membrane-mimicking or hydrophobic environments. Many AMPs are amphipathic helices, and even in aqueous simulations, the helical propensity remains prominent due to their sequence composition, which favors helix-forming residues (e.g., Ala, Leu, Lys, Arg, Val). The de novo modeling algorithms (e.g., PEP-FOLD and AlphaFold2) are trained or tuned to recognize such intrinsic sequence propensities and default to helical conformations in the absence of strong constraints suggesting otherwise. This explains the convergence of different prediction methods toward α-helical structures, particularly for AMPs that are compositionally biased toward helix formation. However, as we can clearly see in Fig. 1, that the percent of alpha-helical content is differing for any AMP across the four algorithms. The same has been further discussed in detail in the VADAR analysis.

Fig. 1
figure 1

Superimposed Peptide Structures Modeled Using Different Algorithms. Each color indicates structure obtained via different algorithms, namely, AlphaFold (Green), Homology Modeling (Cyan Blue), PEP-FOLD (Pink) and Threading (Yellow). PyMOL(TM) 2.3.2 was used for visualization and obtaining the image of the superimposed peptide structure (https://www.pymol.org/)25. The final image containing the superimposed structures of all the peptides was designed using Canva (https://www.canva.com/).

Identification of charge on AMPs

9 out of 10 AMPs were found to be positively charged, indicating higher probability of finding cationic peptides in the gut. The details of the charge on each peptide can be found in Table 2.

Physicochemical properties: functional implications and algorithmic suitability

The complete physicochemical nature of peptides can be understood from Table 2. We have also tried to figure out the potential functional implication of the physicochemical properties, which might arise from the interplay between key features like isoelectric point (pI), aromaticity, GRAVY, and structural stability.

Table 2 Physicochemical properties of gut-derived peptides. AMPs with higher GRAVY value as compared to others have been highlighted in bold.

High pI values, such as those seen in peptides like Amp21 and Amp2410, suggest a cationic nature, which facilitates electrostatic interactions with negatively charged microbial membranes. In contrast, peptides like Amp43, with a lower pI, may have reduced membrane-targeting ability but could excel in alternative roles, such as enzymatic inhibition or receptor binding. Aromaticity contributes to hydrophobic interactions with membranes, as observed in Amp2410, where high aromaticity might enhance binding and structural rigidity. Similarly, peptides with balanced secondary structure fractions, such as Amp164, exhibit flexibility, allowing them to adapt to various targets, which is critical in diverse antimicrobial functions. Thus, the physicochemical properties of each of the peptides are indicative of some strong functional implications. These have been talked about in detail in Table 3.

  • Based on the physicochemical nature of peptides, we have also tried to predict the algorithmic suitability for each of these. Algorithmic suitability predictions consider the stability, flexibility, and structural properties of each peptide. Peptides with a stable profile, like Amp32, are well-suited for AlphaFold due to their structured nature. Disordered peptides, such as Amp218, require algorithms like PEP-FOLD, which specialize in modeling flexibility. Highly structured peptides like Amp2410, dominated by rigid helices, align with AlphaFold’s strengths in helix modeling and Modeller’s capability for template-based prediction. In contrast, peptides with mixed features, such as Amp175, are best handled by I-TASSER or PEP-FOLD, which can address both structured and disordered regions effectively. The details of the algorithmic suitability of peptides based on their physicochemical nature has been highlighted in Table 3.

Table 3 Functional implications and expected algorithmic suitability of gut-derived peptides based on their physicochemical properties and template alignment while homology modeling.

Disordered (DISO) regions: functional implications and algorithmic suitability

We also performed the DISO study to uncover the disorder present in the longer peptides (more than 25 amino acids). For the others, it was assumed that shorter-length peptides have major disorder in their structure, based on the supporting literature46. It is to note that it is due to this disorder that most AMPs (short in length, usually 12–50 amino acids) possess the ability to attain several different conformations.

Tables 4 and 5 highlight the details of the DISO regions and their functional implications, respectively. Peptides such as Amp218 and Amp2410, with 100% disorder and full solvent exposure, might exhibit extreme flexibility. This flexibility is advantageous for rapid and transient actions, such as disrupting microbial membranes or binding to diverse, loosely defined targets. Amp186, while also 100% disordered, has a higher proportion of medium and buried solvent accessibility regions, suggesting structural features that contribute to stability and sustained functionality. In contrast, Amp164 combines high disorder with a completely exposed coil-dominant structure, enabling broad-spectrum target interactions but limiting specificity. Based on the DISO study, we have further tried to predict the algorithmic suitability for each of the peptides. The details of the algorithmic suitability of peptides based on the disorder in their structures has been highlighted in Table 5. Algorithmic suitability reflects the need to accommodate structural variability or rigidity. For peptides like Amp2410, dominated by disordered yet helical structures, AlphaFold might excel due to their ability to predict structured conformations despite flexibility. Highly disordered peptides like Amp218, lacking consistent structural motifs, align better with PEP-FOLD, which models dynamic regions effectively. Amp186, with its mix of exposed and medium accessibility regions, benefits from AlphaFold or Modeller’s capability to predict stable, structured configurations, while Amp164’s fully exposed profile makes it a prime candidate for PEP-FOLD, which thrives in predicting flexible, disordered systems.

Table 4 Disordered regions predicted in gut-derived peptides.
Table 5 Functional implications and expected algorithmic suitability of gut-derived peptides based on their DISO study.

Ramachandran plot analysis of structures determined using homology modeling, threading, PEP-FOLD3 and alphafold

The Ramachandran plot analysis results showing the number of residues in the allowed region along with other details can be seen in Table S4 of Supplementary File 1. The Ramachandran plots for all AMPs can be found in Supplementary File 2.

The AlphaFold-predicted structure of AMP21 shows 21 residues in the most favored regions, indicating a stable conformation. No residues are found in the less favored, generously allowed, or disallowed regions, which suggests that the structure avoids energetically unfavorable conformations. In comparison, the Homology Modeling and PEP-Fold predictions show 18 and 20 residues in the most favored regions, respectively, with 3 and 1 residues in the additional allowed regions. Both models also avoid disallowed regions, indicating stable conformations. The Threading-predicted structure, however, has only 6 residues in the most favored regions and 10 residues in the additional allowed regions. It also has 1 residue in the generously allowed region and 4 residues in the disallowed region, thereby resulting in a less stable conformation.

Thus, based on the Ramachandran plot analysis, the peptide modeled using AlphaFold has the highest proportion of residues in the most favored regions, indicating strong structural reliability. Homology Modeling and PEP-FOLD also show favorable distributions, with only slight variations. However, the Threading approach demonstrates significant deviations, with a notable number of residues in disallowed regions. Overall, AlphaFold and PEP-FOLD provide the most accurate conformations, while Threading appears less reliable for this peptide.

For AMP32, the AlphaFold structure shows 13 residues in the most favored regions and 1 residue in the additional allowed regions, indicating strong structural stability. The homology-modeled structure also has 13 residues in the most favored regions, with 1 residue in the generously allowed region and 0 in the additional allowed regions, which suggests good stability. The PEP-Fold structure has 12 residues in the most favored regions and 2 in the additional allowed regions, showing similar stability. In contrast, the Threading-predicted structure has only 7 residues in the most favored regions, 6 residues in the additional allowed regions, and 1 residue in the generously allowed region, indicating lower stability.

Thus, AlphaFold and Homology Modeling show a similar number of residues in the most favored regions, indicating good structural quality. PEP-FOLD also performs well, with slightly fewer residues in these regions. Threading, while capturing some allowed regions, has a comparatively lower number of residues in the most favored regions. Importantly, none of the methods show residues in disallowed regions, reflecting overall acceptable structural predictions.

For AMP43, the Ramachandran plot analysis shows that AlphaFold and PEP-FOLD have the highest number of residues in the most favored regions, reflecting strong structural accuracy. Homology Modeling and Threading also perform reasonably well, with a good distribution in favored and additional allowed regions. Importantly, none of the methods show residues in disallowed regions, indicating reliable predictions overall. AlphaFold and PEP-FOLD appear to provide the most precise structural conformations forthis peptide.

For AMP164, Homology Modeling shows the highest number of residues in the allowed region followed by AlphaFold.

For AMP175, the Ramachandran plot analysis shows that all four methods provide reliable predictions, with a high number of residues in the most favored regions. PEP-FOLD stands out slightly with the highest count in these regions, while AlphaFold, Homology Modeling, and Threading also perform well. Only minimal residues appear in additional or generously allowed regions, and none are found in disallowed regions. This suggests that all methods generate structurally acceptable models for the peptide.

In AMP186, the AlphaFold structure has 25 residues in the most favored region, which suggests a stable conformation. In contrast, the Homology Modeling and PEP-Fold structures show 17 and 15 residues in the most favored regions, with 6 and 8 residues in the additional allowed regions, and 2 and 2 residues in the generously allowed regions, respectively, indicating a lower stability. The Threading structure has 12 residues in the most favored region, 9 in the additional allowed regions, and 1 and 3 residues in the generously allowed and disallowed regions, respectively, indicating the weakest conformation. Overall, AlphaFold provides the most structurally consistent model for this peptide, followed by PEP-FOLD.

For AMP207, the AlphaFold structure shows 16 residues in the most favored region and 2 in the additional allowed regions, demonstrating a strong conformation. The homology-modeled structure has 16 residues in the most favored region, 1 in the additional allowed region, and 1 in the generously allowed region, which also indicates strong stability. The PEP-Fold structure, with 18 residues in the most favored region, also shows strong conformation. However, the Threading-predicted structure has only 6 residues in the most favored regions, 9 in the additional allowed regions, and 3 in the generously allowed regions, indicating a lower stability. Importantly, no residues were found in disallowed regions for any of the methods, indicating that all predictions are acceptable, but PEP-FOLD stands out for its higher accuracy.

The AlphaFold structure of AMP218 has 27 residues in the most favored region, indicating excellent performance. The homology-modeled structure shows 25 residues in the most favored regions, 1 in the additional allowed region, and 1 in the generously allowed region, which also indicates a strong conformation. The PEP-Fold and Threading structures have 26 residues in the most favored regions, with 0 and 1 residues in the additional allowed regions, and 1 and 0 residues in the generously allowed regions, respectively. These structures show relatively lower stability compared to the AlphaFold and homology-modeled predictions.

In AMP239, the AlphaFold structure has 16 residues in the most favored region. The homology-modeled structure shows 14 residues in the most favored regions, with 2 in the additional allowed regions, indicating stable conformation. The PEP-Fold and Threading structures show 12 and 7 residues in the most favored regions, with 3 and 7 residues in the additional allowed regions, and 1 and 2 residues in the generously allowed regions, respectively, suggesting lower stability. However, none of the methods place residues in disallowed regions, indicating that all four approaches generate structurally acceptable models, with AlphaFold showing the best overall performance.

For AMP2410, the AlphaFold and PEP-Fold structures show 21 and 22 residues in the most favored regions, with 5 and 3 residues in the additional allowed regions, respectively. The Threading and Homology Modeling structures have 23 and 24 residues in the most favored regions, with 3 and 2 residues in the additional allowed regions, respectively. Overall, AMP2410 shows good conformation across all prediction methods. Thus, the Ramachandran plot analysis shows that Homology Modeling has the highest number of residues in the most favored regions, followed closely by Threading. All methods show some residues in additional allowed regions, but none place residues in disallowed regions, indicating that all predictions are structurally acceptable. While Homology Modeling and Threading perform slightly better, the other methods also provide reliable structural predictions.

Overall, the Ramachandran plot analysis shows that the performance of different modeling methods varies depending on the peptide. For most of the peptides, AlphaFold, followed by PEP-FOLD consistently deliver the best results, demonstrating a solid number of residues in the most favored regions. For AMP175 and AMP207, PEP-FOLD provides the best results, with the highest number of residues in the most favored regions. In contrast, for AMP2410, template based methods, that is, Homology Modeling and Threading outperform the other methods, showing the highest number of residues in the most favored regions.

Inferences from the Ramachandran plots can be better understood from the peptide profiling results present in Supplementary File 4.

VADAR analysis of structures determined using homology modeling, PEP-FOLD, threading, and alphafold

The VADAR results for the structures obtained via Homology Modeling, Threading and AlphaFold can be seen in Table S5 of Supplementary File 1.

For the peptide AMP21, AlphaFold stands out as the most effective algorithm. It predicts a structure with a dominant helical content and minimal coil regions, reflecting a stable and organized conformation. After AlphaFold, PEP-FOLD shows the highest helical content. In the AlphaFold structure, the hydrogen bond interactions are both extensive and favorable, with the highest number of residues participating in these bonds, followed by PEP-FOLD. Moreover, AlphaFold achieves the most favorable free energy of folding, indicating superior stability. Threading shows some strengths in beta content and hydrogen bond tightness, and Homology Modeling emphasizes coil regions. Overall, AlphaFold clearly provides the most reliable and stable structural prediction for AMP21.

For AMP32, AlphaFold outperforms other algorithms by predicting a dominantly helical structure, with a high number of residues forming hydrogen bonds. However, its free energy of folding is not the lowest when compared to the other algorithms. Homology Modeling performs quite well here, showing the most favorable free energy and hydrogen bond energy, although it predicts a structure with more beta sheet and coil regions than helical content. Threading provides a moderate model with some beta content, but its stability in the form of H-bond and free enrgy of doling is is lower than that of AlphaFold, PEP-FOLD and Homology Modeling. Finally we can say that, AlphaFold is the most consistent in predicting the structure accurately.

For AMP43, AlphaFold delivers the most structured and stable model, marked by its substantial helical content, complete involvement of residues in hydrogen bonding, and the lowest free energy. Threading and PEP-FOLD also performs well, predicting a highly organized structure with good hydrogen bonding, although its free energy is less favorable compared to AlphaFold. On the other hand, Homology Modeling falls short by predicting only coil regions with no structured elements or hydrogen bonding. While its free energy is comparable, it lacks the accuracy and stability offered by the other algorithms. An important point to note here is that inspite of template similarity as 90%, AMP43 was not modelled efficiently by Homology Modeling, which underlines the fact that the template is a segment or part of a larger unit of protein and therefore, may not be enough for modeling a similar standalone peptide.

For AMP164, PEP-FOLD outperformed other algorithms overall showing most negative free energy of folding.

For AMP175, PEP-FOLD, followed by Homology Modeling stand out with the most stable structure, with not just a high helical content but also with favorable free energy. AlphaFold predicts a well-balanced structure with a good mix of helices and coils, along with the most favorable hydrogen bond energy which is equal to PEP-FOLD. Threading provides a slightly less stable model, offering moderate helix content. Although threading and PEP-FOLD both give highest percentage of residues involved in hydrogen bonding, the free energy of folding in case of threading is less favorable compared to the other methods, making threading a less optimal option overall.

For AMP186, PEP-FOLD delivers a highly stable and well-structured peptid as seen by the most negative free energy of folding. AlphaFold predicts a strong helical conformation and excellent hydrogen bonding as well as most negative H-bond energy. Threading comes next, showing better performance than Homology Modeling, especially in secondary structure prediction with a good balance of helix and coil content and stronger hydrogen bonds. On the other hand, Homology Modeling offers the least stable model, with a high coil percentage and weaker hydrogen bonding interactions. Overall, PEP-FOLD and AlphaFold prove to be the most effective algorithm for predicting AMP186’s structure.

For AMP207, AlphaFold provides the most reliable model, predicting a predominantly helical structure with minimal coil and most negative free energy of folding. Homology Modeling also forms a good structure as it presents a beta-sheet structure with some coil content and strong hydrogen bonds. Threading, on the other hand, predicts a fully coiled structure with weaker hydrogen bonds and unstable free energy, making it the least suitable model. PEP-FOLD shows good helical content, but the H-bond energy and free energy of folding in itscase is more positive than AlphaFold and Threading. Overall, AlphaFold remains the most structured and stable option for AMP207.

For AMP218, all the algorithms predict a structure that is mostly helical, with some coil and turn content. Hydrogen bonding is strong across all models, with a high percentage of residues forming hydrogen bonds and favorable bonding energies. In terms of stability, Homology Modeling predicts the most stable structure with the lowest free energy.

For AMP239, AlphaFold provides the most structured and stable model, with a dominant helical structure, strong hydrogen bonding, and favorable folding energy. Threading predicts a less stable model with minimal helix content and fewer hydrogen bonds, suggesting an unstable structure. On the other hand, PEP-FOLD and Homology Modeling predicts an entirely unstructured peptide, lacking helix content and hydrogen bonds, with the least favorable stability.

For AMP2410, Threading and PEP-FOLD predict the most helical structure with the highest helical content and the greatest number of residues involved in hydrogen bonding, suggesting a highly stable conformation. AlphaFold offers a balanced model, with a mix of helix and coil content, and the most favorable free energy, indicating good stability and strong hydrogen bonding. PEP-FOLD and AlphaFold give the most negative free energy of folding. Homology Modeling predicts a mostly coiled structure with some turn regions but with weaker hydrogen bond interactions and less favorable stability.

After analyzing all peptides, PEP-FOLD and AlphaFold emerge as the best-performing algorithms for modeling a stable structure for most of the peptides. Homology Modeling stands out for AMP218, where it predicts the most stable structures with reliable hydrogen bonding. Threading performs exceptionally well for AMP2410, where it matches PEP-FOLD and AlphaFold in stability and hydrogen bonding, but it generally falls short in other cases. Inferences from the VADAR results can be better understood from the peptide profiling results present in Supplementary File 4.

Overall algorithmic suitability for gut-derived peptides (PART-I): as per Ramachandran plot and VADAR

The comparison between the expected and the overall algorithmic suitability after Ramachandran and VADAR analysis can be clearly understood from the Table 6.

Table 6 Expected vs. Observed algorithmic suitability of Gut-Derived AMPs (Part-I) as per Ramachandran plot and VADAR.

Thus, AlphaFold is best suited for peptides with defined helical or structured regions. On the other hand, PEP-FOLD is most effective for highly disordered or flexible peptides, which showed 100% DISO region previously (AMP164, AMP186, AMP218 and AMP2410). Homology Modeling performed better than all other algorithms fror only AMP218. Notably, Threading shows an average performance, excelling in AMP2410 indicating the presence of conserved structural motifs. High disorder regions, as observed in peptides like Amp218 and Amp2410, correlated with their functional flexibility and transient interactions. Algorithms like PEP-FOLD and AlphaFold performed well in predicting structures for these peptides.

Results from MD simulation analysis

MD simulations highlighted variations in peptide stability and dynamic behavior across algorithms. Stable RMSD (root mean square deviation) values indicate reliable conformations, while RMSF (root mean square fluctuation) captures residue flexibility. Compactness is further reflected in terms of Rg or radius of gyration, and SASA (solvent accessible surface area. All these parameters along with the FEL (Free Energy Landscape) throw light on the structural adaptability of the peptides. These results provide a detailed evaluation of algorithmic performance under dynamic conditions, further aiding in validation of findings from our study. Table S6 of Supplementary File 1 contains details of the RMSD, RMSF, Rg and SASA values for all 40 structures for which simulation was performed. Table S7 of Supplementary File 1 contains details of PC variation thereby reflecting the amount of conformational changes occurring in the case of each peptide.

AMP-wise MD simulation results can be visualized from the plots contained in Supplementary File 3. The overall peptide profile for each AMP is given in Supplementary File 3 which summarizes the MD simulation findings for each AMP.

Excellent overall performance of PEP-FOLD in modeling most of the peptides

For most of the peptides, the simulation results (depicting dynamic and detailed nature of peptides) align with the analysis of structural modeling (overall static nature). This is indicated by the low mean RMSD, RMSF and Rg values which can be found in Table S6. SASA values were further found to validate the compactness of the structures. Any discrepancies were eliminated by plotting the FEL of the peptides which shows the residue-wise free energy distribution of the peptides.

Peptidessuch as AMP43, AMP164, AMP175, AMP186 and AMP207 were best modeled using PEP-FOLD, exhibiting highly dynamic and flexible behavior, as reflected in MD simulation findings. These peptides consistently showed low RMSD values, indicating stable conformations during the simulation period, coupled with compact structures reflected by reduced Rg.

PEP-FOLD demonstrates exceptional modeling capabilities across AMP43, AMP164, AMP175, AMP186, and AMP207, as evidenced by their dynamic flexibility and stability during MD simulations. Among these, AMP175 stands out, showing the lowest RMSD (0.35 ± 0.05 nm), compact Rg (0.72 ± 0.07 nm), and stable SASA values (26.92 ± 1.40 nm²), highlighting its balanced conformational dynamics. Gibbs’ free energy further supports PEP-FOLD’s effectiveness, with AMP175 achieving favorable energy values indicative of strong peptide stability. PCA results reinforce these findings, as PEP-FOLD consistently minimizes variation across PC1 and PC2, exemplified by AMP175’s reduced variability (6.23 and 3.77, respectively), suggesting compact and consistent structural behavior. Similarly, for AMP186, PEP-FOLD maintained a delicate balance between flexibility and stability, with an RMSD of 0.68 ± 0.11 nm and consistent RMSF values, validating its capability to model transient interactions effectively. Rg was also found to be constant over the period of time in case of PEP-FOLD based structure. SASA was lowest in case of Threading and PEP-FOLD. PEP-FOLD also showed minimum values for PC variation. FEL plots of all the four structures mdoeled using different algorithms indicate that maximum number of residues lie in the lower energy state in case of both PEP-FOLD and Homology Modeling-based structures. Therefore, overall, PEP-FOLD becomes the ultimate choice reflecting both stable dynamics (RMSD and Rg) of the peptide as well as a compact and a stable structure (represented by SASA and FEL, respectively).

For AMP164, all four algorithms show considerable results with their own strength. Threading gives the most stable dynamics, whereas Homology Modeling gives the most compact structure. PEP-FOLD and AlphaFold show a balance by giving low values for essential dynamics as well as giving a more compact structure as compared to Threading. This indicates that one may use Threading for simulations focused on dynamics stability, and Homology Modeling if structural compactness is critical. On the other hand, one may prefer using PEP-FOLD or AlphaFold for an overall balanced approach in exploring both dynamics and compactness.

It is important to note that out of the 4 peptides studied for their disorder in the DISO analysis, two of them are best folded by PEP-FOLD (AMP164 and AMP186) while the remaining two (AMP218 and AMP2410) are not. The difference between these two sets of AMPs, that is one modelled best by PEP-FOLD and the other not, is that AMP164 and AMP186 possess very less helical secondary structure, which is 0% and 28% respecticely. In addition to this, the % solvent accessibility to the exposed region for both of them is 83% and 75% respectively. On the other hand, AMP218 and AMP2410 have more than 80% helical content and 100% solvent accessibility to the exposed region. Therefore, the MD simulation findings underscore PEP-FOLD’s specialization in handling disordered and flexible peptides (as visible from the physicochemical properties and DISO results), aligning dynamic adaptability with functional stability.

The MD simulation results representing the structural and essential dynamics for AMP186, indicating the strength of PEP-FOLD in modeling peptides with intrinsic disorder can be seen in Fig. 2.

Fig. 2
figure 2

Plots depicting the structural and essential dynamics of backbone atoms over time for AMP186. The panels represent: (i) RMSD for structures modeled using 4 different algorithms, (ii) RMSF for structures modeled using 4 different algorithms, (iii) Rg for structures modeled using 4 different algorithms, (iv) SASA for structures modeled using 4 different algorithms, (v) PCA for structures modeled using 4 different algorithms, and (vi) FEL for best modeling algorithm. The gmx rms, gmx rmsf, gmx gyrate and gmx sasa functionalities were used to obtain the RMSD, RMSF, Rg and SASA values, respectively. gmx covar and gmx anaeig were used to obtain values for PC1 and PC2. The Gibbs’ free energy values were extracted using gmx-sham. The values were plotted using the OriginPro 2016 64-bit software.

Comparison of FEL plots of AMP186 modelled using different modeling algorithms can be studied from Figure S1 and S2 of Supplementary File 3.

The plots showing the MD analysis for AMP43, AMP164, AMP175, AMP186 and AMP207 can be seen in Supplementary File 3. The overall MD Simulation results for these AMPs can be found in the form of peptide profiling in Supplementary File 4.

Peptide structures modeled well by alphafold

Peptides modeled using AlphaFold, such as AMP21 and AMP239, displayed exceptional structural stability and compactness during MD simulations, reflecting AlphaFold’s strength in predicting structured regions. For AMP21, AlphaFold achieved the lowest RMSD (0.44 ± 0.10 nm) among all algorithms, with minimal fluctuation in Rg (0.95 ± 0.17 nm), indicating a stable and compact conformation over time. SASA values for AMP21 were found to be constant over the period of time in addition to being significantly low, further strengthening the compactness of the structure. RMSF was also found to be the lowest in case of AlphaFold modelled structure. A comparison of 3D FEL plots show that AlphaFold, followed by PEP-FOLD model the most compact structure for the peptide AMP21 with maximum number of residues in the lower energy sate.

Similarly, AMP239 exhibited consistent stability with a low RMSD (0.42 ± 0.08 nm) and a tightly packed structure (Rg: 0.73 ± 0.11 nm). The RMSF and SASA values were also found to be one of the lowest for the AlphaFold modelled structure.

For AMP21 and AMP239, we observed similar scenario where the expected results meet with the observed results, that is, AlphaFold performs excellent in modeling both these peptides. We tried to correlate the results with the physicochemical properties of the peptides, which reflected that AlphaFold possesses the ability to accurately model stable peptides (with low instability index) having higher percentage of helix as present in AMP21 and AMP239.

AlphaFold also showed good MD results for AMP32 as well, which possesses physicochemical properties similar to that of AMP21 and AMP239.

The MD simulation results representing the structural and essential dynamics for AMP21 indicating the strength of AlphaFold in modeling dominantly helical structures can be seen in Fig. 3.

Fig. 3
figure 3

Plots depicting the structural and essential dynamics of backbone atoms over time for AMP21. The panels represent: (i) RMSD for structures modeled using 4 different algorithms, (ii) RMSF for structures modeled using 4 different algorithms, (iii) Rg for structures modeled using 4 different algorithms, (iv) SASA for structures modeled using 4 different algorithms, (v) PCAfor structures modeled using 4 different algorithms, and (vi) FEL for best modeling algorithm. The gmx rms, gmx rmsf, gmx gyrate and gmx sasa functionalities were used to obtain the RMSD, RMSF, Rg and SASA values, respectively. gmx covar and gmx anaeig were used to obtain values for PC1 and PC2. The Gibbs’ free energy values were extracted using gmx-sham. The values were plotted using the OriginPro 2016 64-bit software.

Comparison of FEL plots of AMP21 modelled using different modeling algorithms can be studied from Figure S3 and S4 of Supplementary File 3.

For most other AMPs, AlphaFold resulted in minimal values for PC variation and a compact Free Energy Landscape, if not low RMSD, Rg and RMSF values. This indicates that AlphaFold aids in giving a more compact structure for short-length peptides, if not stable dynamics over the period of time.

For AMP21 and AMP239, we observed the same scenario where expected results match with observed results, that is, AlphaFold performs excellent for both due to high stability (low instability index) and high helical content. Similar results were observed for AMP32, where AlphaFold and PEP-FOLD show compactness of structure. However, the dynamics of AMP32 were well stabilized by Threading. This may be due to comparatively lesser helical content of AMP32, thereby making the helix less dominant.

Peptide structures best modeled by homology modeling

During MD simulations, AMP218 modeled by Homology Modeling exhibited the lowest RMSD (0.30 ± 0.02 nm) and a well-maintained compact structure with Rg at 0.79 ± 0.06 nm, outperforming other algorithms. Most importantly, Homology Modeling outperformed other algorithms in almost all the parameters, giving low and constant values for not only RMSD and Rg but also SASA. It also displayed minimum PC variation as compared to structures derived using other algorithms. Both 2D and 3D FEL plots show that Homology Modeling gives the most compact structure for AMP218.

For our AMPs, the percentage identity of templates found for Homology Modeling was high (60–100%), except for AMP218 which has identity percent of 40.54% to the template. However, it is surprising to note that only for AMP218, Homology Modeling outperformed the other algorithms even when the identity percent was least in this case. This is contrary to the usual trend seen for proteins which are modeled effectively by Homology Modeling only when a good template is provided. The fact that Homology Modeling performs best for AMP218 even when the template was not good enough, indicates towards the fact that apart from the template, there are other parameters that matter during Homology Modeling, such as consideration of the physicochemical properties, especially in case of short-length peptides.

It is important to note that highly disordered peptides like AMP218 and AMP2410 (as known from the DISO study) are modelled well by template-based modeling algorithms, namely, Homology Modeling and Threading respectively. The details of AMP2410 have been discussed later in the manuscript. The results representing the structural and essential dynamics for AMP218 can be seen in Fig. 4.

Fig. 4
figure 4

Plots depicting the structural and essential dynamics of backbone atoms over time for AMP218. The panels represent: (i) RMSD for structures modeled using 4 different algorithms, (ii) RMSF for structures modeled using 4 different algorithms, (iii) Rg for structures modeled using 4 different algorithms, (iv) SASA for structures modeled using 4 different algorithms, (v) PCAfor structures modeled using 4 different algorithms, and (vi) FEL for best modeling algorithm. The gmx rms, gmx rmsf, gmx gyrate and gmx sasa functionalities were used to obtain the RMSD, RMSF, Rg and SASA values, respectively. gmx covar and gmx anaeig were used to obtain values for PC1 and PC2. The Gibbs’ free energy values were extracted using gmx-sham. The values were plotted using the OriginPro 2016 64-bit software.

Comparison of FEL plots of AMP218 modelled using different modeling algorithms can be studied from Figure S5 and S6 of Supplementary File 3.

It is important to reiterate over the fact that out of the 4 peptides studied for their disorder in the DISO analysis, two of them were best modelled using the PEP-FOLD (AMP164 and AMP186) while the remaining two (AMP218 and AMP2410) were more accurately modeled using template-based approaches. The difference between these two sets of AMPs, that is one modeled best by PEP-FOLD and the other not, is that AMP164 and AMP186 (better modelled by PEP-FOLD) possess very less helical secondary structure, which is 0% and 28% respecticely. In addition to this, the % solvent accessibility to the exposed region for both of them is 83% and 75% respectively. On the other hand, AMP218 and AMP2410 (better modelled using template-based approaches) have more than 80% helical content and 100% solvent accessibility to the exposed region. Therefore, the MD simulation findings underscore PEP-FOLD’s specialization in handling disordered and flexible peptides (as visible from the physicochemical properties and DISO results), aligning dynamic adaptability with functional stability.

Interestingly, in case of AMP186, Homology Modeling was found giving a more compact structure as seen in PC variation (minimum), low ΔG and best 2D and 3D FEL plots, whereas PEP-FOLD giving a more stable dynamics by performing best in all other parameters. The reason for Homology Modeling giving a more compact structure could be its 100% identity to the template, whereas PEP-FOLD was able to model the disordered regions well giving a more stable dynamics for them.

In addition to this, in case of AMP207, Homology Modeling was again found giving a more compact structure as seen in PC variation (minimum), low ΔG and best 2D and 3D FEL plots, whereas PF giving a more stable dynamics by performing best in all other parameters. This is a trend similar to AMP186.

Correlating the physicochemical properties of AMP218, 207 and 186, it was found that these were more hydrophilic (having more negative GRAVY value) as compared to all other peptides. Thus, Homology Modeling was found to model structures of more hydrophilic peptides better than other algorithms when there was no presence of highly dominant secondary structure. It is also important to note that for AMP218, Homology Modeling performed well despite the high instability index of the peptide representing a completely disordered peptide with highest intrinsic disorder among all.

The dynamic behaviour of gut–derived peptides: Threading-based models excel in dynamic stability

In case of AMP32, Rg doesn’t differ considerably across different structures. Threading-based structure shows the most constant RMSD over the period of time which is also one of the lowest. Although the AlphaFold structure shows low RMSD values but there is high fluctuation in the values over the period of time, indicating towards a less stable structure as compared to Threading. RMSF doesn’t differ considerably across different structures. Threading based structure shows the lowest values for SASA as compared to structures derived from other algorithms. However, 3-D FEL is best for Homology Modeling followed by AlphaFold. 2D FEL also shows the same pattern, having maximum number of residues in the lower energy state for AlphaFold-based structure. These results suggest that while Homology Modeling provides the most thermodynamically favorable state in terms of free energy, the Threading-based structure demonstrates superior stability and compactness during dynamic simulations.

In case of AMP43, low values for Rg over the period of time are shown by PEP-FOLD followed by Threading. Threading shows least fluctuation in Rg over the period of time. On the other hand, AlphaFold and Homology Modeling show high variations in Rg. Threading gives the most constant RMSD over the period of time. However, the RMSD value for Threading based structure is higher than that of PEP-FOLD and AlphaFold. Homology Modeling gives higher RMSF values whereas other three algorithms give comparatively lower values for the same and do not show considerable differences in the intensity of RMSF fluctuations. Threading gives the lowest and the most constant SASA over the time period. Interestingly, FEL plot analysis reveals contrasting results, just like that of AMP32. FEL plots derived from other algorithms tend to represent more compact structures. Threading shows least number of residues in the lower energy state. This combination of insights emphasizes the importance of integrating structural modeling and dynamics to fully understand the behavior and functional potential of AMPs, particularly gut-derived peptides.

For AMP2410, Rg doesn’t differ considerably across different structures. Threading outputs the lowest and the least fluctuating values for RMSD. Lowest values for RMSF were shown by Threading. These were considerably lower than other algorithms. AlphaFold shows high fluctuation in RMSF. Low and constant SASA over the entire duration of simulation was seen in case of both Threading and AlphaFold. Threading and AlphaFold perform equally well in the FEL plot analysis showing maximum number of residues in the lower energy state. Thus, Threading can be prioritized here as it is performing the best in most parametric values.

A novel aspect of this study lies in the unexpected performance of Threading for AMP2410, which is a peptide dominated by rigid helical structures with high aromaticity and a positive charge. While Threading approaches typically rely on identifying templates with similar structural folds, they are generally less effective for unique or specialized helical peptides like AMP2410, which was anticipated to align better with algorithms such as AlphaFold or Modeller. However, the success of Threading in this case suggests the presence of conserved motifs or structural features within AMP2410 that closely resemble known templates.

It is to note that AMP32, AMP43 and AMP2410 show one of the most moderate GRAVY values among all peptides indicating a balanced hydrophobic-hydrophilic profile. These AMPs also were found to reflect a dominant helical secondary structure in the Protparam results. Thus, AMPs with balanced hydrophobic-hydrophilic profile (indicated by a slightly positive GRAVY value), were found to attain a more stable dynamics over the period of time when modeled using Threading as compared to other approaches. However, AlphaFold was foundto build a more compact structure for these. This underlines the different strengths of the two different computational modeling algorithms.

For AMP164, Both Threading and PEP-FOLD lead to most constant and low RMSD, Rg and SASA values over the period of time. Both these algorithms also perform the best in RMSF by showing least fluctuations in RMSF values at each residue position. Unlike AMP32, AMP2410 and AMP43, AMP164 lacks a dominant secondary structure. However, due to its balanced hydrophobic-hydrophilic profile (indicated by a slightly positive GRAVY value of 0.04), Threading algorithm was able to model its structure quite well.

Thus, cases where the peptide sequence is predicted to have a balanced hydrophobic-hydrophilic profile are modeled best by the Threading algorithm. Further, a dominant secondary structure provides an added advantage to the peptide for being appropriately modeled using Threading.

The results representing the structural and essential dynamics for AMP2410 can be seen in Fig. 5.

Fig. 5
figure 5

Plots depicting the structural and essential dynamics of backbone atoms over time for AMP2410. The panels represent: (i) RMSD for structures modeled using 4 different algorithms, (ii) RMSF for structures modeled using 4 different algorithms, (iii) Rg for structures modeled using 4 different algorithms, (iv) SASA for structures modeled using 4 different algorithms, (v) PCAfor structures modeled using 4 different algorithms, and (vi) FEL for best modeling algorithm. The gmx rms, gmx rmsf, gmx gyrate and gmx sasa functionalities were used to obtain the RMSD, RMSF, Rg and SASA values, respectively. gmx covar and gmx anaeig were used to obtain values for PC1 and PC2. The Gibbs’ free energy values were extracted using gmx-sham. The values were plotted using the OriginPro 2016 64-bit software.

Comparison of FEL plots of AMP2410 modelled using different modeling algorithms can be studied from Figure S7 and S8 of Supplementary File 3. The MD simulation results for AMP32, AMP43, AMP164, AMP175, AMP207 and AMP239 have been depicted in Supplementary File 3 in Figures S9-S26.

Overall algorithmic suitability for gut-derived peptides (PART-II): as per MD simulation results, Ramachandran plot, and VADAR analysis

The overall algorithmic suitability for all peptides studied, can be seen in Table 7. The table highlights algorithmic strengths for each peptide while also explaining deviations from previous analyses.

Table 7 Comparative summary of algorithmic performance for all AMPs as validated by MD simulations (expected vs. observed part-II).

Peptide profiling

The overall analysis for each peptide, both static (VADAR and Ramachandran analysis) and dynamics (MD simulation) has been summarized in the form of peptide profile for each of the peptides. The same can be found in Supplementary File 4.

Identifying the strengths of different modeling algorithms

A closer look at the results gives us an idea of the fact that both Threading and AlphaFold have their own strengths and weaknesses. Most importantly, these two algorithms complement each other in case of more hydrophobic peptides (less negative GRAVY value). When the two algorithms may combine, it might results in better modeling of the short-length peptides. The reason behind this is Threading shows a stable dynamics by showing most constant RMSD, Rg and SASA values over the period of time for several peptides, whereas AlphaFold is seen to give a compact structure as seen in FEL plot analysis containing low energy state residues close to each other with a large number of them in the blue region. Thus, a combination of the two might prove to be extremely beneficial for the obtaining compact models of the peptides with stable dynamics, which can be used for further in silico studies related to the peptides.

On the other hand, in case of more hydrophilic peptides (more negative GRAVY value), Homology Modeling and PEP-FOLD were seen to complement each other.

Fig. 6
figure 6

FEL and RMSD analysis of AMP43, AMP164, AMP2410, and AMP32. (a) FEL analysis of AlphaFold and Threading for AMP43. (b) FEL analysis of AlphaFold and Threading for AMP164. (c) FEL analysis of AlphaFold and Threading for AMP2410. (d) FEL analysis of AlphaFold and Threading for AMP32. (e) RMSD analysis of different modeling algorithms for AMP43. (f) RMSD analysis of different modeling algorithms for AMP164. (g) RMSD analysis of different modeling algorithms for AMP2410. (h) RMSD analysis of different modeling algorithms for AMP32. The gmx rms, gmx rmsf, gmx gyrate and gmx sasa functionalities were used to obtain the RMSD, RMSF, Rg and SASA values, respectively. gmx covar and gmx anaeig were used to obtain values for PC1 and PC2. The Gibbs’ free energy values were extracted using gmx-sham. The values were plotted using the OriginPro 2016 64-bit software. The final image inclusive of all the figures was designed using Canva (https://www.canva.com/).

Figures 6, 7 and 11 show a summary of our results indicating the benefits of using integrated approached for peptide modeling.

Fig. 7
figure 7

Strengths of different algorithms in case of (a). more hydrophobic peptides (b). more hydrophilic peptides. The gmx rms, gmx rmsf, gmx gyrate and gmx sasa functionalities were used to obtain the RMSD, RMSF, Rg and SASA values, respectively. gmx covar and gmx anaeig were used to obtain values for PC1 and PC2. The Gibbs’ free energy values were extracted using gmx-sham. The values were plotted using the OriginPro 2016 64-bit software. The final image inclusive of all the figures alongside the contextual information was designed using Canva (https://www.canva.com/).

Validation using control peptides with pre-defined experimental structures

To ensure reliability of our protocol and findings, we had included a set of 4 stable experimentally-validated peptides as controls in our study. The control peptides which have been proved to have a stable structure experimentally, retained similar stability in Ramachandran plot analysis, VADAR analysis and MD simulation. The X-ray diffraction or NMR validation report of these peptides was studied (taken from PDB itself) and compared with our results for these structures. It is to note that the X-ray diffraction or NMR validation report of 1LYP and 1VII were absent on PDB.

FASA or fractional accessible surface area was found to lie below 1, indicating that they are buried inside and not exposed much to the solvent, thereby forming a stable and compact structure. For the modelled AMPs, we observed that FASA was exceeding 1 for the residues of many peptides. Figure 8 depicts the FASA per residue for the experimentally-validated peptides taken in this study. The FASA plots for the test peptides or predicted AMPs can be found in Supplementary File 5.

Ramachandran plot showed few or negligible number of residues in the disallowed region. This finding was similar to the X-ray diffraction or NMR validation report of the peptides 1IJV and 1L2Y. Thus, we saw that the stable controls showed negligible number of residues in the disallowed region. On the other hand, we already saw above that several of the modeled AMPs exhibited higher number of residues in the disallowed region in the Ramachandran plot analysis (Fig. 9a-b).

Further, our aim was to test that is it necessary that a good Ramachandran plot with zero outliers guarantee structural stability or not. So, we performed MD simulations.

Finally, all the controls (except CAP18) show extremely low and consistent RMSD and Rg. The Free Energy Landscape of the controls further strengthen our study as these show high number of residues leaving the red region (high energy state) and entering the lower energy state (attaining stability), alongside a consistent pattern of adjacent residues leaving the red region, indicating a compact structure (Fig. 10). On the other hand, when the AMP models were tested using MD simulations, they showed high RMSD values and structural fluctuations in the form of Rg, consistent with their intrinsically disordered or flexible nature, as discussed in previous sections throughout the manuscript. This reinforces our earlier assertion that the structure of many short-length AMPs modeled using available algorithms is not a well-defined 3D structure under aqueous conditions, and that prediction algorithms must be carefully evaluated for such cases.

For CAP18 where zero outliers were found in the Ramachandran plot, we found that MD simulation did not show very stable dynamics, indicating that performing MD simulation is necessary to find out if a structure is stable or not, and that Ramachandran or VADAR analysis alone are not sufficient for the same. This supports the fact that our findings in this study are finally based on the results obtained from not just Ramachandran and VADAR analysis but also MD Simulation, where MD Simulation carries the maximum weightage.

The fact that the experimentally-validated structures 1IJV and 1L2Y (well-established as stable peptides in previous research studies), were showing same characteristics in our in silico protocol like they showed in X-ray diffraction or NMR validation reports, highlights that our protocol yields high confidence results.

Fig. 8
figure 8

FASA or fractional accessible surface area per residue for experimentally-stable peptides. (a)1IJV (b)1L2Y (c)1LYP (d) 1VII. Except the residues on N terminal and C terminal, all were found buried deep in the structure, thereby remaining under-exposed to the solvent. The plots were generated using VADAR version 1.8.

Fig. 9
figure 9

Ramachandran plot analysis for experimentally-stable peptides (a) PROCHECK analysis for Ramachandran outliers or residues in disallowed regions show similar results to the PDB structure validation report for 1IJV and 1L2Y. Negligible outliers were found to be present in both 1IJV and 1L2Y,indicating stable structures. No PDB structure validation report was found available on PDB for 1lyp and 1vii. No outliers were present in 1LYP and 1VII when checked via PROCHECK, further indicative of structural stability. (b) Comparison of PROCHECK Ramachandran plot and PDB structure validation report for 1IJV and 1L2Y reflecting the validity of PROCHECK results. The Ramachandran plots were generated using PROCHECK v.3.5.4. The final image inclusive of all the figures was designed using Canva (https://www.canva.com/).

Fig. 10
figure 10figure 10figure 10

MD Simulation results for the experimentally-obtained peptides. (a) Free Energy Landscape (FEL) of: (i)1IJV (ii)1L2Y (iii)1LYP (iv) 1VII. (b) Root Mean Square Deviation (RMSD) of: (i)1IJV (ii)1L2Y (iii)1LYP (iv) 1VII. (c) Radius of gyration (Rg) of: (i)1IJV (ii)1L2Y (iii)1LYP (iv) 1VII. The gmx rms and gmx gyrate functionalities were used to obtain the RMSD and Rg values, respectively. gmx covar and gmx anaeig were used to obtain values for PC1 and PC2. The Gibbs’ free energy values were extracted using gmx-sham. The values were plotted using the OriginPro 2016 64-bit software. The final image inclusive of all the figures alongside the contextual information was designed using Canva (https://www.canva.com/).

Major findings

In our study, most AMPs identified from the human gut were found to be not just cationic but also hydrophilic in nature (9 out of 10 having negative GRAVY value). This is in contrast to previous reports which state that most naturally produced AMPs are hydrophobic2. To verify this, we also examined the physicochemical properties of all the 2619 AMPs that were initially identified and found that approximately 69.71% of the peptides in our dataset were hydrophilic (based on GRAVY < 0). The details of the physicochemical properties can be found in Supplementary File 6. Those with unnatural amino acids were removed from the list while calculating the physiochemical properties.

Superimposed structures of peptides modelled using different algorithms reveal high deviation, indicated by extremely high RMSD values. This means that all algorithms output significantly different structures for the same peptide sequence.

On basis of correlation with the physicochemical nature of peptides, our study reveals that physicochemical properties have a major impact on structuring a peptide from its sequence. Further, different peptide modeling algorithms support better modeling of different types of peptide sequences. Additionally, some peptides may be modelled appropriately by more than one algorithms with their own strengths and weaknesses, showing direction for an integrated approach.

Analysis of the Ramachandran plots reveal that AlphaFold modeled peptide structures possess maximum residues in the allowed region as compared to those obtained by other algorithms. On the other hand, Threading-based structures perform worst in Ramachandran plot analysis showing least number of residues in the allowed region for most of the peptides. VADAR results showcase the weakness of template-based methods in structuring stable peptide models, by giving higher percentage of coil in the structure (indicative of less stability). However, considerable no. of H-bonds was formed by Threading-based structures whereas negligible H-bonds were formed by Homology Modeling based approach. The considerable number of H-bonds might be one of the reasons why Threading-based structures, although less compact, show stable dynamics over the period of time in MD Simulation. AlphaFold leads to the formation of maximum number of H-bonds, in addition to showing the lowest H-bond energy. MD Simulation results reveal that PEP-FOLD performs the best for most of the structures, considering the disorder and instability associated with the short-length peptides. AlphaFold is seen to perform best for AMPs with higher stability (low instability index) and higher helical content. However, it was still able to generate highly compact structures for maximum number of peptides, highlighting its major advantage. Threading is seen to provide stable dynamics for four AMPs with higher GRAVY value showcasing its biggest strength in dynamic modeling of such peptides, stabilizing the structure over the period of time. For such AMPs (those with higher GRAVY value), an integrated approach combining the stable dynamics of Threading and the compact structure of AlphaFold derived model might work the best.

While PEP-FOLD performs best for most peptides, two of the peptides having length of more than 25 amino acids are modeled better by template-based approaches, that is Threading and Homology Modeling. The potential reason behind this could be the arrangement of disordered regions. While AMPs modeled well by PEP-FOLD have all the DISO regions exposed, those modeled better by template-based approaches contain not only exposed regions but also regions which are buried to a medium extent as well as completely. The fact that Homology Modeling performs best for AMP218 even when the template was not good enough, indicates towards the fact that apart from the template, there are other parameters that matter during Homology Modeling, such as consideration of the physicochemical properties, especially in case of short-length peptides. In addition to this, Homology modeling gives the most compact structure for peptides with more negative GRAVY value with PEP-FOLD giving the most stable dynamics for these. This further points towards the need of combinatorial approaches.

Overall, our findings suggest that a combination of different modeling algorithms is necessary in order to study both the stability as well as the dynamic behaviour of the short-length peptides. For most of the short-length peptides, PEP-FOLD seems to model the most dynamically stable structures. While AlphaFold performs best for peptides with defined helical or structured regions, PEP-FOLD is revealed to be the most effective for highly disordered or flexible peptides, even with little or negligible helical structure. Homology Modeling is seen to be suitable for moderately stable peptides with available templates. Most interestingly, Threading which otherwise shows average performance excels in modeling peptides with a balanced hydrophilic and hydrophobic profile. The study also reveals that Threading might perform well for peptides with conserved structural motifs (as anticipated in case of AMP2410), even when the secondary structure conformation has defined helical or structured regions which is expected to support modeling by algorithms like AlphaFold. Figure 11(a) shows the algorithmic suitability of short-length peptides taken in our study based on their GRAVY value. Finally, in Fig. 11(b) we have proposed the combinations of different modeling algorithms for obtaining better structures of short-length peptides.

Further, it is of utmost importance to remember that the template-based algorithms (Modeller, I-TASSER) sometimes yield non-helical or mixed secondary structure models, particularly when aligned to fragments from larger template proteins. This arises from the inherent limitations of homology modeling and threading methods: the resulting 3D model strongly reflects the structural context of the template, which may be a small segment of a larger folded protein with complex topology, rather than a standalone peptide.

In such cases, the peptide’s biologically relevant fold may not be accurately recapitulated unless a highly similar peptide template exists in the database. Consequently, template-derived models may exhibit higher RMSD and Rg values during MD simulations reflecting structural instability or unfolding, especially in the absence of the full template context.

This phenomenon highlights a key limitation of template-based prediction for short, flexible peptides and supports our overall conclusion that template-free (de novo) approaches may be more reliable for modeling short AMPs with weak homology to structured domains. For the same reason, not all peptides with available templates in this study are best modelled by template-based algorithms like Homology Modeling and Threading.

Fig. 11
figure 11

(a) Algorithmic suitability of short-length peptides based on their GRAVY value. (b) Proposed combinations of algorithms for better modeling of short-length peptides. The image was designed using the Canva software (https://www.canva.com/).

Novel insights on gut AMPs

We found potentially novel findings about gut AMPs inferred from our study that have not been explicitly reported before (Table 8).

Table 8 Insights on gut-derived AMPs.

Conclusion and future perspectives

This study highlights the importance of computational modeling in better understanding short-length peptides. We demonstrate that different modeling algorithms excel based on different peptide properties: AlphaFold more reliably predicts the structured peptides, PEP-FOLD is adept at handling the disordered regions, Homology Modeling works quite well for moderately stable peptides, and Threading excels for peptides with balanced hydrophilic-hydrophobic profiles or conserved motifs. Most interestingly, we identified synergistic strengths in combining algorithms based on peptide characteristics. For more hydrophobic peptides, integrating Threading and AlphaFold offers optimal results by leveraging Threading’s dynamic stability and AlphaFold’s ability to generate compact structures. Conversely, for more hydrophilic peptides, combining PEP-FOLD and Homology Modeling balances dynamic flexibility with structural compactness derived from them respectively, thereby addressing the intrinsic properties and disorder of the peptides more clearly. Interestingly, our research reveals another novel finding. We have found that in our dataset most gut-derived AMPs were hydrophilic in nature. This is in contrast to previous reports that most natural AMPs are hydrophobic in nature. We suppose that the gut microbiome might drive the evolution of hydrophilic AMPs to prevent aggregation, enhance solubility and ensure effective antimicrobial action in the aqueous gut environment, ensuring proper solubility and effective antimicrobial activity within the aqueous environment of the gut48.

Future work should focus on developing hybrid algorithms that integrate the strengths of multiple approaches, tailored to peptide-specific properties. Incorporating experimental validation of computational predictions will strengthen the utility of these models, bridging in silico and real-world applications. Furthermore, expanding datasets to include diverse peptide sources and properties will enhance model robustness and accuracy. These advancements in AMP modeling could pave the way for rational design of novel antimicrobial therapeutics, providing innovative solutions to combat antimicrobial resistance and establish effective alternatives to traditional antibiotics. The integration of computational precision with biological insights offers a promising avenue for future research and therapeutic development. Lastly, our study also indicates the need to study the evolution of AMPs in the human gut to examine their changing behaviour.