Extended Data Fig. 1: Length distribution of datasets and model converge in the training stage.
From: Identification of antimicrobial peptides from the human gut microbiome using deep learning

a, The length distribution of sequences in three training sets (Train AMPs: training set for AMP sequences, Train Non-AMPs: training set for non-AMP sequences with similar amount of sequences to that of Train AMPs, Train 10Non-AMPs: training set for non-AMP sequences with 10 times amount of sequences to that of Train AMPs) training data are matched. The colored squares indicate the different length distributions. This was plotted by http://www.bioinformatics.com.cn. b, The loss during training process of different models. Attention and LSTM models converged with 100-200 epochs of training steps, while Bert converged with higher number of epochs. c, Length distribution of 2,349 candidate AMPs from the metagenomic cohorts in our study.