Extended Data Fig. 1: Length distribution of datasets and model converge in the training stage. | Nature Biotechnology

Extended Data Fig. 1: Length distribution of datasets and model converge in the training stage.

From: Identification of antimicrobial peptides from the human gut microbiome using deep learning

Extended Data Fig. 1

a, The length distribution of sequences in three training sets (Train AMPs: training set for AMP sequences, Train Non-AMPs: training set for non-AMP sequences with similar amount of sequences to that of Train AMPs, Train 10Non-AMPs: training set for non-AMP sequences with 10 times amount of sequences to that of Train AMPs) training data are matched. The colored squares indicate the different length distributions. This was plotted by http://www.bioinformatics.com.cn. b, The loss during training process of different models. Attention and LSTM models converged with 100-200 epochs of training steps, while Bert converged with higher number of epochs. c, Length distribution of 2,349 candidate AMPs from the metagenomic cohorts in our study.

Source data

Back to article page