Extended Data Fig. 1: The performance and evaluation of transfer models on the training, validation, and test set. | Nature Microbiology

Extended Data Fig. 1: The performance and evaluation of transfer models on the training, validation, and test set.

From: A generative artificial intelligence approach for the discovery of antimicrobial peptides against multidrug-resistant bacteria

Extended Data Fig. 1

a, Performance of the AMPSorter over training and validation datasets across epochs. b, c, AUC (b) and AUPRC (c) of AMPSorter on the test set. d, e, UMAP visualization of training set, validation set, test set (d) and benchmarking set (e) using k-mer encoding (k = 3). Each point represents a sequence, with the position determined by reducing the high-dimensional k-mer feature space into two dimensions. tr-AMP: AMPs in the training set; tr-Non-AMP: Non-AMPs in the training set; v-AMP: AMPs in the validation set; v-Non-AMP: Non-AMPs in the validation set; te-AMP: AMPs in the test set; te-Non-AMP: Non-AMPs in the test set; b-AMP: AMPs in the benchmarking set; b-Non-AMP: Non-AMPs in the benchmarking set. f, Comparison of FCD values between the AMP test set and the benchmarking set. The FCD values were calculated by comparing the test set and benchmarking set against known AMPs using k-mer encoding (k = 3). g, Performance of the BioToxiPept over training and validation datasets across epochs. h, Training process of AMPGenix. Blue solid line indicates the raw loss values, and black solid line depicts the loss values smoothed. i, The ratio of sequences containing UAAs generated by AMPGenix at different temperature parameters, predicted as AMP and Non-AMP by AMPSorter.

Source data

Back to article page