Table 1 Description of the datasets used for analysis and DNN training.
From: An in silico deep learning approach to multi-epitope vaccine design: a SARS-CoV-2 case study
Datasets | Number of peptides | Descriptions |
|---|---|---|
T | 2000 | Known T-cell epitopes with both MHC-1 and MHC-2 binders collected from the IEDB database. Used for creating the vaccine datasets |
B | 5000 | Known B-cell epitopes collected from the IEDB database. Used for creating the vaccine datasets |
Protective antigens | 300 | Known viral protective antigens collected from both the IEDB database and previous work. Used for training a DNN to identify protective antigens in order to sieve out the positive vaccine dataset from the Cartesian Products |
Cartesian products | 2000 × 5000 × 2 | The Cartesian Products of TxB and BxT. The products include all the peptides generated from the T and B datasets which contain at least one T-cell epitope and one B-cell epitope in each peptide |
NT | 2000 | 2000 peptides which are not T-cell epitopes |
NB | 5000 | 5000 peptides which are not B-cell epitopes |
N protective antigens | 300 | 300 peptides which are not viral protective antigens |
Positive vaccine dataset | 706,970 | Sieved out from the Cartesian Products by using the DNN trained by the protective antigen datasets. Each of the peptide in this dataset contains at least one T-cell epitope and one B-cell epitope and the whole sequence is predicted to be protective antigens. Used for training the DNN to predict vaccine subunits |
Negative vaccine dataset | 706,970 | The negative dataset to train the DNN to predict vaccine subunits. Each peptide in this dataset does not contain at least one T-cell and one B-cell epitope or it is predicted to be non-protective antigens |