Table 1 Description of the datasets used for analysis and DNN training.

From: An in silico deep learning approach to multi-epitope vaccine design: a SARS-CoV-2 case study

Datasets

Number of peptides

Descriptions

T

2000

Known T-cell epitopes with both MHC-1 and MHC-2 binders collected from the IEDB database. Used for creating the vaccine datasets

B

5000

Known B-cell epitopes collected from the IEDB database. Used for creating the vaccine datasets

Protective antigens

300

Known viral protective antigens collected from both the IEDB database and previous work. Used for training a DNN to identify protective antigens in order to sieve out the positive vaccine dataset from the Cartesian Products

Cartesian products

2000 × 5000 × 2

The Cartesian Products of TxB and BxT. The products include all the peptides generated from the T and B datasets which contain at least one T-cell epitope and one B-cell epitope in each peptide

NT

2000

2000 peptides which are not T-cell epitopes

NB

5000

5000 peptides which are not B-cell epitopes

N protective antigens

300

300 peptides which are not viral protective antigens

Positive vaccine dataset

706,970

Sieved out from the Cartesian Products by using the DNN trained by the protective antigen datasets. Each of the peptide in this dataset contains at least one T-cell epitope and one B-cell epitope and the whole sequence is predicted to be protective antigens. Used for training the DNN to predict vaccine subunits

Negative vaccine dataset

706,970

The negative dataset to train the DNN to predict vaccine subunits. Each peptide in this dataset does not contain at least one T-cell and one B-cell epitope or it is predicted to be non-protective antigens