Introduction

Lassa virus (LASV), a member of the Arenaviridae1, is an ambisense RNA virus that causes a severe hemorrhagic Lassa fever in humans. LASV is endemic, particularly in the West African countries of Sierra Leone, The Republic of Guinea, Nigeria, and Liberia2,3. The transmission of LASV to humans occurs through the urine or feces of infected Mastomys rats and the virus spreads human-to-human through direct contact with the blood, urine, feces, or other bodily secretions of an infected person. LASV can be fatal and no approved effective therapeutics are currently available. The development of therapeutics such as antibodies and vaccines for the treatment of LASV is therefore of significant urgency4,5,6.

Of the four proteins that are encoded by the two RNA segments of the LASV genome, the glycoprotein (GP) is the only protein on the viral surface. GP results from the cleavage of a 75 kDa precursor polypeptide, GPC by signal peptidase and then further glycosylated and processed into GP1 and GP27. GP1 is the receptor-binding subunit, and GP2 is the membrane-spanning fusion subunit8,9,10. The virion envelope protein spikes are composed of three heterotrimers, with each heterotrimer containing signal peptide, GP1, and GP211,12, shown in Fig. 1. A chalice-like GP trimer interacts with receptors on the cell surface, for example matriglycan, which mediates the entry of the virus into the host cell. In addition, the GP also interact with ERGIC-53 in the exocytic pathway, which helps to form infectious virions13. GP is considered to be a key factor for LASV growth, cell tropism, host range and pathogenicity, and as it is the only protein situated on the LASV virion surface, GP becomes a primary target for vaccine design4.

Figure 1
figure 1

3D structure of the LASV GP trimer consisting of the three GPs (GP-A, GP-B, GP-C). Each GP has a GP1 subunit and a GP2 subunit (zoomed view). Each monomer is colored differently in the GP trimer. In the zoomed view, the GP2 subunit is lightly shaded to differentiate from the GP1 subunit, and some of the antibody binding sites (Site A, Site B) are highlighted (figure generated from the crystal structure of the LASV GP in the Protein Data Bank21, PDB ID: 5VK24).

The crystal structure of the trimeric LASV GP in complex with the 37.7 H neutralizing antibody from a human survivor (PDB ID: 5VK2, Fig. 1) has been determined, thereby providing insight into the structural basis for antibody design. Analysis of the GP-37.7 H antibody complex shows that the antibody simultaneously binds to two GP monomers at the base of the GP trimer. The binding involves four discontinuous regions of LASV GP: two in site A and two in site B. Site A contains residues 62 and 63 of the N-terminal loop of GP1 and residues 387 to 408 in the T-loop (residues 365–384) and HR2 (residues 400–412) regions of GP2. Site B contains residues 269 to 275 of the fusion peptide and residues 324 to 325 of HR1 (residues 311–355) of GP24,14. Although the antibody predominantly binds to GP2, GP1 is required to maintain the proper prefusion conformation of GP2 for antibody binding4.

Identification of epitopes is an essential step for understanding disease etiology, immunotherapy, immunodiagnostics, and the discovery and development of epitope based-vaccines. An epitope-based vaccine has fewer side effects compared to conventional vaccines. Experimental identification of a promiscuous epitope involves many expensive and time-consuming steps, including the production of antibodies to map antigenic regions on a target protein, animal models, and determination of the crystal structure of antigen-antibody complexes using X-Ray crystallography. Computational identification of epitopes is often employed as a powerful and fast approach to facilitate the identification of potential epitope candidates that can decrease the number of validation experiments and time15,16. Multi-epitope based vaccine development has already proven effective against several viral infections and cancer17,18. In this study, we have identified and characterized T and B-cell epitopes for the LASV GP using different sequence and structure-based computational epitope prediction methods. We then selected potential B and T-cell epitopes for the LASV GP based on a consensus approach, and the novelty of the epitopes was examined with the Immune Epitope Database (IEDB) tools. Subsequently, we identified strongly binding alleles to the MHC-I T-cell epitopes and modeled the allele structures and performed docking to understand the interaction between alleles and epitopes. We further investigated the stability and dynamics of the epitope-allele complexes using molecular dynamics simulations. Analyses of root-mean square deviations, hydrogen bond, interaction energy, and solvent accessibility showed that epitope-allele complexes are stable, indicating that the epitopes strongly bind to the alleles. The identified B and T-cell epitopes of LASV GP in the study can be useful for the development of effective vaccines against Lassa hemorrhagic fever.

Materials and methods

Selection of LASV GP sequence and 3D structure

The sequence of GP for different LASV strains was obtained from the NIAID Virus Pathogen Database and Analysis Resource (ViPR)19. Subsequently, multiple sequence alignments were performed between the sequences using Clustal Omega20 to select a conserved LASV GP for sequence-based epitope prediction. The corresponding X-Ray crystal structure of the Mouse/Sierra Leone/Josiah/1976 LASV GP was obtained from the Protein Data Bank (PDB ID: 5VK2)4,21 for structure-based B-cell epitope prediction. The missing residues were modeled using the Charmm-Gui22,23,24.

Prediction of B-cell epitopes

Sequence-based B-cell epitope prediction was performed with the use of BepiPred2.025, BCPREDS26 and BcePred27 servers separately. These servers predict epitopes based on physico-chemical properties of amino acids, and these servers accept the primary sequence of LASV GP as an input.

Structure-based B-cell epitope prediction for the LASV GP (PDB ID: 5VK2) was carried out using three different programs separately: ElliPro28, Epitopia29 and DiscoTope30. These servers predict epitopes regions based on the geometrical and solvent surface-accessibility of a protein structure, and these servers accept the 3D structure of a protein as input. The consensus epitopes from both sequence and structure-based predictions were selected as potential epitopes for further analysis.

Prediction of T-cell epitopes

Sequence-based MHC-I T-cell epitope predictions for LASV GP were carried out by using three different servers, ProPred-I31, CTLPred32 and NetCTL1.233. To predict their alleles, the consensus epitopes among these three prediction methods were analyzed using IEDB34. The epitopes that strongly bind to the alleles (lowest IC50) were selected for further analysis.

Sequence-based MHC-II T-cell epitope predictions for LASV GP were performed with the use of three different servers: ProPred35, NetMHCII2.336 and EpiTOP3.037. The antigenicity score of the selected epitopes was predicted by VaxiJen 2.038.

Homology modeling and epitope-allele docking

The structure of HLA-A*02:06 (A1) [PDB ID 3OXR39], HLA-A*02:03 (A2) [PDB ID: 3OX839], and HLA-B*35:01 (A3) [PDB ID: 2CIK40] were obtained from the PDB. The experimental structure for the HLA-A*32:01 (A4) allele is not available, and thus, the sequence of this allele was obtained from the UniProt database (UniProtKB ID: P01892), and subsequently its structure was modeled using Swiss-Model41,42,43. The selected consensus MHC-1 epitopes were extracted from the crystal structure of LASV GP (PDB: 5VK2). The epitopes and the alleles were prepared for docking using Autodock Tool version 1.5.644. Autodock Vina 1.1.245 was used for peptide docking with a grid space that covered the entire allele. The best peptide-allele complexes were selected for further investigation based upon visual inspection of peptide-allele interactions and the Autodock Vina criteria. The stability and dynamics of the selected peptide-allele complexes were further studied using molecular dynamics simulations.

Molecular dynamics simulations

All-atom, explicit-solvent molecular dynamics (MD) simulations were performed to investigate the stability and dynamics of the MHC-1 T-cell epitope-allele complexes using the CHARMM36m force field46 with the NAMD 2.12 software package47. The systems were minimized for 10,000 steps followed by 200 ps of equilibration. This was followed by MD production runs for 200 ns at a temperature of 300 K using a 2 fs time-step. The long-range ionic interactions were calculated using the particle mesh Ewald (PME) method48 while the covalent hydrogen atoms were constrained by using a SHAKE algorithm49. The temperature was controlled by using the Langevin temperature coupling with a friction coefficient of 1 ps−1 and pressure was controlled using the Nose-Hoover Langevin-Piston method50. Visualization, and rendering of trajectories and pictures were performed using VMD51.

Results and Discussion

The multiple sequence alignment of the 84 LASV GP sequences resulted in the LASV GP Mouse/Sierra Leone/Josiah/1976) [UniprotKB ID: P08669] as a highly conserved strain, and we thus selected this strain for the sequence-based MHC-I and MHC-II T-cell epitope predictions and for both structure and sequence-based B-cell epitope predictions. In addition, a search of this strain with the experimentally determined structure available in the PDB displayed the 3.2 Å resolution crystal structure of the prefusion GP trimer of LASV in complex with the human neutralizing antibody 37.7 H. [PDB ID: 5VK2] as shown in Fig. 1. This structure was used for the structure-based B-cell epitope prediction. A schematic representation of the epitope prediction cascade is shown in Fig. 2. We have adopted multiple methods to predict and rank the epitopes as they use different criteria for their predictions. Some approaches may incorporate some properties that are similar such as solvent accessible surface area, but the predicted epitopes are different. Previous studies52,53 have suggested that the consensus approach would improve the specificity and accuracy of the epitope prediction as it can reduce the false positives. Therefore, we employ a consensus approach; for example, an epitope can be considered if it overlaps with even a single residue by at least two prediction methods. Our consensus approach selected several nanomer epitopes for MHC-I (Table S1). Although the predicted epitopes for MHC-II T-cell vary in length, the consensus core region between predicted MHC-II epitopes is a nanomer (Table S2) which is considered54 an optimal length for the HLA.

Figure 2
figure 2

The workflow cascade for epitope identification of (a) T-cell and (b) B-cell.

Prediction of T-cell Epitopes

MHC-I T-cell epitope prediction with the LASV GP sequence was performed using three different methods separately: ProPred-1, CTLPred, and NetCTL1.2, and the results are shown in Supplementary Table S1. The epitopes listed by at least two of the methods are listed in Table 1 along with their binding affinity (IC50), antigenicity, and allele. Among these four consensus epitopes, the nanomer E1 epitope FATCGLVGL shows the lowest average IC50 value of 34 nM against the A1 allele as predicted by the IEDB, and it has also a reasonable antigenicity score of 1.65. This was followed by the E3 epitope FSRPSPIGY, which has an average IC50 value of 88 nM against the A3 allele, and also has a better antigenicity score of 2.50 compared to the FATCGLVGL epitope. Interestingly, the E4 epitope RRGTFTWTL is predicted by all three methods though its IC50 and antigenicity scores are not as good as the other epitopes (Table 1). All four of these consensus epitopes were docked to the alleles and we performed the MD simulations to investigate the stability and dynamics of the allele-epitope complex as discussed later.

Table 1 Consensus prediction of the MHC-I T-cell epitopes.

MHC-II T-cell epitope prediction with the LASV GP sequence was performed using three different methods separately: ProPred, NetMHCII 2.3, and EpiTOP 3.0, and the results are shown in Supplementary Table S2. ProPred uses a quantitative matrix35 approach and NetMHCII2.3 uses ANN36, while EpiTOP 3.0 uses Quantitative Structure–Activity Relationship models (QSAR)37 to predict the MHC-II T-cell epitopes. The epitopes that were predicted by at least two methods are listed in Table 2. Among these consensus MHC-II T-cell epitope predictions, the E9 and E13 epitopes were predicted by all three methods and have a reasonable antigenicity score of 0.7, indicating that these two epitopes can be potential candidates for the design of MHC-II T-cell based vaccines. ProPred and EpiTOP 3.0 predict most epitopes as nanomers whereas NetMHCII 2.3 predicts varying lengths of epitopes (Table 2). Interestingly, the 15-mer epitopes predicted by NetMHCII have the consensus core nanomer epitopes, suggesting that the core region is responsible for strong binding of the epitope into the MHC-II binding site55,56,57.

Table 2 Prediction of the MHC-II T-cell epitopes.

Prediction of B-cell epitopes

In addition to the T-cell epitope predictions, we also predicted the linear B-cell epitopes for the LASV GP using sequence-based methods BepiPred 2.025, BCPREDS26, and BcePred27. The BepiPred predicts the epitopes based on a random forest algorithm trained on epitopes annotated from antibody-antigen structures. BCPREDS predicts epitopes by using SVM combined with a different kernel method, including string kernels, radial basis kernels, and subsequence kernels. The BcePred locates B-cell epitopes using four physicochemical properties like hydrophilicity, polarity, exposed surface and beta-turns27. The epitope E30 containing 10 residues was predicted by all three of these sequence methods (Table 3) but with a negative antigenicity score.

Table 3 Prediction of the B-cell epitopes.

We also performed structure-based B-cell epitope prediction using three representative structural and geometrical properties-based methods: ElliPro, Epitopia and DiscoTope. For this, the experimental 3D structure LASV GP (PDB ID: 5VK2) with the modeled missing residues was used. ElliPro predicts linear and conformational epitopes by incorporating the antigenicity, solvent accessibility, and flexibility of protein structures28. Epitopia uses a machine learning algorithm to analyze the antigenic features on protein structure and predicts the probable conformational epitope regions29. DiscoTope uses amino acid statistics, spatial information, and surface accessibility on the protein 3D structure to predict residue-by- residue conformational epitopes30. The E24, E29, E32 and E33 structure-based epitopes in Table 3 are especially interesting as potential candidates as they were predicted by all three methods. In Table 3, we also ranked each epitope based upon how many of the sequence and structure-based methods predicted each epitope, which do not always correlate with the highest antigenicity scores of E24, E26, E28, E29 and E31.

Robinson et al.14 have recently reported the cloning of many human monoclonal antibodies derived from memory B cells of Lassa fever survivors in West Africa. These antibodies specifically bind to both GP1 and GP2 epitopes of LASV. The comparison of our predicted B-cell epitopes with those epitopes shows that there are five consensus epitopes (Table 3) that share similarity with Robinson et al. (Table S3), and another five epitopes that do not share similarity, indicating that our consensus epitope prediction strategy has identified new epitopes.

Epitope surface mapping

For efficacy of vaccines, the epitopes should be located on an accessible region of the protein so that the epitope will be able to bind with antibodies53. This is especially important for the six epitopes that we list in the Tables above that do not share any part of their sequence with known epitopes: E1, E4, E18, E22, E27, E29. In Fig. 3, we highlight the positions of these epitopes on LASV GP. We also highlight the positions of E2 and E3 because the four MHC-I T-cell epitopes have IC50 information readily available. Figure 3 shows that the E1, E2, E3, E4, E18, E22 and E27 epitopes are well located on the exposed regions and thus can interact well with the alleles.

Figure 3
figure 3

Mapping of some representative epitopes are highlighted on the LASV GP. Mapping of: (a) secondary structural elements, (b) surface accessibility. The location of the epitopes on the GP suggests that they are on the solvent exposed region, indicating promiscuity as they have easy access to alleles.

MHC-I T-cell Allele and epitope modeling and docking

Swiss-Model identified the 1.61 Å resolution crystal structure of the HLA class I antigen (PDB ID: 6EI2) as the best template for constructing models. The sequence identity between A4 and the template was 92%. The best model was then selected based on multiple validation methods, including GMQE (Global Model Quality Estimation) and QMEAN. The GMQE and QMEAN values41,58 of the model are 0.75, and 0.6, respectively. In addition to these analyses, Ramachandran plots and ERRAT were also used for the model validation. Analysis of Ramachandran plot59 of the model shows 99.6% of residues are either in favored or in allowed regions (Supplementary Fig. S1), indicating that backbone torsion angles of these models are acceptable. The ERRAT overall quality factor60 score was computed as 99, which is greater than the normally accepted score range for a high quality model of 50. These analyses show that the model is within a high quality range and can be used for further analysis.

Docking of the four consensus MHC-I epitopes (Table 1) was performed using Autodock Vina, which enabled the docking of epitopes obtained from the sequence-based MHC-1 T-cell prediction into the promising allele structures. The Autodock Vina docking protocol has been previously demonstrated to successfully dock epitopes into allele structures45. However, we validated the capability of the docking protocol before docking the epitopes by redocking the epitopes into the allele crystal structure (PDB ID:3OX8) to see whether the crystal bound conformation of the peptide could be reproduced or not. The docked allele-epitope complex showed the same residue-epitope interactions observed in the epitope bound crystal structure, indicating that the Autodock Vina docking protocol was capable of reproducing the experimentally observed binding mode of the epitope. We applied Autodock Vina to each of the four MHC-I allele-epitope complexes. Autodock Vina found that the highest ranked docking structure had the following binding affinities: −5.5 kcal/mol for A1::E1 −5.0 kcal/mol for A2::E2, −6.8 kcal/mol for A3::E3, and −6.0 for A4::E4. These epitopes-alleles docking complexes are shown in Fig. 4.

Figure 4
figure 4

Snapshots of allele-epitope complexes. (a) A1::E1, (b) A2::E2, (c) A3::E3, and (d) A4::E4 at the beginning and end of the MD simulations: t = 0 (minimized structure), t = 200 ns. Allele is gold and epitope is green.

Dynamics of the allele-epitope complex

In order to investigate the dynamics and stability of the four MHC-I allele-epitope complexes, we performed 200 ns all-atom, explicit solvent MD simulations. To quantitatively understand the stability of the allele-epitope complex, we calculated the root mean square deviations (RMSD) of the backbone atoms of the allele-epitope complexes as a function of simulation time as shown in Fig. 5. Figure 5 also includes curves of the RMSD of the backbone atoms of just the allele, and separately, just the backbone atoms of the epitope. All alleles have an RMSD compared to their initial structures of approximately 2 Å, whereas the allele-epitope complexes have a bit higher RMSD of approximately 2.5 Å, indicating that the epitopes make the complexes more flexible. Interestingly, in the case of A3::E3, the allele and the complex show almost the same RMSD, suggesting that the complex is especially stable. To pinpoint why the complexes show a higher RMSD, we further computed the RMSD of only the backbone atoms of the epitope in each the complex. Figure 4 shows that the initial configuration of epitopes E1 and E4 is compact, and that both of these epitopes rearrange their configuration in the binding site and elongate during the 200 ns MD simulation. This elongated configuration is consistent with the investigations of Antunes et al.61 on MHC-I epitopes.

Figure 5
figure 5

Root-mean-squared deviations (RMSD) calculated for the backbone atoms of allele (A), epitope (E) and complex (A + E) from MD simulations of MHC-I allele-epitope complexes.

Since the interactions between protein and epitope peptide are mostly influenced by non-covalent interactions, we computed the number of hydrogen bonds and the interaction energy between the allele and epitope as a function of the MD simulation time. The hydrogen bond was calculated between the protein interface atoms with a distance cut-off of 3.5 Å and angle cut-off of 30o between the donor and acceptor heavy atoms. As shown in Fig. 6, the number of H-bonds fluctuates during the MD simulations for all the complexes. The A3 complex has the largest number of H-bonds. Table 4 shows that during the last 50 ns of the MD simulation trajectory, the A3 complex averages 2.5 H-bonds. Additional analysis of the hydrogen bonding between allele and epitope are listed in Supplementary Table S4.

Figure 6
figure 6

(a) The number of allele-epitope intermolecular hydrogen bonds as a function of MD simulation time. (b) Interaction energy calculated between allele and epitopes as a function of simulation time.

Table 4 Allele–epitope interaction parameters calculated by averaging over the last 50 ns of the MD simulation trajectory.

Figure 6b shows the interaction energy (electrostatic interaction + van der Waals contacts) throughout the entire MD simulation and Table 4 lists the average over the last 50 ns. The A3::E3 and A4::E4 display relatively stronger interaction energies than the A1:E1 and A2::E2 complexes. The comparison of RMSD, hydrogen bond, and interaction energy information indicates that the E3 epitope is an especially promising epitope candidate.

Novelty analysis

The novelty of the four MHC I T-cell epitopes in Table 1, the nineteen MHC II T-cell epitopes in Table 2, and the ten B-cell epitopes in Table 3 identified in this study were analyzed using IEDB34. The IEDB database contains the epitopes that are annotated based on scientific literature. The IEDB showed that the E1, E4, E18, E22, E27, E29 epitopes, which bind to solvent exposed regions on the protein (Fig. 3), have not been previously reported as LASV epitopes or vaccine candidates. In addition, this analysis further indicates that 24 other epitopes (E2, E3, E5, E6, E7, E8, E10, E11, E12, E14, E15, E16, E17, E19, E20, E23, E24, E25, E26, E28, E30, E31, E32, E33) have partial segments of their sequence reported as subsets of other epitopes, whereas E9, E13, E21 are exact match to previously reported sequences. For these epitopes, a comparison showing the overlap between the predicted epitopes in this study and previously known epitopes documented in IEDB is given in Table S5. In addition to the epitopes in the IEDB, we compared our consensus predicted epitopes with the previously reported predictions62,63,64,65,66,67 in Table S6. This comparison shows a varying degree of overlap in the predicted sequences. The novelty results confirm that thirty epitopes have not been previously tested experimentally as LASV epitopes, suggesting that their therapeutic potentials in designing vaccines against LASV can be further explored.

Conclusion

LASV hemorrhagic fever is endemic in West Africa, and no approved effective therapeutics are currently available. Therefore, there is an urgent need for the discovery and development of potential antiviral therapeutics. The LASV GP spike has emerged as a promising selective target for the development of novel vaccines as it plays an essential role in the virus-host interaction. Several in-silico studies62,63,64,65,66,67 were performed to predict LASV GP epitopes with the use of a single prediction tool for each type of epitope. We have identified new T and B-cell epitopes using a variety of computational approaches, including twelve epitope prediction methods, protein-peptide docking, and MD simulations. The MHC I and II T-cell epitopes were separately predicted with the LASV GP sequence using well-known prediction methods. The predicted MHC I T-cell epitopes then were prioritized based on the consensus score, binding affinity, and antigenicity, while MHC II T and B-cell epitopes were prioritized based on the consensus score. Novelty analysis of the consensus-selected 33 epitopes showed that thirty of these predicted epitopes have either no overlap or only a partial overlap to previously reported sequences. Within this list of new epitopes, six sequences have no overlap with any known experimentally tested epitopes in the IEDB. In addition, docking and MD simulations were performed to further validate the MHC I T-cell epitopes. The simulation results show that the allele-MHC-I epitopes are stable, with favorable hydrogen-bond and interaction energy. Of these, Epitope E3 (233FSRPSPIGY241) segment was found to be especially stable. This study demonstrates that the adopted consensus epitope prediction strategy is valuable for in-silico investigations of known epitopes and the identification of new epitopes. Experimental validation of these epitopes may lead to the design and development of effective LASV vaccines.