Deep learning for discriminating non-trivial conformational changes in molecular dynamics simulations of SARS-CoV-2 spike-ACE2

Moraes dos Santos, Lucas; Gutembergue de Mendonça, José; Jerônimo Gomes Lobo, Yan; Henrique Franca de Lima, Leonardo; Bruno Rocha, Gerd; C. de Melo-Minardi, Raquel

doi:10.1038/s41598-024-72842-w

Download PDF

Article
Open access
Published: 30 September 2024

Deep learning for discriminating non-trivial conformational changes in molecular dynamics simulations of SARS-CoV-2 spike-ACE2

Lucas Moraes dos Santos¹,
José Gutembergue de Mendonça²,
Yan Jerônimo Gomes Lobo³,
Leonardo Henrique Franca de Lima³,
Gerd Bruno Rocha² &
…
Raquel C. de Melo-Minardi¹

Scientific Reports volume 14, Article number: 22639 (2024) Cite this article

9696 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Molecular dynamics (MD) simulations produce a substantial volume of high-dimensional data, and traditional methods for analyzing these data pose significant computational demands. Advances in MD simulation analysis combined with deep learning-based approaches have led to the understanding of specific structural changes observed in MD trajectories, including those induced by mutations. In this study, we model the trajectories resulting from MD simulations of the SARS-CoV-2 spike protein-ACE2, specifically the receptor-binding domain (RBD), as interresidue distance maps, and use deep convolutional neural networks to predict the functional impact of point mutations, related to the virus’s infectivity and immunogenicity. Our model was successful in predicting mutant types that increase the affinity of the S protein for human receptors and reduce its immunogenicity, both based on MD trajectories (precision = 0.718; recall = 0.800; $\hbox {F}_1$ = 0.757; MCC = 0.488; AUC = 0.800) and their centroids. In an additional analysis, we also obtained a strong positive Pearson’s correlation coefficient equal to 0.776, indicating a significant relationship between the average sigmoid probability for the MD trajectories and binding free energy (BFE) changes. Furthermore, we obtained a coefficient of determination of 0.602. Our 2D-RMSD analysis also corroborated predictions for more infectious and immune-evading mutants and revealed fluctuating regions within the receptor-binding motif (RBM), especially in the $\beta _{1}^{\prime }/\beta _{2}^{\prime }-C$ loop. This region presented a significant standard deviation for mutations that enable SARS-CoV-2 to evade the immune response, with RMSD values of 5Å in the simulation. This methodology offers an efficient alternative to identify potential strains of SARS-CoV-2, which may be potentially linked to more infectious and immune-evading mutations. Using clustering and deep learning techniques, our approach leverages information from the ensemble of MD trajectories to recognize a broad spectrum of multiple conformational patterns characteristic of mutant types. This represents a strategic advantage in identifying emerging variants, bypassing the need for long MD simulations. Furthermore, the present work tends to contribute substantially to the field of computational biology and virology, particularly to accelerate the design and optimization of new therapeutic agents and vaccines, offering a proactive stance against the constantly evolving threat of COVID-19 and potential future pandemics.

A 3D structural SARS-CoV-2–human interactome to explore genetic and drug perturbations

Article 29 November 2021

Serine 477 plays a crucial role in the interaction of the SARS-CoV-2 spike protein with the human receptor ACE2

Article Open access 22 February 2021

Dual ACE2 epitope-based biomimetic receptors for selective sensing of SARS-CoV variants

Article Open access 23 September 2025

Introduction

In bioinformatics, emerging solutions have transformed the growing volume of biological omics data into knowledge, through methodologies based on machine learning¹. Deep learning (DL), a subfield of machine learning, has been successful in extracting meaningful patterns from data, as it can handle high-dimensional representations of data by modeling high-level abstractions². This is achieved from multiple nonlinear transformations in neural networks, where the output of one node is connected to the input of each succeeding node. This arrangement of neuronal units forms dense layers, constituting a fully connected network³.

In general, deep learning-based pipelines for molecular dynamics simulations are composed of a feature engineering step that reduces the high-dimensional structure space to a low-dimensional feature space; pre-processing that adapts the simulation data to a representation appropriate for algorithm input; a deep neural network composed of feature extraction and classification modules; and a stage of evaluation and interpretability of the predictions made by the model².

In structural bioinformatics, convolutional neural networks (CNNs) have been widely used to predict protein structure or function¹, which can be attributed to two primary aspects: (i) CNNs benefit from the symmetries of structural representations (e.g., distance maps) through translation invariance, effectively learning patterns regardless of their spatial location in the input; (ii) convolutional layers primarily capture local dependencies within their receptive fields through learned filters, focusing on small spatial regions of the input rather than the entire global structure at once^2,4,5. These architectures have represented the state-of-the-art in structure prediction^1,6 and in the identification of functional states from MD trajectories^7,8,9.

Molecular dynamics (MD) is another relevant computational method to investigate the dynamic behavior of biomolecular systems at the atomic level¹⁰. MD simulations combined with experimental data have allowed the investigation of biological processes that would be difficult or even impossible to observe experimentally¹¹.

Given the voluminous and high-dimensional nature of MD trajectories, advances in MD simulation analysis applied to molecules, such as cluster analysis, have enabled more efficient exploration of the conformational states in protein complexes¹². Integrating this method with DL-based approaches improves the prediction of intrinsic protein movements, reducing the risk of overlooking subtle but significant conformations, in which manual analysis could result⁹.

Moreover, remarkable progress has been made in the use of methodologies for in-depth analysis of MD trajectories. For example,⁷ and⁹ combined pixel representations and CNNs to identify functional states in MD trajectories of G protein-coupled receptors (GPCRs), as well as the residues underlying the active states, elucidating the activation mechanisms of these receptors. Both studies transform data extracted from trajectories, including atom coordinates (x, y, and z), into pixel-based representations using RGB components, generating a representation known as pixel maps. Although the model was successful at classifying the ligands, the pixel representation has the disadvantage of being sensitive to translational and rotational movements of proteins, requiring additional preprocessing of the trajectory to remove bias.

Distance maps (DMs) have provided an alternative to this problem as they are invariant to protein rotation and translation. By using distance maps to reduce high-dimensional MD data while preserving crucial spatial information, we gain an advanced approach for analyzing the dynamic behavior of proteins over time. The inherent equivariance of distance maps enables the model to generalize more effectively to unseen structures, regardless of their orientation or position, compared to the purely geometric¹³ or pixel-based representations^7,9 used in earlier studies. This approach also minimizes the need for extensive preprocessing to remove bias or additional data augmentation steps to ensure robustness against transformations. Furthermore, their low dimensionality is desirable in AI applications⁵.

Recently, 3D ResNets have been used to identify conformational changes associated with spatiotemporal patterns of ligands bound to the $\beta$2-adrenergic receptor ($\beta$2AR), a specific type of GPCR⁸. For this purpose, the model used time series of protein distance maps (PDMs) derived from the trajectories of $\beta$2AR-ligand complexes, as input. The study revealed that ligands of the same type tend to share the same dynamics, characterized by conformational changes throughout the GPCR system induced by residues at the binding site⁸.

Despite the relevant contributions of these works, it is possible to observe that the analyzed conformations involve significant rearrangements within the protein structure, often referred to as large-scale conformational changes^8,9. However, small localized changes, such as loop movements within protein motifs, are crucial for protein stability, functionality, and ligand binding and merit significant attention.

In this study, we focus on the Spike (S) protein of SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) as a case study and model the trajectories from MD simulations of the SARS-CoV-2 S protein interacting with human receptor, specifically targeting the receptor binding domain (RBD) and represent them as interresidue distance maps. Subsequently, we developed a deep learning-based system to predict the functional impact of point mutations that could potentially increase the protein’s affinity for the human receptor while reducing its immunogenicity.

Our hypothesis is that specific point mutations in the RBD region should promote non-trivial conformational changes that are indicative of certain variants or even entire lineages of the virus. To support this hypothesis, a literature review was performed. The results highlight the importance of these mutations in the conformation of the virus and its pathogenicity^14,15,16,17. In vitro data show that mutations in the SARS-CoV-2 and SARS-CoV-1 RBDs are capable of evading neutralizing antibodies^18,19. In addition to structural changes, we discuss how the proposed conformational changes can influence transmissibility and immune response, highlighting the importance of these mechanisms for effective control of viral spread.

Some SARS-CoV-2 strains can increase infectivity and transmissibility, posing a challenge for antiviral drug and vaccine projects against COVID-19 in a short period of time. Because it is a hypermutable virus, its transmissibility tends to favor the emergence of variants with critical mutations and shared zoonotic biological characteristics. Therefore, identifying the patterns that characterize the conformations exhibited by these variants could help to accurately predict potential strains linked to the virus. These findings could be valuable in the development of new drugs and vaccines against COVID-19 or even for future epidemics.

Previous research has successfully employed graph neural networks¹³ or self-attention mechanisms²⁰ to predict changes in binding affinity between RBD and ACE2, without requiring MD simulations beyond training data. However, these architectures require structural information from both viral and human proteins, making them dependent on the complex. Furthermore, the effects of some mutations (e.g., S:V367F, S:S477N) may be time-dependent and emerge only after prolonged molecular interactions, requiring simulations of the order of nanoseconds or longer to fully capture their functional impact^21,22,23,24.

The inherent flexibility of the SARS-CoV-2 spike protein (S) is pivotal in modulating its binding affinity to the ACE2 receptor, directly influencing viral virulence²². Several studies underscore the importance of this dynamic adaptability, particularly within the receptor-binding motif (RBM), for optimizing interaction with the human receptor²⁵. Molecular dynamics (MD) simulations have revealed a heightened flexibility in the receptor binding domain (RBD) of SARS-CoV-2 compared to its counterpart in SARS-CoV, notably in key regions of ACE2 recognition comprising residues Gln474—Gly485, Cys488—Phe490 and Ser494— Tyr505^24,26. These simulations provide a nuanced understanding of the dynamic ACE2-RBD interaction network, revealing subtle conformational changes (e.g, flexible loops) that remain elusive to static structural analyzes^22,27,28.

In this sense, the integration of MD simulations offers a comprehensive understanding of the dynamic interaction landscape between the SARS-CoV-2 spike protein’s receptor-binding domain (RBD) and the ACE2 receptor²⁹. This approach reveals a broader spectrum of conformational patterns across various mutant types, making it crucial for unraveling the time-dependent structural evolution of RBD. Such insights are crucial to understanding the mechanisms of viral adaptation and immune evasion in the context of emerging variants³⁰.

Furthermore, we verified whether it is possible to anticipate the effects of mutations by analyzing the most probable conformation along the MD trajectory. This approach avoids the need for a large number of simulation frames as input to the AI model, even though the model has been trained with MD trajectories.

Methods

System preparation and molecular dynamics protocol

We obtained the crystal structure of the SARS-CoV-2 spike receptor binding domain (RBD) bound to angiotensin-converting enzyme 2 (ACE2) from PDB: 6M0J and selected amino acid residues 333 to 527, corresponding to the RBD, using PyMOL (PyMOL Molecular Graphics System, Version 2.4 Schrödinger LLC)³¹. In total, we performed 37 systems based on the RBD FASTA sequence and used a homology model to determine the 3D structures of proteins in SWISS-MODEL³².

We estimated protonation states for all residues in both systems from a standard physiological blood pH of 7.4, a salinity of 0.15 M, and internal and external relative permittivities of 10 and 80, respectively, using the H${++}$ web server³³. We parameterized the systems using the tleap tool from AmberTools21³⁴ and solvated them in a cubic box with a minimum edge distance of 12Å. We describe the water, protein, and glycidic fractions of the systems with the AMBER force fields TIP3P, ff14SB, and GLYCAM-06, respectively.

We also added ions $Na^{+}$ and $Cl^{-}$ to reach a physiological salinity of 0.15 M. After solvation, minimization, equilibration, and productive MD simulation steps were performed using NAMD 2.13 CUDA verbs³⁵. with the AMBER force field. We performed all simulations with a time step of 1 fs using an NPT ensemble, with the temperature and pressure kept constant at 310 K and 1 atm, respectively, by means of a Langevin thermostat and a Langevin piston. We used periodic boundary conditions and a 10 Å electrostatic interaction cut-off point for non-bonded interactions and calculated the long-range electrostatic interactions using Particle Mesh Ewald.

A relaxation protocol was applied under NPT conditions to all systems before performing the productive MD simulation, and a multistep equilibration protocol was used. This protocol consisted of (1) 500 minimization steps with harmonic constraints on protein atoms; (2) 500 steps of unconstrained minimization; (3) 300 ps equilibrium with harmonic constraints on protein atoms; (4) 300 ps equilibrium with harmonic constraints on the backbone atoms; (5) 300 ps of unrestricted balance; and (6) 1 ns preproductive MD simulation with reset speeds and no restrictions. Finally, we performed three independent 100 ns MD simulations for each S RBD protein system with the corresponding mutations on final relaxation coordinates on the SDumont supercomputer at the Brazilian National Laboratory for Scientific Computing (LNCC).

Trajectory analysis

We performed a root mean square fluctuation analysis per residue (RMSF) against the average frame for the $C\alpha$ carbons, aligning the entire RBD with the least mobile residues (333–437, 454–456, 492–494 and 509–526). We estimated RMSFs for each of the three replicates individually, as well as for their combined data. Additionally, to evaluate the convergence between the different simulations of each system, two-dimensional (2D) root mean square deviation (2D-RMSD) plots relative to the protein backbone were generated using the cpptraj plug-in³⁶ of AmberTools 21³⁴.

Database

Our database consists of 38 (thirty-eight) distinct systems, each derived from MD simulations, with most systems containing between 1 (one) and 3 (three) point mutations per mutant (a comprehensive table listing all systems utilized in this study, including their specific RBD mutations, is presented in Supplementary Table 1). Among these, 17 (seventeen) were selected for machine learning-based analyzes, which involved approximately 85K frames; These included the original 2019-nCoV strain, also known as the “wild type” (WT)³⁷; neutral mutants; and variants currently monitored by the WHO (VBM)³⁸. These variants, previously classified as variants of concern (VOCs) or variants of interest (VOIs), are known for their increased infectivity and transmissibility; We assign the label “$++$” to the corresponding systems (Table 1).

Table 1 Database composition.

Full size table

Furthermore, for systems that represent neutral mutants - those that neither increase receptor affinity for ACE2 nor enhance resistance to antibodies - we label them as “${-}{-}$”. The list of single-type mutations, presented in the S:mutation (label) format^55,56, is as follows: S:V445L (${-}{-}$), S:K417Y (${-}{-}$), S:N439R (${-}{-}$), S:Q498I (${-}{-}$), S:S494K (${-}{-}$), S:Y489F (${-}{-}$), and S:Y505F (${-}{-}$)^57,58. We labeled the systems according to the published literature. The wild-type strain was labeled “${-}{-}$”.

Complementarily, we conducted a comprehensive analysis of all molecular dynamics simulations to characterize the systems based on their mutant types, with particular attention to mobility changes in key regions such as the RBD, RBM, and loop regions. In this context, systems with mutant types that exhibit decreased affinity for ACE2 and increased antibody resistance, labeled as “$-+$”, as well as those in which the mutation demonstrates increased affinity for ACE2 and decreased antibody resistance, labeled as “$+-$”, were also analyzed. Table 2 provides a summary of the mutant types present in the database, along with their associated effects on viral binding affinity and immune evasion.

Table 2 Impact of mutations on viral binding affinity and immune evasion.

Full size table

The systems correspond to triplicates of 25,000-frame trajectories (dcd files) obtained through 100 ns simulations, totaling 75,000 frames (300 ns). We sampled 5000 frames using a skip of 15 frames. Next, we separated each of these 5000 frames into .pdb files and generated 5 (five) clusters from the hieragglo average linkage algorithm⁵⁹, that is, using the average distance between members of two clusters calculated in cpptraj³⁶ and a cutoff of 2 Å for each system. Thus, although there is a reduction in the sample space, the most likely conformations tend to occur with greater probability.

Feature transformation

Our approach focuses on a restricted data representation to structural information, specifically pairwise distances within the receptor binding domain (RBD) of the SARS-CoV-2 S protein (Fig. 1a). From the MD trajectories, we generate 2D distance maps that serve as the main geometric descriptors of the structure throughout the simulation. Distance maps are graphical representations that detail the spatial arrangements between all pairs of amino acid residues within a molecule, provided as a 2D matrix of the real-valued distances between residues (distance matrices)^60,61,62. Consider a set of $N$ points in a three-dimensional space, where each point is defined by its coordinates $(x_i, y_i, z_i)$ and is represented by the position vector $\vec {r}_i$. The Euclidean distance $\delta _{ij}$ between the $i$-th and $j$-th points is defined by the following equation:

$$\begin{aligned} \delta _{ij} = {\left\{ \begin{array}{ll} 0, & \text {if } i = j \\ \Vert \vec {r}_j - \vec {r}_i\Vert , & \text {otherwise}, \end{array}\right. } \end{aligned}$$

(1)

In the context of structural biology, $\delta _{ij}$ denotes the Euclidean distance between the $i$-th and $j$-th amino acid residue. Therefore, the pairwise distance matrix $[\delta _{ij}]_{N \times N}$ is constructed, where each element $d_{ij}$ represents the Euclidean interresidue distances, $d_{ij}$. The matrix is symmetric, with $\delta _{ij} = \delta _{ji}$ for all $i, j$, and its diagonal elements are zero, indicating that the distance from any residue to itself is zero^60,63. The complete distance matrix can be represented as

$$\begin{aligned} [\delta _{ij}]_{N \times N} = \begin{bmatrix} 0 & \delta _{12} & \delta _{13} & \dots & \delta _{1N} \\ \delta _{21} & 0 & \delta _{23} & \dots & \delta _{2N} \\ \delta _{31} & \delta _{32} & 0 & \dots & \delta _{3N} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \delta _{N1} & \delta _{N2} & \delta _{N3} & \dots & 0 \\ \end{bmatrix} \end{aligned}$$

(2)

We selected the coordinates (x, y, z) of 194 $\alpha$ carbon atoms^5,60 that constitute the RBD, resulting in distance matrices with dimensions of $194\times 194$ (Fig. 1b,c). The asymptotic computational complexity of this method can be expressed as $O(n^2)$, where n indicates the total number of residues in the protein. We converted these matrices into a 2D image (.png format) using the Matplotlib library in Python. To ensure compatibility between map dimensions and model input, we resized the dimensions of the maps to $224\times 224$ pixels⁶⁴ by adding padding zeros. We implemented the algorithms for generating the DMs in Python (version 3.9.13).

CNN-based model development

Since the problem of identifying subtle conformational changes in MD trajectories falls under the AI-complete category, and considering that DMs have a grid-like topology², we use the DL technique known as convolutional neural networks (CNN)⁶⁵. We choose the VGG architecture, known as Visual Geometry Group⁶⁴ in our methodology. Specifically, we implemented the VGGNet-B variant, characterized by a configuration of 13 trainable layers that include weights.

We represent DMs as 4D tensors, structured based on image dimensions, number of image channels (RGB), and batch size (BS)⁶⁶. In this context, we employ a batch size of 64 samples, as previous studies have shown that batch sizes ranging from 32 to 256 yield improved results⁶⁷, as also observed in related applications⁹. Moreover, we preprocess DM by normalizing the pixel values to a range of $[0-1]$, which is more suitable for efficient processing by neural networks².

Our implementation follows the VGG configuration⁶⁴ (Fig. 1d), except for a slight variation, in which we use a single neuron in the output with a sigmoid activation function, thus transforming the CNN into a binary classifier, resulting in 129 M parameters. After each conv. and pooling layer⁶⁴, we incorporated batch normalization (BN) and rectified linear unit (ReLU) activation. To avoid the problem of internal covariate shift and regularize the model, we added BN before the activation layer⁶⁸.

Additionally, we use dropout after FC layers to mitigate potential overfitting and improve the generalizability of the model². We choose a dropout rate of 0.5, as values within the range of $\left[ 0.3-0.6\right]$ tend to reduce the error rate during training⁶⁹. Dropout is particularly beneficial for datasets exceeding 5K samples, such as the data set of the present problem⁶⁹. We developed the model using well-established ML and neural network libraries such as TensorFlow (version 2.15.0)⁷⁰ and Keras⁶⁶.

Model training

We partitioned the database into four main subsets of trajectories: WT, VOC, VOI, and neutral. For training the model, we used DMs obtained from MD trajectories related to wild-type (WT) and Beta and Delta variants of concern (VOCs). We selected these variants because they share most of the mutations observed in VOI and are commonly associated with an increased affinity for ACE2 and resistance to natural antibodies or vaccines⁷¹.

During training, we use cross-validation (CV), a heuristic to minimize the model’s generalization error and optimize hyperparameters⁷². We define a percentage $\gamma < 0.5$ of the training data as a reference to the validation⁷², and used k-fold CV.

We randomly partition the training set into k mutually exclusive subsets of the same size (n/k), where n is the total number of instances. Thus, validation takes place for one of the subsets, while the remaining $k-1$ serve to train the model. This process occurs k successively, and we estimate the average cross-validation error rate (binary cross-entropy loss) as a reference to optimize the model hyperparameters⁷². We set $k = 5$ since $\gamma \ge 0.1$ is commonly recommended and proves effective in various applications⁷². We select the optimal parameter configuration based on the error rate estimates⁷³. After adjusting the parameters, we utilize the entire training data set to make predictions on the independent test set.

We conducted the training in Google’s virtual environment, Colab, which provided access to a Jupyter Notebook. The computational resources included an NVIDIA A100 GPU with 40 GB of VRAM and an additional 89.6 GB of RAM.

Model evaluation

The test set comprises 14 (fourteen) systems, each distributed equally across classes, totaling 70,014 DMs. We evaluated mutations associated with VOIs such as Iota, Kappa, Lambda, Mu, and Theta, as well as VOCs such as Gamma and Omicron. Notably, key mutations including S:L452R, S:T478K, S:E484K, and S:N501Y were commonly observed in most of these systems, leading us to assign them the label “$++$”^41,48,71,74. Furthermore, we analyzed point mutations S:V445L, S:K417Y, S:N439R, S:Q498I, S:S494K, S:Y489F, and S:Y505F, which are considered neutral⁵⁷, and assigned them the label “${-}{-}$”, thus completing the evaluation of the test set.

During validation, our optimized model achieved an average error rate of 1% with the following specific parameters: learning rate of 1E-3, dropout of 0.5, BS of 64 and 100 training epochs (more details are provided in the Supplementary Material). We used the receiver operating characteristic (ROC) curve (see Fig. 1a in the Supplementary Material) to determine the optimal operating threshold, 0.5, which resulted in a better balance between the recall and false positive rate (1-specificity)⁷⁵.

In the testing, we derived performance metrics from the confusion matrix, including precision (prec), true positive rate (TPR)/recall (rec), false positive rate (FPR), $\hbox {F}_\beta$ score and Matthews’ correlation coefficient (MCC)⁷⁵. In relation to the $\hbox {F}_\beta$ score, we employ the $\hbox {F}_1$ score, which balances precision and recall by setting $\beta =1$. Furthermore, we calculate the area under the ROC curve (AUC). We calculated complementary metrics such as precision and recall for the test set, since relying solely on the error rate can be limiting as a measure of discriminability. Furthermore, we sought to determine the success rate of the model for each class of problem⁷⁵.

Results and discussions

2D-RMSD analysis of the impact of mutations in the RBD

We developed an analysis that focused on structural variations between the original S protein and its strains, which exhibit different levels of affinity for interaction with the cellular receptor ACE2 and antibodies. In this sense, we generated 2D-RMSD plots to identify regions of high mobility, highlighting differences in conformational sampling between different mutant types in the RBD. Since RBD is the main target of vaccine-induced neutralizing antibodies (nAb), our objective was to understand how these mutations impact interactions between ACE2 and other antibodies⁷⁶. It is relevant to determine whether these mutations compromise the effectiveness of mRNA vaccines and natural immunity, given the ongoing emergence of new variants^40,76. The following table shows the labels adopted for each mutation phenotype.

Our MD analysis of all systems in the database, categorized by their mutant types, revealed that the most significant residue fluctuations and standard deviations were associated with the region spanning residues 360 to 374, as well as along the RBM. especially the $\beta _{1}^{\prime }/\beta _{2}^{\prime }-C$ loop (residues 470 to 490), as illustrated in the RMSF plots (Fig. 2a,c).

In the RBM region, we observed a higher standard deviation for mutations characterized by a lower affinity for interaction with the ACE2 cell receptor and increased resistance to antibodies, labeled “$-+$”. Notably, fluctuations were pronounced in the RBM loops at positions 484 and 501, as indicated in Fig. 2a. These fluctuations were even more marked in the C-terminal extension region of the loop $\beta _{1}^{\prime }/\beta _{2}^{\prime }$, as shown in Fig. 2c. The 2D-RMSD graph presents an all-against-all comparison of molecular dynamics poses, with warmer colors denoting increased mobility. In particular, this loop includes position 484, where the critical mutation S:E484K, common to most VOCs identified by the WHO, occurs. We observed a greater fluctuation in the mobility of the $\beta _{1}^{\prime }/\beta _{2}^{\prime }-C$ loop, particularly associated with the “$-+$” mutations, as shown in Fig. 2b,d. Analyzing these movements relative to the RBD and their internal mobility within the RBM provides significant information.

The magnitude of fluctuation across simulations may be related to mutations that exhibit greater antibody resistance, leading to greater diversity of RBM conformations. This increased structural variability may be a strategy adopted by mutations to ’evade’ the host’s immune response. The same pattern is observed for the key mutation S:N501Y, which exhibits high floating points. In particular, the lambda variant (lineage C.37, mutations S: L452Q and S: F490S) and single-form mutations S: F490S, S: F490L and S: G446V consistently showed RMSD values of 5Å throughout the simulation. These findings suggested that these mutations may help the virus ’escape’ from the immune response and reduce the effectiveness of vaccines, potentially promoting the gain of RBM mobility to evade immune responses^77,78.

In mutations that increase affinity for the ACE2 interaction and confer antibody resistance, our analyzes revealed a reduced occurrence of conformations compared to the WT strain and neutral mutations denoted “${-}{-}$”. This pattern was clearly visible on the RMSF plots (Fig. 2a,c), particularly within the $\beta _{1}^{\prime }/\beta _{2}^{\prime }-C$ loop (Fig. 2c). Upon analyzing the 2D-RMSD plot, which compares the “$++$” and “${-}{-}$” systems, it is evident that the “${-}{-}$” mutants exhibit a slight increase in mobility within the RBM region, compared to the “$++$” mutations. Notably, the blue square representing the “${-}{-}$” mutations, there is a discernible increase in warmer colors, indicating enhanced mobility close to 5 Å in the RBM. In contrast, the $\beta _{1}^{\prime }/\beta _{2}^{\prime }-C$ loop shows increased mobility for the “${-}{-}$” mutations, compared to the “$++$” mutations, demonstrated by the warmer colors in the 2D-RMSD graph. For the orange square representing the “$++$” mutations, an increase in cooler colors is observed, suggesting reduced mobility near 0 Å. These observations may provide insight into why mutations with increased ACE2-binding affinity in S RBD could influence viral infectivity and immunogenicity. These findings align with those presented in Figs. 2c and 2 of the Supplementary Material.

Molecular dynamics trajectory classification

As observed in related works^8,9, we evaluated the generalizability of our predictor to unseen conformations in MD trajectories from an independent test set. These trajectories refer to strains of identical lineage to those included in the training set, in this case belonging to the B.1 lineage⁵⁵. We represent these trajectories using DMs, allowing us to evaluate the predictor’s performance under novel conditions.

To evaluate the potential of a new MD simulation product to match a more infectious and immune-evading variant, it is essential to determine the proportion of correctly predicted instances belonging to the positive class. In this scenario, we use the precision of the positive class, recall, and $\hbox {F}_1$ score to identify trajectories that exhibit distance patterns similar to those observed in the variants present in the training set⁷⁵. These patterns may suggest subtle conformational changes in RBD associated with a gain in affinity for ACE2^15,79,80,81, as well as transmissibility^30,82.

Considering the independent test set, we estimate the recall and FPR for the systems assigned the labels $++$ and ${-}{-}$, respectively. Additionally, we calculated the mean ($\mu$) and standard deviation ($\sigma$) of the metrics derived by the model for both classes. Figure 3 summarizes the results, demonstrating the effectiveness of the model in predicting instances with the label $++$. Given $\mu$ and $\sigma$, the recall is considerably higher ($0.800 \pm 0.132$) than the FPR ($0.314 \pm 0.148$), highlighting the need to also reduce the FPR. The model achieved a precision of 0.718, resulting in an $\hbox {F}_1$ score of 0.757, indicating a balanced performance between precision and recall. From the confusion matrix, the MCC was calculated to be 0.488. Furthermore, the estimated AUC was 0.800. These results provide robust evidence of the predictive capacity of the model, particularly for more infectious and less immunogenic mutant types, underscoring its utility to predict the impact of point mutations on the function of the S protein⁷⁵.

Comparison with state-of-the-art methods

Mutation-induced binding free energy (BFE), represented as $\Delta \Delta G_{\text {Bind}}$ or simply $\Delta \Delta G$, is defined as the difference between the BFE of the mutant type and that of the wild type (WT). Specifically, $\Delta \Delta G_{\text {Bind}} = \Delta G_{\text {Bind}}^{\text {WT}} - \Delta G_{\text {Bind}}^{\text {MT}}$, where $\Delta G_{\text {Bind}}^{\text {WT}}$ is the BFE of the WT and $\Delta G_{\text {Bind}}^{\text {MT}}$ is the BFE of the mutant. This definition reflects the energetic impact of mutation on binding affinity^58,83,84,85. Recent studies have established that the transmissibility/infection of viral variants in host cells is proportional to changes in the BFE between the S RBD and ACE2⁸⁵. A positive change in BFE ($\Delta \Delta G_{Bind} > 0$) reveals the ability of the mutation to increase binding between S RBD and ACE2, while a negative change ($\Delta \Delta G_{Bind} < 0$) or a value close to zero ($\Delta G^{MT}_{Bind} \approx \Delta G^{WT}_{Bind}$) suggests reduced or no impact on protein function^57,83,85,86.

In this context, our objective was to determine the biological significance of our model’s predictions by comparing them with those produced by the TopNetTree model^84,85, a topology-based network tree methodology specifically developed to forecast changes in the binding free energy of protein-protein interactions (PPI) resulting from mutations. In this study, the authors consolidated more than 1.4 million SARS-CoV-2 genomic sequences from patients and identified 683 point mutations specifically within the receptor binding domain (RBD). They also used a comprehensive library of 130 SARS-CoV-2 antibody structures. Using sophisticated techniques such as viral genotyping, algebraic topology algorithms, and deep learning, they evaluated that RBD comutations influence both binding free energy and antibodies interactions⁸⁵. Their BFE predictions demonstrated a Pearson correlation coefficient of 0.78 when juxtaposed with experimental data^83,86,87. This high level of precision underscores the potential of these analyses to reliably predict the impacts of viral mutations on vaccine efficacy and antibody neutralization, thereby informing future therapeutic strategies.

Table 3 Correlation for model predictions and BFE changes.

Full size table

We developed our analysis based on changes in BFE (kcal mol⁻¹) in the RBD-ACE2 complex, sourced from reputable references^57,85,86,88. Additionally, for point mutations, we also integrated BFE change data from the online tool “SARS-CoV-2 Mutation Analyzer” (https://weilab.math.msu.edu/MutationAnalyzer/)⁸⁶. Both studies used the TopNetTree model⁸⁴. We calculate the average probability, ${\overline{p}}$, obtained from the output layer’s sigmoid function for the frames of the trajectories, which determines the likelihood of the instances belonging to the positive class. To ensure uniformity and scale the values between 0 and 1, similar to the probability values, we normalize the changes values in BFE using the Min-Max technique, denoted $\Delta \Delta G_{norm}$ (Table 3). Using these calculated values, we developed a scatter plot in which each data point in the scatter plot represents a database-specific mutation or variant, the coordinates representing a tuple of values ${\overline{p}}$ and $\Delta \Delta G_{norm}$ (Fig. 4).

Our analysis identified two distinct clusters: the first contains data points associated with neutral mutations, where changes in BFE exhibit close alignment with changes in WT BFE ($\Delta G^{MT}_{Bind} \approx \Delta G^{WT}_{Bind}$)⁵⁷. The second group included variants correlated with increased viral infectivity and decreased immunogenicity and exhibited significant increases in BFE changes⁸⁵. To quantify the relationship between these groups, we calculated the product-momentum correlation coefficient (also known as Pearson’s correlation coefficient, $\rho$) between the average probability, ${\overline{p}}$ and the normalized change in BFE, $\Delta \Delta G_{norm}$. This produced a strong positive correlation, with $\rho$ at 0.776 and a coefficient of determination ($R^2$) of 0.602 (Fig. 4).

Prediction from centroids

As mentioned previously, the centroids derived from hierarchical clustering offer the most representative conformations of the structure. Thus, we aimed to estimate the impact of the most infective and least immunogenic types of mutants on the mobility of the RBD on the basis of the most probable conformation represented by the centroid. Considering the centroid during the model prediction, rather than analyzing all frames of the complete MD trajectory, we can reduce the time required for the MD simulation.

Table 4 presents the distribution of frames per cluster, $c_i$, revealing that the first cluster contains the highest number of frames in the MD trajectories. In most systems, the number of frames corresponds to a percentage greater than 60.0%. To determine whether predicting the centroid for the first cluster allows us to predict the class, we calculate the probability (p) for each of the n centroids. We estimate the average of these probabilities, weighting them by the ratio between the frame frequency of the ith cluster and the number of frames in the trajectory (w), mathematically described as

$$\begin{aligned} {\overline{p}}_w = \frac{\sum \nolimits _{i=1}^n p_i \cdot w_i}{\sum \nolimits _{i=1}^n w_i}. \end{aligned}$$

(3)

The results indicate that the prediction of the centroid concerning cluster 0 already corresponds to the mutant type class, a finding supported by the weighted average, ${\overline{p}}_w$. This behavior agrees with the predictions made for the MD trajectories, as the centroid represents the most representative conformation of the cluster.

Table 4 Centroid prediction.

Full size table

Feature visualization

We use a technique known as feature visualization to make the learned features explicit by maximizing activation⁸⁹. The aim is to find the input that maximally activates a specific unit. In CNNs, these units correspond to feature maps. From these maps, we can extract the learned patterns from the MD trajectories, allowing for a better understanding of the conformations.

Initially, we considered several approaches developed in related studies, which involved selecting a percentage of frames from the DM trajectory to identify key residues from deep CNNs to correctly predict test frames^7,8. In our context, clusters provide a simplified analysis of the conformations sampled for mutant types, and each cluster presents a corresponding centroid that represents the most representative conformation.

Due to their representativeness, we focused on the centroids of cluster 0 of the MD trajectories used in model training, that is, the WT and Beta variants, in addition to the trajectory referring to the Gamma variant. In the WT trajectory, 60.4% of the frames are in cluster 0, with an average interpoint distance of 1.238 Å ($\pm \, 0.15$ Å). The Beta variant has 87.3% of frames in cluster 0, with an average distance of 1.223 Å ($\pm \, 0.193$ Å). In turn, the Gamma variant has 91.8% of frames in cluster 0, with an average distance of 1.405 Å ($\pm \, 0.319$ Å). We used a 2 Å cutoff distance.

We generated the corresponding DM for the centroids and used it as input to the model. Initially, we extracted an instance of a feature map from the initial conv block. layers (Conv1_2), for the centroid of cluster 0 of the Gamma variant. We observed that the highlighted region in the feature maps corresponds to the C-terminal extension of the loop $\beta _{1}^{\prime }/\beta _{2}^{\prime }-C$ (comprising residues 470–490 of the RBD) (Fig. 1e). In sequence, we projected the high-intensity pixels of the map onto the 3D structure of the RBD (Fig. 1f). We observe that this segment presents greater flexibility than the structure (see Fig. 2 in the Supplementary Material), which is consistent with the results of 2D-RMSD analyses and corroborated by previous studies.

Conclusions and perspectives

The application of MD simulations to the analysis of conformational states and functional mechanisms of macromolecules is a well-established approach. However, due to their high dimensionality and large scale, analyses involving MD data typically pose substantial computational demand. In this study, we represent the trajectories of MD simulations of the SARS-CoV-2 spike protein-ACE2 as 2D distance maps, and combine cluster analysis and convolutional neural networks for identifying discriminative and nontrivial conformational changes in these trajectories.

Our model was successful in predicting the functional impact related to mutant types that increase the affinity of the S protein for ACE2 and reduce its immunogenicity, considering both MD trajectories ($prec = 0.718$; $rec = 0.800$; $\hbox {F}_1 = 0.757$; MCC = 0.488; AUC = 0.800) and its centroids. We also obtained a strong positive Pearson correlation between the average probability of sigmoid for MD trajectories and the $\Delta \Delta G_{Bind}$ corresponding to neutral mutants and variants, represented by $\rho$ equal to 0.776. Furthermore, we obtained a coefficient of determination ($R^2$) of 0.602.

We also observed a strong positive Pearson correlation ($\rho = 0.776$) between the average probability of the sigmoid function for MD trajectories and the $\Delta \Delta G_{\text {Bind}}$ corresponding to neutral mutants and variants. Furthermore, we achieved a coefficient of determination ($R^2 = 0.602$). Our 2D-RMSD analysis aligned with the visualization of features for more infectious and immune-evading mutant types and revealed fluctuating regions within the RBM, especially in the $\beta _{1}^{\prime }/\beta _{2}^{\prime }-C$ loop. This region presented a significant standard deviation for mutations that allow SARS-CoV-2 to evade the immune response, with RMSD values of 5Å.

The proposed method represents an efficient alternative to identify potential SARS-CoV-2 strains, which may be potentially linked to more infectious and transmissible mutations. Our approach utilizes clustering and deep learning techniques to extract valuable insights from ensembles of MD trajectories, enabling recognition of a wide range of conformational patterns characteristic of mutant types. This provides a significant advantage in the early identification of emerging variants. Although MD simulations continue to supply the detailed structural data required by the model, our deep learning framework significantly minimizes the need for time-consuming post-simulation analyses. Furthermore, when we consider that the most frequently observed conformation in an MD trajectory often resembles the average conformation in X-ray crystallography or even in NMR experiments, this approach could identify variants of a single structure obtained from these methods without the need for a simulation.

Furthermore, our approach provides a novel perspective by demonstrating that the increase in infectivity and immune evasion associated with spike protein mutations is not solely attributable to direct protein-receptor contacts but is also significantly influenced by intrinsic changes in protein mobility. By considering the dynamic flexibility of the protein, our study offers a relevant understanding of the underlying mechanisms driving viral evolution.

The ability to accurately predict the increased binding affinity of the SARS-CoV-2 spike protein’s RBD to the human ACE2 receptor without any structural data from the ACE2 protein being provided to the learning algorithm. This is achieved by focusing solely on changes in spike protein mobility, representing a methodological advancement not explored in previous studies. This capability highlights the model’s effectiveness in predicting functional impacts based solely on dynamic protein behavior, offering a novel perspective on protein interaction analysis.

The present work tends to contribute substantially to the field of computational biology and virology, particularly to accelerate the design and optimization of new therapeutic agents and vaccines, offering a proactive stance against the constantly evolving threat of COVID-19 or future pandemics. Our findings indicate that the deep learning model can serve as a viable alternative to expensive energy calculations, providing a level of accuracy comparable to that of MD simulation analyses.

Data availability

The results relating to the analyses conducted as well as the source code are available in the public repository: https://github.com/LBS-UFMG/MD-ML-Project. The data used during this research are available from authors G.B.R (gbr@quimica.ufpb.br) and R.C.M. (raquelcm@dcc.ufmg.br) on request.

Abbreviations

2019-nCoV:: Novel coronavirus
$\beta$2AR:: $\beta$2-Adrenergic receptor
ACE2:: Angiotensin-converting enzyme 2
AI:: Artificial intelligence
BFE:: Binding free energy
BN:: Batch normalization
BS:: Batch size
CNN:: Convolutional neural networks
COVID-19:: Coronavirus disease 2019
CV:: Cross-validation
DM:: Distance maps
DL:: Deep learning
FPR:: False positive rate
GPCR:: G protein-coupled receptors
LNCC:: Brazilian National Laboratory for Scientific Computing
MD:: Molecular dynamics
ML:: Machine learning
mRNA:: Messenger ribonucleic acid
nAbs:: neutralizing antibodies
NMR-spectroscopy:: Nuclear magnetic resonance spectroscopy
NPT:: Constant-temperature, constant-pressure
PPI:: Protein–protein interactions
RBD:: Receptor-binding domain
RBM:: Receptor-binding motif
3D-ResNet:: 3D residual networks
ReLU:: Rectified linear unit
RMSD:: Root-mean-square deviation
RMSF:: Root-mean-square-fluctuation
ROC:: Receiver operating characteristic curve
S protein:: Spyke protein
SARS-CoV-2:: Severe Acute Respiratory Syndrome Coronavirus 2
VBM:: Variants being monitored
VGGNet:: Visual Geometry Group Network
VOC:: Variants of concern
VOI:: Variants of interest
TopNetTree:: Topology-based network tree
TPR:: True positive rate
WHO:: World Health Organization
WT:: Wild-type

References

Min, S., Lee, B. & Yoon, S. Deep learning in bioinformatics. Brief. Bioinform.18, 851–869. https://doi.org/10.1093/bib/bbw068 (2016).
Article Google Scholar
Goodfellow, I. J., Bengio, Y. & Courville, A. Deep Learning (MIT Press, 2016).
Google Scholar
Zhang, A., Lipton, Z. C., Li, M. & Smola, A. J. Dive into Deep Learning (Cambridge University Press, 2023).
Google Scholar
Gao, W. et al. Deep learning in protein structural modeling and design. Patterns1, 100142. https://doi.org/10.1016/j.patter.2020.100142 (2020).
Article CAS PubMed PubMed Central Google Scholar
Defresne, M., Barbe, S. & Schiex, T. Protein design with deep learning. Int. J. Mol. Sci.22, 11741. https://doi.org/10.3390/ijms222111741 (2021).
Article CAS PubMed PubMed Central Google Scholar
Senior, A. W. et al. Improved protein structure prediction using potentials from deep learning. Nature577, 706–710. https://doi.org/10.1038/s41586-019-1923-7 (2020).
Article ADS CAS PubMed Google Scholar
Plante, A. et al. A machine learning approach for the discovery of ligand-specific functional mechanisms of GPCRs. Molecules24, 2097. https://doi.org/10.3390/molecules24112097 (2019).
Article CAS PubMed PubMed Central Google Scholar
Han, M. et al. Recognition of the ligand-induced spatiotemporal residue pair pattern of $\beta$2-adrenergic receptors using 3-D residual networks trained by the time series of protein distance maps. Comput. Struct. Biotechnol. J.20, 6360–6374. https://doi.org/10.1016/j.csbj.2022.10.036 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, C. et al. An interpretable convolutional neural network framework for analyzing molecular dynamics trajectories: A case study on functional states for G-protein-coupled receptors. J. Chem. Inf. Model.62, 1399–1410. https://doi.org/10.1021/acs.jcim.2c00085 (2022).
Article CAS PubMed Google Scholar
Karplus, M. & McCammon, J. A. Molecular dynamics simulations of biomolecules. Nat. Struct. Biol.9, 646–652. https://doi.org/10.1038/nsb0902-646 (2002).
Article CAS PubMed Google Scholar
Filipe, H. A. L. & Loura, L. M. S. Molecular dynamics simulations: Advances and applications. Molecules27, 2105. https://doi.org/10.3390/molecules27072105 (2022).
Article PubMed PubMed Central Google Scholar
Hollingsworth, S. A. & Dror, R. O. Molecular dynamics simulation for all. Neuron99, 1129–1143. https://doi.org/10.1016/j.neuron.2018.08.011 (2018).
Article CAS PubMed PubMed Central Google Scholar
Liu, X. et al. Deep geometric representations for modeling effects of mutations on protein–protein binding affinity. PLoS Comput. Biol.17, e1009284. https://doi.org/10.1371/journal.pcbi.1009284 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Mannar, D. et al. Structural analysis of receptor binding domain mutations in SARS-CoV-2 variants of concern that modulate ACE2 and antibody binding. Cell Rep.37, 110156. https://doi.org/10.1016/j.celrep.2021.110156 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. Mutations strengthened SARS-CoV-2 infectivity. J. Mol. Biol.432, 5212–5226. https://doi.org/10.1016/j.jmb.2020.07.009 (2020).
Article CAS PubMed PubMed Central Google Scholar
Yang, W.-T. et al. SARS-CoV-2 e484k mutation narrative review: Epidemiology, immune escape, clinical implications, and future considerations. Infecti. Drug Resist.15, 373–385. https://doi.org/10.2147/idr.s344099 (2022).
Article Google Scholar
Sergeeva, A. P. et al. Free energy perturbation calculations of mutation effects on SARS-CoV-2 RBD:ACE2 binding affinity. J. Mol. Biol.435, 168187. https://doi.org/10.1016/j.jmb.2023.168187 (2023).
Article CAS PubMed PubMed Central Google Scholar
Baum, A. et al. Antibody cocktail to SARS-CoV-2 spike protein prevents rapid mutational escape seen with individual antibodies. Science369, 1014–1018. https://doi.org/10.1126/science.abd0831 (2020).
Article ADS CAS PubMed Google Scholar
Rockx, B. et al. Escape from human monoclonal antibody neutralization affects in vitro and in vivo fitness of severe acute respiratory syndrome coronavirus. J. Infect. Dis.201, 946–955. https://doi.org/10.1086/651022 (2010).
Article CAS PubMed Google Scholar
Wang, G. et al. Deep-learning-enabled protein–protein interaction analysis for prediction of SARS-CoV-2 infectivity and variant evolution. Nat. Med.29, 2007–2018. https://doi.org/10.1038/s41591-023-02483-5 (2023).
Article CAS PubMed Google Scholar
Ou, J. et al. V367f mutation in SARS-CoV-2 spike RBD emerging during the early transmission phase enhances viral infectivity through increased human ACE2 receptor binding affinity. J. Virol.[SPACE]https://doi.org/10.1128/jvi.00617-21 (2021).
Article PubMed PubMed Central Google Scholar
Pipitò, L., Rujan, R., Reynolds, A. C. & Deganutti, G. Molecular dynamics studies reveal structural and functional features of the SARS-CoV-2 spike protein. BioEssays[SPACE]https://doi.org/10.1002/bies.202200060 (2022).
Article PubMed PubMed Central Google Scholar
Rath, S., Padhi, K. & Mandal, N. Scanning the RBD-ACE2 molecular interactions in omicron variant. Biochem. Biophys. Res. Commun.592, 18–23. https://doi.org/10.1016/j.bbrc.2022.01.006 (2022).
Article CAS PubMed PubMed Central Google Scholar
Abduljalil, J. et al. How helpful were molecular dynamics simulations in shaping our understanding of SARS-CoV-2 spike protein dynamics?. Int. J. Biol. Macromol.242, 125153. https://doi.org/10.1016/j.ijbiomac.2023.125153 (2023).
Article CAS PubMed PubMed Central Google Scholar
Ahamad, S., Hema, K. & Gupta, D. Structural stability predictions and molecular dynamics simulations of RBD and HR1 mutations associated with SARS-CoV-2 spike glycoprotein. J. Biomol. Struct. Dyn.40, 6697–6709. https://doi.org/10.1080/07391102.2021.1889671 (2021).
Article CAS PubMed Google Scholar
Pavlova, A. et al. Machine learning reveals the critical interactions for SARS-CoV-2 spike protein binding to ace2. J. Phys. Chem. Lett.12, 5494–5502. https://doi.org/10.1021/acs.jpclett.1c01494 (2021).
Article CAS PubMed Google Scholar
Liu, J. et al. Characterization of SARS-CoV-2 worldwide transmission based on evolutionary dynamics and specific viral mutations in the spike protein. Infect. Dis. Poverty.[SPACE]https://doi.org/10.1186/s40249-021-00895-4 (2021).
Article PubMed PubMed Central Google Scholar
Williams, J. et al. Molecular dynamics analysis of a flexible loop at the binding interface of the SARS-CoV-2 spike protein receptor-binding domain. Proteins: Struct. Funct. Bioinform.90, 1044–1053. https://doi.org/10.1002/prot.26208 (2021).
Article CAS Google Scholar
Antony, P. & Vijayan, R. Molecular dynamics simulation study of the interaction between human angiotensin converting enzyme 2 and spike protein receptor binding domain of the SARS-CoV-2 b.1.617 variant. Biomolecules11, 1244. https://doi.org/10.3390/biom11081244 (2021).
Article CAS PubMed PubMed Central Google Scholar
Harvey, W. T. et al. SARS-CoV-2 variants, spike mutations and immune escape. Nat. Rev. Microbiol.19, 409–424. https://doi.org/10.1038/s41579-021-00573-0 (2021).
Article CAS PubMed PubMed Central Google Scholar
Schrödinger, LLC. The PyMOL molecular graphics system (2020).
Waterhouse, A. et al. SWISS-MODEL: Homology modelling of protein structures and complexes. Nucleic Acids Res.46, W296–W303. https://doi.org/10.1093/nar/gky427 (2018).
Article CAS PubMed PubMed Central Google Scholar
Anandakrishnan, R., Aguilar, B. & Onufriev, A. V. H$+$ 3.0: Automating pK prediction and the preparation of biomolecular structures for atomistic molecular modeling and simulations. Nucleic Acids Res.40, W537–W541. https://doi.org/10.1093/nar/gks375 (2012).
Article CAS PubMed PubMed Central Google Scholar
Case, D. A. et al. The Amber biomolecular simulation programs. J. Comput. Chem.26, 1668–1688. https://doi.org/10.1002/jcc.20290 (2005).
Article CAS PubMed PubMed Central Google Scholar
Phillips, J. C. et al. Scalable molecular dynamics with NAMD. J. Comput. Chem.26, 1781–1802. https://doi.org/10.1002/jcc.20289 (2005).
Article CAS PubMed PubMed Central Google Scholar
Roe, D. R. & Cheatham, T. E. PTRAJ and CPPTRAJ: Software for processing and analysis of molecular dynamics trajectory data. J. Chem. Theory Comput.9, 3084–3095. https://doi.org/10.1021/ct400341p (2013).
Article CAS PubMed Google Scholar
Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature579, 265–269. https://doi.org/10.1038/s41586-020-2008-3 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
CDC. SARS-CoV-2 Variant Classifications and Definitions. https://www.cdc.gov/coronavirus/2019-ncov/variants/variant-classifications.html (2023).
Radvak, P. et al. SARS-CoV-2 B.1.1.7 (alpha) and B.1.351 (beta) variants induce pathogenic patterns in K18-hACE2 transgenic mice distinct from early strains. Nat. Commun.12, 6559. https://doi.org/10.1038/s41467-021-26803-w (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Wang, P. et al. Antibody resistance of SARS-CoV-2 variants B.1.351 and B.1.1.7. Nature593, 130–135. https://doi.org/10.1038/s41586-021-03398-2 (2021).
Article ADS CAS PubMed Google Scholar
Faria, N. R. et al. Genomics and epidemiology of the P.1 SARS-CoV-2 lineage in Manaus, Brazil. Science372, 815–821. https://doi.org/10.1126/science.abh264 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Naveca, F. G. et al. COVID-19 in Amazonas, Brazil, was driven by the persistence of endemic lineages and P.1 emergence. Nat. Med.27, 1230–1238. https://doi.org/10.1038/s41591-021-01378-7 (2021).
Article CAS PubMed Google Scholar
Mlcochova, P. et al. SARS-CoV-2 B.1.617.2 delta variant replication and immune evasion. Nature599, 114–119. https://doi.org/10.1038/s41586-021-03944-y (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Pouwels, K. B. et al. Effect of delta variant on viral burden and vaccine effectiveness against new SARS-CoV-2 infections in the UK. Nat. Med.27, 2127–2135. https://doi.org/10.1038/s41591-021-01548-7 (2021).
Article CAS PubMed PubMed Central Google Scholar
Zhang, W. et al. Emergence of a novel SARS-CoV-2 variant in southern California. JAMA325, 1324. https://doi.org/10.1001/jama.2021.1612 (2021).
Article PubMed PubMed Central Google Scholar
Annavajhala, M. K. et al. Emergence and expansion of SARS-CoV-2 b.1.526 after identification in New York. Nature597, 703–708. https://doi.org/10.1038/s41586-021-03908-2 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Zhou, H. et al. B.1.526 SARS-CoV-2 variants identified in New York City are neutralized by vaccine-elicited and therapeutic monoclonal antibodies. mBio12, e0138621. https://doi.org/10.1128/mbio.01386-21 (2021).
Article CAS PubMed Google Scholar
McCallum, M. et al. Molecular basis of immune evasion by the delta and kappa SARS-CoV-2 variants. Science374, 1621–1626. https://doi.org/10.1126/science.abl8506 (2021).
Article ADS CAS PubMed Google Scholar
Wilhelm, A. et al. Antibody-mediated neutralization of authentic SARS-CoV-2 b.1.617 variants harboring l452r and t478k/e484q. Viruses13, 1693. https://doi.org/10.3390/v13091693 (2021).
Article CAS PubMed PubMed Central Google Scholar
Halfmann, P. J. et al. Characterization of the SARS-CoV-2 b.1.621 (mu) variant. Sci. Transl. Med.14, eabm4908. https://doi.org/10.1126/scitranslmed.abm4908 (2022).
Article CAS PubMed Google Scholar
Laiton-Donato, K. et al. Characterization of the emerging B.1.621 variant of interest of SARS-CoV-2. Infect. Genet. Evolut.95, 105038. https://doi.org/10.1016/j.meegid.2021.105038 (2021).
Article CAS Google Scholar
Chen, J. et al. Omicron variant (B.1.1.529): Infectivity, vaccine breakthrough, and antibody resistance. J. Chem. Inf. Model.62, 412–422. https://doi.org/10.1021/acs.jcim.1c01451 (2022).
Article CAS PubMed PubMed Central Google Scholar
Haw, N. J. et al. Epidemiological characteristics of the SARS-CoV-2 Theta variant (P.3) in the Central Visayas region, Philippines, 30 October 2020–16 February 2021. West. Pac. Surveill. Resp. J.13, 60–62. https://doi.org/10.5365/wpsar.2022.13.1.883 (2022).
Article Google Scholar
WHO. Tracking SARS-CoV-2 variants. https://www.who.int/en/activities/tracking-SARS-CoV-2-variants/ (2020).
Aksamentov, I. et al. Nextclade: Clade assignment, mutation calling and quality control for viral genomes. J. Open Source Soft.6, 3773. https://doi.org/10.21105/joss.03773 (2021).
Article ADS Google Scholar
Hodcroft, E. B. Covariants: SARS-CoV-2 Mutations and Variants of Interest[SPACE]https://covariants.org/ (2021).
Teng, S. et al. Systemic effects of missense mutations on SARS-CoV-2 spike glycoprotein stability and receptor-binding affinity. Brief. Bioinform.22, 1239–1253. https://doi.org/10.1093/bib/bbaa233 (2020).
Article CAS PubMed Central Google Scholar
Laurini, E. et al. Molecular rationale for SARS-CoV-2 spike circulating mutations able to escape bamlanivimab and etesevimab monoclonal antibodies. Sci. Rep.11, 20274. https://doi.org/10.1038/s41598-021-99827-3 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Murtagh, F. A survey of recent advances in hierarchical clustering algorithms. Comput. J.26, 354–359. https://doi.org/10.1093/comjnl/26.4.354 (1983).
Article Google Scholar
Kloczkowski, A. et al. Distance matrix-based approach to protein structure prediction. J. Struct. Funct. Genomics10, 67–81. https://doi.org/10.1007/s10969-009-9062-2 (2009).
Article CAS PubMed PubMed Central Google Scholar
Ding, W. & Gong, H. Predicting the real-valued inter-residue distances for proteins. Adv. Sci.[SPACE]https://doi.org/10.1002/advs.202001314 (2020).
Article Google Scholar
Du, Y., Kabir, A., Zhao, L. & Shehu, A. From interatomic distances to protein tertiary structures with a deep convolutional neural network. in Proceedings of the 11th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics 1–8 (2020). https://doi.org/10.1145/3388440.3414699.
Leach, A. Molecular Modeling: Principles and Applications 2nd edn. (Prentice Hall, 2001).
Google Scholar
Simonyan, K. & Zisserman, A. Very deep convolutional networks for large-scale image recognition. in 3rd International Conference on Learning Representations (ICLR 2015) 1–14 (2015). ArXiv:1409.1556v6.
LeCun, Y. Backpropagation applied to handwritten zip code recognition. Neural Comput.1, 541–551. https://doi.org/10.1162/neco.1989.1.4.541 (1989).
Article Google Scholar
Chollet, F. Deep Learning with Python 4th edn. (Manning, 2017).
Google Scholar
Mishkin, D., Sergievskiy, N. & Matas, J. Systematic evaluation of convolution neural network advances on the Imagenet. Comput. Vis. Image Understand.[SPACE]https://doi.org/10.1016/j.cviu.2017.05.007 (2017).
Article Google Scholar
Ioffe, S. & Szegedy, C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. in Proceedings of the 32nd International Conference on International Conference on Machine Learning, vol. 37, 448—456 (2015).
Srivastava, N. et al. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res.15, 1929–1958 (2014).
MathSciNet Google Scholar
Abadi, M. et al.TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems[SPACE]arXiv:1603.04467 (2016)
Sanches, P. R. S. et al. Recent advances in SARS-CoV-2 Spike protein and RBD mutations comparison between new variants Alpha (B.1.1.7, United Kingdom), Beta (B.1.351, South Africa), Gamma (P.1, Brazil) and Delta (B.1.617.2, India). J. Virus Erad.7, 100054. https://doi.org/10.1016/j.jve.2021.100054 (2021).
Article CAS PubMed PubMed Central Google Scholar
Duda, R. O., Hart, P. E. & Stork, D. G. Pattern Classification (Wiley, 2001).
Google Scholar
Haykin, S. Neural Networks (Prentice Hall, 1999).
Google Scholar
Cherian, S. et al. SARS-CoV-2 spike mutations, L452R, T478K, E484Q and P681R, in the second Wave of COVID-19 in Maharashtra. India. Microorgan.9, 1542–1553. https://doi.org/10.3390/microorganisms9071542 (2021).
Article CAS Google Scholar
Webb, A. R. & Copsey, K. D. Statistical Pattern Recognition (Wiley, 2011).
Book Google Scholar
Dai, L. & Gao, G. F. Viral targets for vaccines against COVID-19. Nat. Rev. Immunol.21, 73–82. https://doi.org/10.1038/s41577-020-00480-0 (2020).
Article CAS PubMed PubMed Central Google Scholar
Tada, T. et al. SARS-CoV-2 lambda variant remains susceptible to neutralization by mRNA vaccine-elicited antibodies and convalescent serum (2021). https://doi.org/10.1101/2021.07.02.450959.
Wang, M. et al. Reduced sensitivity of the SARS-CoV-2 lambda variant to monoclonal antibodies and neutralizing antibodies induced by infection and vaccination. Emerg. Microb. Infect.11, 18–29. https://doi.org/10.1080/22221751.2021.2008775 (2021).
Article CAS Google Scholar
Tai, W. et al. Characterization of the receptor-binding domain (RBD) of 2019 novel coronavirus: Implication for development of RBD protein as a viral attachment inhibitor and vaccine. Cell. Mol. Immunol.17, 613–620. https://doi.org/10.1038/s41423-020-0400-4 (2020).
Article CAS PubMed PubMed Central Google Scholar
Alaofi, A. L. & Shahid, M. Mutations of SARS-CoV-2 RBD may alter its molecular structure to improve its infection efficiency. Biomolecules11, 1273. https://doi.org/10.3390/biom11091273 (2021).
Article CAS PubMed PubMed Central Google Scholar
Paul, D., Pyne, N. & Paul, S. Mutation profile of SARS-CoV-2 spike protein and identification of potential multiple epitopes within spike protein for vaccine development against SARS-CoV-2. VirusDisease32, 703–726. https://doi.org/10.1007/s13337-021-00747-7 (2021).
Article CAS PubMed PubMed Central Google Scholar
Tao, K. et al. The biological and clinical significance of emerging SARS-CoV-2 variants. Nat. Rev. Genet.22, 757–773. https://doi.org/10.1038/s41576-021-00408-x (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, J., Gao, K., Wang, R. & Wei, G.-W. Prediction and mitigation of mutation threats to COVID-19 vaccines and antibody therapies. Chem. Sci.12, 6929–6948. https://doi.org/10.1039/d1sc01203g (2021).
Article CAS PubMed PubMed Central Google Scholar
Wang, M., Cang, Z. & Wei, G.-W. A topology-based network tree for the prediction of protein–protein binding affinity changes following mutation. Nat. Mach. Intell.2, 116–123. https://doi.org/10.1038/s42256-020-0149-6 (2020).
Article PubMed PubMed Central Google Scholar
Wang, R. et al. Emerging vaccine-breakthrough SARS-CoV-2 variants. ACS Infect. Dis.8, 546–556. https://doi.org/10.1021/acsinfecdis.1c00557 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chen, J. et al. Revealing the threat of emerging SARS-CoV-2 mutations to antibody therapies. J. Mol. Biol.433, 167155. https://doi.org/10.1016/j.jmb.2021.167155 (2021).
Article CAS PubMed PubMed Central Google Scholar
Chen, J., Gao, K., Wang, R. & Wei, G. Revealing the threat of emerging SARS-CoV-2 mutations to antibody therapies. J. Mol. Biol.433, 167155. https://doi.org/10.1016/j.jmb.2021.167155 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mishra, P. M., Anjum, F., Uversky, V. N. & Nandi, C. K. SARS-CoV-2 spike mutations modify the interaction between virus spike and human ACE2 receptors. Biochem. Biophys. Res. Commun.620, 8–14. https://doi.org/10.1016/j.bbrc.2022.06.064 (2022).
Article CAS PubMed PubMed Central Google Scholar
Molnar, C. Interpretable Machine Learning: A Guide for Making Black Box Models Explainable (2022).

Download references

Acknowledgements

The authors gratefully acknowledge the financial support from the Brazilian agencies, institutes, and networks: Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq), Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - Brasil (CAPES), Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG-MG), Fundação de Apoio à Pesquisa do Estado da Paraíba (FAPESQ-PB), Programa de Apoio a Núcleos de Excelência (PRONEX-FACEPE), Fundação de Apoio ao Desenvolvimento da Universidade Federal de Pernambuco (FADE-UFPE) and Financiadora de Estudos e Projetos (FINEP). The authors also acknowledge the physical structure and computational support provided by Universidade Federal da Paraíba (UFPB), the computer resources of Centro Nacional de Processamento de Alto Desempenho em São Paulo (CENAPAD-SP), Núcleo de Processamento de Alto Desempenho da Universidade Federal do Rio Grande do Norte (NPAD/UFRN), and Supercomputer Santos Dumont (https://sdumont.lncc.br/) at the Brazilian National Laboratory for Scientific Computing (LNCC).

Funding

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) [Grant Nos. 312143/2020-6, 307340/2021-0, 405745/2021-4, 440363/2022-5 and 440307/2022-8]; Fundação de Amparo à Pesquisa do Estado de Minas Gerais (FAPEMIG) [Grant Nos. APQ-01834-21, APQ-02690-22, APQ-01838-24]; Fundação de Apoio à Pesquisa do Estado da Paraíba (FAPESQ) [Grant No. 030/2023].

Author information

Authors and Affiliations

Department of Computer Science, Federal University of Minas Gerais, Belo Horizonte, Minas Gerais, Brazil
Lucas Moraes dos Santos & Raquel C. de Melo-Minardi
Department of Chemistry, Federal University of Paraíba, João Pessoa, Paraíba, Brazil
José Gutembergue de Mendonça & Gerd Bruno Rocha
Department of Exact and Biological Sciences, Federal University of São João Del Rei, São João del Rei, Minas Gerais, Brazil
Yan Jerônimo Gomes Lobo & Leonardo Henrique Franca de Lima

Authors

Lucas Moraes dos Santos
View author publications
Search author on:PubMed Google Scholar
José Gutembergue de Mendonça
View author publications
Search author on:PubMed Google Scholar
Yan Jerônimo Gomes Lobo
View author publications
Search author on:PubMed Google Scholar
Leonardo Henrique Franca de Lima
View author publications
Search author on:PubMed Google Scholar
Gerd Bruno Rocha
View author publications
Search author on:PubMed Google Scholar
Raquel C. de Melo-Minardi
View author publications
Search author on:PubMed Google Scholar

Contributions

L.M.S. designed the machine learning model, analyzed the results, contributed to the drafting and revising of the manuscript, and approved the final draft. J.G.M. performed the molecular dynamics simulations, analyzed the data, contributed to the drafting and revising of the manuscript, and approved the final draft. Y.J.L. performed the molecular dynamics simulations, analyzed the data, authored, reviewed, and approved the final draft. L.H.L. conceived and designed the molecular dynamics experiments, analyzed the data, contributed to the drafting and revising of the manuscript, and approved the final draft. G.B.R. conceived and designed the molecular dynamics experiments, performed the experiments, analyzed the data, contributed to the drafting and revising of the manuscript, and approved the final draft. R.C.M. advised the study on the design of the machine learning models, analyzed the results, reviewed the drafts of the article, and approved the final draft.

Corresponding authors

Correspondence to Lucas Moraes dos Santos or Raquel C. de Melo-Minardi.

Ethics declarations

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Moraes dos Santos, L., Gutembergue de Mendonça, J., Jerônimo Gomes Lobo, Y. et al. Deep learning for discriminating non-trivial conformational changes in molecular dynamics simulations of SARS-CoV-2 spike-ACE2. Sci Rep 14, 22639 (2024). https://doi.org/10.1038/s41598-024-72842-w

Download citation

Received: 21 April 2024
Accepted: 11 September 2024
Published: 30 September 2024
DOI: https://doi.org/10.1038/s41598-024-72842-w

Subjects

Abstract

Similar content being viewed by others

A 3D structural SARS-CoV-2–human interactome to explore genetic and drug perturbations

Serine 477 plays a crucial role in the interaction of the SARS-CoV-2 spike protein with the human receptor ACE2

Dual ACE2 epitope-based biomimetic receptors for selective sensing of SARS-CoV variants

Introduction

Methods

System preparation and molecular dynamics protocol

Trajectory analysis

Database

Feature transformation

CNN-based model development

Model training

Model evaluation

Results and discussions

2D-RMSD analysis of the impact of mutations in the RBD

Molecular dynamics trajectory classification

Comparison with state-of-the-art methods

Prediction from centroids

Feature visualization

Conclusions and perspectives

Data availability

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links