Computational analysis of pathogen-host interactome for fast and low-risk in-silico drug repurposing in emerging viral threats like Mpox

Paul, Debarati; Saha, Sovan; Basu, Subhadip; Chakraborti, Tapabrata

doi:10.1038/s41598-024-69617-8

Download PDF

Article
Open access
Published: 12 August 2024

Computational analysis of pathogen-host interactome for fast and low-risk in-silico drug repurposing in emerging viral threats like Mpox

Debarati Paul^1,5^na1,
Sovan Saha²^na1,
Subhadip Basu¹^na2 &
…
Tapabrata Chakraborti^3,4^na2

Scientific Reports volume 14, Article number: 18736 (2024) Cite this article

4062 Accesses
4 Citations
Metrics details

Subjects

Abstract

Monkeypox (Mpox), a zoonotic illness triggered by the monkeypox virus (MPXV), poses a significant threat since it may be transmitted and has no cure. This work introduces a computational method to predict Protein–Protein Interactions (PPIs) during MPXV infection. The objective is to discover prospective drug targets and repurpose current potential Food and Drug Administration (FDA) drugs for therapeutic purposes. In this work, ensemble features, comprising 2–5 node graphlet attributes and protein composition-based features are utilized for Deep Learning (DL) models to predict PPIs. The technique that is used here demonstrated an excellent prediction performance for PPI on both the Human Integrated Protein–Protein Interaction Reference (HIPPIE) and MPXV-Human PPI datasets. In addition, the human protein targets for MPXV have been identified accurately along with the detection of possible therapeutic targets. Furthermore, the validation process included conducting docking research studies on potential FDA drugs like Nicotinamide Adenine Dinucleotide and Hydrogen (NADH), Fostamatinib, Glutamic acid, Cannabidiol, Copper, and Zinc in DrugBank identified via research on drug repurposing and the Drug Consensus Score (DCS) for MPXV. This has been achieved by employing the primary crystal structures of MPXV, which are now accessible. The docking study is also supported by Molecular Dynamics (MD) simulation. The results of our study emphasize the effectiveness of using ensemble feature-based PPI prediction to understand the molecular processes involved in viral infection and to aid in the development of repurposed drugs for emerging infectious diseases such as, but not limited to, Mpox. The source code and link to data used in this work is available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox.

A multi-targeted computational drug discovery approach for repurposing tetracyclines against monkeypox virus

Article Open access 04 September 2023

Exploration of drug repurposing for Mpox outbreaks targeting gene signatures and host-pathogen interactions

Article Open access 27 November 2024

Multi-omics characterization of the monkeypox virus infection

Article Open access 08 August 2024

Introduction

Mpox¹ is an infectious viral disease caused by the monkeypox virus, which is classified under the Orthopoxvirus genus under the Poxviridae family^2,3. It manifests with symptoms like skin rash, fever, muscle pains, and swollen lymph nodes⁴. This zoonotic illness has lately reemerged in multiple countries worldwide⁵. In April 2024, the World Health Organisation (WHO) received reports from 27 countries indicating a total of 528 newly confirmed instances of Mpox infection, along with one recorded death. These numbers suggest that Mpox is still being transmitted globally, albeit at a relatively low level. Given the absence of any viable treatment to completely eliminate the virus, it is crucial to identify drugs that can effectively stop the transmission of this global epidemic. Several therapies and vaccinations are accessible for Mpox. An important focus in therapeutic development is the OPV-specific envelope protein F13L or VP37⁶, which plays a significant role in Mpox and other illnesses linked to OPV, such as smallpox. Mehmood et al. employed Tecovirimat as a benchmark molecule in a machine learning (ML) methodology to examine DrugBank for analogous bioactive chemicals capable of obstructing the MPXV E8L surface binding protein⁷. Altayb et al. found that Fludarabine, an FDA-approved medicine, exhibits the strongest in-silico action against DNA-Dependent RNA Polymerase (A6R), a vital protein for viral replication⁸. Additionally, it exhibited encouraging efficacy against D8L, a crucial component for viral cell entrance, and F13L, which is required for the formation of intracellular mature virus particles. A study conducted by Alandijany et al. found that Tigecycline and Evaracycline demonstrated inhibitory effects when virtually tested against MPXV proteins⁹. Ten MPXV proteins have been targeted in our investigation in a feature-based deep learning algorithm based on their known interaction with human proteins.

Proteins, which are essential for the organization and operation of living organisms, are complex macromolecules that regulate a wide range of biological activities. Their connections, namely through PPIs, play a crucial role in several cellular processes such as enzymatic activity, signaling cascades, and the control of biological pathways^10,11. Decoding PPI Network (PPIN) is essential for comprehending the complexities of cellular machinery and gaining insights into the molecular foundation of health and illness. Within the realm of infectious diseases, the interaction of proteins between the pathogen and the host organism becomes a central focus. Pathogens, including viruses, utilize their proteins to specifically target human proteins to disrupt or manipulate biological circuits¹². The interactions cover a wide range, from attaching to host membrane receptors to enlisting host factors through viral proteins¹³. Pathogen-host interactions (PHIs) determine the result of an infection, affecting the host’s reaction, the ability of the pathogen to avoid the immune system, and the course of the disease¹⁴. Therefore, studying the mechanisms of PHIs reveals the molecular foundations of infectious illnesses, offering potential opportunities for treatments of emerging viral threats like Mpox.

Accurate projections of PPIs are achieved by identifying relevant features and using suitable ML or DL techniques. The advancements in DL technologies have made significant contributions in this particular situation. The feature set can be created by applying appropriate transformations to features obtained from the structures and sequencing of proteins. Given the scarcity of structures, there has been an increased focus on utilizing sequences alone to forecast PPIs^15,16,17. The study conducted by Sun et al.¹⁸ utilized the DL algorithm Stacked Auto-encoder (SAE) to predict PPIs. This prediction was based on sequence-based feature extraction techniques such as Auto-covariance (AC) and Conjoint Triad (CT). Li et al. have developed a deep neural network framework called DNN-PPI¹⁹ for predicting PPIs. This framework utilizes characteristics learned automatically from protein primary sequences and incorporates convolutional neural network (CNN) and long short-term memory (LSTM) neural network layers. In their study, Singh et al.²⁰ proposed a technique called Topsy-Turvy. This methodology integrates graph-theoretic (top-down) and sequence-based (bottom-up) methods in the training phase of their sequence-based predictor for predicting PPI. Graph Neural Networks (GNNs) have made significant progress in recent years and are now considered crucial tools in applications that rely on graph topologies. Jha et al.²¹ have introduced a system that integrates graph-based methods, namely graph convolutional network (GCN) and graph attention network (GAT), with language models (SeqVec²² and ProtBert²³) to predict PPI. Wu et al. have introduced DLPPI, a method that incorporates protein sequence data for predicting PPIs using the Feature-Relational Reasoning Network (FRN) and GNNs²⁴. Koca et al.²⁵ have introduced a model based on Graph Sample and Aggregate (GraphSAGE)²⁶ for predicting PHI from sequences using the Doc2Vec method. Thus sequence-based feature extraction approaches have already proven their effectiveness in identifying PPIs or PHIs^27,28,29,30.

Systematic evaluations of the structure, also known as topology, in large networks, have been introduced in the works of²⁷. These assessments rely on graphlets, which are small induced subgraphs of the larger networks²⁸. Subsequently, in a study conducted by²⁹, a new graph-theoretic approach was introduced to analyze the relationship between local topology and function in real-world networks. The study provided evidence that the graphlet representation of a PPI network is highly relevant for predicting protein function. The study³⁰ has revealed the identification of similarities in the local topological structures of networks by analyzing repeating sub-patterns within them. This analysis utilizes a similarity measure generated from the distribution of graphlet frequencies. Appreciating the significance of graphlets in topological representation, we have suggested a novel methodology, integrating graphlets as well as sequence-based features for the targeted objective of predicting PPIs through the application of the DL-based GraphSAGE model. The ensemble feature set has successfully captured the fundamental organizational principles and intrinsic information stored in protein sequences, resulting in a more adaptable technique for predicting PPIs.

Due to the absence of clinically approved drugs for efficiently treating Mpox, we are driven to develop a new computational model. This model will use an ensemble feature-based deep learning method on the MPXV-Human PPIN to predict potential PHIs. We have subsequently identified the human proteins associated with the PHIs and the prospective FDA drugs in DrugBank¹⁸ for those MPXV targeted human proteins efficient in the treatment of Mpox, followed by the drug repurposing study and conducting docking analysis. The following key components of this research endeavor are emphasized:

1.
A novel graph-based DL model is first deployed to develop a robust classifier for predicting PPIs utilizing ensemble features on the HIPPIE dataset³¹. The ensemble feature consists of graphlet property and a variety of composition/sequence-based attributes (see Fig. 1). 976 sequence-based features are taken into consideration out of which the most essential ones (196) are selected through the proper selection of threshold variance which makes it unique from the rest of the sequence-based methodologies.
2.
A demonstrative case study is conducted to predict the potential PHIs on the MPXV-Human PPIN using the same graph-based DL approach (see Fig. 1). Studying PHIs in the MPXV-Human network is crucial as it enables us to comprehend the specific proteins that facilitate infection¹². It has the ability to reveal potential treatment targets. To expedite the identification of treatments for Mpox, scientists might use existing medications via the process of drug repurposing³². Examining the proteins involved in PHIs provides valuable insights into the dynamics of interactions between hosts and pathogens, which is essential for the development of effective vaccinations.
3.
The model we have presented, accurately predicts 3348 positive PHIs involving the MPXV-Human virus. The drug repurposing study has also evaluated existing treatments to identify new possibilities that may disrupt the links between the host and the infection. After conducting a comprehensive investigation, it has been shown that several protein nodes in the MPXV-Human PPIN function as protein targets for various drugs that have been approved by the FDA in DrugBank³³. The drugs included in this research have been determined using the DCS²¹ (see Algorithm 1 in the supplementary document S1).
4.
The found repurposed drugs have been classified into three domains, namely Level I, Level II, and Level III, according to their derived DCS score. The FDA drugs that have achieved the highest ratings in computational efficiency for treating Mpox have been categorized as Level I. Repurposed drugs, which have poorer effectiveness (inferior scores) than Level I drugs, are then classified as Level II and so forth (see Table S1 in the supplementary).
5.
The repurposed medications revealed in this study have also been substantiated by other pieces of literary evidence (see Sects. S1, S2, and S3 in the supplementary document).
6.
A concurrent docking investigation is conducted using AutoDock Vina³⁴ version 1.5.6, which provides further insights into the effectiveness of these newly found repurposed drugs.
7.
The docking investigation of the three identified repurposed drugs like Fostamatinib, Cannabidiol and Glutamic acid is further confirmed by post-stimulation using MD, which is performed using GROMACS, a freely available software suite.

Results

The graph-based DL computational model we have created for predicting potential PHIs has 3348 positive PHIs in the MPXV-Human PPIN. Potential candidates for treating Mpox may be found by analyzing the proteins implicated in these PHIs via a drug repurposing study followed by docking. The sources of input and the resulting outputs are always of great importance in any computational model, and this holds for our suggested model as well.

Overview of the data sets

HIPPIE database

We apply our method by employing a PPIN obtained from the HIPPIE database³¹. The network comprises 5497 proteins linked by 13,163 interactions. The PPIN is perceived as a graph, where proteins function as nodes and the connections between proteins constitute the edges. This database is used for both positive and negative data choices that are utilized as input for our built graph-based DL computational model. All datasets are available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox.

MPXV-Human PPIN

The MPXV-Human PPIN is generated by combining the Human–human interactions acquired from the HIPPIE database with the anticipated interaction network of MPXV-Human from³². The interaction network may be shown as a graph consisting of 7576 nodes and 17,647 edges. The collection has a total of 7576 nodes, with 10 nodes representing the MPXV protein³² and 7566 nodes representing human proteins. All datasets are available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox.

Potential Mpox FDA drugs

NADH²³, Fostamatinib³⁵, Glutamic acid³⁶, Cannabidiol³⁷, Copper³⁸, and Zinc³⁹ have been classified as Level I drugs according to the DrugBank database. In addition, there are further drugs classified as Level II and Level III (see Table S1 in the supplementary). In this study, the most superior drugs are then used for validation via literary evidence and molecular docking.

Prediction of PPI using ensemble feature on HIPPIE dataset

This research work centers on a novel graph-based DL approach that leverages ensemble features to build a robust classifier for predicting PPIs. The ensemble feature comprises graphlet property (up to 5 nodes) and a range of amino acid composition-based features. The 13,163 interactions observed among the protein nodes in the PPIN of the HIPPIE database are regarded as positive data. The negative edges are discovered by choosing all potential pairs of proteins that do not have an interaction edge between them. By utilizing the negative sampling technique, an equivalent number of negative edges have been created to form the negative dataset for training the model. A few evaluation criteria considered in this work for assessing PPI classification models include precision, recall (also referred to as sensitivity), Matthew’s correlation coefficient (MCC) score, F1-score, Area Under the Precision-Recall Curve (AUPRC), and “Area Under the Curve” of the “Receiver Operating Characteristic” curve (AUC-ROC) score²⁸. A thorough examination of three widely used feature selection techniques: (1) Fast Independent Component Analysis (ICA), (2) Principal Component Analysis (PCA), and (3) Variance Threshold with a threshold parameter of 0.13, is performed. The goal is to identify the most effective strategy in terms of feature selection performance. Since the Variance threshold feature selection technique with a threshold value of 0.13 and 269 features has shown better performance compared to other threshold values, the same number of features is picked for comparison in all other feature selection methods. Although all approaches have used the same set of 269 features, Variance Threshold with a value of 0.13 showed higher discriminatory ability compared to FastICA and PCA in terms of selecting effective features, as seen in Table 1.

Table 1 Performance of our proposed model with FastICA, PCA, and variance threshold feature selection method on HIPPIE dataset³¹.

Full size table

To do a comparative analysis of our model, it is trained and evaluated using three distinct feature sets. Initially, just graphlet characteristics are used, and subsequently, only sequence-based features are employed. Ultimately, the ensemble mode incorporates both graphlet and sequence properties. Table 2a demonstrates the performance of the proposed model with different node attributes. As reflected, the performance of the proposed model using the ensemble feature is superior to the performances achieved by any individual type of feature, such as a graphlet or sequence-based feature. The primary objective of our experimental investigation is to compare and evaluate our graphSAGE²⁶-based model with the current state-of-the-art GNNs²⁴ variants, specifically GCN and GAT^40,41. We have conducted a comparative analysis using ensemble features to evaluate their respective performances, ensuring an identical experimental setup to that of our proposed model. Figure 2 shows the analysis of different performance metrics for the three models GCN, GAT, and GraphSAGE²⁶. The results indicate that our proposed model, which utilizes graphSAGE, has shown significantly better performance compared to other variants of GNN. This is evident from the results of the test set, where the precision, recall, F1-score, AUC-ROC Score, and MCC Score are 0.8647, 0.7622, 0.8103, 0.8215, and 0.6476, respectively.

Table 2 (a) Performance scores of our proposed model with graphlet feature, sequence-based feature, and ensemble feature on HIPPIE dataset³¹; (b) performance scores of our proposed model with graphlet feature, sequence-based feature, and ensemble feature on MPXV-Human dataset³².

Full size table

Prediction of PHIs using ensemble feature on MPXV-Human PPIN

A comprehensive experiment is implemented utilizing MPXV-Human PPIN data to evaluate the efficacy of several feature extraction techniques, such as graphlet-based features, sequence-based features, and ensemble approaches. The findings from Table 2b demonstrate a significant advantage of the ensemble technique compared to both graphlet-based and sequence-based features in terms of all assessment measures. This comprehensive performance advantage underscores the effectiveness of the ensemble approach in capturing and leveraging essential patterns within the Mpox data. It can be seen from Fig. 3a that the model has performed well in correctly identifying positive instances (True Positive (TP): 3348) and negative instances (True Negative (TN): 3094). However, it has made some minor errors, incorrectly classifying some instances as positive when they are negative (False Positive (FP): 436) and vice versa (False Negative (FN): 182). Moreover, AUC-ROC comparison plots are plotted for our proposed model using three distinct features: ensemble feature, sequence-based feature, and graphlet feature along with their scores, which have been reported in Fig. 3b. In the figure, the colour blue represents the ensemble feature, the colour green represents the sequence-based feature, and the colour red represents the graphlet feature. The results revealed that the ensemble feature outperformed the other features, achieving the highest Area Under the Receiver Operating Characteristic (AUC-ROC) score of 0.91. In contrast, the sequence-based feature exhibited moderate performance with an AUC-ROC score of 0.82, while the graphlet feature demonstrated lower performance with a score of 0.53. The predicted positive MPXV-Human PPIN interactions from the ensemble approach have a total of 2069 edges, with 8 nodes representing the MPXV protein and 2069 nodes representing target human proteins. All datasets are available at: https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox.

Drug repurposing study on MPXV-Human PPIN

The drug components identified through the ProcessDrugData algorithm (see Algorithm S1 in the supplementary document) have been listed in Table S1 in the supplementary document along with their DrugBank Accession Number, Groups in DrugBank, and computed DCS. The drugs are ranked in descending order based on their DCS, with NADH being the most computationally efficient treatment for MPXV, with a score of 35. NADH is used as a nutrient in some supplement products. We have categorized the repurposed drugs into three domains, namely Level I, Level II, and Level III, based on their DCS. The medications that get the best scores in computational efficiency for treating Mpox have been classified as Level I. Subsequently, repurposed drugs, which exhibit lower efficacy (worse scores) compared to Level I treatments, are categorized as Level II. Ultimately, drugs that have attained worse scores (in comparison to Level II drugs) in computational efficiency for treating Mpox have been categorized under Level III. The pharmacological results have been supported by many pieces of evidence documented in the subsequent portion of the available literature in Sects. S1, S2, and S3 in the supplementary document. The drugs that have been found are compared with the results produced by SAveRUNNER⁴². SAveRUNNER is an R-based program utilised for predicting medication disease associations. There is a substantial overlap of 1, 8, and 56 medicines that are projected to be repurposed for MPXV at Level I, Level II, and Level III. Please refer to Fig. S1 in the supplementary paper for further details. The comprehensive drug names and their overlap can be accessed online here. SAveRUNNER is a tool executed only on network/topological-based approach whereas this work focuses not only on the network but also the sequence and ensemble methodology of both network and sequence as well.

Computational docking of potential drugs for MPXV protein structures

A detailed analysis of molecular docking between NADH having the highest DCS (35) and MPXV methyltransferase VP39 in complex with inhibitor TO427 (PDB ID: 8CEQ)⁴³ has been done using AutoDock Vina³⁴ version 1.5.6. MPXV protein crystal structure is obtained from Protein Data Bank (PDB) in .pdb format. Figure 4a presents the crystal structures of 8CEQ. The molecular structure of NADH obtained from DrugBank is shown in Fig. 4b. After successful docking, nine different modes of drug-protein interactions are produced along with specific docking scores, which represent the binding energy. The binding mode with the lowest binding energy is considered the optimal binding mode, as it signifies the most stable interaction for the ligand. The best affinity score obtained is − 9.7 (kcal/mol). Root Mean Square Deviation (RMSD) values are computed relative to the best mode and exclusively involve movable heavy atoms. Two forms of RMSD metrics are offered: rmsd/lb (RMSD lower bound) and rmsd/ub (RMSD upper bound), which differ in the method of matching atoms during distance calculation³⁴. The summarized results of the binding energy observations of the nine modes along with their distances from the best orientation can be found in Table S2 in the supplementary document. The identification of amino acids in the protein’s active site has been conducted using Biovia Discovery Studio 4.5⁴⁴. The optimization of the MPXV protein has been performed by the removal of water and other atoms, followed by the addition of a polar hydrogen group. The interaction of NADH with the MPXV protein shows a high-affinity interaction in Fig. 4c as the ligand fits inside the core pocket region of the protein. This is further supported by the hydrogen bonding observed between the oxygen atom of NADH with TYR 189 and SER 141. The details of the various interactions along with the active sites such as ASP 187, SER 141, LEU 154, LEU 221, SER 165, HIS 98, ARG 97, ALA 158, TYR 189, GLY 96, PHE 188 have been shown in Fig. 4d. Thus, the docking results indicate a favorable binding interaction between NADH and the MPXV protein. This further highlights that this widely available natural compound holds promise as a potential anti-Mpox medication, warranting further investigation, which can benefit both patients and public health initiatives. The virtual interaction of 8CEQ with the other Level I repurposed drugs i.e. Fostamatinib, Glutamic acid, and Cannabidiol has been measured using the AutoDock Vina suite. A similar docking study as NADH has been performed on these remaining top three levels I drugs. The binding energy results obtained from the docking of 8CEQ with these ligands are shown in Table S3 in the supplementary document. It is observed from the result that while NADH registers the highest DCS of 35 in the drug repurposing study, it still attains the highest position in comparison to the other repurposed drugs in the molecular docking study with an Affinity score of − 9.7 kcal/mol. The docking scores of Fostamatinib, Glutamic acid, and Cannabidiol are found to be − 8.1, − 4.9, and − 8.7 kcal/mol respectively. The docking result suggests that these drug molecules have a greater capability to inhibit MPXV since they have demonstrated high-affinity interactions with 8CEQ. Consequently, this creates opportunities for pharmaceutical businesses to capitalize on existing drug libraries thereby speeding up the introduction of groundbreaking Mpox treatments.

Molecular dynamic simulation of three potential identified drugs for MPXV protein structures

The study employs GROMACS⁴⁵ for conducting molecular dynamics (MD) simulations⁴⁶. The process consists of seven consecutive steps: (1) Create network topology files for both the protein and ligand molecules. (2) Specify the dimensions of the simulation box and add solvent molecules to it. (3) Introduce ions into the system. (4) Perform energy minimization to optimise the molecular structure. (5) Conduct equilibration to stabilise the system. (6) Run production molecular dynamics simulations. (7) Analyse the obtained data. In chemistry, the phrase "boxing" refers to the act of enclosing a molecule or collection of molecules within a specific region of space. Solvate, in contrast, denotes the process of encircling a solute molecule with solvent molecules, resulting in the formation of a solvation shell. The simulation is conducted on 8CEQ, as well as three other repurposed drugs: Fostamatinib, Cannabidiol, and Glutamic acid. At first, distinct network topology files are created for the 8CEQ and best-posed ligand structure of each of the three drugs, which are found through docking using AutoDock Vina. Next, the ligands are individually coupled with 8CEQ to create three merged structures of MPXV-ligand bound form. Then, "box" and "solvate" are defined for the form in which MPXV is linked to a ligand. This is followed by the addition of ions. During the energy minimization step, the energy of the MPXV-ligand bound state is reduced, and this is followed by minimising the temperature and pressure during the equilibration phase. Subsequently, a molecular dynamics simulation is conducted and the obtained data is evaluated. All three medications exhibit significant stability in terms of RMSD. The findings, together with other graphical displays, of this whole MD simulation is shown in the supplementary document S1.

Discussion

The suggested ensemble approach yields good performance metrics results on the HIPPIE dataset. Existing research demonstrates that peptide sequence-based characteristics are valuable for predicting PPIs. In our GraphSAGE model training, graphlet features (up to 5 nodes) are used in addition to sequence-based features. The utilization of graphlet features enables the identification of complex connections among adjacent nodes, facilitating the detection of specific structural patterns. On the other hand, sequence-based features enhance the model’s understanding of the biological properties of proteins, assisting in the differentiation between interacting and non-interacting pairs based on their biochemical characteristics. Table 2a’s ablation reveals that the ensemble feature set, when combined with the GraphSAGE model, outperforms the sequence-based and graphlet features individually on the HIPPIE dataset. In the subsequent stage, this deduction obtained from the HIPPIE dataset has been extended to validate it on the MPXV-Human PPIN.

While smallpox has been eradicated worldwide via effective vaccination efforts, Mpox continues to exist as an intermittent and localized illness for which there is currently no particular antiviral medicine available. Understanding the connections between MPXV and human host proteins in this environment might provide insights into the processes via which the virus evades the host’s immune response. Anticipating PPIs can uncover promising therapeutic targets in the network of interactions between the host and virus. This can assist in identifying antigens that can be specifically targeted for the creation of vaccines. Motivated by this fact, our investigation is redirected to the MPXV-Human PPIN.

Table 2b demonstrates that the GraphSAGE model while using 73-dimensional graphlet features, does not get appropriate performance metrics results for the MPXV-Human dataset. However, it performs better when utilizing 196-dimensional sequence-based features. The optimal performance was obtained by combining 73-dimensional 2–5 nodes graphlet features with 196-dimensional sequence-based features, resulting in an ensemble 269-dimensional feature vector for each protein node. The feature matrix, together with the positive and negative edge set, has been inputted into the GraphSAGE model for training in all instances. GraphSAGE, a variation of the GNN model, functions by selectively extracting and combining data from a node’s immediate surroundings. This makes it particularly suitable for graphs with diverse sizes and architectures. The model’s ability to generalize on unseen nodes is enhanced by its inductive learning technique. After completing the training process, the sigmoid activation function is utilized to make predictions about the interactions. The test data has been classified into its final class based on the model’s output, using a threshold value of 0.5. If the probability output is greater than 0.5, then it is considered a positive one; otherwise, negative.

During the subsequent stage, a drug repurposing investigation is undertaken on the accurately detected interactions using our suggested model. The drugs and their corresponding protein targets mapping are downloaded from DrugBank. Using the ProcessDrugData algorithm, the potential drugs linked to the human proteins of MPXV targets and involved in direct positive interactions are retrieved from the DrugBank mapping. The DCS is calculated for each such drug. These scores are organized in descending order based on their value. Subsequently, the most promising drugs that might potentially alleviate the consequences and symptoms of Mpox have been grouped into three levels I, II, and III based on their DCS as shown in Table S1 in the supplementary document. In addition, many existing pieces of supporting data are highlighted, from successful studies conducted so far on these drugs, in the treatment of Mpox. To further strengthen our findings, a molecular docking study has been performed on the MPXV methyltransferase VP39 in complex with inhibitor TO427 (PDB ID: 8CEQ) (target protein receptor) with NADH, Fostamatinib, Glutamic acid, and Cannabidiol (ligands). The docking affinity score (in kcal/mol) of these four drug molecules as reported in Table S3 in the supplementary document implies a strong binding affinity between the ligands and the target receptor. The docking results of the best-positioned ligands are further confirmed using MD modelling, which demonstrates the remarkable stability of these docked structures. It suggests that the identified ligands exhibit a strong potential and have a greater likelihood of combating Mpox, which can be further reinforced through clinical trials in the future.

In a nutshell, this study, through fast and efficient computational drug repurposing, can reduce the risks and cost of developing new compounds and offers a viable answer to the time constraints of traditional drug development, especially for new emerging diseases with potential pandemic threats for which all the proteomic information might not be fully available, like Mpox as a case study. The suggested technique also highlights the importance of combining the network’s topological characteristic, namely the graphlet feature, with the intrinsic biochemical and physicochemical sequence-based properties of proteins to predict PPIs. The DL model based on GraphSAGE has been trained using this ensemble feature set in both the HIPPIE and MPXV-Human datasets. The performance metrics scores demonstrate this proposed model’s efficacy in predicting PPI in the HIPPIE dataset and PHIs in the MPXV-Human dataset. Subsequently, a drug repurposing strategy is executed using DrugBank data and a consensus strategy utilizing DCS on the MPXV-Human PPIN. This approach is inherently aligned with biological knowledge, making it more trustworthy and explainable compared to the DL-based drug repurposing study approach. This led to the identification of prospective medicines categorized into three levels that might be used for the treatment of MPXV. Furthermore, a molecular docking study is conducted on the top four level I drugs. The results of this study offer significant perspectives on the various techniques used to extract features, providing a basis for making well-informed decisions in future implementations. Furthermore, conducting drug repurposing, molecular docking and MD simulation analysis presents the opportunity to identify low-risk treatment alternatives efficiently, economically, and promptly for multiple viral threats like Mpox, thereby helping both patients and public health initiatives and boosting the pharmaceutical businesses. In the time ahead, DL and mechanistic artificial intelligence (AI) can be integrated with our consensus scoring strategy to identify prospective drugs as anti-viral medications. Also, for emerging viral diseases like COVID-19 and Mpox where PHI network information is not known or fully established initially, this computational study can bridge the gap In-Silico and still predict some potential repurposed drugs through high interactivity scores in the future.

Methods

Our developed new graph-based DL computational model for predicting PPIs or PHIs consists of three important methodologies (1) Feature Extraction, (2) Deep Neural Network Model, and (3) Classification Strategy.

Feature extraction

Our research presents a new method for combining graphlet features with sequence-based features to extract features. Our objective is to improve the model’s ability to distinguish between different options (interacting and non-interacting) by combining local structure information using graphlet features and the underlying biochemical and physicochemical properties of proteins using amino acid composition features. Therefore, the process of generating features may be stated in three steps:

a. Graphlet Feature extraction: The concept of a network’s graphlet was originally proposed by Prˇzulj²⁷. Before the discussion of our technique, it is important to emphasize the idea of graphlet. These graphlets are small, highly linked induced subgraphs in a graph network. Each graphlet exhibits the automorphism property of a graph, indicating that the graph possesses an isomorphic structure inside itself. The orbit of an automorphism in a graph refers to the partitioning of the vertices that represent the symmetrical configuration of the graph. The frequency of a certain node’s occurrence in each orbit in the PPIN is represented by a 73-dimensional vector called orbit-counts, considering up to 5-node graphlets. This 73-dimensional feature vector captures a protein node’s local topological property within a PPIN. In Fig. 1, a few graphlets from the whole set of 30 graphlets (G₀–G₂₉)^27,28 of 2–5 nodes have been represented. It can be seen that G₀ comprises only one orbit, as does G₂ since all the node positions are equivalent. But G₁ exhibits two orbits since the node positions of the end nodes and middle node are not equivalent. In this fashion, overall, 73 orbital positions are present. To extract node characteristics, the 73-dimensional orbit counts have been calculated for each node, taking into account 2–5 node graphlets using the ORbit Counting Algorithm (ORCA)⁴⁷ which is an efficient algorithm for computing orbit counts of 5-node graphlets for each node in the PPIN. It is designed as a command-line tool, accepting arguments such as orbit type and graphlet size (set to 5 in our scenario), along with an input file specifying the network in a basic text format. As we had 5497 protein nodes in our HIPPIE PPIN, the resulting orbit-counts feature vector was sized at 5497 × 73. The number of times each of these 5497 proteins, acting as a node in the PPIN, appears in every orbital position (0–72) over 5-node graphlets, has been represented in the generated feature matrix.

b. Sequence-based feature extraction: The amino acid composition features provide a thorough overview of the protein content by calculating the percentage of each residue type from the amino acid sequence. The Pfeature tool⁴⁸, a web server capable of calculating a broad spectrum of protein/peptide features based on their amino acid sequences, is utilized to produce 976 distinct features based on composition. The utilization of this numerical encoding of a protein allows us to represent proteins in sequences of different lengths using a standardized collection of features. The amino acid sequences for the 5497 proteins of the HIPPIE network are acquired from the UniProt⁴⁹ database. These sequences were then inputted into the Pfeature server to construct the following composition-based features:

a)
Simple compositions: Amino acid Composition (AAC), Dipeptide Composition (DPC), Atomic Composition (ATC), and Bond Type Composition (BTC).
b)
Physico-chemical compositions: Physico-chemical properties (PCP).
c)
Compositions involving repeats and distribution of amino acid: Repetitive Residue Information (RRI) and Distance Distribution of Residues (DDR).
d)
Shanon Entropy-based Compositions: Shannon Entropy at Protein Level (SEP) and Shannon Entropy at Residue Level (SER).
e)
Miscellaneous Compositions: Conjoint Triad Calculation (CTC), Pseudo Amino Acid Composition (PAAC), Amphiphilic Pseudo Amino Acid Composition (APAAC), Quasi-Sequence Order (QSO) and Sequence Order Coupling Number (SOC).

The Variance Threshold feature selection approach is applied to exclude features (variables) in the dataset that exhibit low variance. The reason for using this strategy is that features with low variance typically exhibit minimal fluctuation throughout the dataset and may not make a substantial contribution to the predictive capability of a model. The variance threshold parameter has been determined using Eq. (1) shown below.

$$ {\text{Variance threshold}}\,\,(i) = \frac{max}{{i \in \left[ {0.05,0.20} \right]}}\,\,AUC - ROC(i) $$

(1)

In essence, we have selected the variance threshold ${\prime}i{\prime}$ from a range of 0.05 to 0.20, with the criterion being that it produces the maximum Area Under the Curve of the Receiver Operating Characteristic (AUC-ROC) score⁵⁰ on the test set. To determine the most effective threshold variance, an experiment is conducted to assess the performance metrics of the model. The threshold is varied from 0.05 to 0.20, as seen in Fig. 5a. A line graph was generated to display the test AUC-ROC score plotted against each variance threshold. Furthermore, a bar graph has been used to display the number of features picked for each variance threshold. From the image, it is evident that a threshold of 0.13 yields the maximum AUC-ROC score of 79.89% with a selection of 196 features. Therefore, the features that have a variance below the stated threshold are excluded, as they are considered to have limited informative value or contribute less to the overall variability in the dataset. The features with variance over 0.13 are retained for further analysis or model training. Finally, since we have 5497 protein nodes in our HIPPIE PPIN, the resulting amino acid composition feature matrix is sized at 5497 × 196.

c. Ensemble feature extraction: In this context, an ensemble methodology is implemented to merge and utilize both feature sets. The hybrid representation combines the topological context recorded by graphlet features with the biochemical insights provided by amino acid composition features. The goal is to effectively utilize the distinct information from each feature set, leading to a stronger and more distinguishable representation of features. In the ensemble feature extraction procedure, the 73-dimensional 2–5 nodes graphlet features are merged, as previously stated, with the 976-dimensional composition-based features. This combination results in a 1049-dimensional feature vector for each protein node. Afterward, the Variance threshold feature selection approach is employed to choose relevant features from the pool of 1049 features. The initial range is set from 0.05 to 0.20 and the AUC-ROC score attained for each threshold value is plotted. The variance threshold has been chosen based on the same technique as the prior feature selection. Choose $i$ such that $i$ ∈ [0.05, 0.20] and $AUC-ROC(i)$ in Eq. (1) is maximized. The plot of the AUC-ROC score (in %) and the number of features selected against each variance threshold within the range of 0.05–0.20 have been represented in Fig. 5b. At last, a feature vector is obtained that consists of 269 dimensions. Thus, for 5497 protein nodes in our PPIN, a feature matrix of size 5497 × 269 is generated to train our model.

Deep neural network model

DL solutions, particularly GNNs tailored for graph domains, have recently become popular for efficient graph analysis. GraphSAGE, a variation of the GNN architecture, is specifically built for large-scale inductive representation learning. It has shown promising results in many graph-based applications, particularly in the field of link prediction⁵¹. It functions as a powerful instrument for creating low-dimensional vector representations of nodes, notably excelling in situations involving graphs with a large number of node properties.

The SageConv architecture, a variant of the GraphSAGE architecture is employed, in our proposed work. The model has a convolutional operator that is more expressive, allowing it to capture more complex characteristics. The SageConv aggregate function that we have considered is the mean aggregator. It utilizes the average of neighbors’ representations, normalized by each neighbor’s degree, to form the aggregate representation. Our model incorporates three SageConv layers, each consisting of 256 units. Each layer is equipped with a Leaky Rectified Linear Unit (Leaky ReLU) activation function and utilizes the dropout technique. The ’ADAM’ optimizer is utilized for our GraphSAGE model, which is a well-acknowledged stochastic gradient descent approach. This optimizer efficiently adjusts the model’s parameters based on the gradient of the loss function, which is computed during the training phase. This contributes to enhancing the model’s AUC-ROC score and reducing overall loss. Figure 6 illustrates the suggested model and its corresponding classifier.

Classification strategy

The challenge of determining whether there is a connection between two protein nodes may be seen as a binary classification issue. A classifier has been developed employing two linear layers, each incorporating a dropout mechanism, to address the issue of overfitting. In the last layer, the sigmoid function is employed as the activation function to forecast the test edges. The predictor’s output has been categorized based on the confidence values for each test edge, which range from 0 to 1, as either 0 or 1 as per Eq. (2). A threshold of 0.5 was selected to assign the final class label to these edges.

$$ FC_{i} \left( t \right) = \left\{ {\begin{array}{ll} {1,} & if\,\,{ Pred_{i} \left( t \right) > 0.5 } \\ {0,} & {otherwise} \\ \end{array} } \right. $$

(2)

In the Eq. (2) provided, ${FC}_{i}\left(t\right)$ represents the final class assignment for the $i$th test edge, and the ${Pred}_{i}\left(t\right)$ represents the probability output score of the sigmoid activation function of the classifier.

Data availability

Data used in this publication are all public datasets and the links to access the same are provided within the manuscript or supplementary information files.

Code availability

The source code as well as the link to the data used in this work is available as open source on GitHub (https://github.com/CMATERJU-BIOINFO/In-Silico-Drug-Repurposing-Methodology-To-Suggest-Therapies-For-Emerging-Threats-like-Mpox).

References

WHO recommends new name for monkeypox disease, https://www.who.int/news/item/28-11-2022-who-recommends-new-name-for-monkeypox-disease (2022).
Xiang, Y. & White, A. Monkeypox virus emerges from the shadow of its more infamous cousin: family biology matters. Emerg. Microbes Infect. 11, 1768–1777. https://doi.org/10.1080/22221751.2022.2095309 (2022).
Article PubMed PubMed Central Google Scholar
Liu, L. Fields Virology, 6th Edition. Clin. Infect. Dis. 59, 613. https://doi.org/10.1093/cid/ciu346 (2014).
Chen, N. et al. Virulence differences between monkeypox virus isolates from West Africa and the Congo basin. Virology 340, 46–63. https://doi.org/10.1016/j.virol.2005.05.030 (2005).
Article CAS PubMed Google Scholar
Multi-country outbreak of mpox, https://www.who.int/publications/m/item/multi-country-outbreak-of-mpox--external-situation-report-33--31-may-2024 (2024).
Tiwari, H. et al. Unlocking the Secret Vault of Promising Drug Targets from Mpox Proteome- A Computational Approach. https://doi.org/10.21203/rs.3.rs-4342258/v1 (2024).
Article Google Scholar
Mehmood, A., Nawab, S., Jia, G., Kaushik, A. C. & Wei, D.-Q. Supervised screening of Tecovirimat-like compounds as potential inhibitors for the monkeypox virus E8L protein. J. Biomol. Struct. Dyn. 1–14. https://doi.org/10.1080/07391102.2023.2245042.
Altayb, H. N. Fludarabine, a potential DNA-dependent RNA polymerase inhibitor, as a prospective drug against Monkeypox virus: A computational approach. Pharmaceuticals 15 (2022).
Alandijany, T. A. et al. A multi-targeted computational drug discovery approach for repurposing tetracyclines against monkeypox virus. Sci. Rep. 13, 14570. https://doi.org/10.1038/s41598-023-41820-z (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Alberts, B. The cell as a collection of protein machines: Preparing the next generation of molecular biologists. Cell 92, 291–294. https://doi.org/10.1016/S0092-8674(00)80922-8 (1998).
Article CAS PubMed Google Scholar
Braun, P. & Gingras, A.-C. History of protein–protein interactions: From egg-white to complex networks. PROTEOMICS 12, 1478–1498. https://doi.org/10.1002/pmic.201100563 (2012).
Article CAS PubMed Google Scholar
Saha, S., Chatterjee, P., Nasipuri, M. & Basu, S. Detection of spreader nodes in human-SARS-CoV protein-protein interaction network. PeerJ 9, e12117. https://doi.org/10.7717/peerj.12117 (2021).
Article PubMed PubMed Central Google Scholar
Zhou, X., Park, B., Choi, D. & Han, K. A generalized approach to predicting protein-protein interactions between virus and host. BMC Genomics 19, 568. https://doi.org/10.1186/s12864-018-4924-2 (2018).
Article CAS PubMed PubMed Central Google Scholar
Ryan, D. P. & Matthews, J. M. Protein–protein interactions in human disease. Curr. Opin. Struct. Biol. 15, 441–446. https://doi.org/10.1016/j.sbi.2005.06.001 (2005).
Article CAS PubMed Google Scholar
Shen, J. et al. Predicting protein–protein interactions based only on sequences information. Proc. Natl. Acad. Sci. 104, 4337–4341. https://doi.org/10.1073/pnas.0607879104 (2007).
Article ADS CAS PubMed PubMed Central Google Scholar
Guo, Y., Yu, L., Wen, Z. & Li, M. Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences. Nucleic Acids Res. 36, 3025–3030. https://doi.org/10.1093/nar/gkn159 (2008).
Article CAS PubMed PubMed Central Google Scholar
Zhou, Y. Z., Gao, Y. & Zheng, Y. Y. in Advances in Computer Science and Education Applications. (eds Mark Zhou & Honghua Tan) 254–262 (Springer Berlin, Heidelberg).
Sun, T., Zhou, B., Lai, L. & Pei, J. Sequence-based prediction of protein protein interaction using a deep-learning algorithm. BMC Bioinformatics 18, 277. https://doi.org/10.1186/s12859-017-1700-2 (2017).
Article CAS PubMed PubMed Central Google Scholar
Li, H., Gong, X.-J., Yu, H. & Zhou, C. Deep neural network based predictions of protein interactions using primary sequences. Molecules 23 (2018).
Singh, R., Devkota, K., Sledzieski, S., Berger, B. & Cowen, L. Topsy-Turvy: Integrating a global view into sequence-based PPI prediction. Bioinformatics 38, i264–i272. https://doi.org/10.1093/bioinformatics/btac258 (2022).
Article PubMed PubMed Central Google Scholar
Jha, K., Saha, S. & Singh, H. Prediction of protein–protein interaction using graph neural networks. Sci. Rep. 12, 8360. https://doi.org/10.1038/s41598-022-12201-9 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Heinzinger, M. et al. Modeling aspects of the language of life through transfer-learning protein sequences. BMC Bioinformatics 20, 723. https://doi.org/10.1186/s12859-019-3220-8 (2019).
Article CAS PubMed PubMed Central Google Scholar
Elnaggar, A. et al. ProtTrans: Toward understanding the language of life through self-supervised learning. IEEE Trans. Pattern Anal. Mach. Intell. 44, 7112–7127. https://doi.org/10.1109/TPAMI.2021.3095381 (2022).
Article PubMed Google Scholar
Wu, J., Liu, B., Zhang, J., Wang, Z. & Li, J. DL-PPI: a method on prediction of sequenced protein–protein interaction based on deep learning. BMC Bioinformatics 24, 473. https://doi.org/10.1186/s12859-023-05594-5 (2023).
Article PubMed PubMed Central Google Scholar
Koca, M. B., Nourani, E., Abbasoğlu, F., Karadeniz, İ & Sevilgen, F. E. Graph convolutional network based virus-human protein-protein interaction prediction for novel viruses. Comput. Biol. Chem. 101, 107755. https://doi.org/10.1016/j.compbiolchem.2022.107755 (2022).
Article CAS PubMed Google Scholar
Hamilton, W. L., Ying, R. & Leskovec, J. In Proceedings of the 31st International Conference on Neural Information Processing Systems 1025–1035 (Curran Associates Inc., Long Beach, California, USA, 2017).
Pržulj, N., Corneil, D. G. & Jurisica, I. Modeling interactome: Scale-free or geometric?. Bioinformatics 20, 3508–3515. https://doi.org/10.1093/bioinformatics/bth436 (2004).
Article CAS PubMed Google Scholar
Pržulj, N. Biological network comparison using graphlet degree distribution. Bioinformatics 23, e177–e183. https://doi.org/10.1093/bioinformatics/btl301 (2007).
Article CAS PubMed Google Scholar
Milenković, T. & Pržulj, N. Uncovering biological network function via graphlet degree signatures. Cancer Inf. 6, CIN.S680. https://doi.org/10.4137/CIN.S680 (2008).
Bhattacharjee, D., Hossain, S. M. M., Sultana, R. & Ray, S. in Pattern Recognition and Machine Intelligence. (eds B. Uma Shankar et al.) 431–437 (Springer International Publishing).
Alanis-Lobato, G., Andrade-Navarro, M. A. & Schaefer, M. H. HIPPIE v2.0: Enhancing meaningfulness and reliability of protein–protein interaction networks. Nucleic Acids Res. 45, D408–D414. https://doi.org/10.1093/nar/gkw985 (2017).
Saha, S., Chatterjee, P., Nasipuri, M., Basu, S. & Chakraborti, T. Computational drug repurposing for viral infectious diseases: a case study on monkeypox. Brief. Funct. Genom. elad058. https://doi.org/10.1093/bfgp/elad058 (2024).
Wishart, D. S. et al. DrugBank: A knowledgebase for drugs, drug actions and drug targets. Nucleic Acids Res. 36, D901–D906. https://doi.org/10.1093/nar/gkm958 (2008).
Article CAS PubMed Google Scholar
Trott, O. & Olson, A. J. AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading. J. Comput. Chem. 31, 455–461. https://doi.org/10.1002/jcc.21334 (2010).
Article CAS PubMed PubMed Central Google Scholar
Fostamatinib-DrugBank, <https://go.drugbank.com/drugs/DB12010> (2023).
Glutamic Acid-DrugBank, 2023).
Cannabidiol-DrugBank, https://go.drugbank.com/drugs/DB09061 (2023).
Copper-DrugBank, https://go.drugbank.com/drugs/DB09130 (2023).
Zinc-DrugBank, https://go.drugbank.com/drugs/DB01593 (2023).
Kipf, T. N. & Welling, M. (2017).
Velickovic, P. et al. in ICLR 2018 (2018).
Fiscon, G. & Paci, P. SAveRUNNER: An R-based tool for drug repurposing. BMC Bioinf. 22, 150. https://doi.org/10.1186/s12859-021-04076-w (2021).
Article CAS Google Scholar
Silhan, J. et al. Discovery and structural characterization of monkeypox virus methyltransferase VP39 inhibitors reveal similarities to SARS-CoV-2 nsp14 methyltransferase. Nat. Commun. 14, 2259. https://doi.org/10.1038/s41467-023-38019-1 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Biovia Discovery Studio https://www.3ds.com/products/biovia/discovery-studio (2023).
Van Der Spoel, D. et al. GROMACS: Fast, flexible, and free. J. Comput. Chem. 26, 1701–1718. https://doi.org/10.1002/jcc.20291 (2005).
Article CAS PubMed Google Scholar
Hollingsworth, S. A. & Dror, R. O. Molecular dynamics simulation for all. Neuron 99, 1129–1143. https://doi.org/10.1016/j.neuron.2018.08.011 (2018).
Article CAS PubMed PubMed Central Google Scholar
Hočevar, T. & Demšar, J. Combinatorial algorithm for counting small induced graphs and orbits. PLOS ONE 12, e0171428. https://doi.org/10.1371/journal.pone.0171428 (2017).
Article CAS PubMed PubMed Central Google Scholar
Pande, A. et al. Pfeature: A tool for computing wide range of protein features and building prediction models. J. Comput. Biol. 30, 204–222. https://doi.org/10.1089/cmb.2022.0241 (2022).
Article CAS PubMed Google Scholar
The UniProt, C. UniProt: A worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515. https://doi.org/10.1093/nar/gky1049 (2019).
Article CAS Google Scholar
Bowers, A. J. & Zhou, X. Receiver operating characteristic (ROC) area under the curve (AUC): A diagnostic measure for evaluating the accuracy of predictors of education outcomes. J. Educ. Stud. Placed Risk (JESPAR) 24, 20–46. https://doi.org/10.1080/10824669.2018.1523734 (2019).
Article Google Scholar
Bhatkar, S., Gosavi, P., Shelke, V. & Kenny, J. In 2023 International Conference on Advanced Computing Technologies and Applications (ICACTA), pp. 1–5.

Download references

Acknowledgements

This work is partially supported by the CMATER research laboratory of the Computer Science and Engineering Department, Jadavpur University, India. Subhadip Basu acknowledges Department of Biotechnology grant (BT/PR16356/BID/7/596/2016) along with Science and Engineering Research Board, grant (SUR/2022/002903), Government of India. Tapabrata Chakraborti is also supported by the Turing-Roche Strategic Partnership.

Author information

These authors contributed equally: Debarati Paul and Sovan Saha.
These authors jointly supervised this work: Subhadip Basu and Tapabrata Chakraborti.

Authors and Affiliations

Department of Computer Science and Engineering, Jadavpur University, Kolkata, India
Debarati Paul & Subhadip Basu
Computer Science and Engineering (Artificial Intelligence and Machine Learning), Techno Main Salt Lake, Kolkata, India
Sovan Saha
Health Sciences Programme, The Alan Turing Institute, London, UK
Tapabrata Chakraborti
Department of Medical Physics and Biomedical Engineering and UCL Cancer Institute, University College London, London, UK
Tapabrata Chakraborti
Embedded Devices & Intelligent Systems, TCS Research & Innovation, Kolkata, India
Debarati Paul

Authors

Debarati Paul
View author publications
Search author on:PubMed Google Scholar
Sovan Saha
View author publications
Search author on:PubMed Google Scholar
Subhadip Basu
View author publications
Search author on:PubMed Google Scholar
Tapabrata Chakraborti
View author publications
Search author on:PubMed Google Scholar

Contributions

S.B., T.C., S.S., and D.P. conceived the idea of the research and wrote the manuscript. S.S. and D.P. conducted the experiment(s). S.B., T.C., S.S. and D.P. analyzed the results. T.C. and S.B. reviewed the manuscript. All authors contributed with idea and method development, scientific question formulation and experimental design, writing and reviweing of the paper. DP and SS did the code development, data handling, running of the experiments and analysis of results.

Corresponding authors

Correspondence to Subhadip Basu or Tapabrata Chakraborti.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Paul, D., Saha, S., Basu, S. et al. Computational analysis of pathogen-host interactome for fast and low-risk in-silico drug repurposing in emerging viral threats like Mpox. Sci Rep 14, 18736 (2024). https://doi.org/10.1038/s41598-024-69617-8

Download citation

Received: 30 April 2024
Accepted: 07 August 2024
Published: 12 August 2024
DOI: https://doi.org/10.1038/s41598-024-69617-8

This article is cited by

Exploration of drug repurposing for Mpox outbreaks targeting gene signatures and host-pathogen interactions
- Saber Imani
- Sargol Aminnezhad
- Zahra Taheri
Scientific Reports (2024)
Structure-based drug designing for potential antiviral activity of selected natural product against Monkeypox (Mpox) virus and its host targets
- Vimal K. Maurya
- Swatantra Kumar
- Shailendra K. Saxena
VirusDisease (2024)

Subjects

Abstract

Similar content being viewed by others

A multi-targeted computational drug discovery approach for repurposing tetracyclines against monkeypox virus

Exploration of drug repurposing for Mpox outbreaks targeting gene signatures and host-pathogen interactions

Multi-omics characterization of the monkeypox virus infection

Introduction

Results

Overview of the data sets

HIPPIE database

MPXV-Human PPIN

Potential Mpox FDA drugs

Prediction of PPI using ensemble feature on HIPPIE dataset

Prediction of PHIs using ensemble feature on MPXV-Human PPIN

Drug repurposing study on MPXV-Human PPIN

Computational docking of potential drugs for MPXV protein structures

Molecular dynamic simulation of three potential identified drugs for MPXV protein structures

Discussion

Methods

Feature extraction

Deep neural network model

Classification strategy

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher's note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

This article is cited by

Exploration of drug repurposing for Mpox outbreaks targeting gene signatures and host-pathogen interactions

Structure-based drug designing for potential antiviral activity of selected natural product against Monkeypox (Mpox) virus and its host targets

Search

Quick links