Abstract
Computational drug repositioning seeks to identify new therapeutic uses for existing or experimental drugs. Network-based methods are effective as they integrate relationships among drugs, diseases, and target proteins/genes into prediction models. However, traditional approaches often rely on a single phenotype-based disease similarity network, limiting the diversity of disease information. In this study, we constructed three disease similarity networks—phenotypic, ontological, and molecular—using data from OMIM, Human Phenotype Ontology annotations, and gene interaction network, respectively. These were integrated into disease multiplex networks and multiplex-heterogeneous networks. We applied a tailored Random Walk with Restart (RWR) algorithm to predict novel drug-disease associations. Experimental results show that both disease multiplex and multiplex-heterogeneous networks outperform their single-layer counterparts in leave-one-out cross-validation. Using 10-fold cross-validation, our method, MHDR, outperformed the state-of-the-art methods TP-NRWRH, DDAGDL and RGLDR, demonstrating the advantage of integrating multiple disease similarity networks. We predicted novel drug-disease associations by ranking candidates, identifying 68 associations supported by shared proteins/genes, 1,064 by shared pathways, and 84 by shared protein complexes, with many validated by clinical trials, underscoring the practical impact of our approach.
Similar content being viewed by others
Introduction
Drug repositioning, which identifies new clinical indications for existing or experimental drugs, offers a cost-effective alternative to traditional drug discovery, known for being time-consuming and expensive1. Historically, clinical observations have driven successful repositioning cases. For instance, bevacizumab, initially developed for metastatic colon cancer and non-small cell lung cancer by targeting angiogenesis, has been repurposed to treat exudative macular degeneration by slowing abnormal retinal vascularization2. Similarly, sildenafil now addresses erectile dysfunction and pulmonary hypertension, thalidomide treats severe erythema nodosum leprosum, and retinoids are used for acute promyelocytic leukemia3,4. Studies by Ashburn et al. and Sardana et al. further highlight successful examples of drug repositioning5,6. To scale this process, computational methods—including machine learning, network-based, and matrix-based approaches—have emerged for in silico prediction of drug-disease associations7,8.
Network-based methods are particularly effective, as they integrate known relationships among drugs, diseases, and target proteins/genes into prediction models9. These methods often rely on the principle that similar drugs can treat diseases with shared pathogenesis or symptoms, constructing heterogeneous networks where drug and disease similarity networks are linked via drug-disease associations. For example, Wu et al. used network clustering to identify drug-disease modules in a heterogeneous network10, while Luo et al. applied a Bi-Random Walk with Restart (MBiRWR) algorithm on a similar network11. RLSDR employed semi-supervised learning to rank candidate diseases12, and TP-NRWRH introduced a two-pass random walk with restart approach to predict new drug indications13. More recent methods, such as DDAGDL14 and RGLDR15, leverage advanced techniques like graph attention networks and regulation-aware learning within complex heterogeneous information networks (HINs), incorporating drug-protein and protein-disease associations to enhance prediction accuracy.
Despite these advancements, a key limitation persists: most methods use a single phenotype-based disease similarity network, typically derived from MimMiner16, which computes similarity using MeSH terms from OMIM disease descriptions17. This approach overlooks diverse disease relationships, such as molecular and ontological similarities, limiting prediction accuracy18. Early methods relied on MeSH vocabularies16 or shared genes19, but recent developments, including the Human Phenotype Ontology (HPO), enable more comprehensive semantic similarity measures20,21,22. Recognizing that diseases are characterized by multiple dimensions—phenotypic, molecular, and ontological—integrating these perspectives can improve disease relationship modeling.
In this study, we constructed three disease similarity networks: DiSimNetO (phenotypic, based on MimMiner/OMIM), DiSimNetH (ontological, using HPO annotations), and DiSimNetG (molecular, derived from HumanNet gene interactions). These were integrated into disease multiplex networks (e.g., DiSimNetOHG) and multiplex-heterogeneous networks (e.g., DrSimNetP-DiSimNetOHG) using known drug-disease associations. We adapted a Random Walk with Restart (RWR) algorithm to rank candidate diseases, predicting novel drug-disease associations. Our approach outperformed single-layer networks in leave-one-out cross-validation (LOOCV) and surpassed state-of-the-art methods like TP-NRWRH, DDAGDL, and RGLDR in 10-fold cross-validation, demonstrating the value of multi-source disease similarity integration. We further validated predictions with clinical evidence, identifying numerous drug-disease associations supported by shared genes, pathways, and protein complexes.
Materials and methods
Here, we first introduce how networks of drugs and diseases (including single/monoplex drug/disease similarity networks, disease multiplex networks, heterogeneous networks of drugs and diseases, and multiplex-heterogeneous networks of drugs and diseases) were constructed. Then, we described how the random walk with restart (RWR) algorithm was adapted to rank candidate diseases on these networks, ultimately to predict novel drug-disease associations.
Network construction
In this section, we describe the construction of drug and disease networks. First, we constructed single/monoplex drug/disease similarity networks. Then, we combined them to form multiplex, heterogeneous and multiplex-heterogeneous networks using known drug-disease associations (Fig. 1).
Network construction. First, single/monoplex drug/disease similarity networks are constructed, including: (a) DrSimNetC, a drug similarity network constructed by computing the chemical structure-based similarity between two drugs; (b) DiSimNetO, a disease similarity network constructed based on a disease vocabulary database, MeSH, which is used to annotate diseases in OMIM medical description records; (c) DiSimNetH, a disease similarity network constructed using a disease annotation database, HPO, by a semantic similarity measure; and (d) DiSimNetG, a disease similarity network built based on known associated genes and interactions in a gene network, HumanNet. Second, (e) a disease multiplex network is constructed using two or three monoplex disease similarity networks (e.g., DiSimNetOHG). Finally, (f) a heterogeneous network of drugs and diseases is constructed by connecting a drug similarity network and a disease similarity network through known drug-disease associations (e.g., DrSimNetC-DiSimNetO), and (g) a multiplex-heterogeneous network is formed by connecting a drug similarity network with a disease multiplex network using known drug-disease associations (e.g., DrSimNetC-DiSimNetOHG).
Construction of monoplex drug/disease similarity networks
Drug similarity networks
First, we collected a drug similarity network from a previous study PREDICT23. This network includes 593 drugs and 175,528 associations between them (shortly called DrSimNetP). This network covers only a small set of drugs available in public databases. Therefore, we additionally constructed a larger one under the hypothesis that drugs with similar chemical structures would have similar therapeutic functions and can be used to treat similar diseases. More specifically, we computed the similarity between each pair of drugs based on their chemical structures using the SIMCOMP tool24 for 7,838 drugs collected from the KEGG database25. We eventually obtained 887,883 interactions with a positive similarity between every pair of the drugs to construct a drug similarity network (shortly called DrSimNetC) (Fig. 1(a)).
Disease similarity networks
First, we constructed a disease similarity network using a disease phenotype similarity matrix collected from MimMiner16. This matrix was constructed based on the similarity between disease phenotypes represented by OMIM records17, which describe diseases as genetic disorders using natural language. Each element of the matrix represents the degree of phenotypic similarity between two diseases, normalized to [0,1]. To ensure a reliable and sparse network, we selected the five nearest neighbors (kLN = 5) with the highest similarity scores for each disease, resulting in a phenotypic disease similarity network with 19,791 interactions among 5,080 phenotypes (shortly called DiSimNetO) (Fig. 1(b)). The choice of kLN = 5 balances network connectivity with specificity, prioritizing the most robust associations. To explore the impact of this parameter, we constructed alternative DiSimNetO networks with kLN = 10, kLN = 15, and a similarity threshold (sim ≥ 0.3), evaluating their performance in Supplementary Figures S1–S4, as discussed in the Results section.
Second, we constructed another disease similarity network based on the Human Phenotype Ontology (HPO). To this end, we first mapped each disease to one OMIM record, then annotated the OMIM with HPO terms using the HPO annotation database26. The similarity between two HPO terms was calculated based on the information content (IC) of each term, defined as follows:
where \(\:p\left(t\right)\) is the probability of term ttt occurring in the HPO annotation database, computed as \(\:p\left(t\right)=\frac{f\left(t\right)}{f\left(root\right)}\), with \(\:f\left(t\right)=Annot\left(t\right)+\:\sum\:_{c\in\:Children\left(t\right)}f\left(c\right)\). Here, Annot(t) is the number of phenotypes annotated with term ttt, and Children(t) is the set of child terms of ttt in the HPO graph. The semantic similarity between two HPO terms ti and tj is calculated using the most informative common ancestor approach27:
where \(\:P\left({t}_{i},{t}_{j}\right)\) is the set of shared ancestors. The similarity between diseases di and dj is:
This value is normalized to [0,1]:
By selecting the five nearest neighbors for each node, we constructed an HPO-based disease similarity network with 34,476 interactions among 6,521 disease phenotypes (DiSimNetH) (Fig. 1(c)).
Finally, we constructed a disease similarity network based on known disease-associated genes and a gene network28. Using 3,229 diseases from OMIM and the HumanNet gene-gene similarity network29, we defined the similarity between diseases di and dj as the similarity between their associated gene sets G1 and G2:
This resulted in a disease similarity network (DiSimNetG) with 82,241 interactions among 3,229 diseases (Fig. 1(d)).
Construction of multiplex disease similarity networks
To enhance the prediction of drug-disease associations, we constructed a disease multiplex network by integrating three distinct disease similarity networks—phenotypic (DiSimNetO), molecular (DiSimNetG), and HPO-based (DiSimNetH)—as layers, allowing Random Walk with Restart (RWR) to capture complementary disease relationships across multiple biological perspectives. Each layer shares the same set of disease nodes, but edges are weighted by different similarity measures, reflecting phenotypic, molecular, and ontological relationships, respectively. Two or three disease similarity networks can be connected to form a multiplex disease similarity network with two or three layers, respectively (Table 1).
Construction of heterogeneous and multiplex-heterogeneous networks
Heterogeneous and multiplex-heterogeneous networks were constructed by connecting a drug similarity network with a monoplex and multiplex disease similarity network, respectively, using known drug-disease associations. Thus, we collected the known associations from PREDICT23. There are 1,933 associations between 593 drugs and 313 diseases. Each drug similarity network can be connected with a monoplex disease similarity network (Fig. 1(f)); thus, six heterogeneous networks were finally obtained. Similarly, each drug similarity network can be connected with a disease multiplex network to form a multiplex-heterogeneous network; thus, a total of eight multiplex-heterogeneous networks can be obtained (Fig. 1(g)). (Table 2).
Random walk with restart algorithm on networks of drugs and diseases
In this section, we describe the Random Walk with Restart (RWR) algorithm adapted to rank candidate diseases for a given drug, predicting novel drug-disease associations. The RWR algorithm simulates a walker traversing a network, starting from source nodes (e.g., a drug and its known associated diseases), moving to neighboring nodes with probabilities defined by edge weights, and occasionally restarting at the source nodes. This process ranks nodes (diseases) based on their steady-state probabilities, reflecting their relevance to the source nodes. We apply RWR to monoplex and multiplex disease similarity networks, as well as heterogeneous and multiplex-heterogeneous networks, to leverage diverse disease and drug similarity information.
RWR algorithm on monoplex and multiplex networks of diseases
For a monoplex disease similarity network (Fig. 1(b-d)), the RWR algorithm ranks diseases based on their proximity to a set of source nodes (S), which includes all diseases known to be associated with a drug of interest, dr. The algorithm balances exploration (moving to adjacent nodes) and exploitation (returning to source nodes) to identify diseases likely to be associated with dr.
Parameters and variables
-
\(\:{A}_{D}\): Adjacency matrix of the monoplex disease similarity network (n ⋅ n), where \(\:{{(A}_{D})}_{i,j}\) represents the similarity between diseases \(\:{d}_{i}\) and\(\:\:{d}_{j}\).
-
\(\:{M}^{D}\): Transition matrix, obtained by column-normalizing \(\:{A}_{D}\), where \(\:{{{\left({M}^{D}\right)}_{i,j\:=}\:(A}_{D})}_{i,j}/\sum\nolimits_{k}{{(A}_{D})}_{k,j}\) denotes the probability of moving from disease \(\:{d}_{i}\) and\(\:\:{d}_{j}\).
-
\(\:{P}_{t}^{D}\): Probability vector (n ⋅ 1) at step t, where \(\:{\left({P}_{t}^{D}\right)}_{i}\) is the probability of the walker being at disease \(\:{d}_{i}\).
-
\(\:{P}_{0}^{D}\): Initial probability vector (n ⋅ 1), defined as:
where (|S|) is the number of source nodes.
-
\(\:\gamma\:\): Restart probability (γ ∈ (0, 1)), controlling the likelihood of the walker returning to the source nodes at each step. A higher γ emphasizes the influence of source nodes, while a lower γ allows more exploration.
-
\(\:{P}_{\infty\:}^{D}\): Steady-state probability vector, where \(\:{\left({P}_{\infty\:}^{D}\right)}_{i}\) represents the relative importance of disease \(\:{d}_{i}\) to the source nodes.
RWR equation and derivation
The RWR algorithm updates the probability vector iteratively until convergence. At each step, the walker either moves to a neighboring node with probability \(\:1-\gamma\:\) or restarts at the source nodes with probability \(\:\gamma\:\). The update rule is:
-
Step 1: Initialize \(\:{P}_{0}^{D}\) based on the source nodes (S).
-
Step 2: Compute the transition matrix \(\:{M}^{D}\) by normalizing \(\:{A}_{D}\).
-
Step 3: At each iteration, calculate \(\:{P}_{t+1}^{D}\):
-
The term \(\:\left(1-\gamma\:\right){M}^{D}{P}_{t}^{D}\) represents the probability of moving to neighboring nodes, weighted by the transition probabilities in \(\:{M}^{D}\).
-
The term \(\:{\gamma\:P}_{0}^{D}\) represents the probability of restarting at the source nodes.
-
-
Step 4: Repeat until \(\:{P}_{t}^{D}\) converges to \(\:{P}_{\infty\:}^{D}\), where (||\(\:{P}_{t+1}^{D}\) - \(\:{P}_{t}^{D}\)|| < \(\:\epsilon\:\)) (e.g., \(\:\epsilon\:\) = 10−6).
-
Step 5: Rank diseases based on \(\:{\left({P}_{\infty\:}^{D}\right)}_{i}\), with higher values indicating stronger associations with dr.
For a multiplex disease similarity network (Fig. 1(e)), multiple disease similarity networks (layers) are integrated, sharing the same set of diseases but with different similarity measures (e.g., DiSimNetO, DiSimNetH, DiSimNetG). The walker can move within a layer or jump between layers.
Additional parameters for multiplex networks
-
L: Number of layers in the multiplex network.
-
\(\:{A}_{D}^{\left[i\right]}\): Adjacency matrix of the disease similarity network at layer i (i = 1,…, L).
-
\(\:\delta\:\): Between-disease-disease-network jumping probability (\(\:\delta\:\)∈ [0,1]), controlling the likelihood of the walker jumping between layers.
-
\(\:\tau\:\): Weight vector (\(\:\left[{\tau\:}_{1},\dots\:,{\tau\:}_{L}\right]\)), where \(\:{\tau\:}_{i}\) represents the importance of layer i. We set \(\:{\tau\:}_{i}\) = 1/L for equal contribution.
-
\(\:{A}_{D}^{M}\): Adjacency matrix of the multiplex network, defined as:
where I is the identity matrix, and off-diagonal blocks allow inter-layer transitions.
-
\(\:{M}^{M}\): Transition matrix of the multiplex network, derived by column-normalizing \(\:{A}_{D}^{M}\).
-
\(\:{P}_{t}^{M}\): Probability matrix (n ⋅ L) at step t, where \(\:{P}_{t}^{i}\) is the probability vector for layer i.
-
\(\:{P}_{0}^{M}\): Initial probability matrix, set as \(\:{P}_{0}^{M}=\tau\:{P}_{0}^{D}\).
RWR equation for multiplex networks
The update rule for the multiplex network is:
-
Step 1: Initialize \(\:{P}_{0}^{M}\) using \(\:\tau\:\) and \(\:{P}_{0}^{D}\).
-
Step 2: Construct \(\:{A}_{D}^{M}\) and compute \(\:{M}^{M}\).
-
Step 3: Update \(\:{P}_{t}^{M}\) iteratively, where the walker moves within or between layers based on \(\:{M}^{M}\).
-
Step 4: Compute the steady-state probability \(\:{P}_{\infty\:}^{M}\). The final score for each disease is the geometric mean of steady-state probabilities across layers.
-
Step 5: Rank diseases based on their scores.
RWR algorithm on heterogeneous and multiplex-heterogeneous networks
In a heterogeneous network (Fig. 1(f)), a drug similarity network and a disease similarity network are connected via known drug-disease associations (bipartite network). The RWR algorithm ranks both drugs and diseases simultaneously, using the drug of interest dr and its known associated diseases as source nodes. The multiplex-heterogeneous network (Fig. 1(g)) extends this by integrating a drug similarity network with a multiplex disease similarity network.
Parameters and variables for heterogeneous networks
-
\(\:{A}_{Dr}\): Adjacency matrix of the drug similarity network (m ⋅ m), where \(\:{\left({A}_{Dr}\right)}_{i,j}\) is the similarity between drugs \(\:d{r}_{i}\) and\(\:\:d{r}_{j}\).
-
\(\:{A}_{D}\): Adjacency matrix of the disease similarity network (n ⋅ n).
-
B: Adjacency matrix of the bipartite drug-disease network, where \(\:{\left(B\right)}_{i,j}=1\) if drug \(\:d{r}_{i}\) is associated with disease dj, and 0 otherwise.
-
\(\:{A}^{H}\): Adjacency matrix of the heterogeneous network:
-
\(\:{M}^{H}\): Transition matrix of the heterogeneous network, defined as:
-
where:
-
\(\:{\:M}_{Dr}^{H}\): Intra-subnetwork transition matrix for the drug similarity network.
-
\(\:{M}_{D}^{H}\): Intra-subnetwork transition matrix for the disease similarity network.
-
\(\:{M}_{DrD}^{H}\), \(\:{M}_{DDr}^{H}\): Inter-subnetwork transition matrices for drug-to-disease and disease-to-drug transitions.
-
-
\(\:\lambda\:\): Between-drug-disease-network jumping probability (\(\:\lambda\:\)∈[0,1]), controlling the likelihood of the walker jumping between drug and disease networks.
-
\(\eta\): Importance weight (\(\eta\) ∈ [0,1]), balancing the contribution of the drug and disease networks in the initial probability vector.
-
\(\:{P}_{t}^{H}\): Probability vector ((m + n) ⋅ 1) at step t.
-
\(\:{P}_{0}^{H}\): Initial probability vector, defined as:
where
Transition matrix derivation
The transition matrix \(\:{M}^{H}\) is computed as follows:
-
For \(\:{M}_{DrD}^{H}\):
-
For \(\:{M}_{DDr}^{H}\):
-
For \(\:{\:M}_{Dr}^{H}\):
-
For \(\:{M}_{D}^{H}\):
RWR equation for heterogeneous networks
The update rule is:
-
Step 1: Initialize \(\:{P}_{0}^{H}\) using \(\:{P}_{0}^{D}\), \(\:{P}_{0}^{Dr}\), and \(\eta\).
-
Step 2: Compute \(\:{M}^{H}\) using the above transition matrices.
-
Step 3: Update \(\:{P}_{t}^{H}\) iteratively, allowing the walker to move within or between subnetworks.
-
Step 4: Rank diseases based on the disease-related portion of \(\:{P}_{\infty\:}^{H}\).
Multiplex-Heterogeneous networks
For a multiplex-heterogeneous network (Fig. 1(g)), the drug similarity network is connected to each layer of a multiplex disease similarity network via L identical bipartite networks. The bipartite adjacency matrix is:
The adjacency matrix is:
The transition matrix \(\:{M}^{MH}\) and RWR equation follow the same structure as the heterogeneous case:
where \(\:{P}_{t+1}^{MH}\), \(\:{P}_{t}^{MH}\), and \(\:{P}_{0}^{MH}\) are of dimension (n⋅L + m), and \(\:{P}_{0}^{MH}=\left[\begin{array}{c}\left(1-\right){P}_{0}^{H}\\\:{P}_{0}^{Dr}\end{array}\right]\).
The final disease scores are computed as the geometric mean across layers, similar to the multiplex case.
Performance evaluation
The prediction performance was assessed using a leave-one-out cross-validation (LOOCV) scheme for each drug, structured as a binary classification task where positive instances are known drug-disease associations, and negative instances are all other diseases not known to be associated with the drug. Given a drug dr, let D be the set of its known associated diseases (positive samples) and C be the set of candidate diseases (negative samples, i.e., all diseases not in D). For each disease s∈D, we perform the following steps:
-
1.
Hold-Out: Remove the association between dr and s, treating s as a test sample.
-
2.
Seed Nodes: Set the remaining known associated diseases D\{s} as seed nodes (S).
-
3.
Ranking: Apply the Random Walk with Restart (RWR)-based ranking method to score all diseases in C∪{s}.
-
4.
Evaluation: Repeat this process for each s∈D, evaluating the ranking of s relative to C.
LOOCV was chosen over k-fold cross-validation to maximize the number of drugs included in the evaluation. In the PREDICT dataset (1,933 associations between 593 drugs and 313 diseases), 171 drugs have only one known associated disease. In k-fold cross-validation, each drug requires at least k associated diseases to be split into k folds, excluding these 171 drugs for k > 1. LOOCV, requiring only one associated disease, ensures all 593 drugs are evaluated, providing a more comprehensive assessment.
Performance is evaluated using two metrics: the Area Under the Receiver Operating Characteristic Curve (AUROC) and the Area Under the Precision-Recall Curve (AUPRC). For a given threshold τ, we compute true positives (TP), false negatives (FN), false positives (FP), and true negatives (TN) as follows:
where rank(s), rank(c) denote the rank of the held-out disease s and candidate disease c in C∪{s}, and I(⋅) is the indicator function. The True Positive Rate (TPR) and False Positive Rate (FPR) are defined as:
By varying τ from 1 to the size of C∪{s}, we plot the ROC curve (TPR vs. FPR) and compute AUROC.
Additionally, we compute precision and recall to plot the Precision-Recall curve and calculating AUPRC.
AUROC measures overall discriminative ability, while AUPRC is particularly informative for our highly imbalanced dataset, where positive associations (~ 0.07% of all drug-disease pairs, or ~ 1:1,428 positive-to-negative ratio) are sparse. AUPRC emphasizes the model’s ability to correctly identify true positives among the minority class, complementing AUROC.
Results
Parameter settings
The prediction performance of the RWR-based methods on different types of drug and disease networks depends on their parameters. The RWR-based methods have been studied for predicting disease-associated biomarkers, such as the prediction of disease-associated genes30,31,32. Thus, its parameters, including the restart probability (\(\:\gamma\:\)), the between-drug-disease-network jumping probability (\(\:\lambda\:\)), and the importance weight (\(\:)\) of layers in a heterogeneous network, have been well studied, and the prediction performance was shown to be stable to changes in these parameters.
The goal of this study is to prove that the integration of multiple disease similarity networks can improve the prediction of drug-disease associations. Therefore, we only investigated the change of the restart probability (\(\:\gamma\:)\) (i.e., the core parameter of the RWR-based methods) and the between-disease-disease-network jumping probability (\(\:\delta\:\)). In particular, two networks were considered: a disease multiplex network (DiSimNetOHG) and a multiplex-heterogeneous network (DrSimNetP-DiSimNetOHG). All monoplex disease networks were considered equal in contributing to the prediction performance; thus, we set \(\:{\tau\:}_{1}={\tau\:}_{2}=\dots\:{\tau\:}_{L}=1/L\).
Firstly, to investigate \(\:\gamma\:,\) we varied it in [0.1, 0.3, 0.5, 0.7, 0.9] and kept other parameters constants, i.e., \(\:\delta\:\) = \(\:\lambda\:\) =\(\eta\)= 0.5, then we assessed the prediction performance of the RWR-based algorithm on the two investigated networks. Figure 2(a) shows that the performance of the multiplex-heterogeneous network is nearly unchanged; meanwhile, that of the multiplex network was slightly worse when \(\:\gamma\:\) increased.
Finally, we varied \(\:\delta\:\) in [0.1, 0.3, 0.5, 0.7, 0.9] and kept other parameters constants, i.e., \(\:\gamma\:\) = \(\:\lambda\:\) =\(\eta\)= 0.5, to investigate its effect on the prediction performance on the two networks. As a result, we observed similar results that when the \(\:\delta\:\) increases the prediction performance of the multiplex-heterogeneous network remains stable; meanwhile, that of the multiplex network slightly decreases (Fig. 2(b)).
Taken together, the prediction performance of the RWR-based methods is stable with respect to changes in parameters; thus, we set \(\:\gamma\:\) = \(\:\delta\:\) = \(\:\lambda\:\) =\(\eta\)= 0.5 for all other experiments.
Integration of multiple disease similarity networks improves the prediction performance
Impact of neighbor selection in disease similarity network construction
To assess the impact of the neighbor selection parameter in constructing the DiSimNetO network, we evaluated four variants: the original DiSimNetO with kLN = 5 (19,791 interactions among 5,080 phenotypes), and three alternatives with kLN = 10, kLN = 15, and a similarity threshold (sim ≥ 0.3). These variants were tested across monoplex, multiplex, heterogeneous (with DrSimNetP), and multiplex-heterogeneous (with DrSimNetP) networks, using AUROC and AUPRC metrics (Supplementary Figures S1–S4). For monoplex networks (Figure S1), DiSimNetO (kLN = 10) achieved the highest performance (AUROC = 0.835, AUPRC = 0.016), while DiSimNetO (kLN = 15) had the lowest AUROC (0.659) and DiSimNetO (sim ≥ 0.3) had the lowest AUPRC (0.006). In multiplex networks (Figure S2), DiSimNetOHG (kLN = 15) yielded the highest AUROC (0.871), and DiSimNetOHG (kLN = 5, 10) tied for the highest AUPRC (0.019). For heterogeneous networks (Figure S3), DrSimNetP-DiSimNetO (kLN = 10) matched the original kLN = 5 for the highest AUROC (0.979), with kLN = 15 achieving the highest AUPRC (0.116). In multiplex-heterogeneous networks (Figure S4), all DiSimNetOHG networks have similar AUROC (0.987–0.988), and DiSimNetOHG (kLN = 10) the highest AUPRC (0.271). Across all scenarios, multiplex and multiplex-heterogeneous networks outperformed their monoplex and heterogeneous counterparts, consistent with our main findings. The original choice of kLN = 5 performed competitively (e.g., AUROC = 0.987, AUPRC = 0.269 for DrSimNetP-DiSimNetOHG), supporting its use, though kLN = 10 or 15 slightly improved performance in some cases. The similarity threshold (sim ≥ 0.3) generally underperformed, likely due to reduced network connectivity.
Disease multiplex and single/monoplex disease similarity networks
We compared the prediction performance of RWR-based methods on three monoplex disease similarity networks (DiSimNetO, DiSimNetH, DiSimNetG) and four multiplex networks (DiSimNetOH, DiSimNetOG, DiSimNetHG, DiSimNetOHG). Figure 3(a) shows AUROC results, with DiSimNetO achieving the highest monoplex performance (AUROC = 0.815) compared to DiSimNetH (AUROC = 0.762) and DiSimNetG (AUROC = 0.743). Multiplex networks outperformed monoplex networks, with DiSimNetOH achieving AUROC = 0.866 and DiSimNetOG achieving AUROC = 0.777, DiSimNetHG at AUROC = 0.748, and DiSimNetOHG at AUROC = 0.841. Figure 3(b) presents AUPRC results, reflecting the imbalanced nature of the dataset (baseline AUPRC ~ 0.0007). DiSimNetO achieved the highest monoplex AUPRC of 0.011 (~ 16 times the baseline), followed by DiSimNetH (AUPRC = 0.003, ~ 4 times) and DiSimNetG (AUPRC = 0.002, ~ 3 times). Multiplex networks showed superior performance: DiSimNetOH (AUPRC = 0.020, ~ 29 times), DiSimNetOG (AUPRC = 0.012, ~ 17 times), DiSimNetHG (AUPRC = 0.003, ~ 4 times), and DiSimNetOHG (AUPRC = 0.019, ~ 27 times). These results confirm that integrating multiple disease similarity networks enhances the ability to identify true positives, particularly in sparse settings.
Performance comparison between single/monoplex disease similarity networks and multiplex disease networks. (a) AUROC curves for monoplex (DiSimNetO, DiSimNetH, DiSimNetG) and multiplex (DiSimNetOH, DiSimNetOG, DiSimNetHG, DiSimNetOHG) networks. (b) AUPRC curves, highlighting performance on the imbalanced dataset (baseline AUPRC ~ 0.0007).
Multiplex-heterogeneous and heterogeneous networks
We evaluated heterogeneous and multiplex-heterogeneous networks constructed with two drug similarity networks (DrSimNetP and DrSimNetC). For DrSimNetP, Fig. 4(a) shows that all multiplex-heterogeneous networks achieved AUROC ≥ 0.98, outperforming heterogeneous networks (e.g., DrSimNetP-DiSimNetO, AUROC = 0.979). Figure 4(c) presents AUPRC results, with heterogeneous networks ranging from AUPRC = 0.030 (~ 43 times the baseline) for DrSimNetP-DiSimNetH to AUPRC = 0.066 (~ 94 times) for DrSimNetP-DiSimNetG. Multiplex-heterogeneous networks performed significantly better: DrSimNetP-DiSimNetOH (AUPRC = 0.169, ~ 241 times), DrSimNetP-DiSimNetOG (AUPRC = 0.224, ~ 320 times), DrSimNetP-DiSimNetHG (AUPRC = 0.219, ~ 313 times), and DrSimNetP-DiSimNetOHG (AUPRC = 0.269, ~ 384 times). For DrSimNetC, Fig. 4(b) shows multiplex-heterogeneous networks with AUROC ≥ 0.97, surpassing heterogeneous networks (e.g., DrSimNetC-DiSimNetG, AUROC = 0.953). Figure 4(d) reports AUPRC values, with heterogeneous networks from AUPRC = 0.027 (~ 39 times) for DrSimNetC-DiSimNetH to AUPRC = 0.062 (~ 89 times) for DrSimNetC-DiSimNetG, and multiplex-heterogeneous networks ranging from AUPRC = 0.158 (~ 226 times) for DrSimNetC-DiSimNetOH to AUPRC = 0.321 (~ 459 times) for DrSimNetC-DiSimNetOHG. These AUPRC results highlight the superior performance of multiplex-heterogeneous networks in distinguishing true positives, complementing the AUROC findings and confirming the benefit of integrating multiple disease similarity networks.
Impact of the integration of disease similarity networks on prediction performance
The disease multiplex network integrates DiSimNetO, DiSimNetG, and DiSimNetH as layers, where each layer is a weighted graph with diseases as nodes and edges weighted by similarity scores. DiSimNetO uses MimMiner phenotypic similarities based on OMIM records. DiSimNetG measures molecular similarity using gene sets from OMIM and interactions from HumanNet. DiSimNetH computes semantic similarity using HPO annotations, normalized via Resnik’s method.
In the multiplex network, all layers share the same set of disease nodes, corresponding to the union of nodes across DiSimNetO, DiSimNetG, and DiSimNetH. This results in 7,496 diseases for DiSimNetOHG (Table 1), combining all unique diseases from the individual networks. Diseases present in one layer but not others have edges in those layers weighted as zero (i.e., no connection). For instance, a disease present in DiSimNetH but not DiSimNetG will have no neighbors in the DiSimNetG layer. Each disease node exists in all layers, connected by intra-layer edges (weighted by the respective similarity measure) and inter-layer edges (connecting the same disease across layers). We constructed four multiplex networks (Table 1), with DiSimNetOHG combining all three layers. Inter-layer connections are weighted by a jumping probability (\(\:\delta\:\)), tuned to 0.5, balancing intra-layer and cross-layer propagation in RWR (see “Materials and Methods” section).
The multiplex framework enables RWR to traverse both within and between layers, capturing diverse disease relationships that a single layer might miss. Intra-layer edges reflect similarity within a specific perspective (e.g., phenotypic similarity in DiSimNetO), while inter-layer edges (weighted by \(\:\delta\:\)) allow the walker to jump between layers, integrating complementary signals. For example, two diseases with low phenotypic similarity in DiSimNetO might be strongly connected in DiSimNetG due to shared genes, and inter-layer jumps enable RWR to combine these signals, improving the ranking of related diseases. This cross-layer interaction is crucial for identifying indirect relationships, such as when a third disease bridges two diseases phenotypically in DiSimNetO and molecularly in DiSimNetG, enhancing the overall connectivity and robustness of disease similarity estimates.
Mathematically (see “Materials and Methods” section), the multiplex adjacency matrix (\(\:{A}_{D}^{M}\)) incorporates both intra-layer adjacency matrices (e.g., \(\:{A}_{D}^{\left[1\right]}\) for DiSimNetO) and inter-layer transitions, with off-diagonal blocks weighted by (\(\:\frac{\delta\:}{(L-1)}\)). The transition matrix \(\:{M}^{M}\) normalizes these weights, and RWR updates probabilities across layers, with the final disease score computed as the geometric mean of steady-state probabilities across layers. This integration ensures that diseases are ranked based on a holistic view of their similarities, leveraging all available data sources.
The multiplex network’s ability to integrate diverse similarity measures significantly improves prediction performance, as shown in Figs. 3 and 4. For example, compared to monoplex networks, DiSimNetOHG achieved an AUROC of 0.841 and AUPRC of 0.019 (27 times the baseline of 0.0007), outperforming DiSimNetO (AUROC 0.815, AUPRC 0.011), DiSimNetG (AUROC 0.743, AUPRC 0.002), and DiSimNetH (AUROC 0.762, AUPRC 0.003) (Fig. 3). In addition, the prediction of drug valsartan (KEGG ID: D00400) for hypertension (MIMID: 145500) benefited from DiSimNetG’s molecular similarity (shared gene AGTR1) (Table 3). This demonstrates that the multiplex framework, by leveraging interactions across layers, enhances the identification of novel drug-disease associations, aligning with clinical evidence and improving practical utility.
Compare with other methods
Recently, a state-of-the-art network-based method, TP-NRWRH33, was proposed, utilizing a heterogeneous network of drugs and diseases based on the PREDICT dataset. TP-NRWRH employs a two-pass random walk with restart on this network, predicting potential drug-disease associations based on the mean probability of the two passes. In their study, TP-NRWRH outperformed methods for drug-disease association prediction (e.g., MBiRWR11 and DrugNet34) and drug-target interaction prediction (e.g., NBI35, HGBI36, KBMF2 K37 and DT-Hybrid38) using 10-fold cross-validation on PREDICT. For a fair comparison, we evaluated our best method, MHDR—using the RWR-based approach on the multiplex-heterogeneous network DrSimNetP-DiSimNetOHG—with the same 10-fold cross-validation scheme on PREDICT. MHDR achieved an AUROC of 0.965 and an AUPRC of 0.158, surpassing TP-NRWRH’s reported AUROC of 0.9394. The enhanced performance is due to the integration of two additional disease similarity networks (DiSimNetH and DiSimNetG) in DrSimNetP-DiSimNetOHG, demonstrating that incorporating more disease similarity networks improves prediction accuracy.
To further validate MHDR’s performance, we compared it with recent state-of-the-art methods, including DDAGDL14 and RGLDR15. DDAGDL constructs a heterogeneous information network (HIN) of drugs, diseases, and proteins using known drug-disease, drug-protein, and protein-disease associations. It employs an autoencoder to generate initial representations from SMILES for drugs, MeSH structures for diseases, and protein sequences for proteins, followed by a graph attention network (GAT)39 to learn high-quality representations of drugs and diseases, and XGBoost for prediction. In contrast, MHDR relies solely on drug-disease associations, a drug similarity network (constructed based on chemical structure similarity), and disease similarity networks (constructed using phenotypic similarities from OMIM records and HPO terms, as well as molecular data), using a tailored Random Walk with Restart (RWR) algorithm. For a fair comparison, we adapted MHDR by using GAT to learn high-quality representations of drugs and diseases, followed by XGBoost for prediction. Although MHDR does not use an autoencoder to derive initial representations from SMILES, MeSH structures, or protein sequences, the process of constructing drug and disease similarity networks inherently embeds this information into the similarities between drugs and between diseases. Using the same negative sample selection and 10-fold cross-validation, DDAGDL achieved an AUROC of 0.9593 and an AUPRC of 0.0396, while MHDR outperformed it with an AUROC of 0.965 and an AUPRC of 0.158.
Similarly, RGLDR constructs a heterogeneous biological network of drugs, diseases, and proteins, using random walks with metapaths (e.g., drug → protein → disease) to build regulation graphs and a regulation-aware graph representation learning approach to learn representations of drugs and diseases, followed by XGBoost for prediction. To enable a fair comparison, we adapted MHDR by defining metapaths within the drug and disease similarity networks (e.g., drug → drug → drug, disease → disease → disease, due to the absence of protein data in our similarity networks). We then learned the representations of drugs and diseases based on these metapaths, followed by XGBoost for prediction. Using the same negative sample selection and evaluation scheme, RGLDR achieved an AUROC of 0.9633 and an AUPRC of 0.044, showing a comparable AUROC to MHDR (0.965) but a lower AUPRC (0.158 vs. 0.044). These results suggest that MHDR’s integration of multiple disease similarity networks enhances prediction performance.
Prediction of novel drug-disease associations
In the previous section, we demonstrated that our Random Walk with Restart (RWR)-based methods, particularly on the multiplex-heterogeneous network DrSimNetP-DiSimNetOHG, achieved superior prediction performance, with stable results across parameter variations and high AUROC and AUPRC scores. Here, we leverage this network to predict novel drug-disease associations and validate their practical relevance through clinical evidence. For each drug, we used known associated diseases as seed nodes, ranked other diseases in the disease multiplex network, and selected the top 10 highly ranked diseases as promising candidates for evidence search (Table S1 in Supporting Information). To demonstrate the model’s real-world utility, we provide case studies for associations supported by shared genes/proteins and shared pathways/protein complexes, validated by clinical trials from ClinicalTrials.gov using “rclinicaltrials” R package, addressing the need for practical effectiveness beyond theoretical metrics.
Case studies: associations supported by shared genes/proteins
We identified 68 drug-disease associations with shared genes/proteins, where disease-associated genes from OMIM17 and drug targets from KEGG25 overlap, indicating potential drug repositioning opportunities40 (Table S2). Of these, 16 were supported by clinical trials. Below, we highlight three representative case studies from Table 3, showcasing their biological grounding and clinical trial support.
-
Testosterone (KEGG ID: D00075) and Prostate Cancer (MIMID: 176807): Testosterone activates the androgen receptor (AR), driving prostate cancer proliferation. However, bipolar androgen therapy uses high-dose testosterone to disrupt AR signaling, offering a novel treatment for castration-resistant cases. Clinical trials, such as NCT04558866, confirm its therapeutic potential by demonstrating efficacy in advanced prostate cancer. The shared gene AR provides a strong biological basis for this association.
-
Argatroban (KEGG ID: D00181) and Ischemic Stroke (MIMID: 601367): Argatroban inhibits thrombin (F2), preventing clot propagation in ischemic stroke caused by thromboembolism. Trials like NCT03552354 validate its efficacy in reducing stroke progression, aligning with F2’s role in coagulation pathways. This association highlights the model’s ability to identify clinically relevant treatments.
-
Risperidone (KEGG ID: D00426) and Obsessive-Compulsive Disorder (OCD) (MIMID: 164230): Risperidone’s antagonism of the serotonin 5-HT2 A receptor (HTR2 A) enhances SSRI efficacy in treatment-resistant OCD. NCT00389493 demonstrates significant symptom improvement, supporting its off-label use. The shared gene HTR2 A underpins the biological rationale for this prediction.
These cases, detailed in Table 3 and Table S2, illustrate how our model’s predictions are supported by clinical trial evidence, demonstrating practical utility.
Case studies: associations supported by shared pathways and protein complexes
Another indirect approach that can also support evidence for drug repositioning is based on shared protein complexes41,42 or pathways43,44,45,46 between a drug of interest and its candidate diseases. It has been proposed that the targeted protein complexes that drugs and diseases share can be used to estimate the possibility of a drug-disease association47. Furthermore, a drug’s biological pathway may indicate the presence of a disease caused by its target. If targets of a drug and genes underlying a disease not currently treated by the drug are involved in the same pathway, it may indicate a promising association between the drug and the disease. Therefore, protein complexes or pathways that the drug of interest and its candidate diseases share could be exploited as evidence supporting drug repositioning. Here, we identified 1,064 and 84 drug-disease associations supported by shared KEGG pathways (Table S3) and CORUM48 protein complexes (Table S4), respectively, with 73 supported by both (Tables S5). Of these, 30 were validated by clinical trials. Below, we present case studies for these nine associations from Table 4, each backed by shared molecular mechanisms and clinical evidence.
-
Estrone (KEGG ID: D00067) and Breast Cancer (MIMID: 114480): Estrone promotes breast cancer growth through estrogen signaling pathway (map04915) and receptor complexes (Er-alpha-p53-hdm2 complex and ESR1-MAGEA2-TP53 complex). Trials like NCT01089049 validate its role in hormone therapy, supporting strategic therapeutic applications. Figure 5 (a) visualizes the connection between the Estrone and breast cancer via shared pathways and protein complexes.
-
Testosterone (KEGG ID: D00075) and Prostate Cancer (MIMID: 176807): Testosterone drives prostate cancer via androgen pathways (map05215) and receptor complexes, but bipolar androgen therapy disrupts tumor growth. NCT04558866 confirms efficacy in advanced cases, consistent with shared gene (AR) evidence (Table 3).
-
Argatroban (KEGG ID: D00181) and Ischemic Stroke (MIMID: 601367): Argatroban inhibits coagulation cascades (map04610), preventing stroke-related clots. NCT03552354 validates its efficacy in acute stroke, aligning with shared gene (F2) findings.
-
Dexamethasone (KEGG ID: D00292) and Schizophrenia (MIMID: 181500): Dexamethasone modulates neuroactive ligand-receptor interactions (map04080), potentially reducing schizophrenia’s stress-related symptoms. NCT01310140 supports its therapeutic potential in early psychosis.
-
Valsartan (KEGG ID: D00400) and Hypertension (MIMID: 145500):
-
Valsartan blocks renin-angiotensin signaling (map04614), effectively lowering blood pressure. NCT01878201 confirms its efficacy in hypertension, consistent with shared gene (AGTR1) evidence. Figure 5(b) visualizes the association between Valsartan and Hypertension via shared pathways and protein complexes.
-
Risperidone (KEGG ID: D00426) and Obsessive-Compulsive Disorder (MIMID: 164230): Risperidone modulates serotonergic pathways (map04726), enhancing OCD treatment. NCT00389493 confirms symptom improvement, aligning with shared gene (HTR2 A) findings.
-
Olanzapine (KEGG ID: D00454) and Attention Deficit-Hyperactivity Disorder (MIMID: 143465): Olanzapine targets dopaminergic pathways (map04728), reducing ADHD impulsivity. NCT00205699 validates its efficacy in severe cases, consistent with shared gene (DRD4).
-
Prednisolone (KEGG ID: D00472) and Schizophrenia (MIMID: 181500): Prednisolone modulates stress-related pathways (map04080), potentially alleviating schizophrenia symptoms. NCT03340909 supports its exploration as a novel treatment.
-
Prochlorperazine (KEGG ID: D00493) and Schizophrenia (MIMID: 181500): Prochlorperazine blocks dopaminergic pathways (map04728), reducing psychotic symptoms. NCT02600741 confirms its efficacy in acute schizophrenia.
These nine associations, supported by shared pathways, protein complexes, and clinical trials, highlight the practical impact of our DrSimNetP-DiSimNetOHG network. Estrone drives breast cancer growth via estrogen signaling (map04915), with trials like NCT01089049 supporting its use in hormone therapy. Testosterone and argatroban excel in prostate cancer and ischemic stroke, with NCT04558866 and NCT03552354 confirming efficacy, mirroring shared gene findings. Dexamethasone and prednisolone offer novel schizophrenia treatments by targeting stress pathways (map04080), validated by trials like NCT01310140 and NCT03340909. Valsartan tackles hypertension through renin-angiotensin signaling (map04614), with NCT01878201 proving effectiveness. Risperidone and olanzapine enhance OCD and ADHD treatments via serotonin and dopamine pathways (map04726, map04728), backed by NCT00389493 and NCT00205699. Prochlorperazine controls schizophrenia symptoms, with NCT02600741 supporting its role. These cases, detailed in Table 4 and Table S5, show our model’s predictions translate to clinical applications.
To validate practical relevance, we confirmed that clinical trial outcomes align with our predictions. For risperidone in OCD, NCT00389493 demonstrates significant symptom reduction, supporting its predicted role in HTR2 A-driven efficacy. In prostate cancer, NCT04558866’s therapeutic benefits validate testosterone’s predicted disruption of AR signaling. Argatroban’s predicted stroke benefit (pathway map04610, gene F2) is supported by NCT03552354’s efficacy in reducing progression. These alignments, cross-referenced with shared gene associations, confirm that our model identifies clinically actionable drug-disease links, enhancing its real-world utility.
This section, supported by case studies and clinical trial validations, demonstrates that our multiplex-heterogeneous network not only achieves high predictive performance (AUROC/AUPRC) but also identifies drug-disease associations with tangible clinical impact, paving the way for innovative drug repositioning.
Visualization of drug-disease associations via shared KEGG pathways and CORUM protein complexes, predicted by the DrSimNetP-DiSimNetOHG network. (a) The drug Estrone (KEGG ID: D00067) and breast cancer (MIMID: 114480) are linked by six shared pathways (e.g., estrogen signaling pathway, pathways in cancer) and two protein complexes (e.g., ER-alpha-p53-HDM2 complex, ESR1-MAGEA2-TP53 complex), supported by clinical trials like NCT01089049. (b) The drug Valsartan (KEGG ID: D00400) and Hypertension, Essential (MIMID: 145500) are connected by 14 shared pathways (e.g., renin-angiotensin system, neuroactive ligand-receptor interaction) and two protein complexes (e.g., AGTR1-AGTR2 complex, AGTR1-MAS1 complex), validated by trials such as NCT01878201. Diseases, drugs, pathways, and protein complexes are represented as nodes in blue squares, red circles, pink triangles, and purple diamonds, respectively. Green solid lines indicate associations, with edge weights reflecting the strength of connectivity between nodes.
Conclusions and discussion
Drug repositioning seeks to identify new indications for existing drugs, a process often limited by the scarcity of off-label uses discovered in clinical practice. Computational methods, particularly network-based approaches, offer a scalable solution by leveraging relationships among drugs, diseases, and target proteins/genes. These methods typically assume that similar drugs can treat similar diseases, using drug and disease similarity networks to predict associations. However, reliance on single disease similarity networks restricts the diversity of disease information, limiting prediction accuracy. As diseases often share molecular causes and phenotypic markers, integrating multiple disease similarity perspectives can enhance prediction performance.
In this study, we constructed three disease similarity networks—DiSimNetO (phenotypic, OMIM), DiSimNetH (ontological, HPO), and DiSimNetG (molecular, HumanNet)—and integrated them into multiplex disease similarity networks (e.g., DiSimNetOHG) and multiplex-heterogeneous networks (e.g., DrSimNetP-DiSimNetOHG). Using a tailored Random Walk with Restart (RWR) algorithm, we predicted novel drug-disease associations. Experimental results demonstrate that these integrated networks outperform single-layer networks in leave-one-out cross-validation. In 10-fold cross-validation, our method, MHDR, surpassed the state-of-the-art method TP-NRWRH, DDAGDL and RGLDR, highlighting the benefit of multi-source disease similarity integration. The practical utility of our approach is evidenced by the prediction of 68 drug-disease associations supported by shared proteins/genes, 1,064 by shared pathways, and 84 by shared protein complexes, with many associations validated by clinical trials (e.g., testosterone for prostate cancer, NCT04558866; valsartan for hypertension, NCT01878201).
Recent methods like DDAGDL and RGLDR leverage advanced representation learning within complex heterogeneous information networks. DDAGDL uses graph attention networks to learn high-quality drug and disease representations, while RGLDR employs regulation-aware graph learning with metapaths, both integrating drug-protein and protein-disease associations. Their approaches have been extended to predict lncRNA-miRNA interactions, expanding their utility in bioinformatics49. However, their dependence on protein data can limit applicability to datasets, which lacks such associations. In contrast, MHDR’s integration of diverse disease similarity networks—embedding phenotypic, ontological, and molecular information—offers greater adaptability while achieving superior performance.
Other recent methods also advance drug discovery and treatment innovation. 3DSMILES-GPT50 facilitates drug discovery by generating three-dimensional molecular structures based on target profiles, aiding the design of novel therapeutics. AMP-Designer51 uses large language models to design antimicrobial peptides, providing new avenues for addressing infectious diseases. These innovations complement network-based methods like ours, offering diverse strategies for drug repositioning and development.
Our predictions, supported by molecular evidence (shared genes, pathways, protein complexes) and clinical trials, underscore the potential of our method for practical drug repositioning52. Looking ahead, drug similarities—crucial for understanding therapeutic mechanisms—can be further explored by constructing diverse drug similarity networks and integrating them into a multiplex-heterogeneous framework, potentially enhancing prediction accuracy53.
Data availability
The data and source code are hosted on Github (https://github.com/hauldhut/MHDR).
References
Nosengo, N. Can you teach old drugs new tricks? Nat. News. 534(7607), 314 (2016).
Rich, R. M. et al. Short-Term Safety and Efficacy of Intravitreal Bevacizumab (Avastin) for Neovascular Age-Related Macular Degeneration. Retina 26(5), 495–511 (2006).
Aronson, J. K. Old drugs – new uses. Br. J. Clin. Pharmacol. 64(5), 563–565 (2007).
Sirota, M. et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci. Transl Med., 3. (2011).
Ashburn, T. T. & Thor, K. B. Drug repositioning: identifying and developing new uses for existing drugs. Nat. Rev. Drug Discov. 3(8), 673–683 (2004).
Sardana, D. et al. Drug repositioning for orphan diseases. Brief. Bioinform., 12. (2011).
Luo, H. et al. Biomedical data and computational models for drug repositioning: a comprehensive review. Brief. Bioinform. 22(2), 1604–1619 (2020).
Jarada, T. N., Rokne, J. G. & Alhajj, R. A review of computational drug repositioning: strategies, approaches, opportunities, challenges, and directions. J. Cheminform. 12(1), 46 (2020).
Lotfi Shahreza, M. et al. A review of network-based approaches to drug repositioning. Brief. Bioinform. 19(5), 878–892 (2017).
Wu, C. et al. Computational drug repositioning through heterogeneous network clustering. BMC Syst. Biol. 7(Suppl 5), S6–S6 (2013).
Luo, H. et al. Drug repositioning based on comprehensive similarity measures and Bi-Random walk algorithm. Bioinformatics 32(17), 2664–2671 (2016).
Le, D. H. & Nguyen-Ngoc, D. Drug repositioning by integrating known Disease-Gene and Drug-Target associations in a Semi-supervised learning model. Acta Biotheor., (2018).
Liu, H. et al. Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinform. 17(Suppl 17), 539 (2016).
Zhao, B. W. et al. A geometric deep learning framework for drug repositioning over heterogeneous information networks. Brief. Bioinform., 23(6). (2022).
Zhao, B. W. et al. Regulation-aware graph learning for drug repositioning over heterogeneous biological network. Inf. Sci. 686, 121360 (2025).
van Driel, M. A. et al. A text-mining analysis of the human phenome. Eur. J. Hum. Genet. 14(5), 535–542 (2006).
Amberger, J. S. et al. OMIM.org: online Mendelian inheritance in man (OMIM®), an online catalog of human genes and genetic disorders. Nucleic Acids Res. 43(D1), D789–D798 (2014).
Cheng, L. et al. Computational methods for identifying similar diseases. Mol. Therapy - Nucleic Acids. 18, 590–604 (2019).
Goh, K. I. et al. The human disease network. Proc. Natl. Acad. Sci. 104(21), 8685–8690 (2007).
Köhler, S. et al. The human phenotype ontology project: linking molecular biology and disease through phenotype data. Nucleic Acids Res. 42(D1), D966–D974 (2014).
Groza, T. et al. The human phenotype ontology: semantic unification of common and rare disease. Am. J. Hum. Genet. 97(1), 111–124 (2015).
Le, D. H., Pham, B. S. & Dao, A. M. Assessing human disease phenotype similarity based on ontology, in RIVF. 2016, IEEE: Hanoi. pp. 211–216. 2016, IEEE: Hanoi. pp. 211–216. (2016).
Gottlieb, A. et al. PREDICT: a method for inferring novel drug indications with application to personalized medicine. Mol. Syst. Biol., 7(1). (2011).
Hattori, M. et al. SIMCOMP/SUBCOMP: chemical structure search servers for network analyses. Nucleic Acids Res. 38(suppl 2), W652–W656 (2010).
Kanehisa, M. et al. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res. 38(suppl 1), D355–D360 (2009).
Köhler, S. et al. The human phenotype ontology in 2017. Nucleic Acids Res. 45(D1), D865–D876 (2016).
Resnik, P. Using information content to evaluate semantic similarity in a taxonomy, in Proceedings of the 14th international joint conference on Artificial intelligence - Volume 1. Morgan Kaufmann Publishers Inc.: Montreal, Quebec, Canada. (1995).
Cheng, L. et al. SemFunSim: A new method for measuring disease similarity by integrating semantic and gene functional association. PLOS ONE. 9(6), e99415 (2014).
Hwang, S. et al. HumanNet v2: human gene networks for disease research. Nucleic Acids Res. 47(D1), D573–D580 (2018).
Kohler, S. et al. Walking the interactome for prioritization of candidate disease genes. Am. J. Hum. Genet. 82(4), 949–958 (2008).
Li, Y. & Patra, J. C. Genome-wide inferring gene-phenotype relationship by walking on the heterogeneous network. Bioinformatics 26(9), 1219–1224 (2010).
Valdeolivas, A. et al. Random walk with restart on multiplex and heterogeneous biological networks. Bioinformatics 35(3), 497–505 (2018).
Liu, H. et al. Inferring new indications for approved drugs via random walk on drug-disease heterogenous networks. BMC Bioinform. 17(17), 539 (2016).
Martínez, V. et al. DrugNet: Network-based drug–disease prioritization by integrating heterogeneous data. Artif. Intell. Med. 63(1), 41–49 (2015).
Cheng, F. et al. Prediction of drug-Target interactions and drug repositioning via Network-Based inference. PLoS Comput. Biol. 8(5), e1002503 (2012).
Wang, W., Yang, S. & Li, J. DRUG TARGET PREDICTIONS BASED ON HETEROGENEOUS GRAPH INFERENCE, in Biocomputing 2013. WORLD SCIENTIFIC. pp. 53–64. (2013).
Gönen, M. Predicting drug–target interactions from chemical and genomic kernels using bayesian matrix factorization. Bioinformatics 28(18), 2304–2310 (2012).
Alaimo, S. et al. Drug–target interaction prediction through domain-tuned network-based inference. Bioinformatics 29(16), 2004–2008 (2013).
Veličković, P. et al. Graph attention networks. arXiv preprint arXiv:1710.10903, (2017).
Le, D. H. & Nguyen-Ngoc, D. Drug repositioning by integrating known Disease-Gene and Drug-Target associations in a Semi-supervised learning model. Acta. Biotheor. 66(4), 315–331 (2018).
Wang, F. et al. Human Protein Complex Signatures for Drug Repositioning, in Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics. Association for Computing Machinery: Niagara Falls, NY, USA. pp. 42–50. (2019).
Yu, L. et al. Inferring drug-disease associations based on known protein complexes. BMC Med. Genom. 8(2), S2 (2015).
Mejía-Pedroza, R. A., Espinal-Enríquez, J. & Hernández-Lemus, E. Pathway-Based drug repositioning for breast Cancer molecular subtypes. Front. Pharmacol., 9(905). (2018).
Pan, Y. et al. Pathway analysis for drug repositioning based on public database mining. J. Chem. Inf. Model. 54(2), 407–418 (2014).
Pham, M. et al. Discovery of disease- and drug-specific pathways through community structures of a literature network. Bioinformatics 36(6), 1881–1888 (2019).
Pratanwanich, N. & Lió, P. Pathway-based bayesian inference of drug–disease interactions. Mol. Biosyst. 10(6), 1538–1548 (2014).
Xuan, P. et al. Prediction of potential drug–Disease associations through deep integration of diversity and projections of various drug features. Int. J. Mol. Sci. 20(17), 4102 (2019).
Ruepp, A. et al. CORUM: the comprehensive resource of mammalian protein complexes. Nucleic Acids Res. 36(suppl 1), D646–D650 (2008).
Zhao, B. W. et al. A heterogeneous information network learning model with neighborhood-level structural representation for predicting lncRNA-miRNA interactions. Comput. Struct. Biotechnol. J. 23, 2924–2933 (2024).
Wang, J. et al. 3DSMILES-GPT: 3D molecular pocket-based generation with token-only large Language model. Chem. Sci. 16(2), 637–648 (2025).
Wang, J. et al. Discovery of antimicrobial peptides with notable antibacterial potency by an LLM-based foundation model. Sci. Adv. 11(10), eads8932 (2025).
Brown, A. S. & Patel, C. J. A standard database for drug repositioning. Sci. Data. 4, 170029 (2017).
Huang, L. et al. Drug–drug similarity measure and its applications. Brief. Bioinform., 22(4). (2020).
Author information
Authors and Affiliations
Contributions
The author contributed to all aspects of this study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Le, DH. Improving computational drug repositioning through multi-source disease similarity networks. Sci Rep 15, 30773 (2025). https://doi.org/10.1038/s41598-025-04772-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-04772-0