Table 1 Computational Methods tested in the INMC PPI prediction project
From: Assessment of community efforts to advance network-based prediction of protein–protein interactions
ID Synopsis | U/S | Ref |
|---|---|---|
I. Similarity-based: the existence probability of a link is measured as the prior knowledge-based similarity between two nodes \((i,j)\). | ||
1. Common Neighbor (CN): similarity of a link is computed as the number of common neighbors between two nodes. | U | |
2. Resource Allocation (RA): similarity of a link is computed as resource allocation of node pair. | U | |
3. Preferential Attachment (PA): similarity of a link is computed as the degree product of node pair. | U | |
4. Jaccard Index (JC): similarity of a link is computed as Jaccard index of node pair. | U | |
5. Adamic Adar (AA): similarity of a link is defined as the Adamic-Adar index. | U | |
6. Katz: similarity of a link is defined as the Katz index. | U | |
7. Similarity (SIM): similarity score integrating L3 and Jaccard index. | U | |
8. Ensemble: integrate several similarity scores. | U | |
9. Maximum similarity, Preferential attachment Score (MPS(T)): integrate two scores of topological features. | U | |
10. Maximum similarity, Preferential attachment and sequence Score (MPS(B&T))*: integrate two scores of topological features and one score from the protein sequence. | U | |
11. Root Noise Model (RNM): an ensemble method that integrates a diagonal noise model, a spectral noise model and L3 model. | U | |
12. L3: paths of length three capture similarity to existing partners. | U | |
13. CRA: similarity of a link is computed as CAR-based resource allocation. | U | |
II. Probabilistic methods: assume that real networks have some structure, i.e., community structure. The goal of these algorithms is to select model parameters that can maximize the likelihood of the observed structure. | ||
1. Stochastic Block Model (SBM): assume nodes are distributed into blocks and links between two nodes depends on the block they belong to. | U | |
2. Repulsive Graph Signal Processing (RepGSP): learn PPIs via graph signal processing. RepGSP rewards links between “repulsive nodes” (i.e., nodes belonging to different communities). | U | |
III. Factorization-based methods: factorize the network adjacency matrix to reduce the high-dimensional nodes in the graph into a lower dimensional representation space by conserving the node neighborhood structures. | ||
1. Non-Negative Matrix Factorization (NNMF): dimension reduction using non-negative matrix factorization and the existence probability is defined as the cosine similarity of latent features. | U | |
2. Geometric Laplacian Eigenmap Embedding (GLEE): dimension reduction using Geometric Laplacian Eigenmap, then defines existence probability as the cosine similarity of latent features. | U | |
3. Spectral Clustering (SPC): dimension reduction using symmetric normalized Laplacian matrix, then the existence probability is defined as the cosine similarity of latent features. | U | |
IV. Machine Learning: methods based on machine learning techniques. | S | |
1. Conditional Generative Adversarial Network (cGAN): generative adversarial network performing image-to-image translation conditioned on either embedding (cGAN1) or raw information (cGAN2) of the network topology. | U | |
2. Skip similarity Graph Neural Network (SkipGNN): receive neural messages from two-hop and immediate neighbors in the interaction network and non‐linearly transforms the messages. | S | |
3. Subgraphs, Embedding and Attributes for Link prediction (SEAL)*: learn general graph structure features from local enclosing subgraphs. | S | |
V. Diffusion-based methods: methods using techniques based on the analysis of the information diffusion over the network, e.g., random walks. This includes methods integrating techniques of other categories. | ||
1. Average Commute Time (ACT): similarity is defined as the average number of movements/steps required by a random walker to reach the destination node and come back to the starting node | U | |
2. Random Walks with Restart (RWR): similarity is defined as the probability of a random walker node to reach the target node. | U | |
3. Structural-Context Similarity (SimRank): measure the structural context similarity and shows object-to-object relationships. | U | |
4. Deep Neural Network and Feature Representations for Nodes (DNN + node2vec): compute node and edge embeddings by the node2vec, then feeds the results into a deep neural network. | S | |
5. Random Watcher-Walker (RW2)*: integrate network construction, network representation learning and classification. | U | |