Table 1 Computational Methods tested in the INMC PPI prediction project

From: Assessment of community efforts to advance network-based prediction of protein–protein interactions

ID Synopsis

U/S

Ref

I. Similarity-based: the existence probability of a link is measured as the prior knowledge-based similarity between two nodes \((i,j)\).

 

1. Common Neighbor (CN): similarity of a link is computed as the number of common neighbors between two nodes.

U

61

2. Resource Allocation (RA): similarity of a link is computed as resource allocation of node pair.

U

62

3. Preferential Attachment (PA): similarity of a link is computed as the degree product of node pair.

U

63

4. Jaccard Index (JC): similarity of a link is computed as Jaccard index of node pair.

U

64

5. Adamic Adar (AA): similarity of a link is defined as the Adamic-Adar index.

U

65

6. Katz: similarity of a link is defined as the Katz index.

U

66

7. Similarity (SIM): similarity score integrating L3 and Jaccard index.

U

67

8. Ensemble: integrate several similarity scores.

U

 

9. Maximum similarity, Preferential attachment Score (MPS(T)): integrate two scores of topological features.

U

68

10. Maximum similarity, Preferential attachment and sequence Score (MPS(B&T))*: integrate two scores of topological features and one score from the protein sequence.

U

68

11. Root Noise Model (RNM): an ensemble method that integrates a diagonal noise model, a spectral noise model and L3 model.

U

38

12. L3: paths of length three capture similarity to existing partners.

U

38

13. CRA: similarity of a link is computed as CAR-based resource allocation.

U

69

II. Probabilistic methods: assume that real networks have some structure, i.e., community structure. The goal of these algorithms is to select model parameters that can maximize the likelihood of the observed structure.

 

1. Stochastic Block Model (SBM): assume nodes are distributed into blocks and links between two nodes depends on the block they belong to.

U

70

2. Repulsive Graph Signal Processing (RepGSP): learn PPIs via graph signal processing. RepGSP rewards links between “repulsive nodes” (i.e., nodes belonging to different communities).

U

71,72,73

III. Factorization-based methods: factorize the network adjacency matrix to reduce the high-dimensional nodes in the graph into a lower dimensional representation space by conserving the node neighborhood structures.

 

1. Non-Negative Matrix Factorization (NNMF): dimension reduction using non-negative matrix factorization and the existence probability is defined as the cosine similarity of latent features.

U

74

2. Geometric Laplacian Eigenmap Embedding (GLEE): dimension reduction using Geometric Laplacian Eigenmap, then defines existence probability as the cosine similarity of latent features.

U

75

3. Spectral Clustering (SPC): dimension reduction using symmetric normalized Laplacian matrix, then the existence probability is defined as the cosine similarity of latent features.

U

76

IV. Machine Learning: methods based on machine learning techniques.

S

 

1. Conditional Generative Adversarial Network (cGAN): generative adversarial network performing image-to-image translation conditioned on either embedding (cGAN1) or raw information (cGAN2) of the network topology.

U

57

2. Skip similarity Graph Neural Network (SkipGNN): receive neural messages from two-hop and immediate neighbors in the interaction network and non‐linearly transforms the messages.

S

17

3. Subgraphs, Embedding and Attributes for Link prediction (SEAL)*: learn general graph structure features from local enclosing subgraphs.

S

16

V. Diffusion-based methods: methods using techniques based on the analysis of the information diffusion over the network, e.g., random walks. This includes methods integrating techniques of other categories.

 

1. Average Commute Time (ACT): similarity is defined as the average number of movements/steps required by a random walker to reach the destination node and come back to the starting node

U

32

2. Random Walks with Restart (RWR): similarity is defined as the probability of a random walker node to reach the target node.

U

77

3. Structural-Context Similarity (SimRank): measure the structural context similarity and shows object-to-object relationships.

U

78

4. Deep Neural Network and Feature Representations for Nodes (DNN + node2vec): compute node and edge embeddings by the node2vec, then feeds the results into a deep neural network.

S

58,79,80

5. Random Watcher-Walker (RW2)*: integrate network construction, network representation learning and classification.

U

81

  1. The U/S column is valued with U for unsupervised methods, otherwise S for supervised or semi-supervised methods. The * symbol indicates the level-2 methods, which make use of node or node-pair attributes.