Table 1 Topology- and statistics-based methods for module validation.
From: Quantitative assessment of gene expression network module-validation methods
No. | Type | Index | Equation | Criteria | Application | Test data | Ref. |
|---|---|---|---|---|---|---|---|
Topological validation | |||||||
1 | Integrated index | Zsummary |
| ≥10, strongly preserved; 2~10, moderately preserved; ≤2, no preservation | Composite preservation statistics to validate whether a module is significantly preserved in another network. Apply to correlation networks (e.g., co-expression networks) | yes | |
2 | ZsummaryADJ | ≥10, strongly preserved; 2~10, moderately preserved; ≤2, no preservation | Same as above. Apply to general networks (e.g., adjacency matrix networks) | yes | |||
3 | medianRank | The lower the better | Same as above. | yes | |||
4 | Single index | Entropy |
| The smaller the better | Access the quality of identified modules. A good quality module is expected to have a low entropy. | no | |
5 | Mpres | Mpres = cor(kl,km) | The closer to 1, the better | Describe the preservation of intra-modular connectivity across two networks. A p-value can be assigned to evaluate the reproducibility of modules. | yes | ||
6 | NB value |
| NB ≥ 0.5 | A ratio of edges within a module and the total number of edges between modules is used to select modules with high intra-modular connectivity. | no | ||
7 | CS (S) | CS (S) > 0, the higher the better | Describe the compactness and neighboring conditions of a cluster. Apply to select good clusters from integrated clustering results | no | |||
8 | LS (S) | The higher the better | Judge the quality of a cluster S in a graph G and help to select good clusters from integrated clustering results. | no | |||
9 | Modularity |
| 0.3 ≤ Q ≤ 0.7 | Evaluate the level of modular structure and the best split of a network into modules. | no | ||
Statistical validation | |||||||
1 | Integer linear programming | C · (X1, X2, …, Xk) |
| C ≤ 0, the smaller the better | A classifier and integer linear programming model to select modules based on the activity of the module in case and control samples. | yes | |
2 | Bootstrap resampling | P-value | NULL | P ≤ 0.05 | P-value is derived from multiscale bootstrap resampling to assess the uncertainty of clustering analysis and search for significant modules. | no | |
3 | Consensus score |
| ≥ρ, the higher the better | A jackknife resampling procedure is used to assess the accuracy and robustness of functional modules resulting in an ensemble of optimal modules. | no | ||
4 | Permutation test | Combinatorial p-value | NULL | Combinatorial criteria: (1) P(Zm) < 0.05; (2) PGL, PnSNPs, Ptopo < 0.05; (3) Pemp < 0.05 Additional criteria: P(Zm(eval)) and/or Pemp(eval) < 0.05 | Significance and permutation tests are used to calculate the P value of module scores. Appropriate for GWAS data; multiple GWAS datasets are needed when using additional criteria. | yes | |
5 | coClustering (q) | ≥95% | A cross-tabulation-based statistic for determining whether modules in the reference dataset are preserved in a test dataset, a permutation test to determine the p value. | yes | |||
6 | Modular compatibility | Compatibility Score (Cp) | The closer to 1, the better | An indication of agreement or overlap between two sets of modules to measure the network modular compatibility between two networks. | yes | ||
7 | Matching p-value | NULL | P < 0.05 | Modified hypergeometric test-derived p-values with Bonferroni correction to measure modules’ conservation between any two species or networks. | yes | ||
8 | IGP | The closer to 1, the better | Defined to validate an individual cluster’s reproducibility and prediction accuracy. | yes | |||






