Table 1 Module detection methods evaluated in this study
From: A comprehensive evaluation of module detection methods for gene expression data
| Â | Clustering: grouping genes based on a global similarity in gene expression profiles |
A | FLAME: fuzzy clustering by selecting cluster supporting objects based on the K-nearest neighbor density estimation |
B | K-medoids: iteratively refines the centers (which are individual genes) and the average dissimilarity within the cluster |
C | K-medoids (see B) but with automatic module number estimation |
D | Fuzzy c-means: similar to k-means (see F), but using fuzzy instead of crisp cluster memberships |
E | Self-organizing maps: maps each gene on a node embedded in a two-dimensional graph structure |
F | K-means: iteratively refines the mean expression with a cluster and the within-cluster sum of squares |
G | MCL: simulates random walks within the co-expression graph by alternative steps of expansion and inflation |
H | Spectral clustering: applies K-means in the subspace defined by the eigenvectors of the Pearson’s correlation affinity matrix |
I | Affinity propagation: clustering by exchange of messages between genes |
J | Spectral clustering: applies K-means in the subspace defined by the eigenvectors of the K-nearest-neighbor graph |
K | Transitivity clustering: tries to find the transitive co-expression graph in which the total cost of added and removed edges is minimized |
L | WGCNA: agglomerative hierarchical clustering (see M), but using the topological overlap measure and a dynamic tree cutting algorithm to implicitly determine the number of modules |
M | Agglomerative hierarchical clustering: generates a hierarchical structure by progressively grouping genes and clusters based on their similarity |
N | Hybrid hierarchical clustering: combination of agglomerative and divisive hierarchical clustering |
O | Divisive hierarchical clustering: generates a hierarchical structure by progressively splitting the genes into clusters |
P | Agglomerative hierarchical clustering (see M), but with automatic module number estimation |
Q | SOTA: combination of self-organizing maps and divisive hierarchical clustering |
R | First finds cluster centers by searching for high-density regions, each gene is then assigned to the cluster of its nearest neighbor of higher density |
S | CLICK: uses density estimation to find tight groups of similar genes, after which these are expanded into modules |
T | DBSCAN: groups genes within core, non-core and outlier genes based on the number of neighbors |
U | Clues: first applies a shrinking procedure which moves each gene towards nearby high-density regions, after which the genes are partitioned into an automatically determined number of clusters using the silhouette width |
V | Mean shift: moves each gene towards nearby high density regions until convergence |
| Â | Decomposition: extracting the components corresponding to co-expression modules by decomposing the expression matrix in a product of smaller matrices |
A | Independent component analysis: decomposes the expression matrix into a set of independent components using the FastICA algorithm, detects potentially overlapping modules within each source signal using false-discovery rate (FDR) estimation |
B | Similar to A, but detects two modules per independent component depending on whether genes have positive or negative weights |
C | Similar to A, but detects modules within each source signal using z-scores |
D | Combination of principal component analysis and independent component analysis, uses FDR estimation to find modules |
E | Principal component analysis: decomposes the expression matrix into a set of linearly uncorrelated components, detects potentially overlapping modules within each component using FDR estimation |
| Â | Biclustering: simultaneous grouping of genes and samples in biclusters based on similar local behavior in expression |
A | Spectral biclustering: detecting checkerboard patterns within the gene expression matrix |
B | ISA: iteratively refines a set of genes and samples based on high or low expression in both the gene and sample dimension |
C | QUBIC: finds biclusters in which the genes have similar high or low expression levels in a discretized expression matrix |
D | Bi-Force: finds biclusters with over- or under-expression by solving the bicluster editing problem |
E | FABIA: builds a multiplicative model of the expression matrix layer by layer. Every layer represents a bicluster |
F | Plaid: builds an additive model of the expression matrix layer by layer. Every layer represents a bicluster |
G | MSBE: finds additive biclusters starting from randomly sampled reference genes and conditions |
H | Cheng & church: minimizes the mean squared residue within every bicluster |
I | OPSM: searches for biclusters where the expression changes in the same direction between genes and samples |
| Â | Iterative network inference: iterative optimization of an inferred network and a set of clusters |
A | MERLIN: iteratively refines a direct regulatory network and modules within a probabilistic graphical network framework |
B | Genomica: starts from an initial hierarchical clustering and iteratively refines this clustering and an inferred module network using a model based on Bayesian regression trees |
| Â | Direct network inference: inference of a regulatory network based on gene expression similarity between regulators and target genes |
A | GENIE3: predicts the expression of each target gene based on random forest regression |
B | CLR: calculates the likelihood of mutual information estimations based on the network neighborhood |
C | Pearson’s correlation between regulator and target gene |
D | TIGRESS: network inference using a combination of Lasso sparse regression and stability selection |