Table 1 Module detection methods evaluated in this study

From: A comprehensive evaluation of module detection methods for gene expression data

 

Clustering: grouping genes based on a global similarity in gene expression profiles

A

FLAME: fuzzy clustering by selecting cluster supporting objects based on the K-nearest neighbor density estimation

B

K-medoids: iteratively refines the centers (which are individual genes) and the average dissimilarity within the cluster

C

K-medoids (see B) but with automatic module number estimation

D

Fuzzy c-means: similar to k-means (see F), but using fuzzy instead of crisp cluster memberships

E

Self-organizing maps: maps each gene on a node embedded in a two-dimensional graph structure

F

K-means: iteratively refines the mean expression with a cluster and the within-cluster sum of squares

G

MCL: simulates random walks within the co-expression graph by alternative steps of expansion and inflation

H

Spectral clustering: applies K-means in the subspace defined by the eigenvectors of the Pearson’s correlation affinity matrix

I

Affinity propagation: clustering by exchange of messages between genes

J

Spectral clustering: applies K-means in the subspace defined by the eigenvectors of the K-nearest-neighbor graph

K

Transitivity clustering: tries to find the transitive co-expression graph in which the total cost of added and removed edges is minimized

L

WGCNA: agglomerative hierarchical clustering (see M), but using the topological overlap measure and a dynamic tree cutting algorithm to implicitly determine the number of modules

M

Agglomerative hierarchical clustering: generates a hierarchical structure by progressively grouping genes and clusters based on their similarity

N

Hybrid hierarchical clustering: combination of agglomerative and divisive hierarchical clustering

O

Divisive hierarchical clustering: generates a hierarchical structure by progressively splitting the genes into clusters

P

Agglomerative hierarchical clustering (see M), but with automatic module number estimation

Q

SOTA: combination of self-organizing maps and divisive hierarchical clustering

R

First finds cluster centers by searching for high-density regions, each gene is then assigned to the cluster of its nearest neighbor of higher density

S

CLICK: uses density estimation to find tight groups of similar genes, after which these are expanded into modules

T

DBSCAN: groups genes within core, non-core and outlier genes based on the number of neighbors

U

Clues: first applies a shrinking procedure which moves each gene towards nearby high-density regions, after which the genes are partitioned into an automatically determined number of clusters using the silhouette width

V

Mean shift: moves each gene towards nearby high density regions until convergence

 

Decomposition: extracting the components corresponding to co-expression modules by decomposing the expression matrix in a product of smaller matrices

A

Independent component analysis: decomposes the expression matrix into a set of independent components using the FastICA algorithm, detects potentially overlapping modules within each source signal using false-discovery rate (FDR) estimation

B

Similar to A, but detects two modules per independent component depending on whether genes have positive or negative weights

C

Similar to A, but detects modules within each source signal using z-scores

D

Combination of principal component analysis and independent component analysis, uses FDR estimation to find modules

E

Principal component analysis: decomposes the expression matrix into a set of linearly uncorrelated components, detects potentially overlapping modules within each component using FDR estimation

 

Biclustering: simultaneous grouping of genes and samples in biclusters based on similar local behavior in expression

A

Spectral biclustering: detecting checkerboard patterns within the gene expression matrix

B

ISA: iteratively refines a set of genes and samples based on high or low expression in both the gene and sample dimension

C

QUBIC: finds biclusters in which the genes have similar high or low expression levels in a discretized expression matrix

D

Bi-Force: finds biclusters with over- or under-expression by solving the bicluster editing problem

E

FABIA: builds a multiplicative model of the expression matrix layer by layer. Every layer represents a bicluster

F

Plaid: builds an additive model of the expression matrix layer by layer. Every layer represents a bicluster

G

MSBE: finds additive biclusters starting from randomly sampled reference genes and conditions

H

Cheng & church: minimizes the mean squared residue within every bicluster

I

OPSM: searches for biclusters where the expression changes in the same direction between genes and samples

 

Iterative network inference: iterative optimization of an inferred network and a set of clusters

A

MERLIN: iteratively refines a direct regulatory network and modules within a probabilistic graphical network framework

B

Genomica: starts from an initial hierarchical clustering and iteratively refines this clustering and an inferred module network using a model based on Bayesian regression trees

 

Direct network inference: inference of a regulatory network based on gene expression similarity between regulators and target genes

A

GENIE3: predicts the expression of each target gene based on random forest regression

B

CLR: calculates the likelihood of mutual information estimations based on the network neighborhood

C

Pearson’s correlation between regulator and target gene

D

TIGRESS: network inference using a combination of Lasso sparse regression and stability selection

  1. Within each category, methods are ranked according to their average test score (Fig. 2). We refer the reader to Supplementary Note 2 for details regarding the implementation and parameters