Table 1 Overview of datasets used in SpeakEasy community detection.
Dataset title | Network size (#nodes) | Biological scale | Data type | Cluster validation | Output | Conclusion |
---|---|---|---|---|---|---|
LFR benchmarks | 1000–5000 | NA | unweighted symmetric networks | known/synthetic clusters | benchmark clusters - comparable to other methods | Top recorded performance on LFR benchmarks to date |
Various real networks | 34–320000 | NA | unweighted symmetric networks | modularity measures | cluster separation statistics - comparable to other methods | Predicted communities are well-separated |
Human Brain Atlas (HBA); Cancer Cell Line Encyclopedia (CCLE) | 8000–18000 | gene | gene expression | Gene Ontology (GO) | co-regulated gene sets | Possible to robustly detect overlapping gene clusters |
Gavin et al.; Collins et al. | 700–1100 | protein | AP-MS protein interactions | small-scale experiments | protein complexes and multi-community proteins | Most accurate recovery of true protein complexes to date |
Immunological Genome Project (Immgen) | 212 | cell-type | cell type-specific gene expression | cell-surface markers | families of cell-types, at multiple resolutions | Cannonical cell type classification is mirrored in cluster results |
Spike-sorting | 9900 | cell activity | extracellular neuron recordings | known/synthetic clusters | spikes associated with specific neurons | SpeakEasy accuratly associates spike waveforms with specific neurons |
Parkinson disease rs-fMRI | 264 | tissue | brain resting state fMRI | permutation testing | groups of synchronized brain regions | SpeakEasy identifies disease-related changes to co-active brain regions |