Table 1 Overview of datasets used in SpeakEasy community detection.

Dataset title	Network size (#nodes)	Biological scale	Data type	Cluster validation	Output	Conclusion
LFR benchmarks	1000–5000	NA	unweighted symmetric networks	known/synthetic clusters	benchmark clusters - comparable to other methods	Top recorded performance on LFR benchmarks to date
Various real networks	34–320000	NA	unweighted symmetric networks	modularity measures	cluster separation statistics - comparable to other methods	Predicted communities are well-separated
Human Brain Atlas (HBA); Cancer Cell Line Encyclopedia (CCLE)	8000–18000	gene	gene expression	Gene Ontology (GO)	co-regulated gene sets	Possible to robustly detect overlapping gene clusters
Gavin et al.; Collins et al.	700–1100	protein	AP-MS protein interactions	small-scale experiments	protein complexes and multi-community proteins	Most accurate recovery of true protein complexes to date
Immunological Genome Project (Immgen)	212	cell-type	cell type-specific gene expression	cell-surface markers	families of cell-types, at multiple resolutions	Cannonical cell type classification is mirrored in cluster results
Spike-sorting	9900	cell activity	extracellular neuron recordings	known/synthetic clusters	spikes associated with specific neurons	SpeakEasy accuratly associates spike waveforms with specific neurons
Parkinson disease rs-fMRI	264	tissue	brain resting state fMRI	permutation testing	groups of synchronized brain regions	SpeakEasy identifies disease-related changes to co-active brain regions

We test community detection across a range of biological datasets to robustly characterize the ability to define practically useful biological communities.

Quick links

Search