Abstract
Ontologies have proven very useful for capturing knowledge as a hierarchy of terms and their interrelationships. In biology a major challenge has been to construct ontologies of gene function given incomplete biological knowledge and inconsistencies in how this knowledge is manually curated. Here we show that large networks of gene and protein interactions in Saccharomyces cerevisiae can be used to infer an ontology whose coverage and power are equivalent to those of the manually curated Gene Ontology (GO). The network-extracted ontology (NeXO) contains 4,123 biological terms and 5,766 term-term relations, capturing 58% of known cellular components. We also explore robust NeXO terms and term relations that were initially not cataloged in GO, a number of which have now been added based on our analysis. Using quantitative genetic interaction profiling and chemogenomics, we find further support for many of the uncharacterized terms identified by NeXO, including multisubunit structures related to protein trafficking or mitochondrial function. This work enables a shift from using ontologies to evaluate data to using data to construct and evaluate ontologies.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
References
Smith, B. et al. The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat. Biotechnol. 25, 1251–1255 (2007).
Musen, M.A. et al. The National Center for Biomedical Ontology. J. Am. Med. Inform. Assoc. 19, 190–195 (2012).
Ashburner, M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000).
Fraser, A.G. & Marcotte, E.M. A probabilistic view of gene function. Nat. Genet. 36, 559–564 (2004).
Leonelli, S., Diehl, A.D., Christie, K.R., Harris, M.A. & Lomax, J. How the gene ontology evolves. BMC Bioinformatics 12, 325 (2011).
Krallinger, M., Leitner, F. & Valencia, A. Analysis of biological processes and diseases using text mining approaches. Methods Mol. Biol. 593, 341–382 (2010).
Raychaudhuri, S., Chang, J.T., Sutphin, P.D. & Altman, R.B. Associating genes with gene ontology codes using a maximum entropy analysis of biomedical literature. Genome Res. 12, 203–214 (2002).
Pena-Castillo, L. et al. A critical assessment of Mus musculus gene function prediction using integrated genomic evidence. Genome Biol. 9 (suppl.1), S2 (2008).
Buitelaar, P. & Cimiano, P. Ontology Learning and Population: Bridging the Gap between Text and Knowledge, Vol. 167 (IOS Press, Amsterdam, 2008).
Coulet, A., Shah, N.H., Garten, Y., Musen, M. & Altman, R.B. Using text to build semantic networks for pharmacogenomics. J. Biomed. Inform. 43, 1009–1019 (2010).
Collins, S.R. et al. Toward a comprehensive atlas of the physical interactome of Saccharomyces cerevisiae. Mol. Cell. Proteomics 6, 439–450 (2007).
Yu, H. et al. High-quality binary protein interaction map of the yeast interactome network. Science 322, 104–110 (2008).
Tarassov, K. et al. An in vivo map of the yeast protein interactome. Science 320, 1465–1470 (2008).
Costanzo, M. et al. The genetic landscape of a cell. Science 327, 425–431 (2010).
Arabidopsis Interactome Mapping Consortium. Evidence for network evolution in an Arabidopsis interactome map. Science 333, 601–607 (2011).
Gasch, A.P. et al. Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Mol. Biol. Cell 12, 2987–3003 (2001).
Jansen, R. et al. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science 302, 449–453 (2003).
Segal, E. et al. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 34, 166–176 (2003).
Myers, C.L. et al. Discovery of biological networks from diverse functional genomic data. Genome Biol. 6, R114 (2005).
Lee, I., Li, Z. & Marcotte, E.M. An improved, bias-reduced probabilistic functional gene network of baker′s yeast, Saccharomyces cerevisiae. PLoS ONE 2, e988 (2007).
Eisen, M.B., Spellman, P.T., Brown, P.O. & Botstein, D. Cluster analysis and display of genome-wide expression patterns. Proc. Natl. Acad. Sci. USA 95, 14863–14868 (1998).
Girvan, M. & Newman, M.E. Community structure in social and biological networks. Proc. Natl. Acad. Sci. USA 99, 7821–7826 (2002).
Zhang, B. & Horvath, S. A general framework for weighted gene co-expression network analysis. Stat. Appl. Genet. Mol. Biol. 4, Article17 (2005).
Khatri, P. & Draghici, S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005).
D'haeseleer, P. How does gene expression clustering work? Nat. Biotechnol. 23, 1499–1501 (2005).
Gibbons, F.D. & Roth, F.P. Judging the quality of gene expression-based clustering methods using gene annotation. Genome Res. 12, 1574–1581 (2002).
Ravasz, E., Somera, A.L., Mongru, D.A., Oltvai, Z.N. & Barabasi, A.L. Hierarchical organization of modularity in metabolic networks. Science 297, 1551–1555 (2002).
Dotan-Cohen, D., Letovsky, S., Melkman, A.A. & Kasif, S. Biological process linkage networks. PLoS ONE 4, e5313 (2009).
Tanay, A., Sharan, R., Kupiec, M. & Shamir, R. Revealing modularity and organization in the yeast molecular network by integrated analysis of highly heterogeneous genomewide data. Proc. Natl. Acad. Sci. USA 101, 2981–2986 (2004).
Kelley, R. & Ideker, T. Systematic interpretation of genetic interactions using protein networks. Nat. Biotechnol. 23, 561–566 (2005).
Jaimovich, A., Rinott, R., Schuldiner, M., Margalit, H. & Friedman, N. Modularity and directionality in genetic interaction maps. Bioinformatics 26, i228–i236 (2010).
Park, Y. & Bader, J.S. Resolving the structure of interactomes with hierarchical agglomerative clustering. BMC Bioinformatics 12 (suppl.1), S44 (2011).
Clauset, A., Moore, C. & Newman, M.E. Hierarchical structure and the prediction of missing links in networks. Nature 453, 98–101 (2008).
Jean-Mary, Y.R., Shironoshita, E.P. & Kabuka, M.R. Ontology Matching with Semantic Verification. Web Semant. 7, 235–251 (2009).
Subramanian, A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl. Acad. Sci. USA 102, 15545–15550 (2005).
Hillenmeyer, M.E. et al. Systematic analysis of genome-wide fitness data in yeast reveals novel gene function and drug action. Genome Biol. 11, R30 (2010).
Abdulrehman, D. et al. YEASTRACT: providing a programmatic access to curated transcriptional regulatory associations in Saccharomyces cerevisiae through a web services interface. Nucleic Acids Res. 39, D136–D140 (2011).
Seaman, M.N. Recycle your receptors with retromer. Trends Cell Biol. 15, 68–75 (2005).
Nickerson, D.P., Brett, C.L. & Merz, A.J. Vps-C complexes: gatekeepers of endolysosomal traffic. Curr. Opin. Cell Biol. 21, 543–551 (2009).
Peplowska, K., Markgraf, D.F., Ostrowicz, C.W., Bange, G. & Ungermann, C. The CORVET tethering complex interacts with the yeast Rab5 homolog Vps21 and is involved in endo-lysosomal biogenesis. Dev. Cell 12, 739–750 (2007).
Addinall, S.G. et al. A genomewide suppressor and enhancer analysis of cdc13–1 reveals varied cellular processes influencing telomere capping in Saccharomyces cerevisiae. Genetics 180, 2251–2266 (2008).
Araragi, S. et al. Mercuric chloride induces apoptosis via a mitochondrial-dependent pathway in human leukemia cells. Toxicology 184, 1–9 (2003).
Saretzki, G. Telomerase, mitochondria and oxidative stress. Exp. Gerontol. 44, 485–492 (2009).
Huh, W.K. et al. Global analysis of protein localization in budding yeast. Nature 425, 686–691 (2003).
Cherry, J.M. et al. Saccharomyces Genome Database: the genomics resource of budding yeast. Nucleic Acids Res. 40, D700–D705 (2012).
Hayes, M.J., Bryon, K., Satkurunathan, J. & Levine, T.P. Yeast homologues of three BLOC-1 subunits highlight KxDL proteins as conserved interactors of BLOC-1. Traffic 12, 260–268 (2011).
Clapier, C.R. & Cairns, B.R. The biology of chromatin remodeling complexes. Annu. Rev. Biochem. 78, 273–304 (2009).
Lu, P.Y., Levesque, N. & Kobor, M.S. NuA4 and SWR1-C: two chromatin-modifying complexes with overlapping functions and components. Biochem. Cell Biol. 87, 799–815 (2009).
Auger, A. et al. Eaf1 is the platform for NuA4 molecular assembly that evolutionarily links chromatin acetylation to ATP-dependent exchange of histone H2A variants. Mol. Cell. Biol. 28, 2257–2270 (2008).
van Attikum, H. & Gasser, S.M. The histone code at DNA breaks: a guide to repair? Nat. Rev. Mol. Cell Biol. 6, 757–765 (2005).
Kouzarides, T. Chromatin modifications and their function. Cell 128, 693–705 (2007).
Evrin, C. et al. A double-hexameric MCM2–7 complex is loaded onto origin DNA during licensing of eukaryotic DNA replication. Proc. Natl. Acad. Sci. USA 106, 20240–20245 (2009).
Hong, E.L. et al. Gene Ontology annotations at SGD: new data sources and annotation methods. Nucleic Acids Res. 36, D577–D581 (2008).
Stark, C. et al. The BioGRID Interaction Database: 2011 update. Nucleic Acids Res. 39, D698–D704 (2011).
Hubble, J. et al. Implementation of GenePattern within the Stanford Microarray Database. Nucleic Acids Res. 37, D898–D901 (2009).
Collins, S.R., Roguev, A. & Krogan, N.J. Quantitative genetic interaction mapping using the E-MAP approach. Methods Enzymol. 470, 205–231 (2010).
Schuldiner, M. et al. Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell 123, 507–519 (2005).
Acknowledgements
We are grateful to J. Bader and Y. Park for assistance with the HAC-ML software and to I. Lee and E. Marcotte for advice on YeastNet. We would also like to thank D. Botstein, C. Boone, R. Sharan and R. Shamir for constructive advice on the manuscript. This work was generously supported by US National Institutes of Health grants P41-GM103504 and P50-GM085764 (TI) and R01-GM084448 and P50-GM081879 (N.J.K.). N.J.K. is a Searle Scholar and Keck Young Investigator.
Author information
Authors and Affiliations
Contributions
J.D. and T.I. conceived and designed the analysis. J.D. performed initial data analysis, constructed the NeXO ontology and performed all computational experiments. M.K. designed and implemented the ontology alignment procedure with guidance from J.D. M.S. and N.J.K. performed the quantitative genetic interaction profiling and interpreted the data. R.B. and J.M.C. investigated and curated the new ontology terms and relations. J.D. and T.I. wrote the manuscript. All authors contributed to the manuscript and approved its final version.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Table 1, Supplementary Note and Supplementary Figs. 1–10 (PDF 3024 kb)
Supplementary Table 2
New terms in NeXO (XLSX 189 kb)
Supplementary Table 3
New term-term relations (XLSX 15 kb)
Supplementary Table 4
Genetic interaction profiles for 73 query genes (XLSX 531 kb)
Supplementary Table 5
New NeXO terms enriched within chemogenomic profiles (XLSX 53 kb)
Supplementary Table 6
Integrated high-confidence interaction network (XLSX 1745 kb)
Supplementary File 1
NeXO in Cytoscape Format (File NeXO.cys provided separately) Direct download: http://chianti.ucsd.edu/~janusz/nexo/NeXO.zip (ZIP 7119 kb)
Supplementary File 2
NeXO in Open Biomedical Ontology (OBO) Format (File NeXO.obo provided separately) (ZIP 422 kb)
Rights and permissions
About this article
Cite this article
Dutkowski, J., Kramer, M., Surma, M. et al. A gene ontology inferred from molecular networks. Nat Biotechnol 31, 38–45 (2013). https://doi.org/10.1038/nbt.2463
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/nbt.2463
This article is cited by
-
Identification of MMP1 as a potential gene conferring erlotinib resistance in non-small cell lung cancer based on bioinformatics analyses
Hereditas (2020)
-
Genome sequence of the model rice variety KitaakeX
BMC Genomics (2019)
-
FunGeneNet: a web tool to estimate enrichment of functional interactions in experimental gene sets
BMC Genomics (2018)
-
Using deep learning to model the hierarchical structure and function of a cell
Nature Methods (2018)
-
Archetypal transcriptional blocks underpin yeast gene regulation in response to changes in growth conditions
Scientific Reports (2018)