Abstract
When compared with single gene functional analysis, gene set analysis (GSA) can extract more information from gene expression profiles. Currently, several gene set methods have been proposed, but most of the methods cannot detect gene sets with a large number of minor-effect genes. Here, we propose a novel distance-based gene set analysis method. The distance between two groups of genes with different phenotypes based on gene expression should be larger if a certain gene set is significantly associated with the given phenotype. We calculated the distance between two groups with different phenotypes, estimated the significant P-values using two permutation methods and performed multiple hypothesis testing adjustments. This method was performed on one simulated data set and three real data sets. After a comparison and literature verification, we determined that the gene resampling-based permutation method is more suitable for GSA, and the centroid statistical and average linkage statistical distance methods are efficient, especially in detecting gene sets containing more minor-effect genes. We believe that this distance-based method will assist us in finding functional gene sets that are significantly related to a complex trait. Additionally, we have prepared a simple and publically available Perl and R package (http://bioinfo.hrbmu.edu.cn/dbgsa or http://cran.r-project.org/web/packages/DBGSA/).
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Nam, D. & Kim, S. Y. Gene-set approach for expression pattern analysis. Brief. Bioinform. 9, 189–197 (2008).
Emmert-Streib, F. & Glazko, G. V. Pathway analysis of expression data: deciphering functional building blocks of complex diseases. PLoS Comput. Biol. 7, e1002053 (2011).
Hung, J. H., Yang, T. H., Hu, Z., Weng, Z. & Delisi, C. Gene set enrichment analysis: performance evaluation and usage guidelines. Brief Bioinform 13, 281–291 (2012).
Khatri, P. & Draghici, S. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21, 3587–3595 (2005).
Rivals, I., Personnaz, L., Taing, L. & Potier, M. C. Enrichment or depletion of a GO category within a class of genes: which test? Bioinformatics 23, 401–407 (2007).
Dopazo, J. Functional interpretation of microarray experiments. OMICS 10, 398–410 (2006).
Mootha, V. K., Lindgren, C. M., Eriksson, K. F., Subramanian, A., Sihag, S., Lehar, J. et al. PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat. Genet. 34, 267–273 (2003).
Subramanian, A., Tamayo, P., Mootha, V. K., Mukherjee, S., Ebert, B. L., Gillette, M. A. et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA 102, 15545–15550 (2005).
Goeman, J. J., van de Geer, S. A., de Kort, F. & van Houwelingen, H. C. A global test for groups of genes: testing association with a clinical outcome. Bioinformatics 20, 93–99 (2004).
Dinu, I., Potter, J. D., Mueller, T., Liu, Q., Adewale, A. J., Jhangri, G. S. et al. Improving gene set analysis of microarray data by SAM-GS. BMC Bioinform. 8, 242 (2007).
Hummel, M., Meister, R. & Mansmann, U. GlobalANCOVA: exploration and assessment of gene group effects. Bioinformatics 24, 78–85 (2008).
Nam, D., Kim, S. B., Kim, S. K., Yang, S., Kim, S. Y. & Chu, I. S. ADGO: analysis of differentially expressed gene sets using composite GO annotation. Bioinformatics 22, 2249–2253 (2006).
Chi, S. M., Kim, J., Kim, S. Y. & Nam, D. ADGO 2.0: interpreting microarray data and list of genes using composite annotations. Nucleic Acids Res 39, W302–W306 (2011).
Isci, S., Ozturk, C., Jones, J. & Otu, H. H. Pathway analysis of high-throughput biological data within a Bayesian network framework. Bioinformatics 27, 1667–1674 (2011).
Tian, L., Greenberg, S. A., Kong, S. W., Altschuler, J., Kohane, I. S. & Park, P. J. Discovering statistically significant pathways in expression profiling studies. Proc. Natl Acad. Sci. USA 102, 13544–13549 (2005).
Goeman, J. J. & Buhlmann, P. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23, 980–987 (2007).
Ye, C. & Eskin, E. Discovering tightly regulated and differentially expressed gene sets in whole genome expression data. Bioinformatics 23, e84–e90 (2007).
Thomson, G. An overview of the genetic analysis of complex diseases, with reference to type 1 diabetes. Best practice & research Clinical endocrinology & metabolism. Best Pract. Res. Clin. Endocrinol. Metab 15, 265–277 (2001) [Research Support, US Govt PHS Review].
Scott, W. K., Pericak-Vance, M. A. & Haines, J. L. Genetic analysis of complex diseases. Science 275, 1327–1330 (1997).
Lee, E., Chuang, H. Y., Kim, J. W., Ideker, T. & Lee, D. Inferring pathway activity toward precise disease classification. PLoS Comput. Biol 4 e1000217 (2008).
Sootanan, P., Prom-on, S., Meechai, A. & Chan, J. Pathway-based microarray analysis for robust disease classification. Neural Comput. Appl. 21, 649–660 (2012).
Chan, J. H., Sootanan, P. & Larpeampaisarl, P. Feature selection of pathway markers for microarray-based disease classification using negatively correlated feature sets. 2011 International Joint Conference on Neural Networks (IJCNN 2011) IEEE. p 3293–3299 (2011).
Thomas, R., de la Torre, L., Chang, X. & Mehrotra, S. Validation and characterization of DNA microarray gene expression data distribution and associated moments. BMC Bioinform. 11, 576 (2010).
Webster, J. A., Gibbs, J. R., Clarke, J., Ray, M., Zhang, W., Holmans, P. et al. Genetic control of human brain transcript expression in Alzheimer disease. Am. J. Hum. Genet. 84, 445–458 (2009) [Research Support, N.I.H., Extramural].
Kuner, R., Muley, T., Meister, M., Ruschhaupt, M., Buness, A., Xu, E. C. et al. Global gene expression analysis reveals specific patterns of cell junctions in non-small cell lung cancer subtypes. Lung Cancer 63, 32–38 (2009).
Hall, J. S., Leong, H. S., Armenoult, L. S., Newton, G. E., Valentine, H. R., Irlam, J. J. et al. Exon-array profiling unlocks clinically and biologically relevant gene signatures from formalin-fixed paraffin-embedded tumour samples. Br. J. Cancer 104, 971–981 (2011).
Ashburner, M., Ball, C. A., Blake, J. A., Botstein, D., Butler, H., Cherry, J. M. et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat. Genet. 25, 25–29 (2000) [Research Support, Non-US Gov’t Research Support, US Gov’t, PHS].
Hjorth, J. S. U. Computer intensive statistical methods validation model selection and bootstrap (Chapman and Hall: London, 1994).
Manly, B. F. J. Randomization, bootstrap, and Monte Carlo methods in biology. 3rd ed. (Chapman & Hall/CRC: Boca Raton, Fla.: London, 2007).
Zieffler, A., Harring, J. & Long, J. D. Comparing groups randomization and bootstrap methods using R. (Wiley-Blackwell: Oxford, 2011).
Strimmer, K. Fdrtool a versatile R package for estimating local and tail area-based false discovery rates. Bioinformatics 24, 1461–1462 (2008).
Chapman, S. SimMetrics: Open Source Similarity Measure Library (2007). Available from: http://sourceforge.net/projects/simmetrics/.
Chapman, S. String similarity metrics for information integration (2007). Available from: http://www.dcs.shef.ac.uk/~sam/stringmetrics.html.
Mecocci, P., MacGarvey, U. & Beal, M. F. Oxidative damage to mitochondrial DNA is increased in Alzheimer’s disease. Ann. Neurol. 36, 747–751 (1994).
Cordell, B. beta-Amyloid formation as a potential therapeutic target for Alzheimer's disease. Annu. Rev. Pharmacol. Toxicol. 34, 69–89 (1994).
Lipton, S. A., Gu, Z. & Nakamura, T. Inflammatory mediators leading to protein misfolding and uncompetitive/fast off-rate drug therapy for neurodegenerative disorders. Int. Rev. Neurobiol. 82, 1–27 (2007).
Tabner, B. J., Turnbull, S., El-Agnaf, O. & Allsop, D. Production of reactive oxygen species from aggregating proteins implicated in Alzheimer’s disease, Parkinson’s disease and other neurodegenerative diseases. Curr. Top Med. Chem. 1, 507–517 (2001).
Datta, K., Sinha, S. & Chattopadhyay, P. Reactive oxygen species in health and disease. Natl Med. J. India 13, 304–310 (2000).
Perry, G., Kawai, M., Tabaton, M., Onorato, M., Mulvihill, P., Richey, P. et al. Neuropil threads of Alzheimer’s disease show a marked alteration of the normal cytoskeleton. J. Neurosci. 11, 1748–1755 (1991).
Bamburg, J. R. & Wiggan, O. P. ADF/cofilin and actin dynamics in disease. Trends Cell Biol. 12, 598–605 (2002).
Matus, S., Lisbona, F., Torres, M., Leon, C., Thielen, P. & Hetz, C. The stress rheostat: an interplay between the unfolded protein response (UPR) and autophagy in neurodegeneration. Curr. Mol. Med. 8, 157–172 (2008).
Barnham, K. J., McKinstry, W. J., Multhaup, G., Galatis, D., Morton, C. J., Curtain, C. C. et al. Structure of the Alzheimer’s disease amyloid precursor protein copper binding domain. A regulator of neuronal copper homeostasis. J. Biol. Chem. 278, 17401–17407 (2003).
Lin, C. L., Bristol, L. A., Jin, L., Dykes-Hoberg, M., Crawford, T., Clawson, L. et al. Aberrant RNA processing in a neurodegenerative disease: the cause for absent EAAT2, a glutamate transporter, in amyotrophic lateral sclerosis. Neuron 20, 589–602 (1998).
Cook, D. G., Forman, M. S., Sung, J. C., Leight, S., Kolson, D. L., Iwatsubo, T. et al. Alzheimer's A beta (1-42) is generated in the endoplasmic reticulum/intermediate compartment of NT2N cells. Nat. Med. 3, 1021–1023 (1997).
Ebneth, A., Godemann, R., Stamer, K., Illenberger, S., Trinczek, B. & Mandelkow, E. Overexpression of tau protein inhibits kinesin-dependent trafficking of vesicles, mitochondria, and endoplasmic reticulum: implications for Alzheimer’s disease. J. Cell. Biol. 143, 777–794 (1998).
Nixon, R. A., Wegiel, J., Kumar, A., Yu, W. H., Peterhoff, C., Cataldo, A. et al. Extensive involvement of autophagy in Alzheimer disease: an immuno-electron microscopy study. J. Neuropathol. Exp. Neurol. 64, 113–122 (2005).
Wallace, D. C. Mitochondrial diseases in man and mouse. Science 283, 1482–1488 (1999).
Geula, C., Greenberg, B. D. & Mesulam, M. M. Cholinesterase activity in the plaques, tangles and angiopathy of Alzheimer’s disease does not emanate from amyloid. Brain Res. 644, 327–330 (1994).
Cutler, R. G., Kelly, J., Storie, K., Pedersen, W. A., Tammara, A., Hatanpaa, K. et al. Involvement of oxidative stress-induced abnormalities in ceramide and cholesterol metabolism in brain aging and Alzheimer’s disease. Proc. Natl Acad. Sci. USA 101, 2070–2075 (2004).
Stadelmann, C., Deckwerth, T. L., Srinivasan, A., Bancher, C., Bruck, W., Jellinger, K. et al. Activation of caspase-3 in single neurons and autophagic granules of granulovacuolar degeneration in Alzheimer’s disease. Evidence for apoptotic cell death. Am. J. Pathol. 155, 1459–1466 (1999).
Cassarino, D. S., Swerdlow, R. H., Parks, J. K., Parker, W. D. & Bennett, J. P. Cyclosporin A increases resting mitochondrial membrane potential in SY5Y cells and reverses the depressed mitochondrial membrane potential of Alzheimer's disease cybrids. Biochem. Biophys. Res. Commun. 248, 168–173 (1998).
Toide, K., Okamiya, K., Iwamoto, Y. & Kato, T. Effect of a novel prolyl endopeptidase inhibitor, JTP-4819, on prolyl endopeptidase activity and substance P- and arginine-vasopressin-like immunoreactivity in the brains of aged rats. J. Neurochem. 65, 234–240 (1995).
Connell, C. M., Shaw, B. A., Holmes, S. B., Hudson, M. L., Derry, H. A. & Strecher, V. J. The development of an Alzheimer’s disease channel for the Michigan Interactive Health Kiosk Project. J. Health Commun. 8, 11–22 (2003).
Kaminska, J., Hoffman-Sommer, M. & Plachta, M. The p24 family proteins--regulators of vesicular trafficking. Postepy Biochem. 56, 75–82 (2010).
Ross, B. M., Moszczynska, A., Erlich, J. & Kish, S. J. Phospholipid-metabolizing enzymes in Alzheimer's disease: increased lysophospholipid acyltransferase activity and decreased phospholipase A2 activity. J. Neurochem. 70, 786–793 (1998).
Lee, J. M., Calkins, M. J., Chan, K., Kan, Y. W. & Johnson, J. A. Identification of the NF-E2-related factor-2-dependent genes conferring protection against oxidative stress in primary cortical astrocytes using oligonucleotide microarray analysis. J. Biol. Chem. 278, 12029–12038 (2003).
Hirai, K., Aliev, G., Nunomura, A., Fujioka, H., Russell, R. L., Atwood, C. S. et al. Mitochondrial abnormalities in Alzheimer’s disease. J. Neurosci. 21, 3017–3023 (2001).
David, D. C., Ittner, L. M., Gehrig, P., Nergenau, D., Shepherd, C., Halliday, G. et al. Beta-amyloid treatment of two complementary P301L tau-expressing Alzheimer's disease models reveals similar deregulated cellular processes. Proteomics 6, 6566–6577 (2006).
Perry, T. L., Yong, V. W., Bergeron, C., Hansen, S. & Jones, K. Amino acids, glutathione, and glutathione transferase activity in the brains of patients with Alzheimer’s disease. Ann. Neurol. 21, 331–336 (1987) [Research Support, Non-US Gov’t].
Brinton, R. D. Cellular and molecular mechanisms of estrogen regulation of memory function and neuroprotection against Alzheimer’s disease: recent insights and remaining challenges. Learn Mem. 8, 121–133 (2001).
Baloyannis, S. J. Mitochondrial alterations in Alzheimer’s disease. J. Alzheimers Dis. 9, 119–126 (2006).
Lukiw, W. J. & Bazan, N. G. Strong nuclear factor-kappaB-DNA binding parallels cyclooxygenase-2 gene transcription in aging and in sporadic Alzheimer’s disease superior temporal lobe neocortex. J. Neurosci. Res. 53, 583–592 (1998).
Harigaya, Y., Shoji, M., Shirao, T. & Hirai, S. Disappearance of actin-binding protein, drebrin, from hippocampal synapses in Alzheimer’s disease. J. Neurosci. Res. 43, 87–92 (1996).
Fulga, T. A., Elson-Schwab, I., Khurana, V., Steinhilb, M. L., Spires, T. L., Hyman, B. T. et al. Abnormal bundling and accumulation of F-actin mediates tau-induced neuronal degeneration in vivo. Nat. Cell Biol. 9, 139–148 (2007).
Heredia, L., Helguera, P., de Olmos, S., Kedikian, G., Sola Vigo, F., LaFerla, F. et al. Phosphorylation of actin-depolymerizing factor/cofilin by LIM-kinase mediates amyloid beta-induced degeneration: a potential mechanism of neuronal dystrophy in Alzheimer’s disease. J. Neurosci. 26, 6533–6542 (2006).
Beissbarth, T. & Speed, T. P. GOstat: find statistically overrepresented gene ontologies within a group of genes. Bioinformatics 20, 1464–1465 (2004).
Acknowledgements
This work was partially supported by the National Natural Science Foundation of China (grant nos. 30871394, 61073136, 91029717, 81172842, 60932008 and 61172098), the Specialized Research Fund for the Doctoral Program of Higher Education of China (grant no. 20112302110040), the Fundamental Research Funds for the Central Universities (grant no. HIT.ICRST.2010 022) and the Fund of Heilongjiang Health Department (grant nos.2011-204 and 2011-251).
Author information
Authors and Affiliations
Corresponding authors
Additional information
Supplementary Information accompanies the paper on Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Li, J., Wang, L., Xu, L. et al. DBGSA: a novel method of distance-based gene set analysis. J Hum Genet 57, 642–653 (2012). https://doi.org/10.1038/jhg.2012.86
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/jhg.2012.86