Abstract
Complex diseases such as hypertension are inherently multifactorial and involve many factors of mild-to-minute effect sizes. A genome-wide association study (GWAS) typically tests hundreds of thousands of single-nucleotide polymorphisms (SNPs), and offers opportunity to evaluate aggregated effects of many genetic variants with effects that are too small to detect individually. The gene-set-enrichment analysis (GSEA) is a pathway-based approach that tests for such aggregated effects of genes that are linked by biological functions. A key step in GSEA is the summary statistic (gene score) used to measure the overall relevance of a gene based on all SNPs tested in the gene. Existing GSEA methods use maximum statistics sensitive to gene size and linkage equilibrium. We propose the approach of variable set enrichment analysis (VSEA) and study new gene score methods that are less dependent on gene size. The new method treats groups of variables (SNPs or other variants) as base units for summarizing gene scores and relies less on gene definition itself. The power of VSEA is analyzed by simulation studies modeling various scenarios of complex multiloci interactions. Results show that the new gene scores generally performed better, some substantially so, than existing GSEA extension to GWAS. The new methods are implemented in an R package and when applied to a real GWAS data set demonstrated its practical utility in a GWAS setting.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Manolio TA, Brooks LD, Collins FS : A HapMap harvest of insights into the genetics of common disease. J Clin Invest 2008; 118: 1590–1605.
Subramanian A, Tamayo P, Mootha VK et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005; 102: 15545–15550.
Mootha VK, Lindgren CM, Eriksson KF et al: PGC-1alpha-responsive genes involved in oxidative phosphorylation are coordinately downregulated in human diabetes. Nat Genet 2003; 34: 267–273.
Patti ME, Butte AJ, Crunkhorn S et al: Coordinated reduction of genes of oxidative metabolism in humans with insulin resistance and diabetes: potential role of PGC1 and NRF1. Proc Natl Acad Sci USA 2003; 100: 8466–8471.
Petersen KF, Dufour S, Befroy D, Garcia R, Shulman GI : Impaired mitochondrial activity in the insulin-resistant offspring of patients with type 2 diabetes. N Engl J Med 2004; 350: 664–671.
Wang K, Li M, Bucan M : Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 2007; 81: 1278–1283.
Purcell S, Neale B, Todd-Brown K et al: PLINK: a tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 2007; 81: 559–575.
Zhou H, Wei LJ, Xu X : Combining association tests across multiple genetic markers in case-control studies. Hum Hered 2008; 65: 166–174.
Meng Z, Zaykin DV, Xu CF, Wagner M, Ehm MG : Selection of genetic markers for association analyses, using linkage disequilibrium and haplotypes. Am J Hum Genet 2003; 73: 115–130.
Lin Z, Altman RB : Finding haplotype tagging SNPs by use of principal components analysis. Am J Hum Genet 2004; 75: 850–861.
Nothnagel M, Furst R, Rohde K : Entropy as a measure for linkage disequilibrium over multilocus haplotype blocks. Hum Hered 2002; 54: 186–198.
Kanehisa M, Goto S, Hattori M et al: From genomics to chemical genomics: new developments in KEGG. Nucleic Acids Res 2006; 34: D354–D357.
Ashburner M, Ball CA, Blake JA et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000; 25: 25–29.
Fields LE, Burt VL, Cutler JA, Hughes J, Roccella EJ, Sorlie P : The burden of adult hypertension in the United States 1999 to 2000: a rising tide. Hypertension 2004; 44: 398–404.
Arnett DK, Baird AE, Barkley RA et al: Relevance of genetics and genomics for prevention and treatment of cardiovascular disease: a scientific statement from the American Heart Association Council on Epidemiology and Prevention, the Stroke Council, and the Functional Genomics and Translational Biology Interdisciplinary Working Group. Circulation 2007; 115: 2878–2901.
Ioannidis JP : Genetic associations: false or true? Trends Mol Med 2003; 9: 135–138.
Ioannidis JP, Trikalinos TA, Ntzani EE, Contopoulos-Ioannidis DG : Genetic associations in large versus small studies: an empirical assessment. Lancet 2003; 361: 567–571.
Lohmueller KE, Pearce CL, Pike M, Lander ES, Hirschhorn JN : Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat Genet 2003; 33: 177–182.
Cheung VG, Spielman RS, Ewens KG, Weber TM, Morley M, Burdick JT : Mapping determinants of human gene expression by regional and genome-wide association. Nature 2005; 437: 1365–1369.
Zhang W, Duan S, Kistner EO et al: Evaluation of genetic variation contributing to differences in gene expression between populations. Am J Hum Genet 2008; 82: 631–640.
Li J, Burmeister M : Genetical genomics: combining genetics with gene expression analysis. Hum Mol Genet 2005; 14 (Spec no 2): R163–R169.
Stranger BE, Nica AC, Forrest MS et al: Population genomics of human gene expression. Nat Genet 2007; 39: 1217–1224.
Chasman DI : On the utility of gene set methods in genomewide association studies of quantitative traits. Genet Epidemiol 2008; 32: 658–668.
Yu K, Li Q, Bergen AW et al: Pathway analysis by adaptive combination of P-values. Genet Epidemiol 2009; 33: 700–709.
Peng G, Luo L, Siu H et al: Gene and pathway-based second-wave analysis of genome-wide association studies. Eur J Hum Genet 2010; 18: 111–117.
Cantor RM, Lange K, Sinsheimer JS : Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am J Hum Genet 2010; 86: 6–22.
Efron B, Tibshirani R : On testing the significance of sets of genes. Ann Appl Stat 2007; 1: 107–129.
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS, Park PJ : Discovering statistically significant pathways in expression profiling studies. Proc Natl Acad Sci USA 2005; 102: 13544–13549.
Goeman JJ, Buhlmann P : Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 2007; 23: 980–987.
Yan X, Sun F : Testing gene set enrichment for subset of genes: Sub-GSE. BMC Bioinformatics 2008; 9: 362.
Manolio TA, Collins FS, Cox NJ et al: Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753.
Acknowledgements
This research is supported in part by an NIH grants HL091028 and HL071782, and an AHA grant 0855626G.
Links
KEGG: Kyoto Encyclopedia of Genes and Genomes: http://www.genome.jp/kegg/
BioCarta: http://www.biocarta.com
Gene Ontology: http://www.geneontology.org
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies the paper on European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Yang, W., de las Fuentes, L., Dávila-Román, V. et al. Variable set enrichment analysis in genome-wide association studies. Eur J Hum Genet 19, 893–900 (2011). https://doi.org/10.1038/ejhg.2011.46
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2011.46