Abstract
The last decade of human genetic research witnessed the completion of hundreds of genome-wide association studies (GWASs). However, the genetic variants discovered through these efforts account for only a small proportion of the heritability of complex traits. One explanation for the missing heritability is that the common analysis approach, assessing the effect of each single-nucleotide polymorphism (SNP) individually, is not well suited to the detection of small effects of multiple SNPs. Gene set analysis (GSA) is one of several approaches that may contribute to the discovery of additional genetic risk factors for complex traits. Complex phenotypes are thought to be controlled by networks of interacting biochemical and physiological pathways influenced by the products of sets of genes. By assessing the overall evidence of association of a phenotype with all measured variation in a set of genes, GSA may identify functionally relevant sets of genes corresponding to relevant biomolecular pathways, which will enable more focused studies of genetic risk factors. This approach may thus contribute to the discovery of genetic variants responsible for some of the missing heritability. With the increased use of these approaches for the secondary analysis of data from GWAS, it is important to understand the different GSA methods and their strengths and weaknesses, and consider challenges inherent in these types of analyses. This paper provides an overview of GSA, highlighting the key challenges, potential solutions, and directions for ongoing research.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Hindorff LA, Sethupathy P, Junkins HA et al: Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc Natl Acad Sci USA 2009; 106: 9362–9367.
Eichler EE, Flint J, Gibson G et al: Missing heritability and strategies for finding the underlying causes of complex disease. Nat Rev 2010; 11: 446–450.
Manolio TA, Collins FS, Cox NJ et al: Finding the missing heritability of complex diseases. Nature 2009; 461: 747–753.
Hirschhorn JN, Daly MJ : Genome-wide association studies for common diseases and complex traits. Nat Rev 2005; 6: 95–108.
Cantor RM, Lange K, Sinsheimer JS : Prioritizing GWAS results: a review of statistical methods and recommendations for their application. Am J Hum Genet 2010; 86: 6–22.
Wang K, Li M, Hakonarson H : Analysing biological pathways in genome-wide association studies. Nat Rev 2010; 11: 843–854.
Holmans P : Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. Adv Genet 2010; 72: pp 141–179.
Conti DV, Cortessis V, Molitor J et al: Bayesian modeling of complex metabolic pathways. Hum Hered 2003; 56: 83–93.
Ideker T, Thorsson V, Ranish JA et al: Integrated genomic and proteomic analyses of a systematically perturbed metabolic network. Science 2001; 292: 929–934.
Subramanian A, Tamayo P, Mootha VK et al: Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci USA 2005; 102: 15545–15550.
Goeman JJ, Buhlmann P : Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 2007; 23: 980–987.
Allison DB, Cui X, Page GP et al: Microarray data analysis: from disarray to consolidation and consensus. Nat Rev 2006; 7: 55–65.
Holmans P, Green EK, Pahwa JS et al: Gene ontology analysis of GWA study data sets provides insights into the biology of bipolar disorder. Am J Hum Genet 2009; 85: 13–24.
Wang K, Li M, Bucan M : Pathway-based approaches for analysis of genomewide association studies. Am J Hum Genet 2007; 81: 1278–1283.
Chen LS, Hutter CM, Potter JD et al: Insights into colon cancer etiology via a regularized approach to gene set analysis of GWAS data. Am J Hum Genet 2010; 86: 860–871.
Menashe I, Maeder D, Garcia-Closas M et al: Pathway analysis of breast cancer genome-wide association study highlights three pathways and one canonical signaling cascade. Cancer Res 2010; 70: 4453–4459.
Lambert JC, Grenier-Boley B, Chouraki V et al: Implication of the immune system in alzheimer's disease: evidence from genome-wide pathway analysis. J Alzheimers Dis 2010; 20: 1107–1118.
Baranzini SE, Galwey NW, Wang J et al: Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 2009; 18: 2078–2090.
Zhang L, Guo YF, Liu YZ et al: Pathway-based genome-wide association analysis identified the importance of regulation-of-autophagy pathway for ultradistal radius BMD. J Bone Miner Res 2010; 25: 1572–1580.
Torkamani A, Topol EJ, Schork NJ : Pathway analysis of seven common diseases assessed by genome-wide association. Genomics 2008; 92: 265–272.
O’Dushlaine C, Kenny E, Heron EA et al: The SNP ratio test: pathway analysis of genome-wide association datasets. Bioinformatics 2009; 25: 2762–2763.
Medina I, Montaner D, Bonifaci N et al: Gene set-based analysis of polymorphisms: finding pathways or biological processes associated to traits in genome-wide association studies. Nucleic Acids Res 2009; 37: W340–W344.
Efron B, Tibshirani R : On testing the significance of sets of genes. Ann Appl Stat 2007; 1: 107.
Kanehisa M, Goto S : KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000; 28: 27–30.
Bader GD, Cary MP, Sander C : Pathguide: a pathway resource list. Nucleic Acids Res 2006; 34: D504–D506.
Ashburner M, Ball CA, Blake JA et al: Gene ontology: tool for the unification of biology. The gene ontology consortium. Nat Genet 2000; 25: 25–29.
Bard JB, Rhee SY : Ontologies in biology: design, applications and future challenges. Nat Rev 2004; 5: 213–222.
Viswanathan GA, Nudelman G, Patil S et al: BioPP: a tool for web-publication of biological networks. BMC Bioinformatics 2007; 8: 168.
Marchini J, Howie B : Genotype imputation for genome-wide association studies. Nat Rev 2010; 11: 499–511.
Smith AV, Thomas DJ, Munro HM et al: Sequence features in regions of weak and strong linkage disequilibrium. Genome Res 2005; 15: 1519–1534.
Zhong H, Yang X, Kaplan LM et al: Integrating pathway analysis and genetics of gene expression for genome-wide association studies. Am J Hum Genet 2010; 86: 581–591.
Pickrell JK, Marioni JC, Pai AA et al: Understanding mechanisms underlying human gene expression variation with RNA sequencing. Nature 2010; 464: 768–772.
Gamazon ER, Zhang W, Konkashbaev A et al: SCAN: SNP and copy number annotation. Bioinformatics 2010; 26: 259–262.
Veyrieras JB, Kudaravalli S, Kim SY et al: High-resolution mapping of expression-QTLs yields insight into human gene regulation. PLoS Genet 2008; 4: e1000214.
De la Cruz O, Wen X, Ke B et al: Gene, region and pathway level analyses in whole-genome studies. Genet Epidemiol 2010; 34: 222–231.
Ballard DH, Cho J, Zhao H : Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet Epidemiol 2010; 34: 201–212.
Gauderman WJ, Murcray C, Gilliland F et al: Testing association between disease and multiple SNPs in a candidate gene. Genet Epidemiol 2007; 31: 383–395.
Yu K, Li Q, Bergen AW et al: Pathway analysis by adaptive combination of P-values. Genet Epidemiol 2009; 33: 700–709.
Whitlock MC : Combining probability from independent tests: the weighted Z-method is superior to Fisher's approach. J Evol Biol 2005; 18: 1368–1373.
Zaykin DV, Zhivotovsky LA, Czika W et al: Combining P-values in large-scale genomics experiments. Pharm Stat 2007; 6: 217–226.
Zaykin DV, Zhivotovsky LA, Westfall PH et al: Truncated product method for combining P-values. Genet Epidemiol 2002; 22: 170–185.
Chai HS, Sicotte H, Bailey KR et al: GLOSSI: a method to assess the association of genetic loci-sets with complex diseases. BMC Bioinformatics 2009; 10: 102.
Dudbridge F, Koeleman BP : Rank truncated product of P-values, with application to genomewide association scans. Genet Epidemiol 2003; 25: 360–366.
Malo N, Libiger O, Schork NJ : Accommodating linkage disequilibrium in genetic-association analyses via ridge regression. Am J Hum Genet 2008; 82: 375–385.
Tibshirani R : Regression shrinkage and selection via the lasso. J Roy Statist Soc Ser B (Methodological) 1996; 58: 267–288.
Lunn DJ, Whittaker JC, Best N : A Bayesian toolkit for genetic association studies. Genet Epidemiol 2006; 30: 231–247.
Conti DV, Witte JS : Hierarchical modeling of linkage disequilibrium: genetic structure and spatial relations. Am J Hum Genet 2003; 72: 351–363.
Kwee LC, Liu D, Lin X et al: A powerful and flexible multilocus association test for quantitative traits. Am J Hum Genet 2008; 82: 386–397.
Liu D, Lin X, Ghosh D : Semiparametric regression of multidimensional genetic pathway data: least-squares kernel machines and linear mixed models. Biometrics 2007; 63: 1079–1088.
Manly BFJ : Randomization, Bootstrap and Monte Carlo Methods in Biology, 3rd edn. FL Chapman & Hall: Boca Raton, 2006.
Elbers CC, van Eijk KR, Franke L et al: Using genome-wide pathway analysis to unravel the etiology of complex diseases. Genet Epidemiol 2009; 33: 419–431.
Fridley BL, Jenkins GD, Biernacka JM : Self-contained gene-set analysis of expression data: an evaluation of existing and novel methods. PLoS One 2010; 5: e12693.
Price AL, Zaitlen NA, Reich D et al: New approaches to population stratification in genome-wide association studies. Nat Rev 2010; 11: 459–463.
Clayton DG, Walker NM, Smyth DJ et al: Population structure, differential bias and genomic control in a large-scale, case-control association study. Nat Genet 2005; 37: 1243–1246.
Herold C, Steffens M, Brockschmidt FF et al: INTERSNP: genome-wide interaction analysis guided by a priori information. Bioinformatics 2009; 25: 3275–3281.
Zamar D, Tripp B, Ellis G et al: Path: a tool to facilitate pathway-based genetic association analysis. Bioinformatics 2009; 25: 2444–2446.
Luo L, Peng G, Zhu Y et al: Genome-wide gene and pathway analysis. Eur J Hum Genet 2010; 18: 1045–1053.
Chen L, Zhang L, Zhao Y et al: Prioritizing risk pathways: a novel association approach to searching for disease pathways fusing SNPs and pathways. Bioinformatics 2009; 25: 237–242.
Chen X, Wang L, Hu B et al: Pathway-based analysis for genome-wide association studies using supervised principal components. Genet Epidemiol 2010; 34: 716–724.
Nam D, Kim J, Kim SY et al: GSA-SNP: a general approach for gene set analysis of polymorphisms. Nucleic Acids Res 2010; 38: W749–W754.
Holden M, Deng S, Wojnowski L et al: GSEA-SNP: applying gene set enrichment analysis to SNP data from genome-wide association studies. Bioinformatics 2008; 24: 2784–2785.
Zhang K, Cui S, Chang S et al: i-GSEA4GWAS: a web server for identification of pathways/gene sets associated with traits by applying an improved gene set enrichment analysis to genome-wide association study. Nucleic Acids Res 2010; 38: W90–W95.
Acknowledgements
This research was supported by the Minnesota Partnership for Biotechnology and Medical Genomics grant, NCI grant CA136393 (Mayo Clinic SPORE in Ovarian Cancer), NCI grant CA140879, and NIAAA grant R03 AA019570. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no conflicts of interest.
Rights and permissions
About this article
Cite this article
Fridley, B., Biernacka, J. Gene set analysis of SNP data: benefits, challenges, and future directions. Eur J Hum Genet 19, 837–843 (2011). https://doi.org/10.1038/ejhg.2011.57
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2011.57
Keywords
This article is cited by
-
Working memory dysfunction in fibromyalgia is associated with genotypes of the catechol- O-methyltransferase gene: an event-related potential study
European Archives of Psychiatry and Clinical Neuroscience (2023)
-
Genomic architecture of phenotypic extremes in a wild cervid
BMC Genomics (2022)
-
Model-based clustering for identifying disease-associated SNPs in case-control genome-wide association studies
Scientific Reports (2019)
-
Methods and results from the genome-wide association group at GAW20
BMC Genetics (2018)
-
Variance explained by whole genome sequence variants in coding and regulatory genome annotations for six dairy traits
BMC Genomics (2018)