Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data

Abstract

Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: PheWAS replication of NHGRI Catalog SNP-phenotype associations.
Figure 2: GWAS and PheWAS associations in the genome.
Figure 3: PheWAS plots for four SNPs.
Figure 4: Risk variants for skin phenotypes have different pleiotropy patterns.

Similar content being viewed by others

References

  1. Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Helgadottir, A. et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491–1493 (2007).

    Article  CAS  PubMed  Google Scholar 

  3. Helgadottir, A. et al. The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat. Genet. 40, 217–224 (2008).

    Article  CAS  PubMed  Google Scholar 

  4. Lees, C.W., Barrett, J.C., Parkes, M. & Satsangi, J. New IBD genetics: common pathways with other diseases. Gut 60, 1739–1753 (2011).

    Article  CAS  PubMed  Google Scholar 

  5. Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Ramos, P.S. et al. A comprehensive analysis of shared loci between systemic lupus erythematosus (SLE) and sixteen autoimmune diseases reveals limited genetic overlap. PLoS Genet. 7, e1002406 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Frazer, K.A., Murray, S.S., Schork, N.J. & Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).

    Article  CAS  PubMed  Google Scholar 

  8. Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Denny, J.C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Pendergrass, S.A. et al. Phenome-wide association study (PheWAS) for detection of pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet. 9, e1003087 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Denny, J.C. et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89, 529–542 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Ritchie, M.D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  13. Hebbring, S.J. et al. A PheWAS approach in studying HLA-DRB1*1501. Genes Immun. 14, 187–191 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  14. McCarty, C.A. et al. The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med. Genomics 4, 13 (2011).

    Article  PubMed  PubMed Central  Google Scholar 

  15. Kho, A.N. et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19, 212–218 (2012).

    Article  PubMed  Google Scholar 

  16. Klompas, M. et al. Automated detection and classification of type 1 versus type 2 diabetes using electronic health record data. Diabetes Care 10.2337/dc12-0964 (2012).

  17. He, C. et al. MHC fine mapping of human type 1 diabetes using the T1DGC data. Diabetes Obes. Metab. 11 (suppl. 1), 53–59 (2009).

    Article  PubMed  PubMed Central  Google Scholar 

  18. Plenge, R.M. et al. TRAF1–C5 as a risk locus for rheumatoid arthritis–a genomewide study. N. Engl. J. Med. 357, 1199–1209 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Tanaka, T. et al. A genome-wide association analysis of serum iron concentrations. Blood 115, 94–96 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Kolz, M. et al. Meta-analysis of 28,141 individuals identifies common variants within five new loci that influence uric acid concentrations. PLoS Genet. 5, e1000504 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. McDavid, A. et al. Enhancing the power of genetic association studies through the use of silver standard cases derived from electronic medical records. PLoS ONE 8, e63481 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Dokal, I. Dyskeratosis congenita. Hematology (Am. Soc. Hematol. Educ. Program) 2011, 480–486 (2011).

    Article  Google Scholar 

  23. Han, J. et al. A germline variant in the interferon regulatory factor 4 gene as a novel skin cancer risk locus. Cancer Res. 71, 1533–1539 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Gudbjartsson, D.F. et al. ASIP and TYR pigmentation variants associate with cutaneous melanoma and basal cell carcinoma. Nat. Genet. 40, 886–891 (2008).

    Article  CAS  PubMed  Google Scholar 

  25. Kanetsky, P.A. et al. Does MC1R genotype convey information about melanoma risk beyond risk phenotypes? Cancer 116, 2416–2428 (2010).

    CAS  PubMed  Google Scholar 

  26. Hanauer, D.A., Rhodes, D.R. & Chinnaiyan, A.M. Exploring clinical associations using '-omics' based enrichment analyses. PLoS ONE 4, e5203 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Roque, F.S. et al. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput. Biol. 7, e1002141 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Rzhetsky, A., Wajngurt, D., Park, N. & Zheng, T. Probing genetic overlap among complex human phenotypes. Proc. Natl. Acad. Sci. USA 104, 11694–11699 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Hoffmann, T.J. et al. Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. Genomics 98, 79–89 (2011).

    Article  CAS  PubMed  Google Scholar 

  30. Collins, R. What makes UK Biobank special? Lancet 379, 1173–1174 (2012).

    Article  PubMed  Google Scholar 

  31. Blumenthal, D. & Tavenner, M. The 'meaningful use' regulation for electronic health records. N. Engl. J. Med. 363, 501–504 (2010).

    Article  CAS  PubMed  Google Scholar 

  32. Friedman, C., Shagina, L., Lussier, Y. & Hripcsak, G. Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc. 11, 392–402 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Wilke, R.A. et al. The emerging role of electronic medical records in pharmacogenomics. Clin. Pharmacol. Ther. 89, 379–386 (2011).

    Article  CAS  PubMed  Google Scholar 

  34. Denny, J.C., Arndt, F.V., Dupont, W.D. & Neilson, E.G. Increased hospital mortality in patients with bedside hippus. Am. J. Med. 121, 239–245 (2008).

    Article  PubMed  Google Scholar 

  35. Lohmueller, K.E., Pearce, C.L., Pike, M., Lander, E.S. & Hirschhorn, J.N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003).

    Article  CAS  PubMed  Google Scholar 

  36. Roden, D.M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 84, 362–369 (2008).

    Article  CAS  PubMed  Google Scholar 

  37. McCarty, C.A., Wilke, R.A., Giampietro, P.F., Wesbrook, S.D. & Caldwell, M.D. Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. Pers. Med. 2, 49–79 (2005).

    Article  Google Scholar 

  38. Pulley, J., Clayton, E., Bernard, G.R., Roden, D.M. & Masys, D.R. Principles of human subjects protections applied in an opt-out, de-identified biobank. Clin. Transl. Sci. 3, 42–48 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  39. Turner, S. et al. Quality control procedures for genome-wide association studies. Curr. Protoc. Hum. Genet. 68, 1.19 (2011).

    Google Scholar 

  40. Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  41. Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).

    Article  CAS  PubMed  Google Scholar 

  42. Cowen, M.E. et al. Casemix adjustment of managed care claims data using the clinical classification for health policy research method. Med. Care 36, 1108–1113 (1998).

    Article  CAS  PubMed  Google Scholar 

  43. Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Ritchie, M.D. et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet. 86, 560–572 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  45. Johnson, A.D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Gauderman, W.J. Sample size requirements for association studies of gene-gene interaction. Am. J. Epidemiol. 155, 478–484 (2002).

    Article  PubMed  Google Scholar 

  47. Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B 125, 289–300 (1995).

    Google Scholar 

  48. Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

This work was supported by the eMERGE Network, initiated and funded by the National Human Genome Research Institute (NHGRI), with additional funding from the National Institute of General Medical Sciences (NIGMS), through the following grants: U01-HG004610 and U01-HG006375 (Group Health Cooperative/University of Washington); U01-HG004608 (Marshfield Clinic); U01-HG004599 and U01-HG006379 (Mayo Clinic); U01-HG004609 and U01-HG006388 (Northwestern University); U01-HG006389 (Essentia Institute of Rural Health/Marshfield Clinic); U01-HG004603 and U01-HG006378 (Vanderbilt University); and U01-HG006385 (Vanderbilt University serving as the Coordinating Center). Funding support for eMERGE genotyping was provided by NHGRI through the grants: U01-HG004424 (The Broad Institute) and U01-HG004438 (Johns Hopkins University, Center for Inherited Disease Research). Replication genotypes were derived from a pharmacogenomics resource supported by NIGMS RC2-GM092318. Development of the PheWAS method is also supported by R01-LM010685 from the National Library of Medicine. BioVU received and continues to receive support through the National Center for Research Resources UL1 RR024975, which is now the National Center for Advancing Translational Sciences, 2 UL1 TR000445. Additional support for this work at the University of Washington was partially provided by the National Center for Advancing Translational Sciences grant UL1TR000427.

Author information

Authors and Affiliations

Authors

Contributions

The experiment was conceived by J.C.D., L.B., D.R.M., and D.M.R. J.C.D. and L.B. designed the final PheWAS algorithm, phenotype classification and matching to NHGRI Catalog phenotypes. L.B. performed the PheWAS. Statistical analysis was performed by J.C.D., L.B., R.J.C. and J.D.M. eMERGE Phenotype algorithms were developed primarily by D.S.C., A.N.K. and J.C.D. Novel phenotype algorithms for skin phenotypes were generated and executed by J.C.D., L.B. and R.Z. and evaluated by J.D.M. S.A.P. performed power calculations. J.R.F., J.C.D. and L.B. reviewed the literature for previous publications for each SNP. Genetic quality control and the merged data set were performed by M.D.R. with input from D.C.C., D.R.C. and J.L.H. Data were provided by D.S.C., P.L.P., A.N.K., J.A.P., L.V.R., D.R.C., P.K.C., J.P., S.J.B. and M.A.B. J.C.D., L.B. and D.M.R. drafted the manuscript, with substantial revision and direction by D.R.M., J.L.H., D.C.C., M.D.R. and J.R.F. Guidance and critical revision were provided by T.A.M. and L.A.H. All authors edited the manuscript.

Corresponding author

Correspondence to Joshua C Denny.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 and Supplementary Tables 1–9 (PDF 6653 kb)

Source data

Rights and permissions

Reprints and permissions

About this article

Cite this article

Denny, J., Bastarache, L., Ritchie, M. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol 31, 1102–1111 (2013). https://doi.org/10.1038/nbt.2749

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/nbt.2749

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research