Abstract
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
This is a preview of subscription content, access via your institution
Access options
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
References
Hindorff, L.A. et al. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. Proc. Natl. Acad. Sci. USA 106, 9362–9367 (2009).
Helgadottir, A. et al. A common variant on chromosome 9p21 affects the risk of myocardial infarction. Science 316, 1491–1493 (2007).
Helgadottir, A. et al. The same sequence variant on 9p21 associates with myocardial infarction, abdominal aortic aneurysm and intracranial aneurysm. Nat. Genet. 40, 217–224 (2008).
Lees, C.W., Barrett, J.C., Parkes, M. & Satsangi, J. New IBD genetics: common pathways with other diseases. Gut 60, 1739–1753 (2011).
Cotsapas, C. et al. Pervasive sharing of genetic effects in autoimmune disease. PLoS Genet. 7, e1002254 (2011).
Ramos, P.S. et al. A comprehensive analysis of shared loci between systemic lupus erythematosus (SLE) and sixteen autoimmune diseases reveals limited genetic overlap. PLoS Genet. 7, e1002406 (2011).
Frazer, K.A., Murray, S.S., Schork, N.J. & Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).
Sivakumaran, S. et al. Abundant pleiotropy in human complex diseases and traits. Am. J. Hum. Genet. 89, 607–618 (2011).
Denny, J.C. et al. PheWAS: demonstrating the feasibility of a phenome-wide scan to discover gene-disease associations. Bioinformatics 26, 1205–1210 (2010).
Pendergrass, S.A. et al. Phenome-wide association study (PheWAS) for detection of pleiotropy within the Population Architecture using Genomics and Epidemiology (PAGE) Network. PLoS Genet. 9, e1003087 (2013).
Denny, J.C. et al. Variants near FOXE1 are associated with hypothyroidism and other thyroid conditions: using electronic medical records for genome- and phenome-wide studies. Am. J. Hum. Genet. 89, 529–542 (2011).
Ritchie, M.D. et al. Genome- and phenome-wide analyses of cardiac conduction identifies markers of arrhythmia risk. Circulation 127, 1377–1385 (2013).
Hebbring, S.J. et al. A PheWAS approach in studying HLA-DRB1*1501. Genes Immun. 14, 187–191 (2013).
McCarty, C.A. et al. The eMERGE Network: A consortium of biorepositories linked to electronic medical records data for conducting genomic studies. BMC Med. Genomics 4, 13 (2011).
Kho, A.N. et al. Use of diverse electronic medical record systems to identify genetic risk for type 2 diabetes within a genome-wide association study. J. Am. Med. Inform. Assoc. 19, 212–218 (2012).
Klompas, M. et al. Automated detection and classification of type 1 versus type 2 diabetes using electronic health record data. Diabetes Care 10.2337/dc12-0964 (2012).
He, C. et al. MHC fine mapping of human type 1 diabetes using the T1DGC data. Diabetes Obes. Metab. 11 (suppl. 1), 53–59 (2009).
Plenge, R.M. et al. TRAF1–C5 as a risk locus for rheumatoid arthritis–a genomewide study. N. Engl. J. Med. 357, 1199–1209 (2007).
Tanaka, T. et al. A genome-wide association analysis of serum iron concentrations. Blood 115, 94–96 (2010).
Kolz, M. et al. Meta-analysis of 28,141 individuals identifies common variants within five new loci that influence uric acid concentrations. PLoS Genet. 5, e1000504 (2009).
McDavid, A. et al. Enhancing the power of genetic association studies through the use of silver standard cases derived from electronic medical records. PLoS ONE 8, e63481 (2013).
Dokal, I. Dyskeratosis congenita. Hematology (Am. Soc. Hematol. Educ. Program) 2011, 480–486 (2011).
Han, J. et al. A germline variant in the interferon regulatory factor 4 gene as a novel skin cancer risk locus. Cancer Res. 71, 1533–1539 (2011).
Gudbjartsson, D.F. et al. ASIP and TYR pigmentation variants associate with cutaneous melanoma and basal cell carcinoma. Nat. Genet. 40, 886–891 (2008).
Kanetsky, P.A. et al. Does MC1R genotype convey information about melanoma risk beyond risk phenotypes? Cancer 116, 2416–2428 (2010).
Hanauer, D.A., Rhodes, D.R. & Chinnaiyan, A.M. Exploring clinical associations using '-omics' based enrichment analyses. PLoS ONE 4, e5203 (2009).
Roque, F.S. et al. Using electronic patient records to discover disease correlations and stratify patient cohorts. PLoS Comput. Biol. 7, e1002141 (2011).
Rzhetsky, A., Wajngurt, D., Park, N. & Zheng, T. Probing genetic overlap among complex human phenotypes. Proc. Natl. Acad. Sci. USA 104, 11694–11699 (2007).
Hoffmann, T.J. et al. Next generation genome-wide association tool: design and coverage of a high-throughput European-optimized SNP array. Genomics 98, 79–89 (2011).
Collins, R. What makes UK Biobank special? Lancet 379, 1173–1174 (2012).
Blumenthal, D. & Tavenner, M. The 'meaningful use' regulation for electronic health records. N. Engl. J. Med. 363, 501–504 (2010).
Friedman, C., Shagina, L., Lussier, Y. & Hripcsak, G. Automated encoding of clinical documents based on natural language processing. J. Am. Med. Inform. Assoc. 11, 392–402 (2004).
Wilke, R.A. et al. The emerging role of electronic medical records in pharmacogenomics. Clin. Pharmacol. Ther. 89, 379–386 (2011).
Denny, J.C., Arndt, F.V., Dupont, W.D. & Neilson, E.G. Increased hospital mortality in patients with bedside hippus. Am. J. Med. 121, 239–245 (2008).
Lohmueller, K.E., Pearce, C.L., Pike, M., Lander, E.S. & Hirschhorn, J.N. Meta-analysis of genetic association studies supports a contribution of common variants to susceptibility to common disease. Nat. Genet. 33, 177–182 (2003).
Roden, D.M. et al. Development of a large-scale de-identified DNA biobank to enable personalized medicine. Clin. Pharmacol. Ther. 84, 362–369 (2008).
McCarty, C.A., Wilke, R.A., Giampietro, P.F., Wesbrook, S.D. & Caldwell, M.D. Marshfield Clinic Personalized Medicine Research Project (PMRP): design, methods and recruitment for a large population-based biobank. Pers. Med. 2, 49–79 (2005).
Pulley, J., Clayton, E., Bernard, G.R., Roden, D.M. & Masys, D.R. Principles of human subjects protections applied in an opt-out, de-identified biobank. Clin. Transl. Sci. 3, 42–48 (2010).
Turner, S. et al. Quality control procedures for genome-wide association studies. Curr. Protoc. Hum. Genet. 68, 1.19 (2011).
Pritchard, J.K., Stephens, M. & Donnelly, P. Inference of population structure using multilocus genotype data. Genetics 155, 945–959 (2000).
Price, A.L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nat. Genet. 38, 904–909 (2006).
Cowen, M.E. et al. Casemix adjustment of managed care claims data using the clinical classification for health policy research method. Med. Care 36, 1108–1113 (1998).
Purcell, S. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Ritchie, M.D. et al. Robust replication of genotype-phenotype associations across multiple diseases in an electronic medical record. Am. J. Hum. Genet. 86, 560–572 (2010).
Johnson, A.D. et al. SNAP: a web-based tool for identification and annotation of proxy SNPs using HapMap. Bioinformatics 24, 2938–2939 (2008).
Gauderman, W.J. Sample size requirements for association studies of gene-gene interaction. Am. J. Epidemiol. 155, 478–484 (2002).
Benjamini, Y. & Hochberg, Y. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B 125, 289–300 (1995).
Willer, C.J., Li, Y. & Abecasis, G.R. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 26, 2190–2191 (2010).
Acknowledgements
This work was supported by the eMERGE Network, initiated and funded by the National Human Genome Research Institute (NHGRI), with additional funding from the National Institute of General Medical Sciences (NIGMS), through the following grants: U01-HG004610 and U01-HG006375 (Group Health Cooperative/University of Washington); U01-HG004608 (Marshfield Clinic); U01-HG004599 and U01-HG006379 (Mayo Clinic); U01-HG004609 and U01-HG006388 (Northwestern University); U01-HG006389 (Essentia Institute of Rural Health/Marshfield Clinic); U01-HG004603 and U01-HG006378 (Vanderbilt University); and U01-HG006385 (Vanderbilt University serving as the Coordinating Center). Funding support for eMERGE genotyping was provided by NHGRI through the grants: U01-HG004424 (The Broad Institute) and U01-HG004438 (Johns Hopkins University, Center for Inherited Disease Research). Replication genotypes were derived from a pharmacogenomics resource supported by NIGMS RC2-GM092318. Development of the PheWAS method is also supported by R01-LM010685 from the National Library of Medicine. BioVU received and continues to receive support through the National Center for Research Resources UL1 RR024975, which is now the National Center for Advancing Translational Sciences, 2 UL1 TR000445. Additional support for this work at the University of Washington was partially provided by the National Center for Advancing Translational Sciences grant UL1TR000427.
Author information
Authors and Affiliations
Contributions
The experiment was conceived by J.C.D., L.B., D.R.M., and D.M.R. J.C.D. and L.B. designed the final PheWAS algorithm, phenotype classification and matching to NHGRI Catalog phenotypes. L.B. performed the PheWAS. Statistical analysis was performed by J.C.D., L.B., R.J.C. and J.D.M. eMERGE Phenotype algorithms were developed primarily by D.S.C., A.N.K. and J.C.D. Novel phenotype algorithms for skin phenotypes were generated and executed by J.C.D., L.B. and R.Z. and evaluated by J.D.M. S.A.P. performed power calculations. J.R.F., J.C.D. and L.B. reviewed the literature for previous publications for each SNP. Genetic quality control and the merged data set were performed by M.D.R. with input from D.C.C., D.R.C. and J.L.H. Data were provided by D.S.C., P.L.P., A.N.K., J.A.P., L.V.R., D.R.C., P.K.C., J.P., S.J.B. and M.A.B. J.C.D., L.B. and D.M.R. drafted the manuscript, with substantial revision and direction by D.R.M., J.L.H., D.C.C., M.D.R. and J.R.F. Guidance and critical revision were provided by T.A.M. and L.A.H. All authors edited the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing financial interests.
Supplementary information
Supplementary Text and Figures
Supplementary Figures 1–5 and Supplementary Tables 1–9 (PDF 6653 kb)
Source data
Rights and permissions
About this article
Cite this article
Denny, J., Bastarache, L., Ritchie, M. et al. Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data. Nat Biotechnol 31, 1102–1111 (2013). https://doi.org/10.1038/nbt.2749
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/nbt.2749
This article is cited by
-
Polygenic risk score-based phenome-wide association study of head and neck cancer across two large biobanks
BMC Medicine (2024)
-
Development of a human genetics-guided priority score for 19,365 genes and 399 drug indications
Nature Genetics (2024)
-
Blood donor biobank as a resource in personalised biomedical genetic research
European Journal of Human Genetics (2024)
-
Pharmacogenetic and clinical risk factors for bevacizumab-related gastrointestinal hemorrhage in prostate cancer patients treated on CALGB 90401 (Alliance)
The Pharmacogenomics Journal (2024)
-
Clinical correlates of CT imaging-derived phenotypes among lean and overweight patients with hepatic steatosis
Scientific Reports (2024)