Abstract
Recent breakthroughs in exome-sequencing technology have made possible the identification of many causal variants of monogenic disorders. Although extremely powerful when closely related individuals (eg, child and parents) are simultaneously sequenced, sequencing of a single case is often unsuccessful due to the large number of variants that need to be followed up for functional validation. Many approaches filter out common variants above a given frequency threshold (eg, 1%), and then prioritize the remaining variants according to their functional, structural and conservation properties. Here we present methods that leverage the genetic structure across different populations to improve filtering performance while accounting for the finite sample size of the reference panels. We show that leveraging genetic structure reduces the number of variants that need to be followed up by 16% in simulations and by up to 38% in empirical data of 20 exomes from individuals with monogenic disorders for which the causal variants are known.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Gilissen C, Hoischen A, Brunner HG, Veltman JA : Unlocking Mendelian disease using exome sequencing. Genome Biol 2011; 12: 228.
Bamshad MJ, Ng SB, Bigham AW et al: Exome sequencing as a tool for Mendelian disease gene discovery. Nat Rev Genet 2011; 12: 745–755.
Ng SB, Buckingham KJ, Lee C et al: Exome sequencing identifies the cause of a mendelian disorder. Nat Genet 2010; 42: 30–35.
Ku CS, Naidoo N, Pawitan Y : Revisiting Mendelian disorders through exome sequencing. Hum Genet 2011; 129: 351–370.
Ng PC, Henikoff S : Predicting the effects of amino acid substitutions on protein function. Annu Rev Genomics Hum Genet 2006; 7: 61–80.
Nishino J, Mano S : The number of candidate variants in exome sequencing for mendelian disease under no genetic heterogeneity. Comput Math Methods Med 2013; 2013: 179761.
Javed A, Agrawal S, Ng PC : Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat Methods 2014; 11: 935–937.
Robinson P, Kohler S, Oellrich A et al: Improved exome prioritization of disease genes through cross species phenotype comparison. Genome Res 2013; 24: 340–348.
Gonzalez-Perez A, Lopez-Bigas N : Improving the assessment of the outcome of nonsynonymous SNVs with a consensus deleteriousness score, Condel. Am J Hum Genet 2011; 88: 440–449.
Lopes MC, Joyce C, Ritchie GR et al: A combined functional annotation score for non-synonymous variants. Hum Hered 2012; 73: 47–51.
Li MX, Kwan JS, Bao SY et al: Predicting mendelian disease-causing non-synonymous single nucleotide variants in exome sequencing studies. PLoS Genet 2013; 9: e1003143.
Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J : A general framework for estimating the relative pathogenicity of human genetic variants. Nat Genet 2014; 46: 310–315.
Ng SB, Bigham AW, Buckingham KJ et al: Exome sequencing identifies MLL2 mutations as a cause of Kabuki syndrome. Nat Genet 2010; 42: 790–793.
Sobreira NL, Cirulli ET, Avramopoulos D et al: Whole-genome sequencing of a single proband together with linkage analysis identifies a Mendelian disease gene. PLoS Genet 2010; 6: e1000991.
Yang Y, Muzny DM, Reid JG et al: Clinical whole-exome sequencing for the diagnosis of mendelian disorders. N Engl J Med 2013; 369: 1502–1511.
Roach JC, Glusman G, Smit AF et al: Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 2010; 328: 636–639.
Bilguvar K, Ozturk AK, Louvi A et al: Whole-exome sequencing identifies recessive WDR62 mutations in severe brain malformations. Nature 2010; 467: 207–210.
MacArthur DG, Balasubramanian S, Frankish A et al: A systematic survey of loss-of-function variants in human protein-coding genes. Science 2012; 335: 823–828.
MacArthur DG, Tyler-Smith C : Loss-of-function variants in the genomes of healthy humans. Hum Mol Genet 2010; 19: R125–R130.
Exome Variant Server, NHLBI GO Exome Sequencing Project (ESP), Seattle, WA, USA. URL http://evs.gs.washington.edu/EVS/. (accessed August 2013).
MacArthur DG, Manolio TA, Dimmock DP et al: Guidelines for investigating causality of sequence variants in human disease. Nature 2014; 508: 469–476.
Jakobsson M, Edge MD, Rosenberg NA : The relationship between F(ST) and the frequency of the most frequent allele. Genetics 2013; 193: 515–528.
Bhatia G, Patterson N, Sankararaman S, Price AL : Estimating and interpreting FST: the impact of rare variants. Genome Res 2013; 23: 1514–1521.
Novembre J, Johnson T, Bryc K et al: Genes mirror geography within Europe. Nature 2008; 456: 98–101.
Yang WY, Novembre J, Eskin E, Halperin E : A model-based approach for analysis of spatial structure in genetic data. Nat Genet 2012; 44: 725–731.
Mathieson I, McVean G : Differential confounding of rare and common variants in spatially structured populations. Nat Genet 2012; 44: 243–246.
Fu W, O'Connor TD, Jun G et al: Analysis of 6,515 exomes reveals the recent origin of most human protein-coding variants. Nature 2013; 493: 216–220.
Abecasis GR, Auton A, Brooks LD et al: An integrated map of genetic variation from 1,092 human genomes. Nature 2012; 491: 56–65.
Gravel S, Henn BM, Gutenkunst RN et al: Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci USA 2011; 108: 11983–11988.
Tennessen JA, Bigham AW, O'Connor TD et al: Evolution and functional impact of rare coding variation from deep sequencing of human exomes. Science 2012; 337: 64–69.
Myles S, Davison D, Barrett J, Stoneking M, Timpson N : Worldwide population differentiation at disease-associated SNPs. BMC Med Genomics 2008; 1: 22.
Zawistowski M, Reppell M, Wegmann D et al: Analysis of rare variant population structure in Europeans explains differential stratification of gene-based tests. Eur J Hum Genet 2014; 22: 1137–1144.
Nelson MR, Wegmann D, Ehm MG et al: An abundance of rare functional variants in 202 drug target genes sequenced in 14 002 people. Science 2012; 337: 100–104.
Li JZ, Absher DM, Tang H et al: Worldwide human relationships inferred from genome-wide patterns of variation. Science 2008; 319: 1100–1104.
Rosenberg NA, Pritchard JK, Weber JL et al: Genetic structure of human populations. Science 2002; 298: 2381–2385.
Moore CB, Wallace JR, Wolfe DJ et al: Low frequency variants, collapsed based on biological knowledge, uncover complexity of population stratification in 1000 genomes project data. PLoS Genet 2013; 9: e1003959.
Wang X, Zhu X, Qin H et al: Adjustment for local ancestry in genetic association analysis of admixed populations. Bioinformatics 2011; 27: 670–677.
Landrum MJ, Lee JM, Riley GR et al: ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res 2014; 42: D980–D985.
Choi M, Scholl UI, Ji W et al: Genetic diagnosis by whole exome capture and massively parallel DNA sequencing. Proc Natl Acad Sci USA 2009; 106: 19096–19101.
Baran Y, Pasaniuc B, Sankararaman S et al: Fast and accurate inference of local ancestry in Latino populations. Bioinformatics 2012; 28: 1359–1367.
Churchhouse C, Marchini J : Multiway admixture deconvolution using phased or unphased ancestral panels. Genet Epidemiol 2013; 37: 1–12.
Maples BK, Gravel S, Kenny EE, Bustamante CD : RFMix: a discriminative modeling approach for rapid and robust local-ancestry inference. Am J Hum Genet 2013; 93: 278–288.
Price AL, Tandon A, Patterson N et al: Sensitive detection of chromosomal segments of distinct ancestry in admixed populations. PLoS Genet 2009; 5: e1000519.
Liu X, Jian X, Boerwinkle E : dbNSFP v2.0: a database of human non-synonymous SNvs and their functional predictions and annotations. Hum Mutat 2013; 34: E2393–E2402.
Kimura M : The neutral theory of molecular evolution. Sci Am 1979; 241: 98–100, 102, 108 passim.
Kimura M : The Neutral Theory Of Molecular Evolution. Cambridge Cambridgeshire; New York: Cambridge University Press, 1983.
Polanski A, Kimmel M : New explicit expressions for relative frequencies of single-nucleotide polymorphisms with application to statistical inference on population growth. Genetics 2003; 165: 427–436.
Marth GT, Czabarka E, Murvai J, Sherry ST : The allele frequency spectrum in genome-wide human variation data reveals signals of differential demographic history in three large world populations. Genetics 2004; 166: 351–372.
Boyko AR, Williamson SH, Indap AR et al: Assessing the evolutionary impact of amino acid mutations in the human genome. PLoS Genet 2008; 4: e1000083.
Wang K, Li M, Hakonarson H : ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 2010; 38: e164.
Casals F, Hodgkinson A, Hussin J et al: Whole-exome sequencing reveals a rapid change in the frequency of rare functional variants in a founding population of humans. PLoS Genet 2013; 9: e1003815.
Muddyman D, Smee C, Griffin H, Kaye J : Implementing a successful data-management framework: the UK10K managed access model. Genome Med 2013; 5: 100.
Cheung KH, Miller PL, Kidd JR, Kidd KK, Osier MV, Pakstis AJ : ALFRED: a Web-accessible allele frequency database. Pac Symp Biocomput 2000; 639–650.
Shifman S, Darvasi A : The value of isolated populations. Nat Genet 2001; 28: 309–310.
Lim ET, Wurtz P, Havulinna AS et al: Distribution and medical impact of loss-of-function variants in the Finnish founder population. PLoS Genet 2014; 10: e1004494.
Glusman G, Caballero J, Mauldin DE, Hood L, Roach JC : Kaviar: an accessible system for testing SNV novelty. Bioinformatics 2011; 27: 3216–3217.
Acknowledgements
This work is supported in part by the National Institutes of Health (R03-CA162200, R01-GM053275 to BP and T32-HG002536 to RB). AE and SFN are supported by the Genomics/Informatics Core of the UCLA MUSCULAR DYSTROPHY CORE CENTER from NIAMS (P30AR057230). BR is a fellow of the Branco Weiss Foundation and an A∗STAR and EMBO Young Investigator. This work was funded by a Strategic Positioning Fund for Genetic Orphan Diseases and an inaugural A∗STAR Investigatorship from the Agency for Science, Technology and Research in Singapore. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. Web resources: We provide publicly available software implementing our approach: http://bogdan.bioinformatics.ucla.edu/software/. Figure 1 was made on this website with Human Genome Diversity Panel data: http://hgdp.uchicago.edu/cgi-bin/gbrowse/HGDP/. 1000 Genomes local ancestry calls are available at: http://www.1000genomes.org/phase1-analysis-results-directory.
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies this paper on European Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Brown, R., Lee, H., Eskin, A. et al. Leveraging ancestry to improve causal variant identification in exome sequencing for monogenic disorders. Eur J Hum Genet 24, 113–119 (2016). https://doi.org/10.1038/ejhg.2015.68
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ejhg.2015.68
This article is cited by
-
SNVstory: inferring genetic ancestry from genome sequencing data
BMC Bioinformatics (2024)