Abstract
The choice of genotyping families vs unrelated individuals is a critical factor in any large-scale linkage disequilibrium (LD) study. The use of unrelated individuals for such studies is promising, but in contrast to family designs, unrelated samples do not facilitate detection of genotyping errors, which have been shown to be of great importance for LD and linkage studies and may be even more important in genotyping collaborations across laboratories. Here we employ some of the most commonly-used analysis methods to examine the relative accuracy of haplotype estimation using families vs unrelateds in the presence of genotyping error. The results suggest that even slight amounts of genotyping error can significantly decrease haplotype frequency and reconstruction accuracy, that the ability to detect such errors in large families is essential when the number/complexity of haplotypes is high (low LD/common alleles). In contrast, in situations of low haplotype complexity (high LD and/or many rare alleles) unrelated individuals offer such a high degree of accuracy that there is little reason for less efficient family designs. Moreover, parent-child trios, which comprise the most popular family design and the most efficient in terms of the number of founder chromosomes per genotype but which contain little information for error detection, offer little or no gain over unrelated samples in nearly all cases, and thus do not seem a useful sampling compromise between unrelated individuals and large families. The implications of these results are discussed in the context of large-scale LD mapping projects such as the proposed genome-wide haplotype map.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Adam D . Genetics group targets disease markers in the human sequence Nature 2001 412: 105
Robertson D . Racially defined haplotype project debated Nat Biotechnol 2001 19: 795–796
Jeffreys AJ, Kauppi L, Neumann R . Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex Nat Genet 2001 29: 217–222
Jeffreys AJ, Ritchie A, Neumann R . High resolution analysis of haplotype diversity and meiotic crossover in the human TAP2 recombination hotspot Hum Mol Genet 2000 9: 725–733
Daly MJ, Rioux JD, Schaffner SF, Hudson TJ, Lander ES . High-resolution haplotype structure in the human genome Nat Genet 2001 29: 229–232
Taillon-Miller P, Bauer-Sardina I, Saccone NL et al. Juxtaposed regions of extensive and minimal linkage disequilibrium in human Xq25 and Xq28 Nat Genet 2000 25: 324–328
Eaves IA, Merriman TR, Barber RA et al. The genetically isolated populations of Finland and Sardinia may not be a panacea for linkage disequilibrium mapping of common disease genes Nat Genet 2000 25: 320–323
Abecasis GR, Noguchi E, Heinzmann A et al. Extent and distribution of linkage disequilibrium in three genomic regions Am J Hum Genet 2001 68: 191–197
Reich DE, Cargill M, Bolk S et al. Linkage disequilibrium in the human genome Nature 2001 411: 199–204
Douglas JA, Boehnke M, Gillanders E, Trent JM, Gruber SB . Experimentally-derived haplotypes substantially increase the efficiency of linkage disequilibrium studies Nat Genet 2001 28: 361–364
Michalatos-Beloin S, Tishkoff SA, Bentley KL, Kidd KK, Ruano G . Molecular haplotyping of genetic markers 10 kb apart by allele-specific long-range PCR Nucleic Acids Res 1996 24: 4841–4843
Clark AG . Inference of haplotypes from PCR-amplified samples of diploid populations Mol Biol Evol 1990 7: 111–122
Excoffier L, Slatkin M . Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population Mol Biol Evol 1995 12: 921–927
Hawley ME, Kidd KK . HAPLO: a program using the EM algorithm to estimate the frequencies of multi-site haplotypes J Hered 1995 86: 409–411
Long JC, Williams RC, Urbanek M . An E-M algorithm and testing strategy for multiple-locus haplotypes Am J Hum Genet 1995 56: 799–810
Stephens M, Smith NJ, Donnelly P . A new statistical method for haplotype reconstruction from population data Am J Hum Genet 2001 68: 978–989
Fallin D, Schork NJ . Accuracy of haplotype frequency estimation for biallelic loci, via the expectation-maximization algorithm for unphased diploid genotype data Am J Hum Genet 2000 67: 947–959
Tishkoff SA, Pakstis AJ, Ruano G, Kidd KK . The accuracy of statistical methods for estimation of haplotype frequencies: an example from the CD4 locus Am J Hum Genet 2000 67: 518–522
Douglas JA, Boehnke M, Lange KA . multipoint method for detecting genotyping errors and mutations in sibling-pair linkage data Am J Hum Genet 2000 66: 1287–1297
Abecasis GR, Cherny SS, Cardon LR . The impact of genotype error on family-based analysis of quantitative traits Euro J Hum Genet 2001 9: 130–134
Akey JM, Zhang K, Xiong M, Doris P, Jin L . The effect that genotyping errors have on the robustness of common linkage-disequilibrium measures Am J Hum Genet 2001 68: 1447–1456
Gordon D, Leal SM, Heath SC, Ott J . An analytic solution to single nucleotide polymorphism error-detection rates in nuclear families: implications for study design Pac Symp Biocomput 2000 663–674
Lewontin RC, Kojima K . The evolutionary dynamics of complex polymorphisms Evolution 1960 14: 450–472
Sobel E, Papp JC, Lange K . Detection and integration of genotyping errors in statistical genetics Am J Hum Genet 2002 70: 496–508
Lincoln SE, Lander ES . Systematic detection of errors in genetic linkage data Genomics 1992 14: 604–610
Ott J . Detecting marker inconsistencies in human gene mapping Human Heredity 1993 43: 25–30
Weir BS, Cockerham CC . Estimation of linkage disequilibrium in randomly mating populations Heredity 1979 42: 105–111
Lander ES, Green P . Construction of multilocus genetic linkage maps in humans Proc Natl Acad Sci USA 1987 84: 2363–2367
Abecasis GR, Cherny SS, Cookson WO, Cardon LR . Merlin-rapid analysis of dense genetic maps using sparse gene flow trees Nat Genet 2002 30: 97–101
Sachidanandam R, Weissman D, Schmidt SC et al. A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms Nature 2001 409: 928–933
Marth G, Yeh R, Minton M et al. Single-nucleotide polymorphisms in the public domain: how useful are they? Nat Genet 2001 27: 371–372
Gordon D, Heath SC, Ott J . True pedigree errors more frequent than apparent errors for single nucleotide polymorphisms Hum Hered 1999 49: 65–70
Douglas JA, Skol AD, Boehnke M . Probability of detection of genotyping errors and mutations as inheritance inconsistencies in nuclear-family data Am J Hum Genet 2002 70: 487–495
Abecasis GR, Cookson WO . GOLD graphical overview of linkage disequilibrium Bioinformatics 2000 16: 182–183
Acknowledgements
This research project was supported by a Wellcome Trust Travel Award to KM Kirk. LR Cardon was supported by a Wellcome Trust Principal Research Fellowship and by NIH grant EY-12562. Lander-Green haplotype estimation was conducted using MERLIN29 (http://bioinformatics.well.ox.ac.uk/Merlin/). EM-based estimates were conducted using ldmax in the GOLD package (www.well.ox.ac.uk/asthma/GOLD/)34 and the SNPHAP program kindly provided by Dr David Clayton, Cambridge University (www-gene.cimr.cam.ac.uk/clayton/software/). Haplotype reconstruction using the EM/MCMC method of Stephens et al16 was performed using PHASE (www.stats.ox.ac.uk/mathgen/software.html).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kirk, K., Cardon, L. The impact of genotyping error on haplotype reconstruction and frequency estimation. Eur J Hum Genet 10, 616–622 (2002). https://doi.org/10.1038/sj.ejhg.5200855
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/sj.ejhg.5200855
Keywords
This article is cited by
-
Marker genotyping error effects on genomic predictions under different genetic architectures
Molecular Genetics and Genomics (2021)
-
Recombination locations and rates in beef cattle assessed from parent-offspring pairs
Genetics Selection Evolution (2014)
-
The catechol-O-methyl transferase (COMT) gene as a candidate for psychiatric phenotypes: evidence and lessons
Molecular Psychiatry (2006)
-
Optimal genotype determination in highly multiplexed SNP data
European Journal of Human Genetics (2006)
-
A likelihood-based method for haplotype association studies of case-control data with genotyping uncertainty
Science in China Series A (2006)


