Empirical assessment of the validity of the ‘fundamental theorem of the HapMap’ in the light of ‘cryptic’ tagging of multiple susceptibility loci

Bochdanovits, Zoltán; Heutink, Peter; van der Vaart, Aad

doi:10.1038/sj.ejhg.5201984

Short Report
Published: 16 January 2008

Empirical assessment of the validity of the ‘fundamental theorem of the HapMap’ in the light of ‘cryptic’ tagging of multiple susceptibility loci

Zoltán Bochdanovits^1,2,
Peter Heutink² &
Aad van der Vaart³

European Journal of Human Genetics volume 16, pages 525–529 (2008)Cite this article

531 Accesses
2 Citations
Metrics details

Abstract

Underestimation of the sample size needed to detect genetic association may occur as a result of deviations from the ‘fundamental theorem of the HapMap’. A biologically plausible mechanism that might cause this deviation is ‘cryptic’ tagging of multiple susceptibility loci by the same neutral marker. For complex disorders, the existence of multiple susceptibility loci on the same chromosome is probably the rule rather than the exception. Our results show that conditional on the known haplotype structure of the genome the probability that a tagging SNP that is in linkage disequilibrium (LD) with a susceptibility gene is also in LD with another susceptibility gene is not negligible. Consequently, we were able to estimate the extent and the prevalence of the bias in the necessary sample size to find association induced by ‘cryptic’ tagging. In general, the underestimation of the necessary sample size is modest: 5% of all association studies will underestimate the sample size by 5–30%. On the basis of our results, a safe bet is to use a sample that is 10% larger than otherwise deemed necessary.

Multifactorial profiling of epigenetic landscapes at single-cell resolution using MulTI-Tag

Article Open access 31 October 2022

Network-based analysis of key regulatory genes implicated in Type 2 Diabetes Mellitus and Recurrent Miscarriages in Turner Syndrome

Article Open access 21 May 2021

A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci

Article Open access 22 September 2022

References

Gabriel SB, Schaffner SF, Nguyen H et al: The structure of haplotype blocks in the human genome. Science 2002; 296: 2225–2229.
Article CAS Google Scholar
Terwilliger JD, Hiekkalinna T : An utter refutation of the ‘fundamental theorem of the HapMap’. Eur J Hum Genet 2006; 14: 426–437.
Article CAS Google Scholar
Thomas DC, Stram DO : An utter refutation of the ‘fundamental theorem of the HapMap’ by Terwilliger and Hiekkalinna. Eur J Hum Genet 2006; 14: 1238–1239.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Clinical Genetics, Section Medical Genomics, VUMC, Amsterdam, The Netherlands
Zoltán Bochdanovits
Center for Neurogenomics and Cognitive Research, VU/VUMC, Amsterdam, The Netherlands
Zoltán Bochdanovits & Peter Heutink
Department of Mathematics, Section Stochastics, Faculty of Sciences, Vrije Universiteit, Amsterdam, The Netherlands
Aad van der Vaart

Authors

Zoltán Bochdanovits
View author publications
Search author on:PubMed Google Scholar
Peter Heutink
View author publications
Search author on:PubMed Google Scholar
Aad van der Vaart
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Zoltán Bochdanovits.

Appendix

Derivation of formula (4)

For two events A and B, let p_A, p_B and p_AB be the probabilities of A, B and A∩B, and let r_AB be the correlation between the indicators 1_A and 1_B (defined as 1 or 0 whether the event occurs or not), that is,

We assume that the allele at the tagging locus T of a randomly selected individual is conditionally independent of the case–control status Ca given the haplotypes (or genotypes) at the disease loci. Let D denote all possible haplotypic alleles at the joint disease loci, and let D∈D denote the event that a random individual possesses allele D (possibly a multi-locus haplotype). Then, it follows from the general result, listed as Lemma 1 at the end of this section, that

This exhibits the root-noncentrality parameter as a linear combination of the root-noncentrality parameters r_D,Ca of the tests of the 2 × 2 tables that would score case–control status versus causal haplotype, for each haplotypic allele D∈D in turn. In its generality, formula (A.1) is only mildly interesting. However, under special assumptions, it turns into easily interpretable formulas.

As a first application, if there are only two possible alleles at the disease loci, say D and d, then sum (A.1) has two terms, and the products of correlations in the two terms are equal (r_T,Dr_D,Ca=r_T,dr_d,Ca), because both correlations change by a minus sign upon replacing D by d. Then, the formula reduces to the multiplicity (1).

Secondly, we derive (4) from (A.1) under assumptions (2) and (3). It follows from the latter pair of assumptions that

Substitution in formula (A.1) yields

Here E(#D) can be deleted, because and hence is uncorrelated with any variable, and can be rewritten as for D_i the event that an individual has the disease allele at the disease locus i (with the other loci unspecified). Thus, we obtain

Next we eliminate β from this formula by expressing this in the correlations . We have

for E(#D∣D_i) the expected total number of disease alleles in an arbitrary individual carrying the disease allele at locus i. Combining this with formula (3) for the prevalence, we see

Solving for β and substituting the solution in (A.2), we find that

The total number of disease alleles in a random individual can be written (The curious first equality is a consequence of our abuse of notation: as a random variable the total number of disease alleles #D in an arbitrary individual is denoted by #D if the event D occurs.) This gives

Thus,

We conclude the derivation of (4) by substituting this in (A.3).

Lemma 1. If events A and B are conditionally independent given a partition D of the outcome space, then

Proof Because A and B are conditionally independent, they are conditionally uncorrelated, that is cov(1_A,1_BD)=0 almost surely. Therefore, the usual conditioning rule for covariances gives

Here on the event D the variable E(1_AD)−E1_A is equal to

Substituting this and the corresponding formula for P(BD)−P(B) in the preceding display gives

This can be rearranged to give the assertion.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bochdanovits, Z., Heutink, P. & van der Vaart, A. Empirical assessment of the validity of the ‘fundamental theorem of the HapMap’ in the light of ‘cryptic’ tagging of multiple susceptibility loci. Eur J Hum Genet 16, 525–529 (2008). https://doi.org/10.1038/sj.ejhg.5201984

Download citation

Received: 20 July 2007
Revised: 13 November 2007
Accepted: 20 November 2007
Published: 16 January 2008
Issue date: April 2008
DOI: https://doi.org/10.1038/sj.ejhg.5201984

Keywords

This article is cited by

Joint reanalysis of 29 correlated SNPs supports the role of PCLO/Piccolo as a causal risk factor for major depressive disorder
- Z Bochdanovits
- M Verhage
- P Heutink
Molecular Psychiatry (2009)

Empirical assessment of the validity of the ‘fundamental theorem of the HapMap’ in the light of ‘cryptic’ tagging of multiple susceptibility loci

Abstract

Similar content being viewed by others

Multifactorial profiling of epigenetic landscapes at single-cell resolution using MulTI-Tag

Network-based analysis of key regulatory genes implicated in Type 2 Diabetes Mellitus and Recurrent Miscarriages in Turner Syndrome

A machine learning-based SNP-set analysis approach for identifying disease-associated susceptibility loci

Log in or create a free account to read this content

References