Abstract
An association study is a popular study design to identify susceptibility genes for common complex diseases. In such a study, the presence of inappropriate samples, such as those derived from close relatives or showing DNA contamination, causes an inflation of type I error or a decrease of power. Here we propose an identity-by-state (IBS)-based detection method of inappropriate samples taking linkage disequilibrium (LD) into consideration. The test statistics is the mean of the proportion of alleles that are shared identical by state at each single nucleotide polymorphism (SNP) between each sample pair in an association study. A covariance of the number of shared alleles between two SNPs is introduced to consider LD. We show that type I error and power are estimated accurately in computer-simulated data, and that if the number of SNPs analyzed is small, the performance of detection of inappropriate samples is superior to the previous method in simulated LD. An application to real association study data showed that accuracy in estimating the distribution of test statistics improved if LD was considered. Sample pairs considered to be siblings were detected. These results suggested that an LD-considered IBS-based detection method is useful in identifying inappropriate samples in an association study.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Balding, D. J., Bishop, M & Cannings, C. Handbooks of Statistical Genetics 3rd edn. (Wiley, Chichester, 2007).
Wright, A. F. & Hastie, N. D. Complex genetic diseases: controversy over the Croesus code. Genome Biol. 2, COMMENT2007.1–COMMENT2007.8. (2001).
Voight, B. F. & Pritchard, J. K. Confounding from cryptic relatedness in case–control association studies. PLoS Genet. 1, e32 (2005).
Wenk, R. E., Traver, M. & Chiafari, F. A. Determination of sibship in any two persons. Transfusion 36, 259–262 (1996).
Ehm, M. G. & Wagner, M. A test statistic to detect errors in sib-pair relationship. Am. J. Hum. Genet. 62, 181–188 (1998).
Zhang, B. & Betensky, R. A. Methods to classify familial relationships in the presence of laboratory errors, without parental data. Hum. Genet. 119, 642–648 (2006).
Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D. et al. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007).
Excoffier, L. & Slatkin, M. Maximum-likelihood estimation of molecular haplotype frequencies in a diploid population. Mol. Biol. Evol. 12, 921–927 (1995).
Giovanni, M. HapSim: a simulation tool for generating haplotype data with pre-specified allele frequencies and LD coefficients. Bioinformatics 21, 4309–4311 (2005).
Study Group of Millennium Genome Project for Cancer, Sakamoto, H. Yoshimura, K. Saeki, N. Katai, H., Shimoda, T. et al. Genetic variation in PSCA is associated with susceptibility to diffuse-type gastric cancer. Nat. Genet. 40, 730–740 (2008).
Acknowledgements
We thank K Yoshimura, Shumpei Ohnami, Sumiko Ohnami, N Saeki, A Kuchiba, H Totsuka, A Saito, S Chiku at the National Cancer Center for their valuable advice. This study was supported by the Program for the Promotion of Fundamental Studies in Health Sciences from the National Institute of Biomedical Innovation.
Author information
Authors and Affiliations
Corresponding author
Additional information
Supplementary Information accompanies the paper on Journal of Human Genetics website
Rights and permissions
About this article
Cite this article
Andoh, M., Sato, Y., Sakamoto, H. et al. Detection of inappropriate samples in association studies by an IBS-based method considering linkage disequilibrium between genetic markers. J Hum Genet 55, 436–440 (2010). https://doi.org/10.1038/jhg.2010.43
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/jhg.2010.43