Abstract
We propose an algorithm for testing association using structured multilocus genotype data. The algorithm implements the clustering of the data by a hierarchical clustering technique and a k-means algorithm. After clustering, the program analyzes all the clusters together using the Mantel–Haenszel (MH) test, by which common associations in the clusters are examined. To use the MH test, the number of subpopulations has to be determined. A method of cross-validation (CV) and the k-means algorithm are applied for estimating the number of subpopulations. The algorithm described was implemented in the computer program POPSTRUCT. In the simulation study, we found that when the two groups with different marker allele frequencies were combined, an inflation of the type I errors was observed. The inflation was more marked when the differences in the marker allele frequencies were larger, the difference in the minor allele frequencies at the disease locus was larger, and the genotype relative risk associated with the disease locus was higher. Our simulation study indicated that the MH test was efficient for decreasing type I errors and increasing the power compared with any test performed on each cluster. Then, we compared the results of STRUCTURE, a model-based method, and POPSTRUCT, a distance-based method. When two subgroups with different allele frequencies were mixed together at a high fixed ratio, POPSTRUCT was superior to STRUCTURE in classifying the combined population into the accurate clusters, each of which reflects one of the original groups.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL (1994) High resolution human evolutionary trees with polymorphic microsatellites. Nature 368:455–457
Cochran WG (1954) Some methods for strengthening the common χ2 test. Biometrics 10:417–451
Devlin B, Roeder K (1999) Genomic control for association studies. Biometrics 55:997–1004
Efron B, Tibshirani RJ (1993) An introduction to the bootstrap. Chapman & Hall, New York
Everitt BS (1993) Cluster analysis, 3rd edn. Edward Arnold, New York
Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587
Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, Mckeigue PM (2003) Control of confounding of genetic associations in stratified populations. Am J Hum Genet 72:1492–1504
Mantel N, Haenszel W (1959) Statistical aspect of the analysis of data from retrospective studies of disease. J Natl Cancer Inst 22:719–748
Pritchard JK, Rosenberg NA (1999) Use of unlinked genetic markers to detect population stratification in association studies. Am J Hum Genet 65:220–228
Pritchard JK, Stephens M, Donelly P (2000a) Inference of population structure using multilocus genotype data. Am J Hum Genet 67:945–959
Pritchard JK, Stephens M, Rosenberg NA, Donelly P (2000b) Association mapping in structured population. Am J Hum Genet 67:170–181
Satten GA, Flanders WD, Yang Q (2001) Accounting for unmeasured population substructure in case-control studies of genetic association using novel latent-class model. Am J Hum Genet 68:466–477
Spielman RS, McGinnis RE, Ewens WJ (1993) Transmission test for linkage disequilibrium: the insulin gene region and insulin-dependent diabetes mellitus (IDDM). Am J Hum Genet 52:506–513
Stone M (1974) Cross-validation and multinomial prediction. Biometrika 51:509–515
Stone M (1977) An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. J R Stat Soc B 39:44–47
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Nakamura, T., Shoji, A., Fujisawa, H. et al. Cluster analysis and association study of structured multilocus genotype data. J Hum Genet 50, 53–61 (2005). https://doi.org/10.1007/s10038-004-0220-x
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1007/s10038-004-0220-x


