Abstract
An algorithm for estimating haplotypes associated with several quantitative phenotypes is proposed. The concept of a receiver operating characteristic (ROC) curve was introduced, and a linear combination of the quantitative phenotypic values was considered. This set of values was divided into two parts: values for subjects with and without a particular haplotype. The goodness of its partition was evaluated by the area under the ROC curve (AUC). The AUC value varied from 0 to 1; this value was close to 1 when the partition had high accuracy. Therefore, the strength of association between phenotypes and haplotypes was considered to be proportional to the AUC value. In our algorithm, the parameters representing a degree of association between the haplotypes and phenotypes were estimated so as to maximize the AUC value; further, the haplotype with the maximum AUC value was considered to be the best haplotype associated with the phenotypes. This algorithm was implemented by using R language. The effectiveness of our algorithm was evaluated by applying it to real genotype data of the Calpine-10 gene obtained from diabetics. The results showed that our algorithm was more reasonable and advantageous for use with several quantitative phenotypes than the generalized linear model or the neural network model.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Akaike J (1973) Information theory and an extension of the maximum likelihood principle. In: The 2nd International symposium on information theory. Tsahkadsor, Armenian SSR, Hungary, 2–8 September 1971, pp 267–281
Chambers JM, Hastie TJ (eds) (1992) Statistical models. Wadsworth and Brooks Gale, Pacific Grove, CA
Copas JB, Corbett P (2002) Overestimation of the receiver operating characteristic curve for logistic regression. Biometrika 89:315–331
Cortes C, Mohri M (2004) AUC optimization vs. error rate minimization. Advances in neural information processing systems (NIPS 2003). MIT Press, Cambridge, MA
Fisher RA (1918) The correlation between relatives on the supposition of Mendelian inheritance. Trans R Soc Edinburgh 52:399–433
Fisher RA (1936) The use of multiple measurements in taxonomic problems. Ann Eugen 7:179–188
IBM TJ Watson Research Center (2004) Model selection via the AUC. In: Proceedings of the 21st international conference on machine learning, Banff, Canada
Iwasaki N, Horikawa Y, Tsuchiya T, Kitamura Y, Nakamura T, Tanizawa Y, Oka Y, Hara K, Kadowaki T, Awata T, Honda M, Yamashita K, Oda N, Yu L, Yamada N, Ogata M, Kamatani N, Iwamoto Y, Del Bosque-Plata L, Hayes MG, Cox NJ, Bell GI (2005) Genetic variants in the calpain-10 gene and the development of type 2 diabetes in the Japanese population. J Hum Genet 2:92–98
Kitamura Y, Moriguchi M, Kaneko H, Morisaki H, Morisaki T, Toyama K, Kamatani N (2002). Determination of probability distribution of diplotype configuration (diplotype distribution) for each subject from genotypic data using the EM algorithm. Ann Hum Genet 66:183–193
McClish DK (1989) Analyzing a portion of the ROC curve. Med Decis Making 9:190–195
Pepe MS (1997) A regression modelling framework for receiver operating characteristic curve in medical diagnostic testing. Biometrika 84:595–608
Pepe MS (1998) Three approaches for regression analysis of receiver operating characteristic curves for continuous test results. Biometrics 54:124–135
Pepe MS (2000) Interpretation, estimation and regression for ROC curves. Biometrics 56:352–359
Qin J (2000) Using logistic regression procedures for estimating receiver operating characteristic curves. Biometrika 90:585–596
Ransohoff DJ, Feinstein AR (1978) Problems of spectrum and bias in evaluating the efficacy of diagnostic tests. N Engl J Med 299:926–930
Sebastiani P, Ramoni MF, Nolan V, Baldwin CT, Steinberg MH (2005) Genetic dissection and prognostic modeling of overt stroke in sickle cell anemia. Nat Genet 37:435–400
Shibata K, Ito T, Kitamura Y, Iwasaki N, Tanaka H and Kamatani N (2004) Simultaneous estimation of haplotype frequencies and quantitative trait parameters: applications to the test of association between phenotype and diplotype configuration. Genetics 168:525–539
Thomas A, Camp NJ (2004) Graphical modeling of the joint distribution of alleles at associated loci. Am J Hum Genet 74:1088–1101
Toesteson AAN, Begg CB (1988) A general regression methodology for ROC curve estimation. Med Decis Making 8:204–215
Yi N, George V, Allison DB (2003). Stochastic search variable selection for Identifying multiple quantitative trait loci. Genetics 164:1129–1138
Zhou XH, Obuchowski NA, McClish DK (2002) Statistical methods in diagnostic medicine. Wiley, New York
Acknowledgements
This work was supported by grants from New Energy and Industrial Technology Development Organization.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kamitsuji, S., Kamatani, N. Estimation of haplotype associated with several quantitative phenotypes based on maximization of area under a receiver operating characteristic (ROC) curve. J Hum Genet 51, 314–325 (2006). https://doi.org/10.1007/s10038-006-0363-z
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1007/s10038-006-0363-z