Abstract
To identify the gene underlying a human genetic disorder can be difficult and time-consuming. Typically, positional data delimit a chromosomal region that contains between 20 and 200 genes. The choice then lies between sequencing large numbers of genes, or setting priorities by combining positional data with available expression and phenotype data, contained in different internet databases. This process of examining positional candidates for possible functional clues may be performed in many different ways, depending on the investigator's knowledge and experience. Here, we report on a new tool called the GeneSeeker, which gathers and combines positional data and expression/phenotypic data in an automated way from nine different web-based databases. This results in a quick overview of interesting candidate genes in the region of interest. The GeneSeeker system is built in a modular fashion allowing for easy addition or removal of databases if required. Databases are searched directly through the web, which obviates the need for data warehousing. In order to evaluate the GeneSeeker tool, we analysed syndromes with known genesis. For each of 10 syndromes the GeneSeeker programme generated a shortlist that contained a significantly reduced number of candidate genes from the critical region, yet still contained the causative gene. On average, a list of 163 genes based on position alone was reduced to a more manageable list of 22 genes based on position and expression or phenotype information. We are currently expanding the tool by adding other databases. The GeneSeeker is available via the web-interface (http://www.cmbi.kun.nl/GeneSeeker/).
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Letovsky SI, Cottingham RW, Porter CJ & Li PW : GDB: the Human Genome Database. Nucleic Acids Res 1998; 26: 94–99.
Blake JA, Eppig JT, Richardson JE, Bult CJ & Kadin JA : The Mouse Genome Database (MGD): integration nexus for the laboratory mouse. Nucleic Acids Res 2001; 29: 91–94.
Bairoch A & Apweiler R : The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res 2000; 28: 45–48.
Ringwald M, Eppig JT & Begley DA et al: The Mouse Gene Expression Database (GXD). Nucleic Acids Res 2001; 29: 98–101.
Woychik RP, Wassom JS, Kingsbury D & Jacobson DA : TBASE: a computerized database for transgenic animals and targeted mutations. Nature 1993; 363: 375–376.
Stern AM, Gall Jr JC, Perry BL, Stimson CW, Weitkamp LR & Poznanski AK : The hand-food-uterus syndrome: a new hereditary disorder characterized by hand and foot dysplasia, dermatoglyphic abnormalities, and partial duplication of the female genital tract. J Pediatr 1970; 77: 109–116.
van Steensel MA, Celli J, van Bokhoven JH & Brunner HG : Probing the gene expression database for candidate genes. Eur J Hum Genet 1999; 7: 910–919.
Propping P & Zerres K : ADULT-syndrome: an autosomal-dominant disorder with pigment anomalies, ectrodactyly, nail dysplasia, and hypodontia. Am J Med Genet 1993; 45: 642–648.
Jamieson CR, van der Burgt I & Brady AF et al: Mapping a gene for Noonan syndrome to the long arm of chromosome 12. Nat Genet 1994; 8: 357–360.
Tartaglia M, Mehler EL & Goldberg R et al: Mutations in PTPN11, encoding the protein tyrosine phosphatase SHP-2, cause Noonan syndrome. Nat Genet 2001; 29: 465–468.
Amiel J, Bougeard G & Francannet C et al: TP63 gene mutation in ADULT syndrome. Eur J Hum Genet 2001; 9: 642–645.
Dryja TP : Gene-based approach to human gene-phenotype correlations. Proc Natl Acad Sci USA 1997; 94: 12117–12121.
den Hollander AI, van Driel MA & de Kok YJ et al: Isolation and mapping of novel candidate genes for retinal disorders using suppression subtractive hybridization. Genomics 1999; 58: 240–249.
Blackshaw S, Fraioli RE, Furukawa T & Cepko CL : Comprehensive analysis of photoreceptor gene expression and the identification of candidate retinal disease genes. Cell 2001; 107: 579–589.
van Bokhoven H, Celli J & Kayserili H et al: Mutation of the gene encoding the ROR2 tyrosine kinase causes autosomal recessive Robinow syndrome. Nat Genet 2000; 25: 423–426.
Afzal AR, Rajab A & Fenske CD et al: Recessive Robinow syndrome, allelic to dominant brachydactyly type B, is caused by mutation of ROR2. Nat Genet 2000; 25: 419–422.
van Steensel MA & Winter RM : Internet databases for clinical geneticists an overview. Clin Genet 1998; 53: 323–330.
Banfi S, Borsani G & Rossi E et al: Identification and mapping of human cDNAs homologous to Drosophila mutant genes through EST database searching. Nat Genet 1996; 13: 167–174.
Venter JC, Adams MD & Myers EW et al: The sequence of the human genome. Science 2001; 291: 1304–1351.
Lander ES, Linton LM & Birren B et al: Initial sequencing and analysis of the human genome. Nature 2001; 409: 860–921.
Davidson D, Bard J, Kaufman M & Baldock RA : The Mouse Atlas Database: a community resource for mouse development. Trends in Genetics 2001; 17: 49–51.
Schuler GD, Boguski MS & Stewart EA et al: A gene map of the human genome. Science 1996; 274: 540–546.
Kanehisa M, Goto S, Kawashima S & Nakaya A : The KEGG databases at GenomeNet. Nucleic Acids Res 2002; 30: 42–46.
Brunner H, Cuelenaere K, Kemmeren P, van Driel MA & Leunissen JAM : The Genemachine: A tool for the extraction and integration of information from web-based genetic databases. Eur J Hum Genet 2000; 8: 130
Makalowska I, Ryan JF & Baxevanis AD : GeneMachine: gene prediction and sequence annotation. Bioinformatics 2001; 17: 843–844.
Acknowledgements
Partly supported by grants from N.W.O./Unilever (grant number 326756 to JAM Leunissen) and from the Irene kinderziekenhuis Foundation (to HG Brunner).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
van Driel, M., Cuelenaere, K., Kemmeren, P. et al. A new web-based data mining tool for the identification of candidate genes for human genetic disorders. Eur J Hum Genet 11, 57–63 (2003). https://doi.org/10.1038/sj.ejhg.5200918
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/sj.ejhg.5200918
Keywords
This article is cited by
-
Deafness gene screening based on a multilevel cascaded BPNN model
BMC Bioinformatics (2023)
-
DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases
BMC Systems Biology (2011)
-
Integrating multiple protein-protein interaction networks to prioritize disease genes: a Bayesian regression approach
BMC Bioinformatics (2011)
-
Identification of Parkinson’s disease candidate genes using CAESAR and screening of MAPT and SNCAIP in South African Parkinson’s disease patients
Journal of Neural Transmission (2011)