Abstract
We sequenced the genome of a Pakistani male at 25.5x coverage using massively parallel sequencing technology. More than 90% of the sequence reads were mapped to the human reference genome. In subsequent analysis, we identified 3 224 311 single-nucleotide polymorphisms (SNPs), of which 388 532 (12% of the total SNPs) had not been previously recorded in single nucleotide polymorphism database (dbSNP) or the 1000 Genomes Project database. The 5991 non-synonymous coding variants were screened for deleterious or disease-associated SNPs. Analysis of genes with deleterious SNPs identified ‘retinoic acid signaling’ and ‘regulation of transcription’ as the enriched Gene Ontology terms. Scanning of non-synonymous SNPs against the OMIM revealed several disease and phenotype-associated variants in Pakistani genome. Comparative analysis with Indian genome sequence revealed >1.8 million shared SNPs; 32% of which were annotated in ∼14 000 genes. Gene Ontology (GO) terms analysis of these genes identified ‘response to jasmonic acid stimulus’, ‘aminoglycoside antibiotic metabolic process’ and ‘glycoside metabolic process’ with considerable enrichment. A total of 59 558 of small indels (1–5 bp) and 16 063 large structural variations were found; 54% of which was novel. Substantial number of novel structural variations discovered in Pakistani genome enforced previous inferences that (a) structural variations are major type of variation in the genome and (b) compared with SNPs, they putatively exhibit equivalent or superior functional roles. This genome sequence information will be an important reference for population-wide genomics studies of ethnically diverse South Asian subcontinent.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Ayub, Q. & Tyler-Smith, C. Genetic variation in South Asia: assessing the influences of geography, language and ethnicity for understanding history and disease risk. Brief. Funct. Genomic. Proteomic. 8, 395–404 (2009).
The HUGO Pan-Asian SNP Consortium Abdulla, M. A., Ahmed, I., Assawamakin, A., Bhak, J., Brahmachari, S. K., Calacal, G. C. et al. Mapping Human Genetic Diversity in Asia. Science 326, 1541–1545 (2009).
Hussain, J. A history of the peoples of pakistan towards independence, (Oxford University Press: Karachi, Pakistan, 1997).
Reich, D., Thangaraj, K., Patterson, N., Price, A. L. & Singh, L. Reconstructing Indian Population History. Nature 461, 489–494 (2009).
Firasat, S., Khaliq, S., Mohyuddin, A., Papaioannou, M., Tyler-Smith, C., Underhill, P. A. et al. Y-chromosomal evidence for a limited Greek contribution to the Pathan population of Pakistan. Eur. J. Hum. Genet. 15, 121–126 (2007).
Siddiqi, S., Mansoor, A., Usman, S., Nasir, M., Khan, K. M. & Qamar, R. Characterization of Y-chromosomal short tandem repeat markers in Pakistani populations. Genet. Test. Mol. Biomarkers 15, 165–172 (2011).
Mohyuddin, A., Ayub, Q., Underhill, P. A., Tyler-Smith, C. & Mehdi, S. Q. Detection of novel Y SNPs provides further insights into Y chromosomal variation in Pakistan. J. Hum. Genet. 51, 375–378 (2006).
Mansoor, A., Mazhar, K., Khaliq, S., Hameed, A., Rehman, S., Siddiqi, S. et al. Investigation of the Greek ancestry of populations from northern Pakistan. Hum. Genet. 114, 484–490 (2004).
Mohyuddin, A., Williams, F., Mansoor, A., Mehdi, S. Q. & Middleton, D. Distribution of HLA-A alleles in eight ethnic groups from Pakistan. Tissue Antigens 61, 286–291 (2003).
Qamar, R., Ayub, Q., Mohyuddin, A., Helgason, A., Mazhar, K., Mansoor, A. et al. Y-chromosomal DNA variation in Pakistan. Am. J. Hum. Genet. 70, 1107–1124 (2002).
Zerjal, T., Xue, Y., Bertorelle, G., Wells, R. S., Bao, W., Zhu, S. et al. The genetic legacy of the Mongols. Am. J. Hum. Genet. 72, 717–721 (2003).
Tucker, T., Marra, M. & Friedman, J. M. Massively parallel sequencing: the next big thing in genetic medicine. Am. J. Hum. Genet. 85, 142–154 (2009).
Li, B. & Leal, S. M. Methods for detecting associations with rare variants for common diseases: application to analysis of sequence data. Am. J. Hum. Genet 83, 311–321 (2008).
Frazer, K. A., Murray, S. S., Schork, N. J. & Topol, E. J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).
DePristo, M. A., Banks, E., Poplin, R., Garimella, K. V., Maguire, J. R., Hartl, C. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Li, H. & Durbin, R. Inference of human population history from individual whole-genome sequences. Nature 475, 493–496 (2011).
Durbin, R. M., Abecasis, G. R., Altshuler, D. L., Auton, A., Brooks, L. D., Gibbs, R. A. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Drmanac, R. The advent of personal genome sequencing. Genet. Med. 13, 188–190 (2011).
Kitzman, J. O., Mackenzie, A. P., Adey, A., Hiatt, J. B., Patwardhan, R. P., Sudmant, P. H. et al. Haplotype-resolved genome sequencing of a Gujarati Indian individual. Nat. Biotechnol. 29, 59–63 (2011).
Li, R., Li, Y., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Kumar, P., Henikoff, S. & Ng, P. C. Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm. Nat. Protoc. 4, 1073–1081 (2009).
Eden, E., Navon, R., Steinfeld, I., Lipson, D. & Yakhini, Z. GOrilla: a tool for discovery and visualization of enriched GO terms. BMC Bioinformatics 10, 48 (2009).
Li, R., Li, Y., Fang, X., Yang, H., Wang, J., Kristiansen, K. et al. SNP detection for massively parallel whole-genome resequencing. Genome Res. 19, 1124–1132 (2009).
Hamosh, A., Scott, A. F., Amberger, J. S., Bocchini, C. A. & McKusick, V. A. Online Mendelian Inheritance in Man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res. 33, D514–D517 (2005).
Campbell, P. J., Stephens, P. J., Pleasance, E. D., O'Meara, S., Li, H., Santarius, T. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Wang, J., Wang, W., Li, R., Li, Y., Tian, G., Goodman, L. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Coop, G., Pickrell, J. K., Novembre, J., Kudaravalli, S., Li, J., Absher, D. et al. The role of geography in human adaptation. PLoS. Genet. 5, e1000500 (2009).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Author contributions
Conceived and designed the experiments: MKA, MIC, YZ. Performed the experiments: XS, RL, HA. Analyzed the data: MKA, CY, ZY, AK. Wrote the paper: MKA, CY, ZY, AK, YZ.
Supplementary Information accompanies the paper on Journal of Human Genetics website
Rights and permissions
About this article
Cite this article
Azim, M., Yang, C., Yan, Z. et al. Complete genome sequencing and variant analysis of a Pakistani individual. J Hum Genet 58, 622–626 (2013). https://doi.org/10.1038/jhg.2013.72
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/jhg.2013.72
Keywords
This article is cited by
-
Whole genome sequencing data of multiple individuals of Pakistani descent
Scientific Data (2020)
-
Whole genome analysis of a Vietnamese trio
Journal of Biosciences (2015)


