Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

Abstract

Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1–50 kb) including insertions, deletions, inversions and their precise breakpoints, and in contrast to other methods, can resolve complex rearrangements. In total, we identified 277,243 SVs ranging in length from 1–23 kb. Validation using computational and experimental methods suggests that we achieve overall <6% false-positive rate and <10% false-negative rate in genomic regions that can be assembled, which outperforms other methods. Analysis of the SVs in the genomes of 106 individuals sequenced as part of the 1000 Genomes Project suggests that SVs account for a greater fraction of the diversity between individuals than do single-nucleotide polymorphisms (SNPs). These findings demonstrate that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Mapping structural variation using whole-genome de novo assembly.
Figure 2: Simulation details.
Figure 3: Canonical structural variation profiles of genes and Alu elements in YH (red) and NA18507 (blue) genomes.
Figure 4: Selection pattern of structural variations.

Similar content being viewed by others

Accession codes

Accessions

GenBank/EMBL/DDBJ

Sequence Read Archive

References

  1. Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).

    Article  CAS  PubMed  Google Scholar 

  2. Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).

    Article  CAS  PubMed  Google Scholar 

  3. International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).

  4. Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).

    Article  PubMed  CAS  Google Scholar 

  5. Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet. 37, 129–137 (2005).

    Article  PubMed  CAS  Google Scholar 

  6. Ben-Shachar, S. et al. 22q11.2 distal deletion: a recurrent genomic disorder distinct from DiGeorge syndrome and velocardiofacial syndrome. Am. J. Hum. Genet. 82, 214–221 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  7. Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  8. The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).

  9. Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).

    Article  PubMed  CAS  Google Scholar 

  10. Frazer, K.A., Murray, S.S., Schork, N.J. & Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).

    Article  PubMed  CAS  Google Scholar 

  11. The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).

  12. Chanock, S. High marks for GWAS. Nat. Genet. 41, 765–766 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  13. Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005).

    Article  PubMed  CAS  Google Scholar 

  14. Campbell, P.J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  15. Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).

    Article  PubMed  CAS  Google Scholar 

  16. Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  17. Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  18. Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  19. Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  20. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).

    Article  PubMed  CAS  Google Scholar 

  22. Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).

    Article  PubMed  CAS  Google Scholar 

  23. Pang, A.W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  24. Hormozdiari, F., Alkan, C., Eichler, E.E. & Sahinalp, S.C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  25. Wong, K., Keane, T.M., Stalker, J. & Adams, D.J. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biol. 11, R128 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  26. Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  27. Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  28. Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  30. Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2010).

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  31. Consortium, T.G. A map of human genome variation from population scale sequencing. Nature 467, 1061–1073 (2010).

    Article  CAS  Google Scholar 

  32. Harris, R.S. Improved pairwise alignment of genomic DNA. PhD thesis, Penn State Univ. (2007).

  33. Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  34. Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2009).

    Article  PubMed  CAS  Google Scholar 

  35. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  36. McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  37. Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics, chapter 4, unit 4.10 (Wiley, 2009).

  39. Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).

    Article  PubMed  CAS  Google Scholar 

  40. Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  41. Kidd, J.M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods 7, 365–371 (2010).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  42. Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  43. Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).

    Article  PubMed  CAS  Google Scholar 

  44. Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).

    Article  PubMed  CAS  Google Scholar 

  45. Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).

    Article  PubMed  CAS  Google Scholar 

  46. Travers, A.A. & Klug, A. The bending of DNA in nucleosomes and its wider implications. Phil. Trans. R. Soc. Lond. B 317, 537–561 (1987).

    Article  CAS  Google Scholar 

  47. Chen, F.C., Chen, C.J., Li, W.H. & Chuang, T.J. Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 17, 16–22 (2007).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  48. Yi, L. Resequencing of 200 human exomes identifies an excess of low frequency non-synonymous coding variants.pdf. Nat. Genet. 42, 969–972 (2010).

    Article  CAS  Google Scholar 

  49. Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  50. Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).

    Article  PubMed  CAS  Google Scholar 

Download references

Acknowledgements

This work was supported by a National Basic Research Program of China (973 program no. 2011CB809200), the National Natural Science Foundation of China (30725008; 30890032; 30811130531; 30221004), the Chinese 863 program (2006AA02Z177; 2006AA02Z334; 2006AA02A302;2009AA022707), the Shenzhen Municipal Government of China (grants JC200903190767A; JC200903190772A; ZYC200903240076A; CXB200903110066A; ZYC200903240077A; ZYC200903240076A and ZYC200903240080A) and the Ole Rømer grant from the Danish Natural Science Research Council. This project is also funded by the Shenzhen Municipal Government and the Local Government of Yantian District of Shenzhen. The 1000 Genomes Project Consortium provided the data for population analysis. AIFB is supported by Diabetes UK, the Wellcome Trust, the Medical Research Council and the Comprehensive Biomedical Research Centre, Imperial College Healthcare NHS Trust. Thanks to X. Wang from School of Biosciences & Bioengineering, SCUT, for his excellent coordination. Thanks to J. El-Sayed Moustafa for her help analyzing the experimental validation data. L. Goodman, S. Edmunds and A. Basford edited the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

Jun W., Jian W. and H.Y. managed the project. Jun W., Y.L., R. Luo designed the analyses. Y.L., R. Luo, R. Li, H. Zheng, H. Zhu, H.W., H.C., B.W., S.H., H.S., F.Z., H.M., S.F., A.J.d.S., A.I.F.B., W.Z., H.D., L.J.M.C., S.L., L.B. and K.K. performed the data analyses. G.T., J.L. and X.Z. performed the sequencing. Jun W., Y.L. and R. Luo wrote the paper.

Corresponding author

Correspondence to Jun Wang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8 and Supplementary Notes (PDF 1037 kb)

Supplementary Table 1

Primers, sequences of randomly selected structural variations and Sanger capillary sequencing results for PCR validation. (XLS 111 kb)

Supplementary Table 2

Summary of Fosmid sequences validation results. (XLS 144 kb)

Supplementary Table 3

Structural variations predicted on the YH and NA18507 genome were, respectively, compared to sets of variants discovered by alternative approaches. (XLS 17 kb)

Supplementary Table 4

Comparison between SVs detected in YH genome, Levy et al.6 and Pang et al.7 (XLS 41 kb)

Supplementary Table 5

Classification of those strongly conserved (dN/dS 0.1) genes containing SVs. (XLS 48 kb)

Supplementary Data Set 1

Souce code (ZIP 5936 kb)

Supplementary Data Set 2

Supplementary array CGH results (TXT 38 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Zheng, H., Luo, R. et al. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat Biotechnol 29, 723–730 (2011). https://doi.org/10.1038/nbt.1904

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/nbt.1904

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research