Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

Li, Yingrui; Zheng, Hancheng; Luo, Ruibang; Wu, Honglong; Zhu, Hongmei; Li, Ruiqiang; Cao, Hongzhi; Wu, Boxin; Huang, Shujia; Shao, Haojing; Ma, Hanzhou; Zhang, Fan; Feng, Shuijian; Zhang, Wei; Du, Hongli; Tian, Geng; Li, Jingxiang; Zhang, Xiuqing; Li, Songgang; Bolund, Lars; Kristiansen, Karsten; de Smith, Adam J; Blakemore, Alexandra I F; Coin, Lachlan J M; Yang, Huanming; Wang, Jian; Wang, Jun

doi:10.1038/nbt.1904

Analysis
Published: 01 August 2011

Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly

Yingrui Li¹^na1,
Hancheng Zheng¹^na1,
Ruibang Luo^1,2,3^na1,
Honglong Wu^1,4^na1,
Hongmei Zhu¹,
Ruiqiang Li¹,
Hongzhi Cao^1,4,
Boxin Wu¹,
Shujia Huang^1,2,
Haojing Shao^1,2,
Hanzhou Ma^1,2,
Fan Zhang^1,2,
Shuijian Feng¹,
Wei Zhang¹,
Hongli Du²,
Geng Tian¹,
Jingxiang Li¹,
Xiuqing Zhang¹,
Songgang Li¹,
Lars Bolund^1,5,
Karsten Kristiansen^1,6,
Adam J de Smith⁷,
Alexandra I F Blakemore⁷,
Lachlan J M Coin⁸,
Huanming Yang¹,
Jian Wang¹ &
…
Jun Wang^1,6,9

Nature Biotechnology volume 29, pages 723–730 (2011)Cite this article

14k Accesses
114 Citations
57 Altmetric
Metrics details

Subjects

Abstract

Here we use whole-genome de novo assembly of second-generation sequencing reads to map structural variation (SV) in an Asian genome and an African genome. Our approach identifies small- and intermediate-size homozygous variants (1–50 kb) including insertions, deletions, inversions and their precise breakpoints, and in contrast to other methods, can resolve complex rearrangements. In total, we identified 277,243 SVs ranging in length from 1–23 kb. Validation using computational and experimental methods suggests that we achieve overall <6% false-positive rate and <10% false-negative rate in genomic regions that can be assembled, which outperforms other methods. Analysis of the SVs in the genomes of 106 individuals sequenced as part of the 1000 Genomes Project suggests that SVs account for a greater fraction of the diversity between individuals than do single-nucleotide polymorphisms (SNPs). These findings demonstrate that whole-genome de novo assembly is a feasible approach to deriving more comprehensive maps of genetic variation.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Figure 1: Mapping structural variation using whole-genome *de novo* assembly.**

**Figure 3: Canonical structural variation profiles of genes and *Alu* elements in YH (red) and NA18507 (blue) genomes.**

**Figure 4: Selection pattern of structural variations.**

Genome-wide investigation identifies a rare copy-number variant burden associated with human spina bifida

Article Open access 08 March 2021

A Catalogue of Structural Variation across Ancestrally Diverse Asian Genomes

Article Open access 04 November 2024

Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data

Article Open access 19 March 2024

Accession codes

Accessions

GenBank/EMBL/DDBJ

Sequence Read Archive

SRA009271

References

Lander, E.S. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Article CAS PubMed Google Scholar
Venter, J.C. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Article CAS PubMed Google Scholar
International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 431, 931–945 (2004).
Hinds, D.A. et al. Whole-genome patterns of common DNA variation in three human populations. Science 307, 1072–1079 (2005).
Article PubMed CAS Google Scholar
Stefansson, H. et al. A common inversion under selection in Europeans. Nat. Genet. 37, 129–137 (2005).
Article PubMed CAS Google Scholar
Ben-Shachar, S. et al. 22q11.2 distal deletion: a recurrent genomic disorder distinct from DiGeorge syndrome and velocardiofacial syndrome. Am. J. Hum. Genet. 82, 214–221 (2008).
Article PubMed PubMed Central CAS Google Scholar
Futreal, P.A. et al. A census of human cancer genes. Nat. Rev. Cancer 4, 177–183 (2004).
Article PubMed PubMed Central CAS Google Scholar
The Cancer Genome Atlas Research Network. Comprehensive genomic characterization defines human glioblastoma genes and core pathways. Nature 455, 1061–1068 (2008).
Mitelman, F., Johansson, B. & Mertens, F. The impact of translocations and gene fusions on cancer causation. Nat. Rev. Cancer 7, 233–245 (2007).
Article PubMed CAS Google Scholar
Frazer, K.A., Murray, S.S., Schork, N.J. & Topol, E.J. Human genetic variation and its contribution to complex traits. Nat. Rev. Genet. 10, 241–251 (2009).
Article PubMed CAS Google Scholar
The International HapMap Consortium. A haplotype map of the human genome. Nature 437, 1299–1320 (2005).
Chanock, S. High marks for GWAS. Nat. Genet. 41, 765–766 (2009).
Article PubMed PubMed Central CAS Google Scholar
Hirschhorn, J.N. & Daly, M.J. Genome-wide association studies for common diseases and complex traits. Nat. Rev. Genet. 6, 95–108 (2005).
Article PubMed CAS Google Scholar
Campbell, P.J. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Article PubMed PubMed Central CAS Google Scholar
Tuzun, E. et al. Fine-scale structural variation of the human genome. Nat. Genet. 37, 727–732 (2005).
Article PubMed CAS Google Scholar
Korbel, J.O. et al. Paired-end mapping reveals extensive structural variation in the human genome. Science 318, 420–426 (2007).
Article PubMed PubMed Central CAS Google Scholar
Kidd, J.M. et al. Mapping and sequencing of structural variation from eight human genomes. Nature 453, 56–64 (2008).
Article PubMed PubMed Central CAS Google Scholar
Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
Article PubMed PubMed Central CAS Google Scholar
Bentley, D.R. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Article PubMed PubMed Central CAS Google Scholar
Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).
Article PubMed PubMed Central CAS Google Scholar
Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).
Article PubMed CAS Google Scholar
Conrad, D.F. et al. Origins and functional impact of copy number variation in the human genome. Nature 464, 704–712 (2010).
Article PubMed CAS Google Scholar
Pang, A.W. et al. Towards a comprehensive structural variation map of an individual human genome. Genome Biol. 11, R52 (2010).
Article PubMed PubMed Central CAS Google Scholar
Hormozdiari, F., Alkan, C., Eichler, E.E. & Sahinalp, S.C. Combinatorial algorithms for structural variation detection in high-throughput sequenced genomes. Genome Res. 19, 1270–1278 (2009).
Article PubMed PubMed Central CAS Google Scholar
Wong, K., Keane, T.M., Stalker, J. & Adams, D.J. Enhanced structural variant and breakpoint detection using SVMerge by integration of multiple detection methods and local assembly. Genome Biol. 11, R128 (2010).
Article PubMed PubMed Central Google Scholar
Levy, S. et al. The diploid genome sequence of an individual human. PLoS Biol. 5, e254 (2007).
Article PubMed PubMed Central CAS Google Scholar
Simpson, J.T. et al. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Article PubMed PubMed Central CAS Google Scholar
Zerbino, D.R. & Birney, E. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Article PubMed PubMed Central CAS Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article PubMed PubMed Central CAS Google Scholar
Gnerre, S. et al. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc. Natl. Acad. Sci. USA 108, 1513–1518 (2010).
Article PubMed CAS PubMed Central Google Scholar
Consortium, T.G. A map of human genome variation from population scale sequencing. Nature 467, 1061–1073 (2010).
Article CAS Google Scholar
Harris, R.S. Improved pairwise alignment of genomic DNA. PhD thesis, Penn State Univ. (2007).
Schwartz, S. et al. Human-mouse alignments with BLASTZ. Genome Res. 13, 103–107 (2003).
Article PubMed PubMed Central CAS Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2009).
Article PubMed CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article PubMed PubMed Central CAS Google Scholar
McKernan, K.J. et al. Sequence and structural variation in a human genome uncovered by short-read, massively parallel ligation sequencing using two-base encoding. Genome Res. 19, 1527–1541 (2009).
Article PubMed PubMed Central CAS Google Scholar
Wang, J. et al. The diploid genome sequence of an Asian individual. Nature 456, 60–65 (2008).
Article PubMed PubMed Central CAS Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics, chapter 4, unit 4.10 (Wiley, 2009).
Alkan, C., Sajjadian, S. & Eichler, E.E. Limitations of next-generation genome sequence assembly. Nat. Methods 8, 61–65 (2011).
Article PubMed CAS Google Scholar
Alkan, C., Coe, B.P. & Eichler, E.E. Genome structural variation discovery and genotyping. Nat. Rev. Genet. 12, 363–376 (2011).
Article PubMed PubMed Central CAS Google Scholar
Kidd, J.M. et al. Characterization of missing human genome sequences and copy-number polymorphic insertions. Nat. Methods 7, 365–371 (2010).
Article PubMed PubMed Central CAS Google Scholar
Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).
Article PubMed PubMed Central CAS Google Scholar
Feuk, L., Carson, A.R. & Scherer, S.W. Structural variation in the human genome. Nat. Rev. Genet. 7, 85–97 (2006).
Article PubMed CAS Google Scholar
Li, R. et al. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28, 57–63 (2010).
Article PubMed CAS Google Scholar
Lam, H.Y. et al. Nucleotide-resolution analysis of structural variants using BreakSeq and a breakpoint library. Nat. Biotechnol. 28, 47–55 (2010).
Article PubMed CAS Google Scholar
Travers, A.A. & Klug, A. The bending of DNA in nucleosomes and its wider implications. Phil. Trans. R. Soc. Lond. B 317, 537–561 (1987).
Article CAS Google Scholar
Chen, F.C., Chen, C.J., Li, W.H. & Chuang, T.J. Human-specific insertions and deletions inferred from mammalian genome sequences. Genome Res. 17, 16–22 (2007).
Article PubMed PubMed Central CAS Google Scholar
Yi, L. Resequencing of 200 human exomes identifies an excess of low frequency non-synonymous coding variants.pdf. Nat. Genet. 42, 969–972 (2010).
Article CAS Google Scholar
Kent, W.J. BLAT–the BLAST-like alignment tool. Genome Res. 12, 656–664 (2002).
Article PubMed PubMed Central CAS Google Scholar
Li, R. et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25, 1966–1967 (2009).
Article PubMed CAS Google Scholar

Download references

Acknowledgements

This work was supported by a National Basic Research Program of China (973 program no. 2011CB809200), the National Natural Science Foundation of China (30725008; 30890032; 30811130531; 30221004), the Chinese 863 program (2006AA02Z177; 2006AA02Z334; 2006AA02A302;2009AA022707), the Shenzhen Municipal Government of China (grants JC200903190767A; JC200903190772A; ZYC200903240076A; CXB200903110066A; ZYC200903240077A; ZYC200903240076A and ZYC200903240080A) and the Ole Rømer grant from the Danish Natural Science Research Council. This project is also funded by the Shenzhen Municipal Government and the Local Government of Yantian District of Shenzhen. The 1000 Genomes Project Consortium provided the data for population analysis. AIFB is supported by Diabetes UK, the Wellcome Trust, the Medical Research Council and the Comprehensive Biomedical Research Centre, Imperial College Healthcare NHS Trust. Thanks to X. Wang from School of Biosciences & Bioengineering, SCUT, for his excellent coordination. Thanks to J. El-Sayed Moustafa for her help analyzing the experimental validation data. L. Goodman, S. Edmunds and A. Basford edited the manuscript.

Author information

Yingrui Li, Hancheng Zheng, Ruibang Luo and Honglong Wu: These authors contributed equally to this work.

Authors and Affiliations

BGI-Shenzhen, Shenzhen, China
Yingrui Li, Hancheng Zheng, Ruibang Luo, Honglong Wu, Hongmei Zhu, Ruiqiang Li, Hongzhi Cao, Boxin Wu, Shujia Huang, Haojing Shao, Hanzhou Ma, Fan Zhang, Shuijian Feng, Wei Zhang, Geng Tian, Jingxiang Li, Xiuqing Zhang, Songgang Li, Lars Bolund, Karsten Kristiansen, Huanming Yang, Jian Wang & Jun Wang
School of Bioscience and Biotechnology, South China University of Technology, Guangzhou, China
Ruibang Luo, Shujia Huang, Haojing Shao, Hanzhou Ma, Fan Zhang & Hongli Du
Department of Computer Science, The University of Hong Kong, Hong Kong, China
Ruibang Luo
Genome Research Institute, Shenzhen University Medical School, Shenzhen, China
Honglong Wu & Hongzhi Cao
Institute of Human Genetics, University of Aarhus, Aarhus, Denmark
Lars Bolund
Department of Biology, University of Copenhagen, Copenhagen, Denmark
Karsten Kristiansen & Jun Wang
Department of Genomics of Common Disease, School of Public Health, Imperial College London, London, UK
Adam J de Smith & Alexandra I F Blakemore
Department of Epidemiology and Biostatistics, School of Public Health, Imperial College, London, UK
Lachlan J M Coin
The Novo Nordisk Foundation Center for Basic Metabolic Research, University of Copenhagen, Copenhagen, Denmark
Jun Wang

Authors

Yingrui Li
View author publications
Search author on:PubMed Google Scholar
Hancheng Zheng
View author publications
Search author on:PubMed Google Scholar
Ruibang Luo
View author publications
Search author on:PubMed Google Scholar
Honglong Wu
View author publications
Search author on:PubMed Google Scholar
Hongmei Zhu
View author publications
Search author on:PubMed Google Scholar
Ruiqiang Li
View author publications
Search author on:PubMed Google Scholar
Hongzhi Cao
View author publications
Search author on:PubMed Google Scholar
Boxin Wu
View author publications
Search author on:PubMed Google Scholar
Shujia Huang
View author publications
Search author on:PubMed Google Scholar
Haojing Shao
View author publications
Search author on:PubMed Google Scholar
Hanzhou Ma
View author publications
Search author on:PubMed Google Scholar
Fan Zhang
View author publications
Search author on:PubMed Google Scholar
Shuijian Feng
View author publications
Search author on:PubMed Google Scholar
Wei Zhang
View author publications
Search author on:PubMed Google Scholar
Hongli Du
View author publications
Search author on:PubMed Google Scholar
Geng Tian
View author publications
Search author on:PubMed Google Scholar
Jingxiang Li
View author publications
Search author on:PubMed Google Scholar
Xiuqing Zhang
View author publications
Search author on:PubMed Google Scholar
Songgang Li
View author publications
Search author on:PubMed Google Scholar
Lars Bolund
View author publications
Search author on:PubMed Google Scholar
Karsten Kristiansen
View author publications
Search author on:PubMed Google Scholar
Adam J de Smith
View author publications
Search author on:PubMed Google Scholar
Alexandra I F Blakemore
View author publications
Search author on:PubMed Google Scholar
Lachlan J M Coin
View author publications
Search author on:PubMed Google Scholar
Huanming Yang
View author publications
Search author on:PubMed Google Scholar
Jian Wang
View author publications
Search author on:PubMed Google Scholar
Jun Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Jun W., Jian W. and H.Y. managed the project. Jun W., Y.L., R. Luo designed the analyses. Y.L., R. Luo, R. Li, H. Zheng, H. Zhu, H.W., H.C., B.W., S.H., H.S., F.Z., H.M., S.F., A.J.d.S., A.I.F.B., W.Z., H.D., L.J.M.C., S.L., L.B. and K.K. performed the data analyses. G.T., J.L. and X.Z. performed the sequencing. Jun W., Y.L. and R. Luo wrote the paper.

Corresponding author

Correspondence to Jun Wang.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–8 and Supplementary Notes (PDF 1037 kb)

Supplementary Table 1

Primers, sequences of randomly selected structural variations and Sanger capillary sequencing results for PCR validation. (XLS 111 kb)

Supplementary Table 2

Summary of Fosmid sequences validation results. (XLS 144 kb)

Supplementary Table 3

Structural variations predicted on the YH and NA18507 genome were, respectively, compared to sets of variants discovered by alternative approaches. (XLS 17 kb)

Supplementary Table 4

Comparison between SVs detected in YH genome, Levy et al.⁶ and Pang et al.⁷ (XLS 41 kb)

Supplementary Table 5

Classification of those strongly conserved (dN/dS ⩽ 0.1) genes containing SVs. (XLS 48 kb)

Supplementary Data Set 1

Souce code (ZIP 5936 kb)

Supplementary Data Set 2

Supplementary array CGH results (TXT 38 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Li, Y., Zheng, H., Luo, R. et al. Structural variation in two human genomes mapped at single-nucleotide resolution by whole genome de novo assembly. Nat Biotechnol 29, 723–730 (2011). https://doi.org/10.1038/nbt.1904

Download citation

Received: 04 March 2011
Accepted: 03 June 2011
Published: 01 August 2011
Issue date: August 2011
DOI: https://doi.org/10.1038/nbt.1904

This article is cited by

Whole genome sequence analysis reveals genetic structure and X-chromosome haplotype structure in indigenous Chinese pigs
- Xiong Tong
- Lianjie Hou
- Chong Wang
Scientific Reports (2020)
dnAQET: a framework to compute a consolidated metric for benchmarking quality of de novo assemblies
- Gokhan Yavas
- Huixiao Hong
- Wenming Xiao
BMC Genomics (2019)
The wolf reference genome sequence (Canis lupus lupus) and its implications for Canis spp. population genomics
- Shyam Gopalakrishnan
- Jose A. Samaniego Castruita
- M. Thomas P. Gilbert
BMC Genomics (2017)
novoBreak: local assembly for breakpoint detection in cancer genomes
- Zechen Chong
- Jue Ruan
- Ken Chen
Nature Methods (2017)
Recent breeding programs enhanced genetic diversity in both desi and kabuli varieties of chickpea (Cicer arietinum L.)
- Mahendar Thudi
- Annapurna Chitikineni
- Rajeev K. Varshney
Scientific Reports (2016)