Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Analysis
  • Published:

Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads

Abstract

The development of high-throughput sequencing technologies has advanced our understanding of cancer. However, characterizing somatic structural variants in tumor genomes is still challenging because current strategies depend on the initial alignment of reads to a reference genome. Here, we describe SMUFIN (somatic mutation finder), a single program that directly compares sequence reads from normal and tumor genomes to accurately identify and characterize a range of somatic sequence variation, from single-nucleotide variants (SNV) to large structural variants at base pair resolution. Performance tests on modeled tumor genomes showed average sensitivity of 92% and 74% for SNVs and structural variants, with specificities of 95% and 91%, respectively. Analyses of aggressive forms of solid and hematological tumors revealed that SMUFIN identifies breakpoints associated with chromothripsis and chromoplexy with high specificity. SMUFIN provides an integrated solution for the accurate, fast and comprehensive characterization of somatic sequence variation in cancer.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: SMUFIN.
Figure 2: Large structural variation in pediatric medulloblastoma tumor MB1.
Figure 3: Identification and validation of chromoplexy in mantle cell lymphoma tumor M003.

Similar content being viewed by others

References

  1. Kandoth, C. et al. Mutational landscape and significance across 12 major cancer types. Nature 502, 333–339 (2013).

    Article  CAS  Google Scholar 

  2. Frampton, G.M. et al. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing. Nat. Biotechnol. 31, 1023–1031 (2013).

    Article  CAS  Google Scholar 

  3. Cibulskis, K. et al. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat. Biotechnol. 31, 213–219 (2013).

    Article  CAS  Google Scholar 

  4. Puente, X.S. et al. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia. Nature 475, 101–105 (2011).

    Article  CAS  Google Scholar 

  5. Degner, J.F. et al. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25, 3207–3212 (2009).

    Article  CAS  Google Scholar 

  6. Chen, K. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).

    Article  CAS  Google Scholar 

  7. Rausch, T. et al. DELLY: structural variant discovery by integrated paired-end and split-read analysis. Bioinformatics 28, i333–i339 (2012).

    Article  CAS  Google Scholar 

  8. Alexandrov, L.B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).

    Article  CAS  Google Scholar 

  9. 1000 Genomes Project Consortium. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).

  10. Ye, K., Schulz, M.H., Long, Q., Apweiler, R. & Ning, Z. Pindel: a pattern growth approach to detect break points of large deletions and medium sized insertions from paired-end short reads. Bioinformatics 25, 2865–2871 (2009).

    Article  CAS  Google Scholar 

  11. Wang, J. et al. CREST maps somatic structural variation in cancer genomes with base-pair resolution. Nat. Methods 8, 652–654 (2011).

    Article  CAS  Google Scholar 

  12. Beá, S. et al. Landscape of somatic mutations and clonal evolution in mantle cell lymphoma. Proc. Natl. Acad. Sci. USA 110, 18250–18255 (2013).

    Article  Google Scholar 

  13. Rausch, T. et al. Genome sequencing of pediatric medulloblastoma links catastrophic DNA rearrangements with TP53 mutations. Cell 148, 59–71 (2012).

    Article  CAS  Google Scholar 

  14. Korbel, J.O. & Campbell, P.J. Criteria for inference of chromothripsis in cancer genomes. Cell 152, 1226–1236 (2013).

    Article  CAS  Google Scholar 

  15. Gonzalez-Perez, A. et al. IntOGen-mutations identifies cancer drivers across tumor types. Nat. Methods 10, 1081–1082 (2013).

    Article  CAS  Google Scholar 

  16. Baca, S.C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).

    Article  CAS  Google Scholar 

  17. Shen, M.M. Chromoplexy: a new category of complex rearrangements in the cancer genome. Cancer Cell 23, 567–569 (2013).

    Article  CAS  Google Scholar 

  18. Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).

    Article  CAS  Google Scholar 

  19. Richter, J. et al. Recurrent mutation of the ID3 gene in Burkitt lymphoma identified by integrated genome, exome and transcriptome sequencing. Nat. Genet. 44, 1316–1320 (2012).

    Article  CAS  Google Scholar 

  20. Teles Alves, I. et al. Next-generation sequencing reveals novel rare fusion events with functional implication in prostate cancer. Oncogene 10.1038/onc.2013.591 (3 February 2014).

  21. Huang, W., Li, L., Myers, J.R. & Marth, G.T. ART: a next-generation sequencing read simulator. Bioinformatics 28, 593–594 (2012).

    Article  Google Scholar 

  22. Young, M.A. et al. Background mutations in parental cells account for most of the genetic heterogeneity of induced pluripotent stem cells. Cell Stem Cell 10, 570–582 (2012).

    Article  CAS  Google Scholar 

  23. Jones, D.T. et al. Recurrent somatic alterations of FGFR1 and NTRK2 in pilocytic astrocytoma. Nat. Genet. 45, 927–932 (2013).

    Article  CAS  Google Scholar 

  24. Wang, K., Li, M. & Hakonarson, H. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res. 38, e164 (2010).

    Article  Google Scholar 

  25. Untergasser, A. et al. Primer3–new capabilities and interfaces. Nucleic Acids Res. 40, e115 (2012).

    Article  CAS  Google Scholar 

  26. Shaffer, L.G., McGowan-Jordan, J. & Schmid, M. (eds.) ISCN 2013: An International System for Human Cytogenetic Nomenclature (2013) (Kargar, 2013).

  27. Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

The ICGC-CLL Genome Consortium is funded by the Spanish Ministry of Economy and Competitiveness (MINECO) through the Instituto de Salud Carlos III (ISCIII), Red Temática de Investigación del Cáncer (RTICC) of the ISCIII (RD12/0036/0036) and National Institute of Bioinformatics (INB). This study was also supported by Ministerio de Economía y Competitividad, Secretaría De Estado De Investigación, Desarrollo e Innovación PLAN NACIONAL de I+D+i 2008-2011, Subprograma de Apoyo a Centros y Unidades de excelencia Severo Ochoa; and Plan Nacional SAF12/38432; Generalitat de Catalunya AGAUR 2009-SGR-992; Fondo de Investigaciones Sanitarias (PI11/01177); Association for International Cancer Research (12-0142). J.O.K. and M.S.W. were supported by the European Commission (Health-F2-2010-260791). C.L.-O. is an investigator of the Botin Foundation. E.C. and M.O. are ICREA Academia Researchers. We also thank S. Guijarro and C. Gómez for their excellent technical assistance.

Author information

Authors and Affiliations

Authors

Contributions

V.M., S.G. and D.T. conceived and designed the study. L.O.A., L.M., M.P., J.L.G., R.R. and M.O. performed data analysis. S.B., I.S., C.R., A.N., E.C. and I.G.G. generated and experimentally validated the MCL samples. M.S.-W., A.M.S. and J.O.K. generated and experimentally validated the MB1 sample. C.L.-O., X.S.P., E.C. and D.T. wrote the manuscript; and D.T. supervised the whole study.

Corresponding author

Correspondence to David Torrents.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–5 and Supplementary Tables 2 and 4 (PDF 4689 kb)

Supplementary Table 1 (XLSX)

List of somatic modifications affecting the in silico genome (XLSX 951 kb)

Supplementary Table 3 (XLSX)

Results of SMUFIN and others methods for the in silico validation analysis (XLSX 5687 kb)

Supplementary Table 5 (XLSX)

Results of SMUFIN on the Mantle Lymphoma M004 (XLSX 773 kb)

Supplementary Table 6 (XLSX)

Validation sequences of M003 lage structural variants. The identifyiers shown here correspond to the ones of Supplementary Table 8. (XLSX 23 kb)

Supplementary Table 7 (XLSX)

Results of SMUFIN on the Medulloblastoma MB1 (XLSX 208 kb)

Supplementary Table 8

Results of SMUFIN on the Mantle Lymphoma M003 (XLSX 194 kb)

Supplementary Source Code

Compressed ZIP file with SMUFIN's source code (ZIP 1659 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Moncunill, V., Gonzalez, S., Beà, S. et al. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat Biotechnol 32, 1106–1112 (2014). https://doi.org/10.1038/nbt.3027

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/nbt.3027

This article is cited by

Search

Quick links

Nature Briefing: Cancer

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

Get what matters in cancer research, free to your inbox weekly. Sign up for Nature Briefing: Cancer