Abstract
Next-generation high-throughput DNA sequencing technologies have advanced progressively in sequence-based genomic research and novel biological applications with the promise of sequencing DNA at unprecedented speed. These new non-Sanger-based technologies feature several advantages when compared with traditional sequencing methods in terms of higher sequencing speed, lower per run cost and higher accuracy. However, reads from next-generation sequencing (NGS) platforms, such as 454/Roche, ABI/SOLiD and Illumina/Solexa, are usually short, thereby restricting the applications of NGS platforms in genome assembly and annotation. We presented an overview of the challenges that these novel technologies meet and particularly illustrated various bioinformatics attempts on mapping and assembly for problem solving. We then compared the performance of several programs in these two fields, and further provided advices on selecting suitable tools for specific biological applications.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Horner, D. S., Pavesi, G., Castrignano, T., De Meo, P. D., Liuni, S., Sammeth, M. et al. Bioinformatics approaches for genomics and post genomics applications of next-generation sequencing. Brief. Bioinform. 11, 181–197 (2009).
Metzker, M. L. Applications of Next-Generation Sequencing Sequencing Technologies—the Next Generation. Nat. Rev. Genet. 11, 31–46 (2010).
Tilford, C. A., Kuroda-Kawaguchi, T., Skaletsky, H., Rozen, S., Brown, L. G., Rosenberg, M. et al. A physical map of the human Y chromosome. Nature 409, 943–945 (2001).
Lander, E. S., Linton, L. M., Birren, B., Nusbaum, C., Zody, M. C., Baldwin, J. et al. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Sanger, F., Nicklen, S. & Coulson, A. R. DNA sequencing with chain-terminating inhibitors. Proc. Natl Acad. Sci. USA 74, 5463–5467 (1977).
Mardis, E. R. Next-generation DNA sequencing methods. Annu. Rev. Genomics Hum. Genet. 9, 387–402 (2008).
Service, R. F. Gene sequencing—The race for the $1000 Genome. Science 311, 1544–1546 (2006).
Schuster, S. C. Next-generation sequencing transforms today's biology. Nat. Methods 5, 16–18 (2008).
Morozova, O. & Marra, M. A. Applications of next-generation sequencing technologies in functional genomics. Genomics 92, 255–264 (2008).
Flicek, P. & Birney, E. Sense from sequence reads: methods for alignment and assembly. Nat. Methods 6, S6–S12 (2009).
Trapnell, C. & Salzberg, S. L. How to map billions of short reads onto genomes. Nat. Biotechnol. 27, 455–457 (2009).
Flicek, P. & Birney, E. Sense from sequence reads: methods for alignment and assembly (vol 6, pg S6, 2009). Nat. Methods 7, 479 (2010).
Bateman, A. & Quackenbush, J. Bioinformatics for next generation sequencing. Bioinformatics 25, 429 (2009).
Bozdag, D., Barbacioru, C. C. & Catalyurek, U. V. IEEE International Symposium on Parallel & Distributed Processing, 1033–1042 (2009).
Schatz, M. C. CloudBurst: highly sensitive read mapping with MapReduce. Bioinformatics 25, 1363–1369 (2009).
Chen, K., Wallis, J. W., McLellan, M. D., Larson, D. E., Kalicki, J. M., Pohl, C. S. et al. BreakDancer: an algorithm for high-resolution mapping of genomic structural variation. Nat. Methods 6, 677–681 (2009).
Clement, N. L., Snell, Q., Clement, M. J., Hollenhorst, P. C., Purwar, J., Graves, B. J. et al. The GNUMAP algorithm: unbiased probabilistic mapping of oligonucleotides from next-generation sequencing. Bioinformatics 26, 38–45 (2010).
Campagna, D., Albiero, A., Bilardi, A., Caniato, E., Forcato, C., Manavski, S. et al. PASS: a program to align short sequences. Bioinformatics 25, 967–968 (2009).
Li, R. Q., Li, Y. R., Kristiansen, K. & Wang, J. SOAP: short oligonucleotide alignment program. Bioinformatics 24, 713–714 (2008).
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Li, H., Ruan, J. & Durbin, R. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18, 1851–1858 (2008).
Lin, H., Zhang, Z., Zhang, M. Q., Ma, B. & Li, M. ZOOM! Zillions of oligos mapped. Bioinformatics 24, 2431–2437 (2008).
Rumble, S. M., Lacroute, P., Dalca, A. V., Fiume, M., Sidow, A. & Brudno, M. SHRiMP: accurate mapping of short color-space reads. PLoS Comput. Biol. 5, e1000386 (2009).
Chen, Y., Souaiaia, T. & Chen, T. PerM: efficient mapping of short sequencing reads with periodic full sensitive spaced seeds. Bioinformatics 25, 2514–2521 (2009).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Smith, A. D., Chung, W. Y., Hodges, E., Kendall, J., Hannon, G., Hicks, J. et al. Updates to the RMAP short-read mapping software. Bioinformatics 25, 2841–2842 (2009).
Jiang, H. & Wong, W. H. SeqMap: mapping massive amount of oligonucleotides to the genome. Bioinformatics 24, 2395–2396 (2008).
Homer, N., Merriman, B. & Nelson, S. F. BFAST: an alignment tool for large scale genome resequencing. PLoS ONE 4, e7767 (2009).
Eaves, H. L. & Gao, Y. MOM: maximum oligonucleotide mapping. Bioinformatics 25, 969–970 (2009).
Kim, Y. J., Teletia, N., Ruotti, V., Maher, C. A., Chinnaiyan, A. M., Stewart, R. et al. ProbeMatch: rapid alignment of oligonucleotides to genome allowing both gaps and mismatches. Bioinformatics 25, 1424–1425 (2009).
Ning, Z., Cox, A. J. & Mullikin, J. C. SSAHA: a fast search method for large DNA databases. Genome Res. 11, 1725–1729 (2001).
Malhis, N., Butterfield, Y. S. N., Ester, M. & Jones, S. J. M. Slider-maximum use of probability information for alignment of short sequence reads and SNP detection. Bioinformatics 25, 6–13 (2009).
Weese, D., Emde, A. K., Rausch, T., Doring, A. & Reinert, K. RazerS-fast read mapping with sensitivity control. Genome Res. 19, 1646–1654 (2009).
Li, H. & Homer, N. A survey of sequence alignment algorithms for next-generation sequencing. Brief. Bioinform. 11, 473–483 (2010).
Burrows, M. & Wheeler, D. J. A block sorting lossless data compression algorithm. Technical Report 124 (1994).
Noe, L., Girdea, M. & Kucherov, G. Designing Efficient Spaced Seeds for SOLiD Read Mapping. Adv. Bioinformatics pii, 708501 (2010).
Staden, R. A strategy of DNA sequencing employing computer programs. Nucleic Acids Res. 6, 2601–2610 (1979).
Pop, M. Genome assembly reborn: recent computational challenges. Brief. Bioinform. 10, 354–366 (2009).
Miller, J. R., Koren, S. & Sutton, G. Assembly algorithms for next-generation sequencing data. Genomics 95, 315–327 (2010).
Pevzner, P. A., Tang, H. & Waterman, M. S. An Eulerian path approach to DNA fragment assembly. Proc. Natl Acad. Sci. USA 98, 9748–9753 (2001).
Warren, R. L., Sutton, G. G., Jones, S. J. M. & Holt, R. A. Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23, 500–501 (2007).
Dohm, J. C., Lottaz, C., Borodina, T. & Himmelbauer, H. SHARCGS, a fast and highly accurate short-read assembly algorithm for de novo genomic sequencing. Genome Res. 17, 1697–1706 (2007).
Jeck, W. R., Reinhardt, J. A., Baltrus, D. A., Hickenbotham, M. T., Magrini, V., Mardis, E. R. et al. Extending assembly of short DNA sequences to handle error. Bioinformatics 23, 2942–2944 (2007).
Bryant, D. W. Jr, Wong, W. K. & Mockler, T. C. QSRA: a quality-value guided de novo short read assembler. BMC Bioinformatics 10, 69 (2009).
Miller, J. R., Delcher, A. L., Koren, S., Venter, E., Walenz, B. P., Brownley, A. et al. Aggressive assembly of pyrosequencing reads with mates. Bioinformatics 24, 2818–2824 (2008).
Hernandez, D., Francois, P., Farinelli, L., Osteras, M. & Schrenzel, J. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18, 802–809 (2008).
Margulies, M., Egholm, M., Altman, W. E., Attiya, S., Bader, J. S., Bemben, L. A. et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437, 376–380 (2005).
Hossain, M. S., Azimi, N. & Skiena, S. Crystallizing short-read assemblies around seeds. BMC Bioinformatics 10, S16 (2009).
Myers, E. W., Sutton, G. G., Delcher, A. L., Dew, I. M., Fasulo, D. P., Flanigan, M. J. et al. A whole-genome assembly of Drosophila. Science 287, 2196–2204 (2000).
Venter, J. C., Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G. et al. The sequence of the human genome. Science 291, 1304–1351 (2001).
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. M. & Birol, I. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Butler, J., MacCallum, I., Kleber, M., Shlyakhter, I. A., Belmonte, M. K., Lander, E. S. et al. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18, 810–820 (2008).
Chaisson, M. J. & Pevzner, P. A. Short read fragment assembly of bacterial genomes. Genome Res. 18, 324–330 (2008).
Li, R. Q., Zhu, H. M., Ruan, J., Qian, W. B., Fang, X. D., Shi, Z. B. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Zerbino, D. R. & Birney, E. Velvet: Algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829 (2008).
Simpson, J. T., Wong, K., Jackman, S. D., Schein, J. E., Jones, S. J. & Birol, I. ABySS: a parallel assembler for short read sequence data. Genome Res. 19, 1117–1123 (2009).
Medvedev, P., Stanciu, M. & Brudno, M. Computational methods for discovering structural variation with next-generation sequencing. Nat. Methods 6, S13–S20 (2009).
Bentley, D. R., Balasubramanian, S., Swerdlow, H. P., Smith, G. P., Milton, J., Brown, C. G. et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature 456, 53–59 (2008).
Bashir, A., Volik, S., Collins, C., Bafna, V. & Raphael, B. J. Evaluation of paired-end sequencing strategies for detection of genome rearrangements in cancer. PLoS Comp. Biol. 4, e1000051 (2008).
Campbell, P. J., Stephens, P. J., Pleasance, E. D., O’Meara, S., Li, H., Santarius, T. et al. Identification of somatically acquired rearrangements in cancer using genome-wide massively parallel paired-end sequencing. Nat. Genet. 40, 722–729 (2008).
Korbel, J. O., Abyzov, A., Mu, X. J., Carriero, N., Cayting, P., Zhang, Z. D. et al. PEMer: a computational framework with simulation-based error models for inferring genomic structural variants from massive paired-end sequencing data. Genome Biol. 10, R23 (2009).
Pop, M. & Salzberg, S. L. Bioinformatics challenges of new sequencing technology. Trends Genet. 24, 142–149 (2008).
Mardis, E. R. The impact of next-generation sequencing technology on genetics. Trends Genet. 24, 133–141 (2008).
Ansorge, W. J. Next-generation DNA sequencing techniques. New Biotechnol. 25, 195–203 (2009).
Morozova, O., Hirst, M. & Marra, M. A. Applications of new sequencing technologies for transcriptome analysis. Annu. Rev. Genomics Hum. Genet. 10, 135–151 (2009).
Hurd, P. J. & Nelson, C. J. Advantages of next-generation sequencing versus the microarray in epigenetic research. Brief. Funct. Genomic. Proteomic. 8, 174–183 (2009).
McHardy, A. C. & Adams, B. The role of genomics in tracking the evolution of influenza A virus. PLoS Pathog. 5, e1000566 (2009).
Holt, K. E., Parkhill, J., Mazzoni, C. J., Roumagnac, P., Weill, F. X., Goodhead, I. et al. High-throughput sequencing provides insights into genome variation and evolution in Salmonella Typhi. Nat. Genet. 40, 987–993 (2008).
Engstrand, L. How will next-generation sequencing contribute to the knowledge concerning Helicobacter pylori? Clin. Microbiol. Infect. 15, 823–828 (2009).
Author information
Authors and Affiliations
Corresponding author
Additional information
Supplementary Information accompanies the paper on Journal of Human Genetics website
Supplementary information
Rights and permissions
About this article
Cite this article
Bao, S., Jiang, R., Kwan, W. et al. Evaluation of next-generation sequencing software in mapping and assembly. J Hum Genet 56, 406–414 (2011). https://doi.org/10.1038/jhg.2011.43
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/jhg.2011.43
Keywords
This article is cited by
-
Sustained software development, not number of citations or journal choice, is indicative of accurate bioinformatic software
Genome Biology (2022)
-
Designing a camera placement assistance system for human motion capture based on a guided genetic algorithm
Virtual Reality (2018)
-
Avian transcriptomics: opportunities and challenges
Journal of Ornithology (2018)
-
Transcriptome analysis reveals differentially expressed genes associated with germ cell and gonad development in the Southern bluefin tuna (Thunnus maccoyii)
BMC Genomics (2016)
-
Estimation of genetic diversity in viral populations from next generation sequencing data with extremely deep coverage
Algorithms for Molecular Biology (2016)