Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Nature Precedings
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • RSS feed
  1. nature
  2. nature precedings
  3. articles
  4. article
Identification and correction of systematic error in high-throughput sequence data
Download PDF
Download PDF
  • Manuscript
  • Open access
  • Published: 06 June 2011

Identification and correction of systematic error in high-throughput sequence data

  • Frazer Meacham1,
  • Dario Boffelli2,
  • Joseph Dhahbi2,
  • David Martin2,
  • Meromit Singer1 &
  • …
  • Lior Pachter1 

Nature Precedings (2011)Cite this article

  • 579 Accesses

  • 2 Citations

  • 1 Altmetric

  • Metrics details

Abstract

A feature common to all DNA sequencing technologies is the presence of base-call errors in the sequenced reads. The implications of such errors are application specific, ranging from minor informatics nuisances to major problems affecting biological inferences. Recently developed “next-gen” sequencing technologies have greatly reduced the cost of sequencing, but have been shown to be more error prone than previous technologies. Both position specific (depending on the location in the read) and sequence specific (depending on the sequence in the read) errors have been identified in Illumina and Life Technology sequencing platforms. We describe a new type of systematic error that manifests as statistically unlikely accumulations of errors at specific genome (or transcriptome) locations. We characterize and describe systematic errors using overlapping paired reads form high-coverage data. We show that such errors occur in approximately 1 in 1000 base pairs, and that quality scores at systematic error sites do not account for the extent of errors. We identify motifs that are frequent at systematic error sites, and describe a classifier that distinguishes heterozygous sites from systematic error. Our classifier is designed to accommodate data from experiments in which the allele frequencies at heterozygous sites are not necessarily 0.5 (such as in the case of RNA-Seq). Systematic errors can easily be mistaken for heterozygous sites in individuals, or for SNPs in population analyses. Systematic errors are particularly problematic in low coverage experiments, or in estimates of allele-specific expression from RNA-Seq data. Our characterization of systematic error has allowed us to develop a program, called SysCall, for identifying and correcting such errors. We conclude that correction of systematic errors is important to consider in the design and interpretation of high-throughput sequencing experiments.

Similar content being viewed by others

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Article 14 September 2023

Nuclear oligo hashing improves differential analysis of single-cell RNA-seq

Article Open access 13 May 2022

Recovery of missing single-cell RNA-sequencing data with optimized transcriptomic references

Article 11 September 2023

Article PDF

Author information

Authors and Affiliations

  1. University of California, Berkeley, CA https://www.nature.com/nature

    Frazer Meacham, Meromit Singer & Lior Pachter

  2. Children’s Hospital Oakland Research Institute, Oakland, CA https://www.nature.com/nature

    Dario Boffelli, Joseph Dhahbi & David Martin

Authors
  1. Frazer Meacham
    View author publications

    Search author on:PubMed Google Scholar

  2. Dario Boffelli
    View author publications

    Search author on:PubMed Google Scholar

  3. Joseph Dhahbi
    View author publications

    Search author on:PubMed Google Scholar

  4. David Martin
    View author publications

    Search author on:PubMed Google Scholar

  5. Meromit Singer
    View author publications

    Search author on:PubMed Google Scholar

  6. Lior Pachter
    View author publications

    Search author on:PubMed Google Scholar

Rights and permissions

Creative Commons Attribution 3.0 License.

Reprints and permissions

About this article

Cite this article

Meacham, F., Boffelli, D., Dhahbi, J. et al. Identification and correction of systematic error in high-throughput sequence data. Nat Prec (2011). https://doi.org/10.1038/npre.2011.5989.1

Download citation

  • Received: 04 June 2011

  • Accepted: 06 June 2011

  • Published: 06 June 2011

  • DOI: https://doi.org/10.1038/npre.2011.5989.1

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Keywords

  • sequencing
  • systematic error
  • base-call error
  • illumina
Download PDF

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Sign up for alerts
  • RSS feed

About the journal

  • Journal Information

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Nature Precedings (Nat Preced)

nature.com sitemap

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2025 Springer Nature Limited

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing