Abstract
The 16S rRNA gene (16S) is an accepted marker of bacterial taxonomic diversity, even though differences in copy number obscure the relationship between amplicon and organismal abundances. Ancestral state reconstruction methods can predict 16S copy numbers through comparisons with closely related reference genomes; however, the database of closed genomes is limited. Here, we extend the reference database of 16S copy numbers to de novo assembled draft genomes by developing 16Stimator, a method to estimate 16S copy numbers when these repetitive regions collapse during assembly. Using a read depth approach, we estimate 16S copy numbers for 12 endophytic isolates from Arabidopsis thaliana and confirm estimates by qPCR. We further apply this approach to draft genomes deposited in NCBI and demonstrate accurate copy number estimation regardless of sequencing platform, with an overall median deviation of 14%. The expanded database of isolates with 16S copy number estimates increases the power of phylogenetic correction methods for determining organismal abundances from 16S amplicon surveys.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Aziz RK, Bartels D, Best AA, DeJongh M, Disz T, Edwards RA et al. (2008). The RAST server: rapid annotations using subsystems technology. BMC Genomics 9: 75.
Brynildsrud O, Snipen L-G, Bohlin J . (2015). CNOGpro: detection and quantification of CNVs in prokaryotic whole-genome sequencing data. Bioinformatics 31: 1708–1715.
Greenblum S, Carr R, Borenstein E . (2015). Extensive strain-level copy-number variation across human gut microbiome species. Cell 160: 583–594.
Kembel SW, Wu M, Eisen JA, Green JL . (2012). Incorporating 16S gene copy number information improves estimates of microbial diversity and abundance. PLoS Comput Biol 8: e1002743.
Langille MGI, Zaneveld J, Caporaso JG, McDonald D, Knights D, Reyes JA et al. (2013). Predictive functional profiling of microbial communities using 16S rRNA marker gene sequences. Nat Biotechnol 31: 814–821.
Lee C, Lee S, Shin SG, Hwang S . (2008). Real-time PCR determination of rRNA gene copy number: absolute and relative quantification assays with Escherichia coli. Appl Microbiol Biotechnol 78: 371–376.
Lee ZM-P, Bussema C, Schmidt TM . (2009). rrnDB: documenting the number of rRNA and tRNA genes in bacteria and archaea. Nucleic Acids Res 37: D489–D493.
McDonald D, Price MN, Goodrich J, Nawrocki EP, DeSantis TZ, Probst A et al. (2012). An improved Greengenes taxonomy with explicit ranks for ecological and evolutionary analyses of bacteria and archaea. ISME J 6: 610–618.
Overbeek R, Olson R, Pusch GD, Olsen GJ, Davis JJ, Disz T et al. (2013). The SEED and the rapid annotation of microbial genomes using subsystems technology (RAST). Nucleic Acids Res 42: D206–D214.
Periwal V, Scaria V . (2014). Insights into structural variations and genome rearrangements in prokaryotic genomes. Bioinformatics 31: 1–9.
Price RM, Bonett DG . (2002). Distribution-free confidence intervals for difference and ratio of medians. J Stat Comput Simul 72: 119–124.
Ross MG, Russ C, Costello M, Hollinger A, Lennon NJ, Hegarty R et al. (2013). Characterizing and measuring bias in sequence data. Genome Biol 14: R51.
Sharon I, Morowitz MJ, Thomas BC, Costello EK, Relman DA, Banfield JF . (2013). Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res 23: 111–120.
Stoddard SF, Smith BJ, Hein R, Roller BRK, Schmidt TM . (2014). rrnDB: improved tools for interpreting rRNA gene abundance in bacteria and archaea and a new foundation for future development. Nucleic Acids Res 43: D593–D598.
Yoon S, Xuan Z, Makarov V, Ye K, Sebat J . (2009). Sensitive and accurate detection of copy number variants using read depth of coverage. Genome Res 19: 1586–1592.
Zhao M, Wang Q, Wang Q, Jia P, Zhao Z . (2013). Computational tools for copy number variation (CNV) detection using next-generation sequencing data: features and perspectives. BMC Bioinformatics 14: S1.
Acknowledgements
We thank John Wilmes and Stefano Allesina for helpful discussions, and the Center for Research Informatics at the University of Chicago for computational resources. MP was supported by a Department of Education GAANN fellowship and a NIH Genetics & Regulation training grant. This work was supported by grant DOE DE-AC02-06CH11357 to JAG and grants NSF MCB0603515 and James S. McDonnell Foundation 220020237 to JB.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Competing interests
The authors declare no conflict of interest.
Additional information
Supplementary Information accompanies this paper on The ISME Journal website
Supplementary information
Rights and permissions
About this article
Cite this article
Perisin, M., Vetter, M., Gilbert, J. et al. 16Stimator: statistical estimation of ribosomal gene copy numbers from draft genome assemblies. ISME J 10, 1020–1024 (2016). https://doi.org/10.1038/ismej.2015.161
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ismej.2015.161
This article is cited by
-
The food source of Sargasso Sea leptocephali
Marine Biology (2020)
-
Correcting for 16S rRNA gene copy numbers in microbiome surveys remains an unsolved problem
Microbiome (2018)
-
Niche partition of phenanthrene-degrading bacteria along a Phragmites australis rhizosphere gradient
Biology and Fertility of Soils (2018)