Abstract
The exploration of the microbial world has been an exciting series of unanticipated discoveries despite being largely uninformed by rational estimates of the magnitude of task confronting us. However, in the long term, more structured surveys can be achieved by estimating the diversity of microbial communities and the effort required to describe them. The rates of recovery of new microbial taxa in very large samples suggest that many more taxa remain to be discovered in soils and the oceans. We apply a robust statistical method to large gene sequence libraries from these environments to estimate both diversity and the sequencing effort required to obtain a given fraction of that diversity. In the upper ocean, we predict some 1400 phylotypes, and a mere fivefold increase in shotgun reads could yield 90% of the metagenome, that is, all genes from all taxa. However, at deep ocean, hydrothermal vents and diversities in soils can be up to two orders of magnitude larger, and hundreds of times the current number of samples will be required just to obtain 90% of the taxonomic diversity based on 3% difference in 16S rDNA. Obtaining 90% of the metagenome will require tens of thousands of times the current sequencing effort. Although the definitive sequencing of hyperdiverse environments is not yet possible, we can, using taxa-abundance distributions, begin to plan and develop the required methods and strategies. This would initiate a new phase in the exploration of the microbial world.
Similar content being viewed by others
Log in or create a free account to read this content
Gain free access to this article, as well as selected content from this journal and more on nature.com
or
References
Angly FE, Felts B, Breitbart M, Salamon P, Edwards RA, Carlson C et al. (2006). The marine viromes of four oceanic regions. PLoS Biol 4: 2121–2131.
Bunge J, Epstein SS, Peterson DG . (2006). Comment on ‘Computational improvements reveal great bacterial diversity and high metal toxicity in soil’. Science 313: 918.
Chao A . (1987). Estimating the population-size for capture recapture data with unequal catchability. Biometrics 43: 783–791.
Chao A, Bunge J . (2002). Estimating the number of species in a Stochastic abundance model. Biometrics 58: 531–539.
Curtis TP . (2006). Microbial ecologists: it's time to ‘go large’. Nat Rev Microb 4: 488.
Curtis TP, Head IM, Lunn M, Woodcock S, Schloss PD, Sloan WT . (2006). What is the extent of prokaryotic diversity? Philos T Roy Soc B 361: 2023–2037.
Curtis TP, Sloan WT . (2005). Exploring microbial diversity—a vast below. Science 309: 1331–1333.
Curtis TP, Sloan WT, Scannell JW . (2002). Estimating prokaryotic diversity and its limits. Proc Natl Acad Sci USA 99: 10494–10499.
Edgar RC . (2004). MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797.
Etienne RS, Olff H . (2005). Confronting different models of community structure to species-abundance data: a Bayesian model comparison. Ecol Lett 8: 493–504.
Fierer N, Breitbart M, Nulton J, Salamon P, Lozupone C, Jones R et al. (2007). Metagenomic and small-subunit rRNA analyses reveal the genetic diversity of bacteria, archaea, fungi, and viruses in soil. Appl Environ Microbiol 73: 7059–7066.
Gans J, Wolinsky M, Dunbar J . (2005). Computational improvements reveal great bacterial diversity and high metal toxicity in soil. Science 309: 1387–1390.
Gelman A . (1996). Inference and monitoring convergence. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds). Markov chain Monte Carlo in practice. Chapman & Hall: London, UK, pp 131–143.
Gelman A, Carlin JB, Stern HS, Rubin DB . (2004). Bayesian Data Analysis. Chapman & Hall: London, UK.
Gilks WR, Richardson S, Spiegelhalter DJ . (1996). Introducing Markov chain Monte Carlo. In: Gilks WR, Richardson S, Spiegelhalter DJ (eds). Markov Chain Monte Carlo in Practice. Chapman & Hall: London, UK, pp 1–20.
Giovannoni SJ, Tripp HJ, Givan S, Podar M, Vergin KL, Baptista D et al. (2005). Genome streamlining in a cosmopolitan oceanic bacterium. Science 309: 1242–1245.
Gotelli NJ, Colwell RK . (2001). Quantifying biodiversity: procedures and pitfalls in the measurement and comparison of species richness. Ecol Lett 4: 379–391.
Green JL, Plotkin JB . (2007). A statistical theory for sampling species abundances. Ecol Lett 10: 1037–1045.
Hanage WP, Fraser C, Spratt BG . (2006). Sequences, sequence clusters and bacterial species. Philos T Roy Soc B 361: 1917–1927.
Hong SH, Bunge J, Jeon SO, Epstein SS . (2006). Predicting microbial species richness. Proc Natl Acad Sci USA 103: 117–122.
Huber JA, Mark Welch D, Morrison HG, Huse SM, Neal PR, Butterfield DA et al. (2007). Microbial population structures in the deep marine biosphere. Science 318: 97–100.
Hubbell SP, Foster RB, O'Brien ST, Harms KE, Condit R, Wechsler B et al. (1999). Light-gap disturbances, recruitment limitation, and tree diversity in a neotropical forest. Science 283: 554–557.
Huse SM, Huber JA, Morrison HG, Sogin ML, Mark Welch D . (2007). Accuracy and quality of massively parallel DNA pyrosequencing. Genome Biol 8: R143.
Johnson NL, Kemp AW, Kotz S . (2005). Univariate Discrete Distributions. John Wiley & Sons: Hoboken, New Jersey.
Konstantinidis KT, Tiedje KM . (2005). Genomic insights that advance the species definition for prokaryotes. Proc Natl Acad Sci USA 102: 2567–2572.
Lange KL, Little RJA, Taylor JMG . (1989). Robust statistical modeling using the t-distribution. J Am Stat Assoc 84: 881–896.
Montero CI, Shea YR, Jones PA, Harrington SM, Tooke NE, Witebsky FG et al. (2008). Evaluation of Pyrosequencing® technology for the identification of clinically relevant non-dematiaceous yeasts and related species. Eur J Clin Microbiol Infect Dis; Online First DOI 10.1007/s10096 008–01510-x.
Pielou EC . (1969). An Introduction to Mathematical Ecology. Wiley: New York, NY.
Roesch LF, Fulthorpe RR, Riva A, Casella G, Hadwin AKM, Kent AD et al. (2007). Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J 1: 283–290.
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S et al. (2007). The sorcerer II global ocean sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol 5: 398–431.
Schloss PD, Handelsman J . (2005). Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microb 71: 1501–1506.
Schloss PD, Handelsman J . (2006). Toward a census of bacteria in soil. PLoS Comp Biol 2: 786–793.
Sichel HS . (1974). Distribution representing sentence-length in written prose. J Roy Stat Soc A 137: 25–34.
Sloan WT, Quince C, Curtis TP . (2008). The Uncountables. In: Zengler K (ed). Accessing Uncultivated Microorganisms: from the Environment to Organisms and Genomes and Back. ASM Press: Washington, DC, pp 35–54.
Sloan WT, Woodcock S, Lunn M, Head IM, Curtis TP . (2007). Modeling taxa-abundance distributions in microbial communities using environmental sequence data. Microbial Ecol 53: 443–455.
Sogin ML, Morrison HG, Huber JA, Mark Welch D, Huse SM, Neal PR et al. (2006). Microbial diversity in the deep sea and the underexplored ‘rare biosphere’. Proc Natl Acad Sci USA 103: 12115–12120.
Spiegelhalter DJ, Best NG, Carlin BR, van der Linde A . (2002). Bayesian measures of model complexity and fit. J Roy Stat Soc B 64: 583–616.
Stackebrandt E, Goebel BM . (1994). A place for DNA-DNA reassociation and 16S ribosomal-RNA sequence analysis in the present species definition in bacteriology. Int J Syst Bacteriol 44: 846–849.
Torsvik V, Ovreas L, Thingstad TF . (2002). Prokaryotic diversity—magnitude, dynamics, and controlling factors. Science 296: 1064–1066.
Whitman WB, Coleman DC, Wiebe WJ . (1998). Prokaryotes: the unseen majority. Proc Natl Acad Sci USA 95: 6578–6583.
Acknowledgements
We thank Aaron Halpern, Alberto Riva, Doug Rusch and Eric W Triplett for providing the soil sequence and GOS cluster data sets. We also thank Sue Huse and Mitchell Sogin for providing a copy of the quickdist program, and Rampal S Etienne for an algorithm used in fitting the log-normal distribution. We thank Mark Bailey, Stephen Giovannoni, and two anonymous reviewers for helpful comments on an earlier version of this article. Chris Quince is supported by a Lord Kelvin Adam Smith Research Fellowship from the University of Glasgow, and Bill Sloan is supported by an Engineering and Physical Sciences Advanced Research Fellowship.
The Bayesian diversity estimation software used in this study is available from the corresponding author upon request or can be downloaded from the website: http://people.civil.gla.ac.uk/~quince/Software/BDES.html.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Quince, C., Curtis, T. & Sloan, W. The rational exploration of microbial diversity. ISME J 2, 997–1006 (2008). https://doi.org/10.1038/ismej.2008.69
Received:
Revised:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/ismej.2008.69
Keywords
This article is cited by
-
Direct quantification of ecological drift at the population level in synthetic bacterial communities
The ISME Journal (2021)
-
Clade-specific diversification dynamics of marine diatoms since the Jurassic
Nature Ecology & Evolution (2018)
-
Abundance determines the functional role of bacterial phylotypes in complex communities
Nature Microbiology (2018)
-
Hill number as a bacterial diversity measure framework with high-throughput sequence data
Scientific Reports (2016)
-
The activated sludge ecosystem contains a core community of abundant organisms
The ISME Journal (2016)


