Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

Advertisement

Scientific Data
  • View all journals
  • Search
  • My Account Login
  • Content Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • RSS feed
  1. nature
  2. scientific data
  3. data descriptors
  4. article
A watershed-scale potential pathogenic bacteria dataset from the Yangtze River Basin
Download PDF
Download PDF
  • Data Descriptor
  • Open access
  • Published: 03 March 2026

A watershed-scale potential pathogenic bacteria dataset from the Yangtze River Basin

  • Jie Wang1,2,
  • Shang Wang1,
  • Tong Li1,3,
  • Weiguo Hou2 &
  • …
  • Ye Deng  ORCID: orcid.org/0000-0002-7584-06321,3 

Scientific Data , Article number:  (2026) Cite this article

  • 1186 Accesses

  • Metrics details

We are providing an unedited version of this manuscript to give early access to its findings. Before final publication, the manuscript will undergo further editing. Please note there may be errors present which affect the content, and all legal disclaimers apply.

Subjects

  • Biodiversity
  • Environmental impact
  • Freshwater ecology
  • Microbial ecology
  • Pathogens

Abstract

Microbial safety is fundamental to ensuring water quality, particularly in the Yangtze River Basin, China’s most critical drinking water source. Despite its ecological and economic importance, the basin faces significant anthropogenic pressures, including wastewater discharge, which may elevate the risk of pathogenic contamination. However, fragmented sampling efforts and limited coverage have hindered a systematic understanding of pathogenic microbial diversity and distribution across this vast ecosystem. A novel bioinformatic pipeline leveraging Genome-Specific Markers to accurately identify and quantify potential pathogenic taxa in metagenomic data was applied to 625 publicly available metagenomes, spanning water, sediments, and riparian soils along the 6,300 km Yangtze River continuum. We reconstructed a potential pathogen catalog comprising 403 taxa, largely expanding the pathogen diversity in the large river ecosystem. We also generate the Richness distribution maps of potential pathogens for water, sediments and soils along Yangtze River. The basin-scale pathogen inventory not only establishes a baseline for potential pathogenic bacteria communities in the Yangtze Basin but also serves as a reference library for quick biosurveillance and risk management from genomic resolution.

Similar content being viewed by others

Microbial genomic database of the Yangtze River, the third-longest river on Earth

Article Open access 14 July 2025

Microbial risk assessment across multiple environments based on metagenomic absolute quantification with cellular internal standards

Article 22 April 2025

National-scale biogeography and function of river and stream bacterial biofilm communities

Article Open access 26 November 2025

Data availability

Data are available at the figshare repository (https://doi.org/10.6084/m9.figshare.30196462)29. The repository contains four datasets, including the spatial distribution maps for water, sediment and soils; S1. Metadata of samples for pathogen detection analysis; S2. Pathogens identified by GSMer in the Yangtze River Basin and their potential hosts and S3. Georeferenced sampling locations and pathogen richness used in spatial mapping. Dataset S1 contains the sources of the original metagenomic sequencing data used in this study. Dataset S2 provides potential pathogen species identified by the GSM-based matching and their host information.

Code availability

The parameters of all programs used for the analysis are described in the main text. GSM library construction code was available at https://github.com/yedeng-lab/humanpathogen-GSM.

References

  1. Hu, Y. et al. Annual trends and health risks of antibiotics and antibiotic resistance genes in a drinking water source in East China. Science of The Total Environment 791, 148152 (2021).

    Google Scholar 

  2. Pandey, P. K., Kass, P. H., Soupir, M. L., Biswas, S. & Singh, V. P. Contamination of water resources by pathogenic bacteria. AMB Expr 4, 51 (2014).

    Google Scholar 

  3. Oon, Y.-L. et al. Waterborne pathogens detection technologies: Advances, challenges, and future perspectives. Front. Microbiol. 14, 1286923 (2023).

    Google Scholar 

  4. Liu, W. et al. Unraveling pathogen dynamics in rivers flowing into taihu lake: Insights from high-throughput sequencing and environmental correlations. Water Research X 29, 100406 (2025).

    Google Scholar 

  5. Carraro, L., Mächler, E., Wüthrich, R. & Altermatt, F. Environmental DNA allows upscaling spatial patterns of biodiversity in freshwater ecosystems. Nat Commun 11, 3585 (2020).

    Google Scholar 

  6. Deiner, K., Fronhofer, E. A., Mächler, E., Walser, J.-C. & Altermatt, F. Environmental DNA reveals that rivers are conveyer belts of biodiversity information. Nat Commun 7, 12544 (2016).

    Google Scholar 

  7. Ding, J. et al. Impacts of land use on surface water quality in a subtropical river basin: A case study of the dongjiang river basin, southeastern China. Water 7, 4427–4445 (2015).

    Google Scholar 

  8. McKee, A. M. & Cruz, M. A. Microbial and viral indicators of pathogens and human health risks from recreational exposure to waters impaired by fecal contamination. J. Sustainable Water Built Environ. 7, 03121001 (2021).

    Google Scholar 

  9. Hofstra, N. Quantifying the impact of climate change on enteric waterborne pathogen concentrations in surface water. Current Opinion in Environmental Sustainability 3, 471–479 (2011).

    Google Scholar 

  10. Hales, S. Climate change, extreme rainfall events, drinking water and enteric disease. Reviews on Environmental Health 34, 1–3 (2019).

    Google Scholar 

  11. Seymour, J. R. & McLellan, S. L. Climate change will amplify the impacts of harmful microorganisms in aquatic ecosystems. Nat Microbiol 10, 615–626 (2025).

    Google Scholar 

  12. Girones, R. et al. Molecular detection of pathogens in water–the pros and cons of molecular techniques. Water Res 44, 4325–4339 (2010).

    Google Scholar 

  13. Gu, W. et al. Rapid pathogen detection by metagenomic next-generation sequencing of infected body fluids. Nat Med 27, 115–124 (2021).

    Google Scholar 

  14. Gallagher, T., Phan, J. & Whiteson, K. Getting Our Fingers on the Pulse of Slow-Growing Bacteria in Hard-To-Reach Places. J Bacteriol 200, e00540–18 (2018).

    Google Scholar 

  15. Aw, T. G. & Rose, J. B. Detection of pathogens in water: from phylochips to qPCR to pyrosequencing. Curr Opin Biotechnol 23, 422–430 (2012).

    Google Scholar 

  16. Wang, J., Han, Y. & Feng, J. Metagenomic next-generation sequencing for mixed pulmonary infection diagnosis. BMC Pulm Med 19, 252 (2019).

    Google Scholar 

  17. Tu, Q., He, Z. & Zhou, J. Strain/species identification in metagenomes using genome-specific markers. Nucleic Acids Research 42 (2014).

  18. Li, T. et al. Beyond water and soil: Air emerges as a major reservoir of human pathogens. Environment International 190, 108869 (2024).

    Google Scholar 

  19. NNCBI sequence read archive https://identifiers.org/insdc.sra:SRP288687 (2020).

  20. NCBI sequence read archive https://identifiers.org/insdc.sra:SRP217764 (2020).

  21. NCBI sequence read archive https://identifiers.org/insdc.sra:SRP394638 (2023).

  22. NCBI sequence read archive https://identifiers.org/insdc.sra:SRP201455 (2019).

  23. NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA006054 (2023).

  24. NGDC Genome Sequence Archive https://ngdc.cncb.ac.cn/gsa/browse/CRA008231 (2023).

  25. National Microbiology Data Center (NMDC) https://nmdc.cn/resource/genomics/project/detail/NMDC10020587 (2026).

  26. Bolger, A. M., Lohse, M. & Usadel, B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics 30, 2114–2120 (2014).

    Google Scholar 

  27. Wang, B. et al. Tackling Soil ARG‐Carrying Pathogens with Global‐Scale Metagenomics. Advanced Science 10, 2301980 (2023).

    Google Scholar 

  28. Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat Biotechnol 35, 1026–1028 (2017).

    Google Scholar 

  29. Wang, J., Wang, S., Li, T., Hou, W. & Deng, Y. A watershed-scale Potential pathogenic bacteria dataset from the Yangtze River Basin. figshare https://doi.org/10.6084/m9.figshare.30196462 (2026).

Download references

Acknowledgements

This work was supported by Opening Project of State Key Laboratory of Geomicrobiology and Environmental Changes (51830100303), the National Key Research and Development Program of China (Grant 2022YFC3204703) and the National Natural Science Foundation of China (Grant 42277104).

Author information

Authors and Affiliations

  1. State Key Laboratory of Regional Environment and Sustainability, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing, 100085, China

    Jie Wang, Shang Wang, Tong Li & Ye Deng

  2. Institute of Earth Sciences, China University of Geosciences, Beijing, 100083, China

    Jie Wang & Weiguo Hou

  3. College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, 100049, China

    Tong Li & Ye Deng

Authors
  1. Jie Wang
    View author publications

    Search author on:PubMed Google Scholar

  2. Shang Wang
    View author publications

    Search author on:PubMed Google Scholar

  3. Tong Li
    View author publications

    Search author on:PubMed Google Scholar

  4. Weiguo Hou
    View author publications

    Search author on:PubMed Google Scholar

  5. Ye Deng
    View author publications

    Search author on:PubMed Google Scholar

Contributions

J.W. generated the data and contributed to manuscript writing and revision. S.W. and Y.D designed the study and organized the research, manuscript writing and revision. T.L. contributed to the code writing and data analysis. W.G.H. contributed to manuscript revision.

Corresponding authors

Correspondence to Shang Wang or Ye Deng.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

S1. Metadata of samples for pathogen detection analysis (download PDF )

S2. Pathogens identified by GSMer in the Yangtze River Basin and their potential hosts (download PDF )

S3. Georeferenced sampling locations and pathogen richness used in spatial mapping (download XLSX )

the spatial distribution maps for water, sediment and soils (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Wang, J., Wang, S., Li, T. et al. A watershed-scale potential pathogenic bacteria dataset from the Yangtze River Basin. Sci Data (2026). https://doi.org/10.1038/s41597-026-06983-0

Download citation

  • Received: 30 September 2025

  • Accepted: 25 February 2026

  • Published: 03 March 2026

  • DOI: https://doi.org/10.1038/s41597-026-06983-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

Download PDF

Associated content

Collection

Environmental pollution in aquatic systems

Advertisement

Explore content

  • Research articles
  • News & Comment
  • Collections
  • Follow us on X
  • Sign up for alerts
  • RSS feed

About the journal

  • Aims and scope
  • Editors & Editorial Board
  • Journal Metrics
  • Policies
  • Open Access Fees and Funding
  • Calls for Papers
  • Contact

Publish with us

  • Submission Guidelines
  • Language editing services
  • Open access funding
  • Submit manuscript

Search

Advanced search

Quick links

  • Explore articles by subject
  • Find a job
  • Guide to authors
  • Editorial policies

Scientific Data (Sci Data)

ISSN 2052-4463 (online)

nature.com footer links

About Nature Portfolio

  • About us
  • Press releases
  • Press office
  • Contact us

Discover content

  • Journals A-Z
  • Articles by subject
  • protocols.io
  • Nature Index

Publishing policies

  • Nature portfolio policies
  • Open access

Author & Researcher services

  • Reprints & permissions
  • Research data
  • Language editing
  • Scientific editing
  • Nature Masterclasses
  • Research Solutions

Libraries & institutions

  • Librarian service & tools
  • Librarian portal
  • Open research
  • Recommend to library

Advertising & partnerships

  • Advertising
  • Partnerships & Services
  • Media kits
  • Branded content

Professional development

  • Nature Awards
  • Nature Careers
  • Nature Conferences

Regional websites

  • Nature Africa
  • Nature China
  • Nature India
  • Nature Japan
  • Nature Middle East
  • Privacy Policy
  • Use of cookies
  • Legal notice
  • Accessibility statement
  • Terms & Conditions
  • Your US state privacy rights
Springer Nature

© 2026 Springer Nature Limited

Nature Briefing Microbiology

Sign up for the Nature Briefing: Microbiology newsletter — what matters in microbiology research, free to your inbox weekly.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing: Microbiology