Abstract
Riverine nitrogen is a pivotal determinant influencing water quality in inland and coastal waters. Despite the recognized utility, no spatially-explicit data on riverine nitrogen yield is available for large parts of the world, thus hindering our ability to identify the contributors to riverine nitrogen and understand aquatic nitrogen cycling. To fill the data gap for the United States, here we (1) compiled 294,996 total nitrogen (TN), 225,827 nitrate (NO3−), 204,015 ammonium (NH4+), and 158,837 total organic nitrogen (TON) concentrations, with concurrent streamflow data, across the Conterminous United States (CONUS), (2) estimated riverine nitrogen loads for over 1,800 hydrological stations, (3) derived the spatial distribution of annual riverine nitrogen yield by leveraging river and catchment connectivity information contained in the National Hydrography Dataset plus (NHDPlus), and (4) characterized nonpoint-source TN loads by excluding point-source loads. This new spatial dataset quantifies spatial sources of nitrogen yield from point and non-point sources (e.g., up to 36% from point sources across the U.S.) and serves as ground-truthing to validate water quality models.
Similar content being viewed by others
Background & Summary
Nitrogen (N) from point or nonpoint sources has been emerging as a predominant contributor causing suboptimal water quality in eutrophic streams and rivers in the United States1,2 and across the globe3. Escalations in riverine nitrogen concentrations have become pervasive across myriads of watersheds, attributed largely to shifts in land utilization practices, modifications in agricultural methodologies, and climate-induced variations4,5. Globally, about 118 tera-grams of nitrogen (TgN) per year was applied as fertilizers6, and 18% of them was leached into oceans via river runoff7. Recent estimates of riverine nitrogen loss into oceans showed a pronounced increase from 17 TgN per year during the preindustrial era to 37.6 TgN per year in the last decade8, as particularly evident for major rivers such as the Amazon, Mississippi, and Yangtze9. Elevated nitrogen in rivers was documented to transform and impair many riverine ecosystems and estuaries across the world10,11, urging for concerted efforts to manage nutrients and mitigate eutrophication as well as for more publicly accessible data to support such efforts.
Extensive data on nitrogen concentrations and streamflow exist for many parts of the world. But existing data syntheses or analyses focused mostly on large-scale accounting (e.g., N loss into the ocean from the whole Mississippi river basin) or individual small watersheds for local applications. There is a lack of spatially explicit data synthesis that covers a large area and resolves spatial heterogeneity at fine scales. New data of this kind are particularly useful because the anthropogenic sources and drivers controlling nitrogen loads often vary across different river reaches or basins12; they are instrumental in quantifying nitrogen accrual or depletion at distinct river sections and thereby, advancing our understanding of the underlying mechanisms and contributing factors. However, such data is not readily available yet to the scientific and policy communities, even for data-rich regions such as the United States. In particular, derivation of such data needs to account for river network and hydrological connectivity across the region. A comprehensive delineation of the spatial distribution of nitrogen yields from upstream land areas remains notably absent.
In addition, existing databases focused mostly on riverine nitrogen loads without accounting for the nature of the sources or only on one specific anthropogenic driver such as fertilization or urban development8,13. The existing database also rarely differentiated point from non-point sources. The differentiation is important for tracking nitrogen fates across managed landscape and is expected to be instrumental in formulating more efficacious policies to manage soil nutrients and ameliorate water quality. As such, there is also an urgent need of nitrogen pollution database to assess the critical source area of nitrogen loads from non-point (i.e., diffuse pollution from different land uses and atmospheric deposition) and point sources (e.g. concentrated discharge from sewage treatment plant or chemical plant).
We aim to address the gaps in the available nitrogen pollution databases by deriving riverine nitrogen loads (Fig. 1) at more than 1,800 hydrological monitoring stations distributed across the Conterminous United States (CONUS). By leveraging the upstream-downstream river connectivity for catchments from the NHDPlus Hydrologic Unit Code 12-digit (HUC12) geospatial dataset, we computed the riverine nitrogen yield, which is the net gain or loss of riverine nitrogen between upstream and downstream hydrological stations divided by the controlled watershed area. To further elucidate non-point source contributions to the riverine nitrogen loads, we removed the point source contributions in the recently published “Point-Source Nutrient Loads to Streams of the Conterminous United States” dataset14. This new dataset facilitates a more spatially resolved analysis of critical source areas of nitrogen pollution in the CONUS. Such dataset also helps understand control factors of riverine nitrogen gain or loss across different social and physiographic regions, as well as provides observational data for verifying credibility of watershed water quality models that are often used to support nitrogen pollution mitigation.
Methods
Data acquisition and preprocessing
To estimate nitrogen loads, we compiled collocated and concurrent daily records of both streamflow and nitrogen concentrations from 1954 to 2023 in 1,861 hydrologic stations across the CONUS. The streamflow data were sourced from the United States Geological Survey (USGS) National Water Information System (NWIS). We also downloaded nitrogen concentration data from the Water Quality Portal (https://www.waterqualitydata.us/), which is a hub of water quality data from an array of authoritative sources including the USGS, Environmental Protection Agency (EPA), as well as more than 400 state, federal, tribal, and local agencies. Note that we only used the nitrogen concentration data from USGS that are collocated and concurrent with observed streamflow data from the same USGS station. For nitrogen concentration data, when multiple records were available on a single day, we computed the average rather than a flow-weighted average due to the use of daily streamflow data. Moreover, if multiple stations available in one HUC12, we only retained the station closest to the basin outlet in the mainstem.
Estimation of Nitrogen Loads Using LOADEST
We employed LOADEST, a FORTRAN-based computer program designed for the estimation of constituent loads in streams and rivers using the Adjusted Maximum Likelihood Estimation (AMLE) approach, to compute the nitrogen loads at individual hydrological stations (Fig. 2)15. We excluded stations with fewer than 12 paired streamflow-nitrogen concentration data records to meet the statistical analysis requirements of LOADEST. Nine predefined regression models (Table 1) were compared to identify the most reliable equation for each individual hydrologic station. The selection of the best model was based on the Akaike Information Criteria (AIC), which integrates both maximum likelihood and the count of independently adjusted parameters in the model16. In LOADEST, “load” refers to the total mass of nitrogen over a specified time period. The initial output from LOADEST is measured in grams per day, which we convert to kilograms per year. We calculated different nitrogen forms including total nitrogen (TN), total organic nitrogen (TON), ammonium (NH4+) and nitrate (NO3−). For example, 294,996 records of daily TN concentration from 1,861 stations were used. The average TN load (1,916,464 kgN yr−1) is much higher than the median value (260,532 kgN yr−1), indicating that a subset of hydrological stations exhibits exceptionally high values. The exact temporal spans and the total counts of available observations at each station are described in the dataset provided.
Riverine TN loads across the CONUS calculated by the LOADEST program. Points represent the average TN loads at USGS stations, which are further linearly interpolated along the mainstem of the flowlines (i.e., the longest flow path in each HUC8) to show the changes in TN loads along river networks. Due to the skewness of the data toward high values, we applied a logarithmic transformation (\(\mathrm{ln}\left(1+x\right)\),where \(x\) is the TN load) for visualization purpose.
Watershed yield of riverine nitrogen
After calculating the nitrogen loads at each individual hydrological stations, we further leveraged the upstream-downstream topology information contained in the Hydrologic Unit Catalogue – 12 (HUC12) catchments of the National Hydrography Dataset Plus (NHDPlus) dataset (https://nhdplus.com/NHDPlus/NHDPlusV2_data.php) to estimate the net gain or loss nitrogen in watersheds between different pairs of upstream-downstream hydrological stations.
We derived spatial riverine nitrogen yield for watersheds between upstream-downstream hydrological stations using the approach described in Qiu et al.17. This process commenced with the identification of the specific HUC12 within which each hydrological station is situated. The most downstream HUC12 encompassing the hydrological station was designated as S1 (the first-level or the most downstream station). Subsequently, the tracing of upstream HUC12s was conducted until reaching one or multiple HUC12 catchments housing a hydrological station. These upstream HUC12 catchments and hydrological stations were assigned second level labels, such as S1_1, S1_2, and S1_3. The HUC12s falling within the ambit of the first-level station (S1) and the second-level stations (S1_1, S1_2, S1_3, and so on) were grouped as one watershed and marked as a unique “S1” HUC group. The net gain or loss of riverine nitrogen yield from the “S1” watershed was computed by dividing the discrepancy between the load at the first-level station and the aggregated load from the second-level stations by the area of the “S1” watershed. Next, we designated each hydrologic station from the second-level stations as a first-level station and repeat the same procedures described above. Those procedures are iterated until all hydrological stations are processed, resulting in the spatial nitrogen yield maps over the CONUS.
At the HUC12 level, there are a total of 86,744 catchments located within the CONUS. As some HUC12 catchments do not drain to any hydrological stations, the resultant long-term average watershed TN yield map (Fig. 3a) encompasses 66,694 HUC12 catchments. The map shows that U.S. Midwest and Northeast have the highest TN yield and are the major sources of nitrogen pollution. The spatial patterns of other nitrogen forms are similar to the TN yield pattern (Fig. S1).
Nonpoint-source contributions to riverine nitrogen yield
The watershed nitrogen yield maps help illustrate the gain or loss of riverine nitrogen from both non-point and point sources. Nonpoint-source nutrient loads, primarily originating from surface runoff and precipitation and notably influenced by agricultural practices and energy production18,19, often constitute the predominant cause of water quality issues. To discern nonpoint-source contributions, we removed the point source contributions from the “Point-Source Nutrient Loads to Streams of the Conterminous United States” dataset14. This dataset encompasses 5,430 major point-source facilities, such as wastewater treatment plants, chemical plants, and landfills, alongside 11,537 minor wastewater treatment facilities discharging into streams within the CONUS during the year 2012. In total, 16,967 point-source facilities are distributed across 11,601 HUC12 catchments, primarily concentrated in the eastern CONUS. We estimated point-source TN load for each HUC12 catchment by aggregating loads from all point-source facilities within that HUC12.
After removing the point-source contributions (Fig. 3c), we estimated non-point source TN yield (Fig. 3b). In general, point-source contributions are relatively small compared to that of non-point sources but can noticeably change the spatial distribution of watershed TN yield. Large point-source TN yields are primarily found in the Midwest and Southeast regions, which typically originate from wastewater treatment plants and petroleum refineries, two major sources of point discharge14. By removing point-source contributions, the area of watersheds with negative yield or loss of riverine TN increased from 1,043,641 to 1,432,602 km2. Over the CONUS, the area-weighted TN yield is 532.5 and 341.02 kgN km−2 yr−1, respectively, with and without point-source contributions. This indicates that point sources contributed ca. 36% of the riverine TN yield. However, this is likely the upper bound estimate of the point source contributions as part of the point source inputs will be removed through riverine processes20.
Data Records
The “Spatial Nitrogen Yield” Dataset is accessible through Wang et al.21. It encompasses three components: (1) Nitrogen Loads at Hydrological Stations, (2) Watershed Nitrogen Yield, and (3) Nonpoint-Source Nitrogen Yield.
The “Nitrogen Loads” dataset contains the TN loads at each hydrological station (Table 2). To better illustrate estimation uncertainty, we also provide the lower and upper nitrogen load values for the 95% confidence intervals, which were calculated using the standard error of prediction. Furthermore, this dataset includes additional information, such as the start and end years of the observational data, as well as AIC and coefficient of determination (R-Squared) for the selected LOADEST model. These attributes offer users flexibility in sub-setting the dataset based on their specific research requirements.
The “Watershed Nitrogen Yield” dataset comprises nitrogen yield at the HUC group scale (Table 3). The first sheet in the dataset contains average watershed yield for various nitrogen forms, and other sheets includes the annual nitrogen yield from 1981 to 2022. However, due to limited availability of concentration data for a specific year, the spatial coverage of annual nitrogen yield data is not as extensive as long-term average watershed yield data (Fig. S2). This dataset lends itself to enhanced integration with other spatial datasets, such as the NHDPlus and the WBD, effectively contributing to the enrichment of the preexisting national hydrography repositories. Broadly, we observe that within the CONUS, there are 55,671 HUC12s featuring TN yield greater than zero. These HUC12s exhibit an increasing spatial pattern from west to east, while TN loads are also high in the western coastal areas. Eastern areas characterized by heightened TN yields are concentrated in the Midwest and eastern regions, notably prevalent in the upper Mississippi basin and Ohio area, where intensive agricultural activities predominate. Conversely, regions situated in the western CONUS reflect lower nitrogen yields, although elevated nitrogen yields are identified in proximity to the northern coastal regions. The “Nonpoint-Source Watershed Nitrogen Yield” dataset is distinctive in its exclusion of point-source TN.
Technical Validation
To ascertain the precision of our data, we conducted a comparative analysis between the drainage area (Fig. 4a) and the drainage area data extracted from the USGS GAGES-II dataset22. In this comparison, when multiple USGS stations were situated within a single HUC12, we specifically retained the principal stem hydrological stations and omitted tributary stations. Notably, the majority of data points align closely along the 1:1 line. This agreement, coupled with a high R-square value, underscores the substantial concordance between the drainage area derived from the improved WBD and the USGS GAGES-II dataset. This robust agreement serves to affirm the reliability of our rectified upstream-downstream topology information.
To assess the derived riverine nitrogen loads, we compared the LOADEST derived estimates against an independent riverine TN load dataset derived using the Weighted Regressions on Time, Discharge, and Season (WRTDS) model23. A total of 152 hydrological stations with both WRTDS and LOADEST derived data were compared (Fig. 4b). The R-square value closely approximates unity, whereas the Root Mean Square Error (RMSE) is under 7,000 MgN yr−1. The minor discrepancies noted are likely attributed to variations in the temporal periods covered by the two datasets. This result substantiates the reliability of the riverine TN loads derived from this study.
Usage Notes
The methods and data records sections provide information for readers to acquire a comprehensive grasp of the methodologies employed for the estimation and arrangement of nitrogen loads. This foundational understanding shall empower users to optimize their utilization of the nitrogen dataset. Furthermore, given disparities in temporal coverage, availability of concentration data records, and the accuracy of regression models, it is imperative for users to judiciously select specific subsets of nitrogen loads that align with their distinct research requisites.
In a broader context, the new nitrogen load and watershed yield datasets can serve as versatile resources across an array of research domains:
-
1)
The “Nitrogen Loads at Hydrological Stations” dataset, structured at the level of hydrological stations, can form the basis for large-scale watershed model evaluations24.
-
2)
The “Watershed Nitrogen Yield” dataset, when conjoined with supplementary datasets encompassing climate, land cover, and soil properties, can be utilized to scrutinize the dominant factors influencing the net gain or loss of various nitrogen constituents. Further insights into environmental controls on riverine nitrogen loads can be garnered by considering hydraulic attributes, including reach length, width, depth, skewness, as well as streamflow data sourced from the NHDPlus and other relevant repositories.
-
3)
The “Watershed Nitrogen Yield” dataset, with its capacity to depict the spatial distribution of nitrogen yield, aids researchers in constructing comprehensive riverine nitrogen budgets. This spatial nitrogen yield data, in turn, facilitates estimations of nitrogen fluxes traversing from rivers to the ocean across the CONUS. Comparative analyses of upstream-downstream topological relationships enable assessments of the nitrogen removal potential, thereby providing valuable insights for policymakers in formulating more effective decisions.
-
4)
Nonpoint sources, predominantly influenced by human activities, exert significant control over riverine nitrogen fluxes toward coastal regions25. The “Nonpoint-Source Nitrogen Yield” dataset equips policymakers with the tools to discern regions where nitrogen loads are most substantially influenced by commercial fertilizer application and animal manure26.
The dataset derived in this study encapsulates historical average conditions grounded in observations at numerous hydrological stations. Regions characterized by sparse distribution of hydrological stations, particularly evident in southwestern areas, are devoid of nitrogen load coverage. Future research endeavors may explore the integration of machine learning techniques and the fusion of process-based and statistical models to enhance coverage and the explanatory depth in ungauged basins.
Code availability
All the codes for processing the nitrogen yields were calculated using R version 4.3.1 and archived at GitHub: https://github.com/ymwang4924/tn.git.
References
U.S. Environmental Protection Agency. Atlas of America’s polluted waters. EPA Report 840-B-00–002, https://nepis.epa.gov/Exe/ZyPURL.cgi?Dockey=20004GSG.TXT (2000).
Royer, T. V., David, M. B. & Gentry, L. E. Timing of riverine export of nitrate and phosphorus from agricultural watersheds in Illinois: Implications for reducing nutrient loading to the Mississippi River. Environ Sci Technol 40, 4126–4131 (2006).
Yao, X., Zhang, Y., Zhang, L. & Zhou, Y. A bibliometric review of nitrogen research in eutrophic lakes and reservoirs. Journal of Environmental Sciences 66, 274–285 (2018).
Grant, S. B., Azizian, M., Cook, P., Boano, F. & Rippy, M. A. Factoring stream turbulence into global assessments of nitrogen pollution. Science (1979) 359, 1266–1269 (2018).
Houser, J. N. & Richardson, W. B. Nitrogen and phosphorus in the Upper Mississippi River: Transport, processing, and effects on the river ecosystem. Hydrobiologia 640, 71–88 (2010).
Peñuelas, J., Sardans, J., Rivas-ubach, A. & Janssens, I. A. The human-induced imbalance between C, N and P in Earth’s life system. Glob Chang Biol 18, 3–6 (2012).
Galloway, J. V. et al. Human alteration of the nitrogen cycle. Threats, benefits and opportunities. UNESCO–SCOPE policy briefs,. Journal of Science 1, 25–27 (2007).
Ding, Q., Shao, H., Zhang, C. & Fang, X. The patterns of soil nitrogen stocks and C: N stoichiometry under impervious surfaces in China. Earth Syst Sci Data 15, 4599–4612 (2023).
Yamamoto, A. et al. Competing and accelerating effects of anthropogenic nutrient inputs on climate-driven changes in ocean carbon and oxygen cycles. Sci Adv 8, 9207 (2022).
Peñuelas, J. et al. Human-induced nitrogen–phosphorus imbalances alter natural and managed ecosystems across the globe. Nature Communications 4, 1–10 (2013).
Ibáñez, C. & Peñuelas, J. Changing nutrients, changing rivers. Science (1979) 365, 637–638 (2019).
Canfield, D. E., Glazer, A. N. & Falkowski, P. G. The evolution and future of earth’s nitrogen cycle. Science (1979) 330, 192–196 (2010).
Cao, P., Lu, C. & Yu, Z. Historical nitrogen fertilizer use in agricultural ecosystems of the contiguous United States during 1850-2015: Application rate, timing, and fertilizer types. Earth Syst Sci Data 10, 969–984 (2018).
Skinner, K. D. & Maupin, M. A. Point-source nutrient loads to streams of the conterminous United States, 2012. Data Series, https://doi.org/10.3133/DS1101 (2019).
Runkel, R. L., Crawford, C. G. & Cohn, T. A. Load Estimator (LOADEST): A FORTRAN Program for Estimating Constituent Loads in Streams and Rivers, https://doi.org/10.3133/tm4A5 (2004).
Akaike, H. A New Look at the Statistical Model Identification. IEEE Trans Automat Contr 19, 716–723 (1974).
Qiu, H. et al. Watershed carbon yield derived from gauge observations and river network connectivity in the United States. Scientific Data 10, 1–13 (2023). 2023 10:1.
Shen, Z., Liao, Q., Hong, Q. & Gong, Y. An overview of research on agricultural non-point source pollution modelling in China. Sep Purif Technol 84, 104–111 (2012).
Ouyang, W., Skidmore, A. K., Toxopeus, A. G. & Hao, F. Long-term vegetation landscape pattern with non-point source nutrient pollution in upper stream of Yellow River basin. J Hydrol (Amst) 389, 373–380 (2010).
Van Drecht, G., Bouwman, A. F., Knoop, J. M., Beusen, A. H. W. & Meinardi, C. R. Global modeling of the fate of nitrogen from point and nonpoint sources in soils, groundwater, and surface water. Global Biogeochem Cycles 17, 1115 (2003).
Wang, Y., Zhang, X. & Zhao, K. A dataset of riverine nitrogen yield across watersheds in the Conterminous United States. figshare https://doi.org/10.6084/m9.figshare.24645747 (2024).
Falcone, J. A. GAGES-II: Geospatial Attributes of Gages for Evaluating Streamflow. (2011).
Oelsner, G. P. et al. Water-Quality Trends in the Nation’s Rivers and Streams, 1972–2012—Data Preparation, Statistical Methods, and Trend Results (Ver. 2.0, October 2017): U.S. Geological Survey Scientific Investigations Report, https://doi.org/10.3133/sir20175006 (2017).
Wang, Q. et al. Nitrate loading projection is sensitive to freeze-thaw cycle representation. Water Res 186, 116355 (2020).
Howarth, R. W. et al. Regional nitrogen budgets and riverine N & P fluxes for the drainages to the North Atlantic Ocean: Natural and human influences. Nitrogen Cycling in the North Atlantic Ocean and its Watersheds 75–139, https://doi.org/10.1007/978-94-009-1776-7_3 (1996).
Puckett, L. J. Nonpoint and Point Sources of Nitrogen in Major Watersheds of the United States. vol. 94, https://pubs.usgs.gov/wri/wri944001/wri944001.html (1994).
Acknowledgements
This research is in part supported by the U.S. Department of Agriculture – Agricultural Research Service and the National Aeronautics and Space Administration (NASA 22-CMS22-0027). The mention of trade names or commercial products in this publication is solely for the purpose of providing specific information and does not imply recommendation or endorsement by the funding agencies.
Author information
Authors and Affiliations
Contributions
Yiming Wang processed the nitrogen concentration data and developed codes in R to derive the nitrogen load and yield. Xuesong Zhang conceived the research plan and quality-controlled the datasets. Kaiguang Zhao conducted data processing. All authors provided comments and contributed to writing the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Wang, Y., Zhang, X. & Zhao, K. A dataset of riverine nitrogen yield across watersheds in the Conterminous United States. Sci Data 11, 712 (2024). https://doi.org/10.1038/s41597-024-03552-1
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-024-03552-1