Introduction

River hydrologic and biogeochemical regimes are key for linking processes across terrestrial and aquatic systems1. Within river systems, areas of groundwater and surface water mixing (hyporheic zones) contribute 4–96% of stream respiration2,3,4,5,6,7, indicating large variation in their role in local to global biogeochemical cycles. The respiratory contributions of the hyporheic zone are driven by hydrologic exchange flows and biogeochemical processing8,9,10. Hydrologic exchange flows between river water and the surrounding subsurface environments stimulate biogeochemical activity in hyporheic zone sediments primarily through provision of nutrients, mixing of dissolved reactants, changes in redox conditions, and exposure to microbes11,12. While the hydrological conceptualization of the hyporheic zone implicitly assumes saturated conditions in the streambed11,13,14,15, river systems are increasingly disturbed due, in part, to the loss of surface water16. Variable flow conditions can alter hyporheic zone biogeochemical function17,18,19,20, and while the interaction between physical controls, nutrient availability, and biogeochemical processing in perennial hyporheic zones have been widely studied11, understanding these processes in variably inundated hyporheic zones is a pressing challenge21,22. Our use of the phrase ‘variably inundated’ aligns with the definition recently provided in Stegen et al.23 whereby there is loss and gain through time of an uninterrupted aqueous barrier at the land–atmosphere interface. In river systems, variably inundated hyporheic zones can also be considered the parafluvial zone, which occurs within the active channel of streams and rivers but loses surface water when discharge is low23,24.

Recent field and laboratory experiments have highlighted the importance of dissolved organic matter (DOM) chemistry associated with elemental stoichiometry of individual molecular formulas20,25,26,27,28,29. For example, Garayburu-Caruso et al.30 found that DOM thermodynamic favorability–calculated from elemental stoichiometry–regulates aerobic respiration under conditions of carbon limitation. River system DOM chemistry tied to elemental stoichiometries is heavily influenced by biotic and abiotic processes which modify rates of production, transformation, sorption/desorption, and/or spatial movement of DOM molecules31. Specifically, abiotic processes associated with sunlight, photooxidation, and redox conditions have been linked to DOM bioavailability changes in the hyporheic zone and other subsurface systems25,32,33. Within river systems the majority of work on DOM chemistry has focused on understanding DOM chemistry in surface water or fully saturated/inundated sediments, with relatively little work on factors governing variation in DOM chemistry in variably inundated hyporheic zones.

Deeper understanding of processes driving variation in DOM chemistry, including within variably inundated hyporheic zones, can be pursued by leveraging the recent unification of DOM chemistry with meta-community ecology (i.e., meta-metabolome ecology)31. This framework treats DOM as assemblages of organic molecules with the goal of understanding the contributions of different ecological assembly processes impacting DOM transformations31. Ecological assembly processes can be either deterministic (i.e., non-random) or stochastic (i.e., random). Deterministic processes are further classified as (1) variable selection, when environmental factors (e.g., biotic or abiotic processes) are responsible for differences across DOM assemblages, or (2) homogeneous selection, when environmental processes constrain DOM assemblages to be highly similar through space or time34. Stochastic processes are linked to random events associated with transport and spatially or temporally random production or transformation of organic molecules31. Under this framework, ecological null models are used with DOM chemistry to quantify the relative influences of stochastic and deterministic processes governing DOM assemblages.

Previous work by Sengupta et al.20 showed that decreases in sediment moisture cause decreases in respiration rates along with significant shifts in the chemistry of water extractable OM (interpreted as DOM). Specifically, they found that drying decreased respiration rates, decreased DOM thermodynamic favorability, and changed the composition of metabolically active microbes. They proposed a conceptual model in which drying triggers a feedback among respiration, DOM chemistry, and microbial gene expression. One implication is that DOM chemistry in dry hyporheic zone sediments is expected to be influenced by deterministic processes. This study was, however, based on a single field system, and it is unknown whether DOM chemistry in dry hyporheic zone sediments is deterministically organized across river systems or if their results were system-specific. We are therefore motivated to study cross-site variation in assembly processes. Because drying causes a deterministic shift in DOM chemistry20, we hypothesize that determinism will generally dominate over stochasticity within variably inundated sediments. We also expect stronger determinism in dry sediments due to minimal influences of stochastic mixing34,35,36 relative to previous estimates of determinism focused on DOM in river water and pore water within inundated sediments31,37. As a conceptual extension, we further hypothesize that because low moisture causes deterministic shifts in DOM chemistry, the lower the moisture gets, the stronger deterministic selection will be. We, therefore, expect a negative relationship between deterministic selection and sediment moisture.

Here we test this hypothesis focusing on water-extractable organic molecules from shallow hyporheic zone sediments crowdsourced from across the United States. While technically the material we study is water extractable organic matter, we refer to it as DOM because it is effectively dissolved when analyzed via mass spectrometry. Sediments were collected from the variably inundated parafluvial zone in each field site. This sampling design allowed us to leverage natural variation in sediment moisture across field sites without constraining other variables. This approach provides a unique opportunity to evaluate cross-system variation in the relative influences of deterministic and stochastic processes over DOM chemistry and the role of moisture across a wide range of biomes. Our results indicate predominance of deterministic selection and that sediment moisture constrains the level of deterministic assembly imposed on organic molecules in hyporheic zone sediments.

Methods

Sample collection

During August through November of 2020, the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) consortium38 conducted a crowdsourced study to understand key drivers of hyporheic zone biogeochemical processes in variably inundated sediments at a continental scale. WHONDRS designed the campaign with collaborators that volunteered to sample, generated sampling protocols/videos, and sent out free sampling kits to collaborators.

Riverbed sediments from variably inundated zones were collected from 48 sites across Alaska, the contiguous United States, and Puerto Rico. Observational metadata on weather, vegetation, and geomorphological features were also collected. Along each river, sediment samples for DOM and moisture analysis were collected from 10 locations spanning ~ 20–100 m along the shoreline (i.e., from the parafluvial zone parallel to the river). Efforts were made to collect fine-grained sediments from depositional zones. Sediments (1–3 cm depth) were sampled with a stainless steel scoop into 50 mL conical tubes for DOM (GENSCIENTIFIC ™ Cat #: 28-106) and 25 mL conical tubes for moisture (Eppendorf™ 0030122429). Samples were refrigerated upon collection and shipped in blue ice to Pacific Northwest National Laboratory (PNNL) in Richland, WA (United States) within 24 h of collection. Although samples from 48 sites were collected, this manuscript includes samples from 38 sites (Fig. 1). For robust cross-site comparisons, we restricted analyses to this subset of 38 sites that had high quality mass spectrometry data from all 10 locations sampled per site, per the data calibration quality standards described below.

Fig. 1
figure 1

Map of the 38 sites included in this manuscript and sampled by the Worldwide Hydrobiogeochemistry Observation Network for Dynamic River Systems (WHONDRS) consortium. Yellow dots indicate sites where variably inundated sediment samples were collected. Colors display ecoregions defined by RESOLVE Ecoregions 201776.

Laboratory analysis

Upon sample receipt, sediments for DOM analysis were centrifuged using an Eppendorf 5430R centrifuge at 6000 g for 5 min at 4 °C. Excess water was decanted, a breathe easy membrane (Andwin Scientific Cat #: BERM2000) was applied to allow evaporation without allowing external microbes into the sediments, and the tube was recapped. Samples were then flash frozen using liquid nitrogen and lyophilized for 72 h using a Labconco FreeZone lyophilizer. At each of the ten locations within a site, sediment was collected into two 50 mL conical tubes (GENSCIENTIFIC™ Cat #: 28-106) in the field. After lyophilization in the lab, the sediment from the two tubes was combined and sieved through a 2 mm stainless steel sieve that was previously cleaned with hydrogen peroxide, thoroughly rinsed with MilliQ water, and dried. Sieved samples were homogenized, evenly distributed back into the original two 50 mL conical tubes, and then placed into a −80 °C freezer until analysis. This was repeated for each of the ten locations within each site.

Sediment samples were thawed in the dark overnight at 4 °C. Sediment DOM was extracted in 50 mL conical tubes (GENSCIENTIFIC™ Cat #: 28-106) by using a 1:2 ratio of sediment to MilliQ water (10 g of sediment with 20 mL of MilliQ water). The samples were shaken at 375 rpm for 2 h at 21 °C in an environmentally controlled room. After shaking, the tubes were centrifuged at 6000 g and 21 °C for 5 min. The supernatant was filtered into a 40 mL amber vial (Thermo Scientific™ Cat#: SS2460040) using a 0.22 um Sterivex filter (MilliporeSigma™ Cat#: SVGP01050). Dissolved organic carbon (DOC) was measured as Non-Purgeable Organic Carbon (NPOC) in the filtered supernatant using a Shimadzu TOC-L Total Organic Carbon Analyzer connected to an ASI-L autosampler. The NPOC calibration curve spanned 0.25 mg/L – 100 mg/L as carbon. Blanks and check standards were added every 10 samples to confirm data quality. 150 µL of sample was sparged into the TOC-L furnace at 680 °C then the best 3 out of 5 injections were averaged to get a final concentration. The quality of the calibration curve, samples, blanks, and check standards was assessed before using data. The targeted check standard concentration was 3.5 mg/L as C and the mean and standard deviation of 84 check standards samples across 8 runs were 3.49 mg/L as C and 0.22, respectively. The lower limit of detection was 0.21 mg/L as C and the lower limit of quantification was 0.68 mg/L as C.

Sediment samples for moisture analysis were stored at 4 °C upon sample receipt until analysis. Moisture content in all the samples was measured at the end of the sampling campaign. Initial sample weights were recorded and subsequently samples were placed in an oven at 105 °C for approximately 24 h. After 24 h, samples were cooled to 21 °C in a desiccator, weights were recorded, and samples were returned to the oven for another 3 h. After 3 h, samples were cooled to 21 °C in a desiccator and final weights were recorded. Sediment moisture (SM) was found by first calculating water content as the difference in grams between the initial wet sediment mass (Mwet) and the final mass after drying (Mdry), and then dividing by Mdry such that SM = (Mwet—Mdry) / Mdry. We also found the fraction of the sample that was water (i.e., the water fraction; WF), where by WF = (Mwet—Mdry) / Mwet.

Fourier transform ion cyclotron resonance mass spectrometry (FTICR-MS)

DOM chemistry was characterized following methods previously described in Garayburu-Caruso et al.39. Briefly, NPOC concentration in sediment extracts across samples was normalized to 1.5 mg C/L and acidified to pH 2 using 85% phosphoric acid. Further, samples were extracted with PPL cartridges (Bond Elut) following solid phase extraction protocols described by Dittmar et al.40. Because NPOC concentration was normalized prior to solid phase extraction, further normalization was not conducted following solid phase extraction. This has the advantage of not needing to fully dry down and re-dissolved the DOM, which may not all go back into solution but must be done to measure NPOC following solid phase extraction. However, our approach has the disadvantage of not being able to estimate extraction efficiencies. Extracts were analyzed using a 12 Tesla (12 T) Bruker SolariX Fourier transform ion cyclotron mass spectrometer (FTICR-MS; Bruker, SolariX, Billerica, MA, USA) located at the Environmental Molecular Sciences Laboratory in Richland, WA. Ultrahigh-resolution mass spectra of water-extractable sediment DOM was acquired in negative mode at a resolution of 322 K at 481.164 m/z using a standard electrospray ionization (ESI) source with the voltage of + 4.2 kV. The instrument was externally calibrated weekly to a mass accuracy of < 0.1 ppm; in addition, the instrument settings were optimized using a Suwannee River Fulvic Acid (SRFA) standard (Cat. No. 2S101F). Data were collected with ion accumulation times of 0.08 s, 0.1 s and 0.15 s from 150 m/z –1100 m/z at 4 M. One hundred forty-four scans were co-added for each sample and internally calibrated using a DOM homologous series separated by 14 Da (–CH2 groups). The mass measurement accuracy was typically within 1 ppm for singly charged ions across a broad m/z range (100 m/z–900 m/z). BrukerDaltonik Data Analysis (version 4.2) was used to convert raw spectra to a list of m/z values by applying the FTMS peak picker module with a signal-to-noise ratio (S/N) threshold set to 7 and absolute intensity threshold to the default value of 100. Formultitude41 was used to align samples (0.5 ppm threshold) and assign chemical formulas using a S/N > 7 and mass measurement error < 0.5 ppm. Formulas were only assigned taking into consideration the presence of C, H, O, N, S, and P and excluding other elements. Formultitude output was processed using the R package “ftmsRanalysis”42 which removes peaks outside of a high confidence m/z range (200 m/z–900 m/z) and/or with a 13C isotopic signature. It also calculates molecular formula properties and assigns chemical classes to peaks with molecular formula assigned using oxygen-to-carbon and hydrogen-to-carbon ratios. The outputs from “ftmsRanalysis” were used to calculate DOM characteristics that are essential for null model analysis (see next subsection). Peak intensities were changed to binary presence/absence. This helps reduce impacts of ion charge competition and variation in ionization efficiencies that cause unreliable quantification of absolute or relative peak abundances43,44. Charge competition and variation in ionization efficiencies can still influence whether a given molecule is observed, so changing to presence/absence data can reduce but not completely eliminate potential biases45.

Ecological null modeling

We performed null modeling analysis (Fig. S1) following methods described by Danczak et al.,31 and applied in Danczak et al.37. This method considers each of the FTICR-MS peaks that had a molecular formula assigned and yields an index that indicates the relative deviation from purely random chemical composition in complex DOM samples. A given sample is conceptualized as an assemblage of organic molecules, represented more directly as a collection of molecular formulas. The observed composition of each DOM assemblage is considered to have arisen through a mixture of deterministic and stochastic processes. We used null modeling to quantify the relative influences of deterministic and stochastic processes over the DOM assemblages. To do this, we calculated the β-Nearest Taxon Index (βNTI) for each pairwise assemblage comparison. The key difference of βNTI relative to other approaches (e.g., functional diversity measurements46,47,48) is that βNTI uniquely provides information on the processes leading to observed variation in DOM composition. That is, functional diversity is a measure of the variation in DOM chemistry, while βNTI is a measure of the processes leading to that variation. We refer the reader to Danczak et al.31 for example datasets and scripts to compute βNTI using FTICR-MS data.

To calculate βNTI requires a dendrogram based on how similar or different each molecular formula is relative to all other formulas. For this, we used one molecular characteristics dendrogram (MCD) per site, following methods described in and R code provided by Danczak et al.31. Only peaks with a molecular formula assigned were included in this analysis as the MCD requires information from molecular formulas. All molecular formulas were used to estimate a MCD for a given site. An alternative approach would require formulas to be observed across several or all samples at a site. We chose to not do this because it could artificially increase compositional similarity (e.g., requiring formulas to be observed across all samples would lead to perfect similarity). As in Danczak et al.31, the following properties for each formula were used to build the MCD: (1) the number of C, H, O, N, P, and S atoms and (2) molecular properties derived from molecular mass and elemental stoichiometry, including double-bond equivalent (DBE), modified aromaticity index (AIMod), and Kendrick defect (Kdef). DBE describes the degree of chemical unsaturation and gives insight into the number of potential double-bonds; AIMod estimates the degree of aromaticity; and Kdef provides information on homologous series43,49,50,51. The elemental composition and derived metrics were combined to generate an Euclidean distance matrix between chemical formulas using function ‘vegdist’ in the vegan package (v2.5–6)52. Euclidean distances were then used to perform a ‘Unweighted Pair Group Method with Arithmetic Mean’ (UPGMA) hierarchical cluster analysis using function ‘hclust’ with the ‘average’ method, in the base ‘stats’ package. The output of the UPGMA clustering was used as the MCD and two illustrative examples of the resulting dendrogram are provided in Fig. 2. This was done independently for each site such that one MCD dendrogram was generated for each of the 38 sites that had 10 samples. The intent of this approach, as opposed to analyzing all samples from all sites together, is to treat each river as an independent system. This is consistent with the overarching study design whereby sampled rivers spanned numerous watersheds that were not hydrologically connected to each other.

Fig. 2
figure 2

Two example molecular characteristics dendrograms (MCDs). Panels ‘a’ and ‘b’ are illustrative examples of MCDs. Each tip is one molecular formula. They are from sites 0009 (with 1038 formulas) and 0041 (with 4259 formulas), which had the lowest and highest site-level median βNTI values, respectively.

To estimate βNTI requires a measurement of sample-to-sample divergence based on the MCD as well as a null expectation for what that measurement should be under stochastic conditions. The measurement of divergence is the β-mean nearest taxon distance (βMNTD). After generating an MCD for each site we used the MCD to calculate βMNTD for each pair of DOM samples within a given site. βMNTD was measured using the ‘comdistnt’ function in the picante package53. We then used a randomization to generate a distribution of 999 null values of βMNTD for each sample pair. Each tip of the MCD is linked to one molecular formula and the randomization (i.e., null model) is based on randomly changing which formula goes to which tip (i.e., formula-tip linkages are scrambled across the dendrogram). After doing this randomization we re-calculated βMNTD. This represents one iteration of the null model (i.e., randomization). It has been shown that this type of randomization provides null values of βMNTD that are expected when composition is due primarily to stochastic processes34,35. Repeating the randomization 999 times provides 999 values of βMNTD and thus a distribution of βMNTD expected under stochastic assembly; this is the null distribution.

The null βMNTD distribution was compared to the true βMTND value for that same pair of DOM samples. The difference between true and null βMNTD values provides an estimate for the influence of stochastic and deterministic assembly processes over DOM chemistry. When true βMTND deviates significantly from the null distribution of βMNTD values, it indicates that deterministic processes are governing variation in DOM chemistry. The difference between true βMNTD and the null βMNTD distribution is the βNTI metric, which is measured in units of standard deviations (of the null distribution). A true βMNTD value that is 2 standard deviations away from the mean of the null distribution will result in a βNTI value of 2. When true βMNTD is more than 2 standard deviations from the mean of the null distribution, it indicates a statistically significant influence of deterministic processes (Fig. S1).

True βMNTD can be above or below the null distribution’s mean, whereby pairwise comparisons with |βNTI|> 2 correspond to deterministic processes and |βNTI|< 2 indicates stochastic processes34,35,54,55 Deterministic processes were further separated into two classes analogous to variable selection and homogenous selection, which are well-established concepts in community ecology. When βNTI > 2 the observed value of βMNTD is much larger than the value of βMNTD expected under stochastic assembly. This means the observed differences in DOM chemistry are significantly greater than would be expected under stochastic assembly. This is likely when two samples are from two different environments that each select for a distinct assemblage of organic molecules, thereby causing divergence in composition and large positive βNTI values. This is known as variable selection. When βNTI <  − 2 the differences in DOM chemistry are less than the stochastic expectation.34,35,54,55. This is likely when two samples are from two very similar environments that each select for a very similar assemblage of organic molecules, thereby causing convergence in composition and very negative βNTI values. This is known as homogeneous selection. For a given site, the fraction of βNTI values that are less than −2 indicates the fractional influence of homogeneous selection. Likewise, the fraction of βNTI values that are greater than + 2 indicates the fractional influence of variable selection, and the fraction of βNTI values between −2 and + 2 indicates the fractional influence of stochastic assembly.

The fractional influences of each assembly process are linked to the sampling design of any given study. In our case, within each site we sampled 10 locations within the parafluvial zone of a single stream reach. Results must be interpreted relative to that sampling design. The fractional influences of different assembly processes would likely change if we chose to compare DOM across sites (i.e., across stream reaches) and/or across different components of stream systems (e.g., comparing parafluvial DOM to surface water DOM). If making comparisons across a broader range of environmental conditions, we would expect larger values of βNTI and greater fractional influences of variable selection34,35. Conversely, if making comparisons across very well mixed systems with little variation in environmental conditions, we would expect greater fractional influences of either homogeneous selection or stochastic assembly34,35. As with any type of study and methodology, the details of sampling designs are linked to the questions being asked. For βNTI-based analyses, changes in environmental extent, spatial scales, and/or temporal scales change the questions and will likely change the quantitative values of βNTI.

Dendrograms were estimated using all molecular formulas across the 10 samples within each site, and βNTI was estimated within each site by comparing all 10 samples to each other (i.e., no between-site comparisons were examined). This provides 45 unique βNTI values per site as samples were not compared to themselves. In theory, a MCD could be estimated for each sample instead of for each site, but that would preclude estimation of βNTI because it requires multiple samples to be represented within a single MCD. Using a single MCD for each site allowed estimation of the median βNTI value for each site. This was based on comparisons of all 10 samples to each other. In turn, cross-site comparisons used these site-level median βNTI values.

Statistical analysis

Outside of the null modeling, all statistical analyses used p < 0.05 as the significance threshold. We used ordinary least squares regressions (function “lm” in the base ‘stats’ package) to evaluate relationships between site level median βNTI and site level median sediment moisture and site level median water fraction. Linear models were used because the relationships did not contain any obvious non-linearities and the response variable (i.e., βNTI) is not bounded (i.e., no asymptote is expected). However, we did observe a constraint-based relationship between site level median βNTI and site level median sediment moisture. To evaluate the statistical significance of the constraint boundary, we subdivided the site level median sediment moisture content into 10 even segments and found the maximum site level median βNTI in each of those bins. We then fit a negative linear function to the relationship between maximum site-level median βNTI and the corresponding site-level median sediment moisture. Base functions in R were used for these analyses and associated plotting.

Results and discussion

Our null models showed that the assembly of organic molecules into DOM assemblages was primarily, but not completely, governed by deterministic selection (Figs. 3, 4). Aggregating all βNTI values across sites revealed a unimodal distribution with a peak well above the upper significance threshold (i.e., > 2) (see purple line in Fig. 3). This is consistent with a primary influence of deterministic assembly, and more specifically ‘variable selection’34,35,54,55. Variable selection causes non-random divergence in DOM chemistry due to variation in environmental conditions. Deterministic, environmentally-imposed variable selection is, therefore, responsible for spatial variation in DOM chemistry at the reach-scale (~ 20-100 m in the upstream–downstream direction) in nearly all sampled rivers. We emphasize that this result is linked to the spatial scale (i.e., reach scale) and environmental extent (i.e., within parafluvial sediment) of our sampling. As detailed in the Methods section, if we increased or decreased the spatial scale or environmental extent, we would expect changes to the βNTI values. It is also important to recall that the null model is based on the molecular characteristic dendrogram (MCD), which is based on detailed molecular properties of all observed molecular formulas. This approach focuses on inferring relative influences of deterministic and stochastic processes by simultaneously considering numerous molecular characteristics; this work does not focus on variation in those characteristics per se, but rather the processes that influence them.

Fig. 3
figure 3

Within- and among-site distributions of DOM βNTI, showing cross-site variation in the influence of deterministic selection. Two example site-level βNTI distributions (blue histograms, left-hand axes, ‘within-site observations’) and the aggregate (i.e., among site) βNTI distribution across all pairwise comparisons (purple line, right-hand axes, ‘among site density’). Panels ‘a’ and ‘b’ are for sites with the lowest (site 0009) and highest (site 0041) median βNTI values, respectively. Corresponding MCDs are shown in Fig. 2. The aggregate distribution is the same in both panels and was fit as a Gaussian kernel density function. Vertical red dashed lines are at −2 and + 2, which are the significance thresholds for βNTI. Values <  − 2 or >  + 2 indicate deterministic assembly, while values between − 2 and + 2 indicate stochastic assembly. Variable selection and homogeneous selection are inferred when is βNTI >  + 2 or <  − 2, respectively. While there is significant variation across sites, the central tendency across all sites (purple line) is well above + 2, indicating variable selection as the primary driver of variation in DOM chemistry.

Fig. 4
figure 4

Histogram of site-level median βNTI values, showing that variable selection is commonly the dominant assembly process over DOM chemistry. Vertical red dashed lines are at − 2 and + 2, which are the significance thresholds for βNTI. Values < -2 or >  + 2 indicate deterministic assembly, while values between − 2 and + 2 indicate stochastic assembly. Values >  + 2 specifically indicate dominance of ‘variable selection’ whereby the environment deterministically drives divergence in organic matter chemistry among locations within a stream reach. The predominance of values >  + 2 indicate that DOM chemistry is governed by highly localized processes that diverge significantly across samples within most sampled reaches. Figure in 5a Danczak et al.31 shows a contrasting pattern with a βNTI distribution centered near 0 and with a range from approximately − 8 to + 13, and Fig. 6a in Danczak et al.37 also diverges with βNTI values exclusively above the + 2 significance threshold. Minimum and maximum values for median βNTI values for each study (Danczak et al.31,37) have been added to this figure for illustration purposes, however, βNTI is scale dependent and thus we caution direct quantitative comparisons across different datasets.

Variable selection contrasts with other assembly processes, including homogeneous selection and stochastic assembly. Homogeneous selection occurs when the environment causes compositional similarity, and is indicated when βNTI is less than -2. We observed a non-zero influence of homogeneous selection (Fig. 3a), but it had a negligible influence overall. This is indicated by the very small extension of the aggregate βNTI distribution below -2 (see purple line in Fig. 3). In Fig. 3 we also show within-site distributions of βNTI for the site with the weakest determinism (i.e., most stochastic) and the site with the strongest determinism (i.e., least stochastic). These results show that there is significant variation in the influence of within-site deterministic assembly, though there is a central tendency well above the βNTI significance threshold of 2. There is, therefore, usually deterministic divergence in DOM chemistry across locations within a given stream reach, likely driven by highly localized interactions between DOM and both biotic and abiotic processes in the hyporheic zone. One or more environmental factors do, however, lead to cross-site/reach variation in the degree of deterministic divergence in DOM chemistry.

Stochastic assembly is mediated by transport and/or DOM transformation or production events that are unstructured through space and/or time31,37,56,57. We observed some influence of stochasticity (Fig. 3). In two of the 38 sites, stochasticity was the primary assembly process (Figs. 4, S2). It remains, however, that variable selection was the primary assembly process across most sites. More specifically, the distribution of site-level median βNTI values is centered well above the upper significance threshold (Fig. 4).

The strong and relatively consistent influence of deterministic ‘variable selection’ indicates that within most sampled reaches there are highly localized processes that drive the chemistry of DOM associated with sediments58,59,60,61. Because our sample level data come from ~ 10 mL of sediment homogenized with far more water than they would contact in situ, we infer that processes imposing variable selection operate below the spatial scale of the sample (~ 10 mL). With available data we cannot unambiguously identify the specific molecular mechanisms driving variation in DOM chemistry at this scale. We hypothesize that deterministic variable selection may arise through a combination of mechanisms related to low hydrologic connectivity that decrease stochastic mixing, redox conditions that vary across sample volumes and select for or against certain DOM chemistries, and/or cross-sample variation in the type and quantity of mineral surfaces that may deterministically sorb and/or react with specific types of organic molecules. Our results in combination with previous work provides some speculative support for this hypothesis.

Two previous studies that applied ecological null modeling to DOM chemistry found that DOM from pore water was more deterministically assembled relative to adjacent stream/river water31,37. The most stochastically assembled DOM was from surface water of a large river, while DOM from a small first order stream had comparatively more influence of deterministic assembly. Surface water from a large river should have less contact with sediments than surface water from very small streams, and both will inherently have less contact with sediments than pore water. In addition, surface waters should be more mixed with more homogeneous redox conditions than pore water. This suggests that less mixing (i.e., lower hydrologic connectivity) in sediments could indirectly contribute to deterministic assembly, potentially via microsite variation in redox conditions and/or localized reactions with mineral surfaces, among others. Conversely, greater connectivity should be associated with a stronger influence of mixing-based stochastic assembly (and weaker deterministic assembly) that leads to more homogenization of DOM chemistry.

These inferences align closely with the conceptual model developed in Lynch et al.62, in which times of greater hydrologic connectivity were associated with DOM homogenization via mixing, which we conceptualize as a stochastic process. Similarly, Wagner et al.63 found DOM composition to be homogenized during high connectivity events, again due to mixing, consistent with the pulse-shunt concept in which periods of high flow rapidly move DOM downstream64. We posit that the link between hydrologic connectivity and the strength of determinism (or stochasticity) can be conceptualized via the ratio of reaction rate to flow rate, which is commonly formalized as the Damköhler number65. Systems with reaction rates that are fast relative to flow rates will likely be more deterministic than systems with slow reaction rates and fast flow rates, as proposed in Hu et al.57. Reaction rates that are fast, relative to flow rates, and paired with spatial variation in the processes influencing DOM chemistry should result specifically in deterministic variable selection57.

To gain additional insights into what might be driving cross-site variation in the strength of determinism we examined the statistical relationships of site-level βNTI with sediment moisture and sediment water fraction. We focus on these variables because all the processes and environmental conditions discussed above (e.g., hydrologic connectivity, mixing, redox, reaction rates, organo-mineral interactions) can be directly or indirectly influenced, in part, by the water content of sediments66,67. Here we use ‘sediment moisture’ to refer to water content normalized by sediment dry mass, which emphasizes how much water there is relative to sediments. We use ‘sediment water fraction’ to refer to water content normalized by sediment wet mass, which emphasizes the fraction of the sample that is water.

For the analysis we used cross-site variation in sediment moisture and water fraction at the time of field sampling (Fig. S3). Regressing site-level median βNTI against site-level median sediment moisture and water fraction revealed patterns dependent on how water content was represented. For sediment moisture, median βNTI decreased with increasing moisture, but the relationship was only significant when analyzed as an upper constraint boundary (Fig. 5a). In contrast, median βNTI showed no relationship with water fraction (Fig. 5b). The significant relationship with sediment moisture must be interpreted cautiously as there were relatively few samples at high moisture content. The relationship with sediment moisture suggests that the influence of stochastic processes, which push βNTI closer to zero, increases as there is an increase in the amount of water relative to the amount of sediment. This makes physical sense as high quantities of water relative to sediment will lead to more diffuse interactions between organic molecules and mineral surfaces68,69 thereby decreasing the influence of deterministic selection. In addition, there will be more capacity for hydrologic transport to cause random mixing of organic molecules36,62,63. This is akin to the ecological process of dispersal37, which is well known as a mechanism that increases stochastic assembly34,35. Weaker deterministic influences under conditions of high water content is also consistent with previous work that found stronger influences of stochasticity in surface water DOM than in pore water DOM31,37. Sediment moisture may, therefore, commonly influence the assembly processes influencing DOM. To further evaluate this inference, it will be useful to extend null modeling across a broader range of ecosystems (e.g., soils across an aridity gradient) and examine temporal development of DOM chemistry and stochastic/deterministic processes following re-wetting.

Fig. 5
figure 5

Site-level median βNTI as a function of site-level median sediment moisture (a) and sample water fraction (b). Sediment moisture is percentage-based grams of water per gram of dry sediment. Water fraction is percentage-based grams of water per gram of wet sediment. In both panels the horizontal axis was separated into 10 equal segments and the maximum βNTI value was found within each segment. The filled dots are the maximum values within each of the 10 bins. The hollow circles are values that fell below the maximum value within each bin. A linear regression was fit to the maximum values (filled dots) to evaluate whether βNTI was related to either sediment moisture or water fraction following an upper constraint boundary. The hollow circles can be thought of as falling below the constraint boundary, which is represented by the solid black regression model line. Regression statistics, based on the filled dots, are shown on each panel and the significant regression model is shown as a solid line.

The observation of an upper constraint boundary (Fig. 5a) indicates that sediment moisture is associated with an upper bound on the influence of deterministic variable selection. Sediment moisture explains most of the variation in deterministic influences along the constraint boundary (R= 0.74). This is consistent with previous observations of stronger variable selection in pore water DOM, relative to surface water DOM31,37 However, without considering the constraint boundary, sediment moisture and the moisture fraction explain almost none of the variation in deterministic influences; the moisture fraction explains 13% of the variation in determinism, but the regression was non-significant (Fig. 5b). This indicates that while moisture content plays a role, additional variables contribute significantly to cross-site variation in deterministic influences over DOM assemblages. It further suggests that stronger signals of determinism in pore water DOM, relative to surface water DOM, are not due simply to differences in mixing.

While we cannot yet identify what variable(s) cause most of the variation in determinism across sites, it is helpful to note that the vast majority of sites with relatively low moisture content fell below the upper constraint boundary (Fig. 5a). We specifically observed variable selection (i.e., non-random divergence in DOM chemistry within each stream reach), so there appears to be an upper bound–related to moisture–on how divergent the DOM chemistry is through space. This means that the DOM in most sites is less spatially differentiated than what the moisture level would allow for. The implication is that other factors are reducing the strength of variable selection. These other factors could be deterministic selection-based forces and/or be the result of stochastic spatial processes. For example, some level of homogeneous selection could be imposed if there is consistent mineralogy and/or microbial communities across sampling locations within a given stream reach. In this case, the processes selecting for specific kinds of organic molecules would be similar across different locations within a stream reach. Spatial homogeneity of selective processes can work against spatially variable selective processes, resulting in a system that is effectively stochastic. On the other hand, a signal of stochasticity can be due to the lack of selective processes. We cannot easily identify whether stochasticity is due to the lack of selective processes or selective processes acting against each other. However, many reaches/sites with relatively low moisture fell below the βNTI constraint boundary (Fig. 5a). Because moisture is low for these sites, hydrologic-driven mixing (i.e., dispersal) is likely low. This indicates that these sites fell below the constraint boundary due to homogeneous selection working against variable selection, not because of high levels of dispersal/mixing.

Despite variation in assembly processes, this study clearly shows that DOM assemblages in variably inundated sediments are primarily governed by deterministic (i.e., non-random) variable selection. This result is similar to previous work with pore water DOM and contrasts most clearly with a strong stochastic signal found in riverine DOM from a large river (i.e., with little contact between river water DOM and sediments)31. We hypothesize the existence of a continuum from stochastic to highly deterministic assembly of DOM moving from well-mixed surface water in large rivers to variably inundated/connected sediments to rarely/poorly connected riparian and upland soils (Fig. 6). This is a post-hoc conceptual model based on a combination of results from this study and results from previous studies, discussed above. The model goes beyond data and results of this study with the goal of stimulating follow-on work aimed at understanding cross-ecosystem gradients in stochastic and deterministic processes.

Fig. 6
figure 6

Conceptual hypothesis on dissolved organic matter (DOM) assembly processes. We hypothesize a continuum from stochastic to highly deterministic DOM chemistry moving across zones, from well-mixed surface water to variably inundated/connected parafluvial sediments to rarely/poorly connected riparian and upland soils. In this case, assembly is evaluated within each zone (e.g., within parafluvial sediments as was done in the current study). Across this continuum, we also hypothesize that the strength of deterministic assembly increases, in part, due to more intense organo-mineral interactions as the water content decreases. We further hypothesize even stronger signals of deterministic, variable selection when comparing DOM across the continuum (e.g., surface water DOM compared to upland soil DOM). We encourage evaluation of this conceptual model across diverse water-land gradients.

We suggest that future evaluation of our conceptual model (Fig. 6) is an opportunity to place variably inundated hyporheic zones21 in context of other watershed system features (e.g., upland hillslopes, river water)23. This is especially relevant given the predominance of non-perennial streams globally70 and ongoing changes to inundation regimes in perennial and non-perennial streams16. There are several ways to evaluate the conceptual model spanning modeling, field, and lab. For modeling we envision a forward-simulation reactive transport model71,72 that explicitly represents the assembly and dynamics of DOM assemblages. Such a model could be used to run numerical in silico experiments to evaluate the plausibility of our conceptual model, and offer a refined version as a model-generated hypothesis. In the field we envision sampling across spatial gradients represented in Fig. 6 and repeating that sampling across numerous regions with different climates, soil/sediment mineralogy, and land cover. Analyzing such a sample set using null modeling of FTICR-MS data would provide an observation-based evaluation. In the lab we envision manipulative experiments that start with a common substrate (e.g., homogenized sediments) and impose a continuum of dilutions (water to slurry to intact sediment) and variable inundation regimes (no fluctuation to regular wet/dry cycling to a long return internal of inundation). Such an experiment would provide a more direct evaluation, with the important caveat of being under simplified laboratory settings that may not translate to the field. Each of these approaches has strengths and weaknesses and the ideal scenario would merge them for a robust evaluation of the conceptual model.

We emphasize that application of null models to DOM is nascent and there are numerous needs and opportunities for further analyses. It would be helpful, for example, to evaluate how the βNTI values relate to more traditional metrics derived from FTICR-MS data, such as the nominal oxidation state of carbon, the aromaticity index, double bond equivalent, and many others. It may also be insightful to include a common reference point such as SRFA and/or Carboxyl-rich alicyclic molecules, similar to Danczak et al.47. Another direction is analyzing nested subsets based on shared environmental characteristics. Using the current dataset one could analyze all pairwise comparisons across all sites and then extract subsets based on features such as stream size and/or ecoregion. More generally, expanding the use of null modeling will help generate transferable understanding of how the relative influences of stochasticity and determinism are linked to environmental conditions. This can be helpful to inform mechanistic models that link variation in DOM chemistry to biogeochemical rates33,73. In such models, conditions leading to more deterministic DOM assemblages can likely be modeled with a smaller set of DOM molecules, while conditions leading to stochastic assembly may need to account for a much broader and less predictable suite of DOM molecules. The approach has also been extended to the level of individual molecular formulas, whereby it has become possible to identify which types of molecular chemistry are deterministically organized and which are stochastic56. This allows direct connection to microbial sequence data that can be analyzed through the same null modeling methods, with the potential to link deterministic components of microbiomes to deterministic components of DOM chemistry. At both the whole-assemblage level and at the individual-formula level, variation in deterministic influences that remains unexplained is a strong indicator of important, yet unmeasured, environmental features35. We’ve only just begun exploring how ecological assembly processes influence DOM and we encourage an expansion of analysis approaches and further connections to additional concepts.