Background & Summary

Recently, there has been a widespread development of large-sample hydrology (LSH) datasets worldwide1,2,3,4,5,6,7,8,9,10. Many of these datasets were inspired by the pioneering Catchment Attributes and MEteorology for Large-sample Studies (CAMELS) dataset, which provided a comprehensive LSH dataset for the contiguous United States6. Such datasets typically include hydro-climatic variables—such as streamflow, meteorological forcing data, and catchment properties (e.g., land use and soil types)—covering numerous catchments over extended time periods.

Yet, numerous studies have highlighted the importance of integrating long-term hydro-climatic and catchment properties with stream water quality data to derive critical insights into solute transport processes11,12. Hence, datasets that combine long-term, reliable water quality variables with other hydro-climatic and catchment properties are essential for investigating flow pathways and residence times, with practical applications in reducing pollutant loads and improving water resource management under pressures from population growth, land use intensification and climate change.

Recently, CAMELS-Chem11 was released as an openly accessible dataset for the contiguous United States—the first augmentation of a CAMELS dataset that incorporates water quality data. However, similar initiatives remain limited and freely accessible water quality parameters remain scarce in published datasets. This is primarily due to the challenges associated with measuring and providing access to such data, compared to hydro-climatic and catchment variables that are typically easier both to measure and to obtain.

Here, we introduce CAMELS-CH-Chem, an extension of the existing CAMELS-CH5 dataset. While CAMELS-CH provides hydrometeorological and streamflow data across Switzerland, CAMELS-CH-Chem builds on this by integrating up to 40 stream water quality parameters, isotopes and catchment-aggregated data on atmospheric deposition, land cover, and agriculture for 115 of the original CAMELS-CH catchments. With CAMELS-CH-Chem, we aim to make an important contribution to the field of LSH by introducing the first CAMELS extension in Europe to include water quality data, enabling data-driven modeling of water quality at larger scales.

Our intention in providing catchment-aggregated variables is to enable comprehensive, multi-scale analyses by offering a harmonized dataset ready for integration into LSH and hydro chemical modelling frameworks. Catchment aggregated data of agricultural practices and atmospheric deposition serve as critical explanatory variables for interpreting spatial and temporal variability in water quality and biogeochemical parameters. Similarly, the inclusion of precipitation and stream water isotope time series enables valuable insights into hydrological flow paths, water source contributions, and catchment-scale transit times.

Although some of the original data, dating back to 1970, is available upon request from providers such as FOEN13, CAMELS-CH-Chem specifically provides a comprehensive dataset from 1981 to 2020 to maximize the overlap between different water quality sources and the complementary CAMELS-CH. This approach aligns with the primary objective of LSH datasets: to provide long-term, standardized variables across large regions14.

The provided data are divided into three main categories:

  1. (i)

    Stream water chemistry: This includes time series of more than 30 stream water chemistry constituents, covering both field and laboratory measurements. We provide hourly and daily data on water temperature, dissolved oxygen, pH, and electrical conductivity, along with (bi)monthly measurements of dissolved organic carbon (DOC), total organic carbon (TOC), alkalinity (\({{\rm{HCO}}}_{3}^{-}\)), ammonium (\({{\rm{N}}{\rm{H}}}_{4}^{+}\)), nitrate (\({{\rm{NO}}}_{3}^{-}\)), nitrite (\({{\rm{NO}}}_{2}^{-}\)), total nitrogen (TN), dissolved reactive phosphorus (DRP), total phosphorus (TP), total filtered phosphorus (TFP), calcium (Ca²+), magnesium (Mg²+), sodium (Na+), potassium (K+), chloride (Cl), sulphate (\({{\rm{SO}}}_{4}^{2-}\)), silicic acid (H4SiO4), and total hardness. The choice of hourly and daily resolution aligns with the temporal structure of the CAMELS-CH dataset, while the (bi)monthly and other coarser intervals reflect the original sampling frequency provided in the respective data sources.

  2. (ii)

    Stream water isotopes: This includes (bi)monthly time series of stream water isotope data of deuterium (²H) and oxygen-18 (¹⁸O).

  3. (iii)

    Catchment aggregated data: This provides annual time series of atmospheric deposition concentrations for nitrate (\({{\rm{NO}}}_{3}^{-}\)), ammonium (\({{\rm{N}}{\rm{H}}}_{4}^{+}\)), ammonia (NH3), nitrite (\({{\rm{N}}{\rm{O}}}_{2}^{-}\)), and total inorganic nitrogen for 115 catchments, alongside land cover including specific information on crop type distributions and livestock density data. Finally, we also provide monthly time series of catchment aggregated precipitation isotope data to improve isotope assessments.

The manuscript is structured as follows: the Methods section describes the original data sources, and the methodology used for compiling it into the CAMELS-CH-Chem dataset. The Data Records section describes the structure of the CAMELS-CH-Chem dataset. Finally, the Technical Validation section provides a first order validation of the CAMELS-CH-Chem dataset based on selected hypotheses.

Methods

Stream water chemistry data

Water chemistry data were collected within the framework of national monitoring programs. The high-frequency water temperature data were obtained from the surface water temperature monitoring network of the Swiss Federal Office for the Environment (FOEN, in German BAFU). The water quality data were collected within the National Surface Water Quality Monitoring Programme (NAWA). Within NAWA FRACHT (previously called NADUF), the high-frequency parameters and pollutant loads are monitored in about 15 selected catchments in collaboration with the FOEN, the Swiss Federal Institute of Aquatic Science and Technology (Eawag) and the Swiss Federal Institute for Forest, Snow and Landscape Research (WSL). In the NAWA TREND, the surface water quality is monitored in more than 100 catchments in cooperation with the FOEN and the cantonal authorities. The three sources encompass both unique and redundant information for a few variables, whereby high-frequency data comprises the most complete daily and hourly time series.

Figure 1 shows the distribution of the stream water chemistry measurement stations across Switzerland with their respective catchment boundaries in the background. The colors of each dot represent the maximum number of chemical variables observed at a given location. Note that some locations of the three different data sources overlap, consequently we provide data for 115 unique locations. Figure 1 also illustrates that these catchments are well distributed across Switzerland capturing the complex topography (Fig. 1d) and climatology of the country. In total, 86 locations have high-frequency measurement data available, 24 have water sampling data from NAWA FRACHT and 76 from NAWA TREND.

Fig. 1
figure 1

Spatial distribution of the measurement locations with data available in CAMELS-CH-Chem encompassing (a) high-frequency, (b) NAWA FRACHT and (c) NAWA TREND data. The dots in each subplot have different colors representing the number of available parameters at each location. Note that for the same location, we might provide data from high-frequency measurements, NAWA FRACHT and NAWA TREND. The upstream catchment area of each station is displayed in background. Additionally, subplot (d) shows a map of the elevation, with the boundaries of Switzerland and the catchment areas of each station.

In the following section, we describe the main data sources. A detailed description of the measurements and information on data acquisition and processing, such as sensor types, accuracy, and methods used, are available in Supplementary Material, i.e., Table S1 for the high-frequency data, Table S2 for NAWA FRACHT, and Table S3 for NAWA TREND.

High-frequency (sensor) data

The FOEN13 and NAWA FRACHT15 programme provided hourly and daily time series of stream water temperature, pH, electric conductivity, and oxygen concentration from 1981 to 2020 for the 86 stations shown in Fig. 1a. It is important to note that most of the 86 stations have temperature only. These data are measured with online sensors (telemetry). Here we refer to them as “high-frequency data.” A further overview of these four variables regarding the dataset-specific information (name and description), units and resolution is shown in Table 1.

Table 1 Overview of the stream water chemistry data obtained from the high-frequency data provided by FOEN and the NAWA FRACHT programme.

Nawa fracht

Further water chemistry data were obtained from the NAWA FRACHT programme15 (Fig. 1b). The dataset provides 38 variables obtained from either installed online sensors (six variables with “_sensor” in their names) or measured in the laboratory (remaining variables) (Table 2). These data have a measurement resolution of 7 to 14 days collected between 1982 and 2020, whereby time series provide the mean of measurements between date_start and date_end. Note that the NAWA FRACHT program also has overlapping measurement locations with the high-frequency data (described above) and NAWA TREND (described below). An overview of the 38 measured variables, i.e., dataset-specific information, units, and resolution, is shown in Table 2.

Table 2 Overview of the stream water chemistry variables obtained from the National River Monitoring and Survey Programme (NAWA FRACHT, previously called NADUF).

Nawa trend

Water chemistry data are also provided from the NAWA TREND programme16 (Fig. 1c). This dataset provides 22 variables (Table 3), measured from grab samples covering 2011 through 2020 at monthly resolution. Thus, the time series represents the measurement taken at the respective date. An overview of the 22 variables, including dataset-specific information, units, and resolution, is shown in Table 3.

Table 3 Overview of the stream water chemistry variables obtained from the National Surface Water Quality Monitoring Programme (NAWA TREND).

Stream water isotopes

We also provide stream water measurements of deuterium and oxygen-18 data with a resolution from 14 days to monthly for all locations where such data are available across Switzerland from the ISOT module17 (nine stations) of the National Groundwater Monitoring (NAQUA) and from the CH-IRP dataset18 (11 stations). The spatial distribution of these stations is shown in Fig. 2.

Fig. 2
figure 2

Spatial distribution of the measurement locations with stream water isotope data available in CAMELS-CH-Chem encompassing (a) ISOT data and (b) CH-IRP. Their respective catchment delineations are shown in background for both datasets. Moreover, for CH-IRP, it is worth noting that stations Biberbrugg (2604) and Einsiedeln (2609), both depicted as red dots, are located very close but in two different rivers (Biber and Alp). Therefore, due to scale reasons this figure gives the impression of having only 10 stations.

ISOT module

The ISOT module provides isotope measurement time series covering the period from 1992 through 2022. For two stations (i.e., Thun: ID 2030 & Brienzwiler: ID 2019), the measurements are derived from grab samples taken at the respective date. For the remaining stations, the values represent mixed samples aggregated between the start and end date. Information on the data is summarized in (Table 4). Figure 2a shows the distribution of these nine measurement locations.

Table 4 Overview of the ISOT isotope data available in CAMELS-CH-Chem.

CH-IRP dataset

Apart from the nine stations monitored by the ISOT program (FOEN), here we also provide data for 11 stations monitored and made available through the CH-IRP dataset and project (Staudinger et al.18), covering the period from 2010 through 2020. The CH-IRP original dataset covers a total of 22 medium-sized alpine and pre-alpine Swiss catchments. It is worth noting that we deliberately provide data only for the 11 stations overlapping with the original CAMELS-CH stations. For a full description of the CH-IRP dataset, users should refer to their original publication (Staudinger et al.18). Information on the data is summarized in (Table 5), while Fig. 2b shows the distribution of these monitoring stations.

Table 5 Overview of the CH-IRP isotope data available in CAMELS-CH-Chem.

Catchment aggregated data

Complementing the stream water chemistry and isotope data, CAMELS-CH-Chem also provides five types of catchments aggregated data: yearly time series of i) atmospheric deposition, ii) land cover percentage, iii) crop types and iv) livestock unit data, along with (v) monthly time series of precipitation isotopes data. Figure 3 shows the locations of the 115 catchments across Switzerland where these aggregated data are available. Note that these catchments correspond to those with stream water chemistry data.

Fig. 3
figure 3

Spatial distribution of the 115 catchments used to derive the catchment aggregated data provided in CAMELS-CH-Chem. Each of their respective gauging stations is shown as black crosses. In the background, the four major basins are shown, along with their main river networks and major lakes.

It is important to note that, except for land cover data, this aggregated information was derived from data sources covering solely Switzerland. Since 23 of the 115 catchments have a part located outside Switzerland, users should be careful when dealing with the data. To provide an initial filter for users, the variable area_perc_swiss, which will be further described in the Gauge metadata section (Table 11), provides the percentage of the catchment located in Switzerland.

Atmospheric deposition

CAMELS-CH-Chem also provides time series of annual atmospheric deposition of \({{\rm{N}}{\rm{O}}}_{3}^{-}\), \({{\rm{N}}{\rm{H}}}_{4}^{+}\), NH3, \({{\rm{N}}{\rm{O}}}_{2}^{-}\), HNO3, and total inorganic nitrogen aggregated from gridded data provided by Rihm and Künzle19. Specifically, the gridded data of individual nitrogen components was based on i) emission inventories and statistical dispersion models (NH3 and \({{\rm{N}}{\rm{O}}}_{2}^{-}\)), ii) monitoring data and spatial interpolation methods (HNO3, wet deposition of \({{\rm{N}}{\rm{H}}}_{3}\) and \({{\rm{N}}{\rm{H}}}_{4}^{+}\)), and iii) monitoring data, inferential models, and spatial interpolation (dry deposition of \({{\rm{NO}}}_{3}^{-}\) and \({{\rm{NH}}}_{4}^{-}\)). Finally, the gridded data of total atmospheric nitrogen deposition are based on the combination of the above-mentioned components.

Further details on the methods used to model and spatially aggregate nitrogen deposition are available in Rihm and Achermann20 and Rihm and Künzle19. The original gridded nitrogen deposition dataset is available at a 1 km × 1 km resolution in 5-year intervals starting in 1990. Here, we provide catchment averages, which were calculated using the area-weighted mean of all map pixels inside a catchment. Table 6 provides an overview of this dataset. It is worth noting that since this dataset exists only for the years 1990, 2000, 2005, 2010, 2015 and 2020, we applied a linear interpolation to fill the years in between.

Table 6 Overview of the atmospheric deposition data available in CAMELS-CH-Chem.

Landcover data

We provide land use data for the 115 catchments included in CAMELS-CH-Chem, recomputed following the same procedure as in CAMELS-CH5, and also using the CORINE Land Cover (CLC) dataset21. As CLC data are available for the years 2000, 2006, 2012 and 2018, we applied linear interpolation to fill the years in between and repeated the value from 2018 for the two remaining years. The data were divided into 12 classes: agriculture, forest (coniferous, deciduous, and mixed), grass and herb vegetation, scrub vegetation, wetlands, ice and perpetual snow, inland water surface, rock (loose and solid), settlements/urban and unknown/blank. Table 7 summarizes this information. Users may refer to CAMELS-CH5 and the official CORINE21 publication for further details.

Table 7 Overview of the land cover data provided in CAMELS-CH-Chem.

The following two sections provide additional detail on agricultural practices within the “agriculture” land cover class described here. Specifically, the crop types and livestock density data represent subsets of this broader category, offering more granular insight into how agricultural land is used within each catchment.

Crop type data

We also estimated the area within the 115 catchments covered by certain crops and provided these data annually from 1980 to 2020. The following 10 crop types were considered: cereals, maize, sugar beet, potatoes, rapeseed, pulses, vegetables, total arable land (=sum of all crops), as well as grapevines and orchards. We utilized the Swiss census of agricultural farms, provided by the Swiss Federal Statistical Office (FSO22), for the annual statistics of all crops in Switzerland.

Until 2019, yearly crop statistics were recorded only at the municipal level, meaning the precise location of crops within each municipality was unknown. Hence, to improve spatial localization, we distributed the statistical crop data by aggregating such yearly values across the land use class 41 (period 2004-09, and standard nomenclature NOAS04) obtained from the Arealstatistik Schweiz dataset23 for the arable land; and the classes “grapevine” and “orchard” from the Topographic Landscape Model (TLM) from the Swisstopo dataset24. This step resulted in the total crop data being divided into 10 different classes for each Swiss municipality.

In the end, we aggregated the municipality data per catchment and estimated the area of each crop type for each of the 115 catchments. Each catchment has, therefore, a yearly time series for each of the 10 crop classes (Table 8). It is important to note that the data before 1996 was provided at 5-year intervals (1980, 1985, 1990 and 1996), and after 1996, at yearly time-steps until 2019. Therefore, we applied a linear interpolation between the 5-year data in the 1980–1996 period and repeated the values from 2019 for the last year.

Table 8 Overview of the crop type data available in CAMELS-CH-Chem, along with their respective temporal resolution and source.

It is worth noting that the crop_perc attribute from the CORINE land cover dataset represents a broader coverage of agricultural land, including areas not covered by the specific crop types described here. Therefore, the sum of the individual crop type areas will typically be smaller than the corresponding agricultural land cover class from CORINE.

Livestock unit data

The term livestock unit (here referred to as GVE, from the German word Grossvieheinheiten) is a reference unit that facilitates the aggregation of livestock across different species and age groups based on a standardized convention from Eurostat25. Here, we used yearly livestock unit data from the FSO23, covering the years 1980 to 2020. The original data were recorded at the municipal level, meaning the exact location of livestock within a municipality could not be determined.

Therefore, to improve spatial localization, we distributed the livestock data across the land use categories alpine and jura pastures, natural meadows, and farm pastures within each municipality. This was done by using land use classifications from the Arealstatistik Schweiz dataset26, allowing us to estimate livestock density (livestock units per hectare) for different land use types, including natural meadows, pastures, and Alpine and Jura pastures.

We distinguished between two types of areas:

  1. 1.

    Alpine and Jura Pastures: It is estimated that 20% of the total Swiss livestock population spends three months annually on these pastures. To calculate livestock density in these areas, we multiplied the total Swiss livestock units by 0.05 (20% × 1/4 year). This value was then divided by the total area of Alpine and Jura pastures, resulting in a uniform livestock unit per hectare for all such pastures.

  2. 2.

    Natural Meadows and Farm Pastures: We used land use categories 15 (natural meadows) and 16 (home pastures) from the Swiss land use statistics23. Each area was assigned a weighting factor of 1 and multiplied by the total livestock unit of the respective municipality. The resulting value was then multiplied by 0.95 (i.e., 1-0.05) and divided by the total area of natural meadows and home pastures within the municipality, yielding a municipality-specific livestock unit per hectare.

Finally, we aggregated the livestock data for the 115 catchments. An overview of the final livestock unit data is presented in Table 9. It is important to note that similarly to the crop-types data, livestock unit data before 1996 were provided at 5-year intervals (1980, 1985, 1990 and 1996) and after 1996, at yearly time-step until 2020. Therefore, we applied a linear interpolation between the years 1980 and 1996.

Table 9 Overview of the livestock unit data (GVE) available in CAMELS-CH-Chem, alongside their description, units, temporal resolution, and sources.

Precipitation isotopes data

Stable isotopes of oxygen (18O) and deuterium (2H) in precipitation and in stream water serve as natural tracers of hydrological processes. Hence, besides the stream water isotopes time series (previously described), we provide monthly catchment-aggregated precipitation isotopes for the 115 catchments from 2007 to 2020.

In Switzerland, stable isotopes of oxygen in precipitation are monitored through the ISOT17 observation network, which is part of the NAQUA National Groundwater Monitoring Programme. Here, we used monthly precipitation isotope values from the ISOT network27, which were originally spatially interpolated into gridded isotope maps (“isoscapes”) using a regression-kriging approach28.

According to the isoscapes publication27, this interpolation method involves a multiple linear regression model relating isotope values to a set of geographic and climatic variables, including elevation, coordinates, and monthly precipitation totals. The spatially correlated residuals from this regression are then interpolated using ordinary kriging to account for local deviations not explained by the predictors. It is worth noting that the elevation dataset used originally had a resolution of 25 m (Swisstopo DHM2529) and was resampled to 500 m to match the final resolution of the isoscapes, serving as a compromise to align with the coarser resolution of the other input variables.

Catchment delineation

We used the catchment boundary shapefiles from the CAMELS-CH dataset to calculate catchment aggregated data (e.g., atmospheric deposition and agricultural data). Catchment outlets in CAMELS-CH are defined based on the discharge gauging location. However, some chemical measurement locations in the NAWA FRACHT and NAWA TREND datasets are slightly different from the CAMELS-CH streamflow gauging stations. For these cases, we adjusted the CAMELS-CH catchment areas using the new outlet information. Information regarding these shifts is provided in the gauge metadata (Table 11) with details regarding the distance between streamflow and the water chemistry measurement locations. For the respective catchments, we also provide the adjusted shapefile delineation for users to decide whether to use the original CAMELS-CH or the adjusted CAMELS-CH-Chem catchment boundaries. All the remaining catchment aggregated data were derived exclusively using the catchment boundaries provided by CAMELS-CH.

Data Records

The current version of the CAMELS-CH-Chem dataset (v1.0)30 is stored in a Zenodo repository at https://doi.org/10.5281/zenodo.16158375. The repository is organized into the following (sub)folders:

  • catchment_aggregated_data: contains five subfolders. Each contains one csv file per catchment, with 115 files in total. The files are organized by time series (rows) and attribute variables (columns).

    • agricultural_data: contains one csv file per catchment with the variables described in Table 8.

    • atmospheric_deposition: like the previous, but with the variables described in Table 6.

    • landcover_data: like the previous, but with the variables described in Table 7.

    • livestock_data: like the previous, but with the variables described in Table 9.

    • rain_water_isotopes: like the previous, but with the variables described in Table 10.

      Table 10 Overview of the isoscapes precipitation isotope data available in CAMELS-CH-Chem.
  • shapefiles: contains three subfolders.

    • camels_ch_del: contains two shapefiles. One shapefile marks the location of the gauge stations (as will be described in Table 11), and the other includes the derived catchment boundaries associated with each gauge (Table 12). Both files are referenced in the Swiss coordinate system LV95 (sometimes also referred to as CH1903+) and were copied from the original CAMELS-CH.

      Table 11 Overview of the gauges metadata structure with their respective variables name, description, and units.
      Table 12 Catchment delineation metadata structure with their respective variable name, description, and units.
    • nawa_fracht_del: provides the alternative delineation shapefile for the NAWA FRACHT catchments.

    • nawa_trend_del: like the previous one, but for the NAWA TREND catchments.

  • gauges_metadata: contains one csv file covering all the metadata associated with each of the 115 gauging stations, as will be described in Table 11.

  • stream_water_chemistry: contains two subfolders.

    • timeseries: contains two nested sub-sub folders. The csv files in both are organized by time series (rows) and attribute variables (columns), and each column represents one of the four water quality variables as described in Table 1. Both nested subfolders contain 86 files.

      • daily: contains one csv file per catchment at daily resolution.

      • hourly: contains one csv file per catchment at hourly resolution.

    • interval_samples: contains two nested sub-subfolders.

      • nawa_fracht: contains one csv file per catchment covered (24 files). The rows represent the dates, and each column represents one of the water quality variables, as described in Table 2.

      • nawa_trend: contains one csv file per catchment covered (76 files) and presents a similar structure as the previous one, but now with each column covering one of the variables in Table 3.

  • stream_water_isotopes: contains two subfolders. Each contains one csv file per catchment with any isotope data. The rows represent the dates, and each column represents either deuterium or oxygen-18 data.

    • isot: contains one csv file per catchment covered (nine files), as described in Table 4.

    • ch-irp: contains one csv file per catchment covered (11 files), as described in Table 5.

  • additional_data_from_studies: contains a non-exhaustive overview of recent published studies performed in Switzerland18,31,32,33,34,35,36,37,38,39,40,41,42, where water quality and isotope measurements were monitored in catchments nested within the CAMELS-CH-Chem catchments. The data cover different periods of time, have different time resolutions, and occasionally provide also data from groundwater or snow samples. In the table reported in the csv file we recapitulate the available data, their resolution and the IDs within which the nested catchments are located. When data were not published, they were provided by the authors31,32, and are provided in the folder. For a complete description of the data, users are invited to refer to the corresponding studies.

Gauge metadata

The gauge metadata file contains the basic information to allow a proper use of the dataset. Many attributes are a repetition of those provided by CAMELS-CH. Note that the coordinate information on northing and easting is always provided in the Swiss reference system LV95, while the gauge_lon and gauge_lat are provided in WGS84. Additionally, due to the potential location difference between the measurement point of the CAMELS-CH streamflow gauge and both NAWA FRACHT and NAWA TREND, further fields were added to ensure consistency when using the data.

The attributes gauge_name_{}, gauge_easting_{}, gauge_northing_{} and area_{} refer to specific information from either NAWA FRACHT or NAWA TREND when applicable. The attribute area_swiss_perc represents the percentage of the upstream catchment area located in Switzerland and might be useful for users when using the catchment aggregated data.

The field foen_{}_dist represents the distance in kilometres between the CAMELS-CH streamflow gauge and the NAWA FRACHT or NAWA TREND measurement points (when applicable). Additionally, we also added a correction factor (q_nawat_corrector) for the NAWA TREND measurement points, which can be used to correct the streamflow discharge (as provided in CAMELS-CH) to the new catchment area when using the chemistry data. Finally, the field remarks summarize additional potential information about the gauges that should be considered before using the data.

Catchment delineations metadata

The delineated geometry of each catchment is stored in the catchment layer. This layer includes the gauge_id field, which is also used for the gauges, allowing for a link between the two datasets. Additionally, the catchment layer also includes the information shown in Table 12. These information ensure consistency between the catchment and gauge datasets, facilitating seamless integration and analysis.

Technical Validation

Calibration of the sensor, NAWA FRACHT and NAWA TREND data

The devices used to measure the variables available from sensor data are calibrated twice per year. If there are significant deviation from manual measurements, they are corrected accordingly against grab samples. For NAWA FRACHT, calibration of physico-chemical sensors at stations is performed monthly. More information about instruments accuracy and methods are available for both datasets in Tables S1, S2.

Regarding NAWA TREND, the data are measured, processed, and validated by either their respective cantonal authorities or their contracted laboratories. Although, we do not provide further details regarding the specific instruments, methodologies, or protocols used by the individual cantons or laboratory team, all the data is expected to be collected in accordance with the Swiss Modular Stepwise Procedure43 and analyzed using the methods recommended by the Swiss Lab’Eaux44.

Water chemistry measurements first “sanity check”

We provided a first assessment of the validity of some of the measured variables. Based on previous literature, we formulated three main hypotheses on the expected variable correlations among themselves. We then tested these hypotheses to determine whether the observed water chemistry measurements are consistent with expectations. Here we computed the correlations using the Spearman correlation coefficient (rs). Our hypotheses are as follows:

  1. (i)

    Stream water EC should be broadly negatively correlated with mean discharge45,46.

  2. (ii)

    Conductivity measures the ability of the stream water to conduct electricity, which is directly correlated to the amount of dissolved ions47,48. Therefore, EC should be positively correlated to the measurements of major anions, such as Cl and \({{\rm{NO}}}_{3}^{-}\).

  3. (iii)

    Increasing temperature decreases the solubility of oxygen in water, moreover, the increase in water temperature leads to an increase in biological activity, which can consequently reduce the concentrations of dissolved oxygen in the stream water49,50. Hence, stream water temperature and oxygen concentration should be negatively correlated.

Therefore, we selected the variables: ec25_lab, Cl, NO3_N and q_mean_sensor from NAWA TREND; ec20_lab, Cl, NO3_N and q_mean_sensor from NAWA FRACHT; and temp_sensor, and O2C_sensor from the high-frequency measurements data to perform our validation.

Figure 4 shows the histograms of the distributions of the Spearman correlation coefficients (rs) computed between electrical conductivity and either mean discharge (a), Cl (b) and NO3_N (c) for NAWA TREND and NAWA FRACHT data. Overall, the correlations between electrical conductivity and mean discharge in Fig. 4a were largely negative, with values of rs close to −0.50 for the three data sources. NAWA FRACHT only had one station out of 24 with a positive value, while there were 7 out of 76 for NAWA TREND. These findings are aligned with our hypothesis (i).

Fig. 4
figure 4

Histograms of the Spearman correlation coefficient between (a) EClab and Qmean, (b) EClab and Cl, and (c) EClab and \({{\rm{NO}}}_{3}^{-}\). The different colors in the subplots represent different data sources, i.e., NAWA FRACHT in orange and NAWA TREND in blue.

Moreover, Fig. 4b shows histograms for the correlations between electrical conductivity and Cl, while Fig. 4c shows the histograms for the correlations between electrical conductivity and \({{\rm{NO}}}_{3}^{-}\). Both subplots show that most of the stations exhibit a correlation above 0.50, which supports our expectation from hypothesis (ii).

Figure 5 shows the correlations between temperatures and oxygen concentrations. Figure 5a shows the histogram of the rs computed for the daily time series of oxygen and temperature for each of the 16 stations with high-frequency measurement data for these two variables (Table 1). All correlations were negative, with only one station with rs > −0.70. Figure 5b shows an example of a daily resolution time series of these two variables for the Mellingen gauge (2018) between 01.10.2019 and 30.09.2020. The figure indicates the expected pattern for the two variables, with oxygen concentrations increasing during the colder months and decreasing with rising temperatures during the summer period. Hence, these results corroborate our hypothesis (iii), which suggested a negative correlation between these two variables.

Fig. 5
figure 5

(a) Histograms of the Spearman correlation coefficient between the time series of oxygen concentration and temperature for all 16 stations covered by the high-frequency data. (b) Daily time series of oxygen concentration and temperature between 01.10.2019 and 30.09.2020 for gauge Mellingen (2018) used as an example of the typical annual course of the two variables.

Overall, the rough confirmation of the three hypotheses stated in this section can be used as broad indication of the reliability of the current water chemistry datasets provided in CAMELS-CH-Chem. We acknowledge that this section does not contain a complete validation of the dataset, yet we believe that it is sufficient as a first sanity check of the overall validity of CAMELS-CH-Chem data.

Ionic mass balance for NAWA FRACHT stations

In this section we present the mean, and the 25th, 50th and 75th percentiles of ionic mass balance for all 24 NAWA FRACHT stations, based on the computation of the ionic mass balance for each measurement. Detailed information about the respective ionic balance of each station is available in Table S4 (Supplementary Material). Figure 6 shows the spatial distribution of these values.

Fig. 6
figure 6

Distribution of the long-term (a) mean, (b) 25th percentile, (c) 50th percentile, and (d) 75th percentile ionic mass balance for the NAWA FRACHT stations.

About 70% of the stations presented a mean ionic balance below 1.5%, and a maximum 75th of 4.3%. Additionally, only two stations presented a maximum ionic balance above 10% (2143: 14.8%; and 2415: 11%), although both their 75th were below 4.3%, which indicates that such cases correspond to outliers in the stations (Table S4). Established literature51 suggests that values below 5% ionic imbalance are typically deemed acceptable, while discrepancies exceeding 10% may suggest anomalies in measurement or incomplete data. Hence, the present analysis can be used as a validation for most of the NAWA FRACHT measurements.

Distance between measurement stations

Regarding the distance between the streamflow measurement gauges and the NAWA FRACHT (foen_nawaf_dist), 16% of the stations (4 out of 24) were more than 5 km away from the respective CAMELS-CH outlet. The maximum distance is 10 km for station 2068, located in the very south of Switzerland (on the Ticino River), with an overall catchment area of 1,613.3 km2. Furthermore, for NAWA TREND (foen_nawat_dist) 14% of the stations (10 out of 72) were more than 5 km away from the CAMELS-CH outlet. Only two gauges had a distance greater than 10 km between the gauge and the sampling location. The station with the maximum distance (20 km) is 2288, located at the Rhine River, and with an overall catchment area of more than 11,000 km2.

Although the results suggest that for most of the stations, the distance between the discharge station and the water chemistry measurement locations of either NAWA FRACHT or NAWA TREND tend to be low, users can refer to the respective variable indicating this distance when deciding whether or not to use the station in their analysis with the CAMELS-CH data. Finally, for stations where the derived catchment area is considerably different, we suggest users to use the q_nawat_corrector to correct the discharge data.

Isotope measurements validation

The isotopic composition of water samples, specifically δ 2H and δ 18O, was analyzed to assess potential deviations from the Global Meteoric Water Line (GMWL). The GMWL serves as a reference for the isotopic compositions of meteoric water, following Eq. (1). This step was included in the CAMELS-CH-Chem validation phase to demonstrate the usability of the collected data for future users.

$$\delta {\rm{H}}2=8.0\,\delta {\rm{O}}18+10\textperthousand $$
(1)

where δ 2H is the deuterium fraction and δ 18O the Oxygen-18. Both measurements are in permille (‰) notation according to VSMOW.

Figure 7 shows the individual subplots a to t for the total 20 stations with isotope data (nine from ISOT and 11 from CH-IRP), with the δ²H and δ¹⁸O values plotted alongside the GMWL. The axis limits were set based on the observed range of isotope values across all stations. This comparative approach allowed for a clear identification of any deviations from the GMWL and provided insight into potential fractionation processes in the catchments.

Fig. 7
figure 7

Dual isotope plots of δ²H and δ¹⁸O for the nine ISOT and 11 CH-IRP sampling locations in reference to the Global Meteoric Water Line (GMWL, dashed orange line). Blue circles (dark for ISOT and light for CH-IRP) represent individual water samples covering their respective entire timeseries.

Overall, the alignment of the isotope data with the GMWL showed low deviation, with only few samples with apparent evaporative fractionation, indicating that this dataset provides a reliable basis for future hydrological studies in the provided catchments. For CH-IRP, users can also refer to their original publication, where Staudinger et al.18 also provide lab standards and errors for their measurement stations, along with potentially problematic measurements.

Usage Notes

Users should note that overlaps between the high-resolution and grab samples variables result from differences in temporal aggregation of the same measurement. Moreover, missing data was generally sparse and occurred sporadically, primarily due to temporary sensor or sampling interruptions, and users should acknowledge that while using the data.

Other water quality and isotope datasets for stations not covered in the current version of the CAMELS-CH-Chem dataset are available. We provide an initial, non-exhaustive list of published and unpublished datasets. This resource is intended to be expanded through ongoing contributions from the community.