Background & Summary

The Arctic as a region is experiencing rapid changes in climate1,2,3,4, which impact living conditions for humans and ecosystem functioning5,6,7,8,9,10,11,12. This includes processes such as vegetation damage and tundra C emission rates, which cause a positive feedback to climate change13,14. Research into the effects of changing Arctic climate is therefore essential, however, has been limited by large-scale in situ data availability15, which means that studies involving pan-Arctic climate have been based on remotely sensed temperatures16,17 or reanalysis products18,19,20. Although airports and critical infrastructure have been monitored since the 40s and 50s21, widespread in situ surface climate data collection began in the 90s and early 00s by climate monitoring programs and universities22,23,24,25,26,27,28,29. Only recently, therefore, are hourly-to daily measurements covering most Arctic regions publicly available, but scattered in various local databases22,23,24,25,26,27,28,29. Such in situ datasets can be used to validate satellite or reanalysis data regionally30, however, a pan-Arctic standardized option is not readily available. Presently, geopolitics and sanctions further restrict access to data from especially the Siberian Arctic15, with important impacts on climate model performances31. Therefore, increasing accessibility, reusability, and interoperability of Arctic environmental data32 is today a crucial task for furthering research.

In this publication, we present a new standardized dataset containing terrestrial in situ weather data from all the major Arctic regions collected from publicly available data sources, with most data falling within the period 1990-2023 and with a focus on the most commonly measured variables Air temperature, Surface temperature, Snow depth, Relative Humidity and Precipitation. The data has been reformatted and restructured to a standardized format, and has gone through a quality check (Fig. 1), however, we have purposefully kept the data as “raw” and unprocessed as possible to ensure a flexible, but accessible and interoperable use of pan-Arctic in situ weather data spanning from a period before wide-spread climate change in the Arctic until today. Credit for use of the data should go to the respective data sources as listed together with the data33.

Fig. 1
Fig. 1
Full size image

Overview over the workflow involving data collection, reformatting and normalization, quality check and compression process, which created the presented dataset.

Methods

The data included in the presented dataset comes from 13 different publicly available resources distributed around the Arctic, although some locations are situated below the Arctic circle at 66 N. These sites, however, are part of Arctic monitoring programs because of their ecological similarity to or connection with sites in the geographic Arctic, and are therefore included. The data sources are: AWI: Alfred Wegener Institute, Potsdam, Germany28,34; CALM: Circumpolar Active Layer Monitoring through Arctic Data Center, Washington DC, USA35,36,37 (https://www2.gwu.edu/~calm/data/north.htm); CEN/Nordicana: Center for Northern Studies, Quebec, Canada38,39,40,41,42 (https://nordicana.cen.ulaval.ca/en_index.aspx); FMI: Finnish Meteorological Institute, Helsinki, Finland43 (https://litdb.fmi.fi/index.php); GEM: Greenland Ecosystem Monitoring, Nuuk, Greenland/Roskilde, Denmark44 (https://G-E-M.dk); IARC: International Arctic Research Center, Fairbanks, USA45; NGEE: Next Generation Ecosystem Experiments, Fairbanks, USA46,47,48,49, NMI: Norwegian Meteorological Institute, Oslo, Norway50 (https://frost.met.no/index.html); SILA/Nordicana: SILA Network at Center for Northern Studies, Canada51,52 (https://nordicana.cen.ulaval.ca/en_index.aspx); SMHI: Swedish Meteorological and Hydrological Institute, Norrköping, Sweden53 (https://www.smhi.se/data/sok-oppna-data-i-utforskaren); WMO: World Meteorological Organization via Meteostat.net, Friedberg, Germany54 (https://dev.meteostat.net/sources.html); DMI: Danish Meteorological Institute, Copenhagen, Denmark55 (https://www.dmi.dk/frie-data), NOAA: National Oceanic and Atmospheric Administration Global Monitoring Laboratory, Washington DC, USA56 (https://gml.noaa.gov/data/data.php?site=brw). A complete list of the locations including geographic location, data source, access link and reference and details of how to cite the data can be found with the dataset33.

Originating from different measurement programs, the instrumentation used may vary. In tables S1-7 in Supplementary Materials, we compile the information available about the instrumentation from each data source to the degree of detail available at the data source. Where available, we list the exact instrumentation. For data collected from World Meteorological Organization (WMO) stations through Meteostat, we may assume that they adhere to the WMO standards of instrumentation and installation (Table S757). Data from some locations are available from several sources (e.g. Arctic Data Center and the IARC data base). Figure 2 shows the data sources in this particular dataset. Measurement methods range from manual snow probing or relative humidity sensing in the measurements before 1980 to, most commonly, automatized measurements of a range of weather variables with half-hourly to daily frequency.

Fig. 2
Fig. 2
Full size image

Locations and sources from which in situ data was collected. See Table S1-7 for details of the instrumentation. A metadata file with information on how to cite the data is located together with the data. In some cases, different types of data from the same location came from different sources, so that dots of different colors may overlap on the figure. Abbreviations: AWI: Alfred Wegener Institute, Potsdam28,34; CALM: Circumpolar Active Layer Monitoring through Arctic Data Center3537; CEN/Nordicana: Center for Northern Studies, Canada3842; FMI: Finnish Meteorological Institute43; GEM: Greenland Ecosystem Monitoring44; IARC: International Arctic Research Center, Fairbanks, US45; NGEE: Next Generation Ecosystem Experiments, Fairbanks, US4649, NMI: Norwegian Meteorological Institute50; SILA/Nordicana: SILA Network at Center for Northern Studies, Canada51,52; SMHI: Swedish Meteorological and Hydrological Institute53 WMO: World Meteorological Organization via Meteostat.net54; DMI: Danish Meteorological Institut55, NOAA: National Oceanic and Atmospheric Administration Global Monitoring Laboratory56.

Import and Standardization

Localization and collection of data was done using Application Programming Interfaces (in the case of the data sources WMO and Norwegian Meteorological Institute (NMI)), manual download from databases (Finnish Meteorologial Institute (FMI), International Arctic Research Center (IARC), Circumactive Active Layer Monitoring (CALM), Next Generation Ecosystem Experiments (NGEE), Swedish Meteorological and Hydrological Institute (SMHI), Alfred Wegener Institute (AWI), Greenland Ecosystem Monitoring (GEM), Nordicana, National Oceanic and Atmospheric Administration Global Monitoring Laboratory (NOAA)) and by direct communication with agencies (Danish Meteorological Institute (DMI)). For each data source, an import script was developed (python 3.9, available here33), which is called into a standardization script33, in which all data is restructured and standardized into the same tabular format (Fig. 1). Because each data source came with its own format, the standardization procedure was specific for each source. Where metadata on Latitude, Longitude and Elevation was not already a part of the data, this script also adds the location information.

Quality check

Because data came from various sources, in some of which it was not clear what quality check had been performed, we designed and ran all standardized datasets through a simple quality check with elements inspired by previously published quality check procedures such as58,59,60,61. In the quality-checked datasets provided here, we made qualified evaluations based on the most commonly used criteria (see details in Tables 1, 2, 3 and 4), in order to provide the most immediately useful dataset.

Table 1 Part 1: Details of the operations performed and conditions set during the quality check for Air Temperature, Snow Depth, and Precipitation.
Table 2 Part 1: Details of the operations performed and conditions set during the quality check for Long wave incoming radiation (LW Incoming) and Long wave outgoing radiation (LW Outgoing).
Table 3 Part 2: Details of the operations performed and conditions set during the quality check for Shortwave incoming radiation (SW Incoming), Shortwave outgoing radiation (SW Outgoing), and Relative Humidityr.
Table 4 Part 2: Details of the operations performed and conditions set during the quality check for Surface Temperature, Soil Temperature, and Soil Moisture.

The quality check is split into 5 modules:

  1. 1.

    Removal of known common measurement errors or missing values (specifically the values -9999, 9999, ‘M-9999.0000’, ‘R-9999.0000’, 9999.0000, ‘R0.00000’, -9999.0, -9.999e+03, -999.9,-99.9, -99, 6999.000000) and rows in which all values are NaN.

  2. 2.

    Removal of impossible values such as negative snow depths, relative humidity above 100 %, or air temperatures of 60°C, which are physically impossible and are instrument artifacts (Named ‘spikes’ in59). If snow depths are only slightly negative (−3 to 0 cm) and the air temperature is above 2°C, snow depth is set to 0 (see Table 1).

  3. 3.

    Flagging of outliers in each data type: values above or below 3 standard deviations of the preceding 7 and following 7 data points (rolling window = 15), and minimum 3 units above/below the average of the same preceding and following 7 values. This flagging is then followed by user inspection and an optional visual inspection of the dataset.

  4. 4.

    Unit conversion so that units are uniform across datasets and sites. This was necessary for snow depth, which was standardized to cm and soil moisture (vol %).

  5. 5.

    Judgment of probable instrument artifacts or effects of calibration. This involves a close look at tiny, but non-zero snow depth measurements during high summer, which with very high probability are a zero-calibration issue. Sudden jumps (offsets) in air temperature, surface and soil temperature and soil moisture data (potentially due to e.g. calibration) are detected for user inspection and potential removal or offset correction.

An overview of the general quality check procedure can be seen in Fig. 1. Tables 1, 2, 3 and 4 give specifics of considerations and checks that were made for each data type in the dataset, including specific cutoff values.

We also provide a reformatted and standardized version of the raw data before quality check33, which can be run through the quality check procedure so users can make their own evaluation of e.g. offsets or spikes (Table 5). The quality check can be run module by module and the user can judge which modules are necessary and appropriate.

Table 5 Description of data files available in the repository33.

We have done no gap filling or spatial homogenization of data but see an example here 62, but have kept the quality-checked dataset as close to in situ measurements as possible. With a varied set of data sources and varying degrees of information on instrumentation, record length and start-end dates, we did not perform general long-term drift correction (but see63 for detection practices).

Data Records

The dataset is available at Zenodo [https://doi.org/10.5281/zenodo.15388335]33. Figure 2 shows a map of all locations represented in this dataset with the original data sources denoted.

The compiled pan-Arctic dataset is available at the data repository33 in the compressed .parquet format, supported by most data handling programs64. The dataset is available in its entirety, but is also split into each data source and split in regions (Scandinavia, North America, Greenland and Russia) for partial download (see an overview in Table 5). It is available pre-quality check (as close to raw data as possible, but restructured to have the same format) and post-quality check (Table 5).

Further, the code for 1) importing, 2) reformatting and normalizing, and 3) quality check (modules 1-5 all optional) can be found as .py files alongside the dataset, as well as a .py script that imports the metadata file and merges with the data file33. New data from these sources can thus be standardized into this format and quality checked. Finally, a metadata file (.xlsx and .csv) with locations and data sources is available, as well as a list of citations which should be used when using the data and their links to licenses (.txt).

Technical Validation

The quality check of raw normalized data in module 1-5 altered between 0 and 2.5 % of the measurement values in the data from each source (Table 6). However, the amount of data differed between sources, with the WMO dataset being the largest, and its relative change presenting as 0% ( < 0.001%).

Table 6 Cumulated percentage of data filtered from the original input as the dataset passed through each module in the quality check procedure.

Data coverage and missing values

Table 7 shows the number of values for each variable and data source in the dataset. The total number of non-NaN observations are in the same order of magnitude across all Arctic regions, with the Russian Arctic subset of the data the smallest and the Scandinavian Arctic the largest. The variables most represented are air temperature, precipitation, relative humidity, snow depth and surface temperature, with subsets of the sites also focusing on soil temperatures and soil moisture. The data sources, with their specific focus, determine the composition of the compiled and standardized dataset.

Table 7 The number of non-NaN values for each variable and data source.

Table 8 shows the total and percentage of data coverage in North America, Russia, Scandinavia and Greenland.

Table 8 The number and percentage of non-NaN values for each variable and region.

The different variables are represented to various degrees, reflecting priorities in the Arctic in situ measurements of the represented programs, and access to the data. Most programs measure air temperature and relative humidity, which have high percentage of non-NaN values, whereas snow depth, precipitation, and radiation measurements are the second most commonly represented, see Table 8.

Data coverage increases over time, with most data available after 1990 and especially after 2000 (see data density plots of aggregated counts in Figs. 3, 4, 5, 6). Widespread measurement of liquid precipitation started after 2010, whereas surface temperature, air temperature and snow depth were prioritized earlier. The figures represent the data density over time as a total count of data points for each variable aggregated over each year. The figures, which show the distribution of data over time as “violin plots”, are then scaled so that the width of the “violin” represents the variable data density relative to the other variables within the plot.

Fig. 3
Fig. 3
Full size image

Count (relative to each other, unitless) of non-NaN values over time in the Greenlandic region. Only the most represented variables are shown.

Fig. 4
Fig. 4
Full size image

Count (relative to each other, unitless) of non-NaN values over time in the North American Arctic region.

Fig. 5
Fig. 5
Full size image

Count (relative to each other, unitless) of non-NaN values over time in the Russian Arctic region.

Fig. 6
Fig. 6
Full size image

Count (relative to each other, unitless) of non-NaN values over time in the Arctic Scandinavian region, including Svalbard.