Background & Summary

Continental scale droughts are expected to become more frequent as climate change induced increases in surface temperatures lead to atmospheric drying1. Higher surface temperatures and less frequent precipitation during drought drive competition for water between the land surface and the atmosphere2. Reduced precipitation and increased atmospheric demand for water lead to soil moisture anomalies, limiting water available for plant use3. During droughts, water stress can reduce vegetation growth4 and productivity5, minimizing the role of vegetation as a carbon sink6 and diminishing crop yields7. High atmospheric aridity, measured as vapor pressure deficit (VPD), has been shown to be as important as low soil moisture at driving plant water stress8. High VPD has also been associated with drying out vegetation, which then serves as potential fuel for wildfires9. With drought conditions expected to worsen through the 21st century, further attention should be paid to expanding the study of complex biological and ecohydrological responses to increased atmospheric aridity10. To do so, high spatial and temporal resolution datasets of VPD must be available for the scientific community. Here we present the first daily gridded VPD product for the continental United States, including parts of Northern Mexico and Southern Canada (CONUS+).

VPD has been identified as a major factor in driving water fluxes between the land surface and the atmosphere6,11,12, impacting photosynthesis6 and plant growth4. Vegetation growth responds differently to rising VPD depending on plant type and climate13,14. Thus, there is a need to better understand how plants modulate or adapt to changes in atmospheric aridity across gradients of climates and vegetation types. Some plants close their pores, known as stomata, during periods of elevated VPD in order to conserve water and prevent desiccation15 and hydraulic failure16. Under hydraulic failure, plants can no longer exchange water and carbon with the atmosphere, leading to reduced carbon uptake and increased likelihood of plant mortality14,17. Elevated values of atmospheric aridity have been shown to decrease plant growth4 and shutdown stomatal conductance and photosynthesis rates18, indicating carbon assimilation is highly sensitive to changes in VPD.

Elevated VPD is associated with decreases in crop yields10 which can cause billions of dollars in financial losses for the agricultural sector7. In arid regions of Northern China, reductions in wheat, maize, and soybean yields were shown to be more sensitive to changes in VPD than precipitation or temperature19. Similar sensitivities to rising VPD have been shown in crops yields in the Midwestern US5, central Europe20, and northeast Australia21. Lobell et al.5 found that maize and soybean yields in Iowa, Illinois, and Indiana have become increasingly sensitive to high VPD, though farmers are combating yield loss with agronomic advances. In Hungary, positive VPD anomalies, which are associated with increasing temperatures, were shown to negatively impact crop yield for winter wheat20. As climate trends point to higher surface temperatures, the detrimental effects of VPD on crop yields are expected to increase22. On the other hand, there is evidence to suggest high VPD may indeed have a positive effect on certain crop yields in fields with sufficient soil moisture due to plant adaptations that increase water-use efficiency23. Moreover, under simulated climate change conditions with projected increases in temperature and carbon dioxide, elevated carbon dioxide levels may offset the detrimental impacts of high VPD on sorghum grain yield by increasing radiation and transpiration efficiency21. Given the uncertainty in crop responses to rising VPD9 in a changing climate, further analyses incorporating daily-scale VPD are needed10.

High VPD has also been associated with the drying of surface fuels24, increasing the risk of wildfire intensity25 and burned area26. Future climate projections indicate much of the global land area is subject to increases in temperature, resulting in changes to precipitation regimes and wildfire risk27. Increases in VPD as a result of the feedback between rising temperature and lack of precipitation are linked to increases in recent wildfire activity in the western US28,29. Even in humid regions like the Pacific Northwest or Southeast, variability in precipitation resulting from climate change increases the likelihood of drought-induced wildfires30,31. Furthermore, high VPD increases wildfire risk in forest biomes around the globe and jeopardizes their roles as carbon sinks32.

According to the Clausius-Clapeyron relationship33, the amount of water the atmosphere can hold is temperature dependent. As temperatures increase, the capacity for the atmosphere to hold water increases as well34. VPD represents the difference between the actual amount of water vapor in the atmosphere and the amount of water vapor the atmosphere can hold at saturation. It is a measure of atmospheric demand for, or capacity to hold, water35,36,37. To better understand how projected increases in VPD will impact water fluxes during drought and increase the risk of fire at local, regional, and continental scales, there is a need for a high-resolution VPD dataset that considers climate and land cover across ecoregions. Currently, research and operations that require VPD for their analyses often have to compute VPD from ground-based38,39 or satellite remote sensing40,41 measurements of temperature and relative humidity because most datasets do not contain VPD measurements. Ecosystem-to-continental scale modeling and observational studies analysing plant responses to increased VPD would benefit from a fine scale, gridded VPD data product, like the one presented here.

Current available datasets that contain VPD for CONUS are point-scale measurements. The observational network AmeriFlux42 provides sub-daily measurements of temperature and relative humidity from eddy covariance flux towers and, for select sites, provides VPD for users. AmeriFlux has 500 sites spread across North and South America, with many sites concentrated near agricultural areas, specific research stations, and universities. As a result, there are large areas missing ground observations, including parts of the Rocky Mountains and the Great Basin Desert43 (Fig. 1). Despite the sparse distribution of AmeriFlux sites across the United States, the diversity of land cover types and climate regions are well-represented44.

Fig. 1
figure 1

Map of Köppen-Geiger climate classification on the 1 km by 1 km CONUS+ grid. The color scheme was adopted from Peel et al.67. The 253 AmeriFlux sites are indicated with markers representing the IGBP Vegetation Land Cover.

At present, there is no single gridded dataset of VPD for all of CONUS that is freely available. Moreover, to our knowledge, no dataset has derived VPD while accounting for specific land cover and climate types. There are, however, existing methods to produce gridded, continental-scale VPD data from reanalysis or satellite-remote sensing. One limitation related to using these approaches is the coarse spatial or temporal resolution of freely-available datasets. The North American Land Data Assimilation system (NLDAS-2) is a reanalysis dataset that provides hourly temperature, pressure, and specific humidity at an 1/8th degree (12.5 km) spatial resolution that can be used to derive estimates of VPD8. The European Centre for Medium-Range Weather Forecasts (ECMWF) also produces an hourly global reanalysis dataset of meteorological variables, ERA545, with a horizontal spatial resolution of 31 km, which can be used to compute VPD46. Though the hourly resolution of the reanalysis data is useful for when accounting for diurnal fluctuations in VPD, the large spatial resolution extends beyond the fetch of an eddy covariance tower47. Zhang et al.40 demonstrated how to compute daily VPD over China using satellite imagery from the Moderate Imaging Spectroradiometer (MODIS) at a 1 km spatial resolution. The method, which uses MODIS to estimate temperature and humidity, derives an empirical relationship from ground observations. The method is only valid in sites with weather stations and is prone to cloud contamination of surface reflectance40. Moreover, the 8-day observations likely do not capture day-to-day fluctuations in land surface and atmospheric water statuses that impact VPD. To overcome the limitations of the previously mentioned approaches to estimating VPD, we integrated climate and land cover classifications with ground-based observations from AmeriFlux, along with a validated high-resolution temperature and atmospheric vapor pressure product available at a 1 km spatial resolution and daily timestep. This approach informs estimates of VPD, culminating in a gridded VPD product for all of CONUS+. The CONUS+VPD datasets generated in this study have wide ranging applications: VPD can be used to study ecosystem functioning9,14,38,48,49, drought monitoring and prediction8,50,51, or assessing fire risk24,25,26,52,53.

Methods

Overview

In this study, we used meteorological variables of average daily atmospheric vapor pressure (e) and maximum daily temperature (Tmax) from Daymet54, alongside ground based VPD from AmeriFlux eddy covariance towers with a variety of land cover and climate types (Fig. 1), to produce a high-resolution (1 km, daily) gridded VPD data product for CONUS+. Deriving estimates of VPD for CONUS+ consisted of two phases (Fig. 2): (1) Develop land cover and climate dependent correction factors using AmeriFlux observations, (2) Apply correction factors to all pixels in the CONUS+ grid. In Phase 1, we first computed 24-hour average VPD from 253 AmeriFlux sites with varying climate and land cover classifications. We then calculated VPD using Daymet e and Tmax. We chose to use Tmax instead of another daily temperature averaging scheme (e.g.37,55,56,57) because using Tmax better recreates day-to-day variability in VPD. However, it also overestimates daily VPD (Fig. 3). In order to correct the overestimation, we computed the median ratio of AmeriFlux to Daymet-derived VPD for each land cover type and climate classification. In Phase 2, we estimated VPD for the entire CONUS+ grid. We adjusted the Daymet-derived VPD for each grid cell based on its climate or land cover type by applying the corresponding correction factor (i.e., median ratio) developed in Phase 1. This generated two 24-hour average VPD datasets: one informed by land cover and one informed by climate. The resulting Daymet-derived VPD datasets for CONUS+ are evaluated against daily average VPD computed from hourly AmeriFlux VPD data not previously used in the analysis, NLDAS-2 derived VPD, and VPD computed using a weighted temperature averaging scheme.

Fig. 2
figure 2

Two-phase summary of workflow. In Phase 1, we use AmeriFlux VPD to correct estimates of VPD derived from Daymet variables maximum daily temperature, Tmax and average daily vapor pressure e. In Phase 2, we apply correction factors to every grid cell in CONUS+ by matching each grid cell with the corresponding International Geosphere-Biosphere Programme (IGBP) and the Köppen-Geiger (KG) climate classification.

Fig. 3
figure 3

Example of scaling VPD (Tmax) and comparisons with AmeriFlux VPD at US-Twt for days 100–300 of 2015. US-Twt is a rice paddy cropland (CRO) AmeriFlux site in central California with a dry, temperate climate with a hot summer (Csa).

Datasets

Daymet

Daymet is a freely available collection of daily, 1 km, gridded meteorological data products derived from weather stations throughout North America58. Weather station data is used to inform daily estimates of Daymet’s primary output variables: minimum and maximum temperature (Tmin and Tmax, respectively) and total precipitation. Daymet also produces secondary variables of average daytime shortwave radiation, average atmospheric water vapor pressure (e), and accumulated snow water equivalent, which are derived from the primary variables59,60. Day length is also provided as an estimate based on geographic location58. For this study, we used Daymet Tmin, Tmax, and e from 1995 to 2023, corresponding with available data records from ground based AmeriFlux eddy covariance towers. Tmin and Tmax were used to estimate saturated vapor pressure, esat, from which we subtract e to estimate VPD.

For this study, daily minimum and maximum temperature and average vapor pressure were acquired from the Daymet Version V4R154 (Fig. 4). Daymet data were directly downloaded from https://daac.ornl.gov/. Daymet V4R1 updated V4 by correcting a daily data feed error and imputing missing readings for 2020 and 202158. Daymet V4 uses weather inputs from the National Centers for Environment Information Global Historical Climate Network Daily database. Daymet V4 updated previous Daymet algorithms to account for observation reporting time and high elevation biases which can affect daily maximum temperature and precipitation accumulations. For each 1 km by 1 km Daymet grid cell, a normalized weighted interpolation is used to estimate both Tmin and Tmax from three-dimensional temperature gradients using data from weather stations within a predefined search radius58. Daily average vapor pressure, e is computed using Tmin as a proxy for dew point temperature and implementing aridity adjustments61 which require potential evapotranspiration derived from shortwave radiation60.

Fig. 4
figure 4

Daymet maximum temperature and average vapor pressure for day 180 (i.e. June 29) 2023.

AmeriFlux

AmeriFlux is a network of eddy covariance flux towers, distributed across North, Central, and South America, collecting semi-continuous measurements of carbon, water, and energy fluxes42,44,62. The AmeriFlux data was used for rescaling Daymet-derived VPD at the tower sites and validating the CONUS+ gridded VPD product. We downloaded all AmeriFlux data used in this study from https://ameriflux.lbl.gov/, using the “Site Search” tool to select 253 sites from the nearly 500 AmeriFlux sites by considering sites in CONUS+ which had half-hourly VPD, record lengths of at least one year, and which share their data under the AmeriFlux CC-BY-4.0 License. Time series of VPD observations are provided at half-hourly increments, with records spanning 1995 to 2023; however, most sites do not have records spanning the entire time interval. The 253 sites used in this study met the criteria of having at least one complete year of VPD observations, totaling over 1400 site years of data across all sites (Tables 1 and 2). We computed daily estimates of AmeriFlux VPD (VPDAMF) by averaging over all 48 half-hour measurements within a single day. AmeriFlux reports VPD in hectopascals (hPa), except for some older sites which reported VPD in kilopascals (kPa). All VPD data from AmeriFlux were converted to units of kPa. AmeriFlux provides ancillary information about the geographic location (latitude and longitude coordinates), land cover (International Geosphere-Biosphere Programme, IBGP), and climate (Köppen-Geiger, KG) for each site. We used the IGBP and KG classifications as provided from AmeriFlux to develop relationships between AmeriFlux and Daymet daily average VPD. From the 253 sites, there were 14 unique IGBP land cover and 16 unique KG climate classifications (Tables 1 and 2, respectively). To provide independent validation, we computed daily average VPD from five AmeriFlux sites that provided hourly VPD and were not included in estimating the correction factors from Phase 1 for the final gridded CONUS+ VPD product.

Table 1 International Geosphere-Biosphere Programme (IGBP) land cover classifications for the 253 AmeriFlux sites used in the correction factor analysis.
Table 2 Köppen-Geiger climate classification descriptions for 247 of the 253 AmeriFlux towers used in the correction factor analysis.

A summary of the 253 AmeriFlux IGBP and KG classifications, the proportions of sites they represent, and the respective total number of site years can be found in Tables 1 and 2 and Fig. 5. KG classification Cfa (temperate, no dry season, hot summer) was the most represented classification in this analysis with 68 sites (26.9%) throughout the Southeastern US (Fig. 5). The next most common KG climate classifications was Dfb (cold, no dry season, warm summer) with 52 sites (20.6%), followed by Dfa (cold, no dry season, hot summer) with 32 sites (12.6%). Of the 253 sites in this study, croplands (CRO), evergreen needle leaf forests (ENF), and grasslands (GRA) were the most-represented IGBP land cover classifications, representing 23.3%, 22.1%, and 15.8% of the total sites, respectively. The proportion of sites a classification represents does not necessarily translate to the amount of data used in this study because some sites may have longer temporal records of VPD. For example, there are 16 more sites with KG classifications Cfa than with Dfb, but Dfb had 331 site years of data compared to Cfa which had 328 sites years of data. With 413.8 site years, ENF was the land cover classification with the most data, nearly twice as many site years as CRO despite having only three fewer tower sites. Overall, climate and land cover from AmeriFlux towers represented well the actual coverage of CONUS+ with a few exceptions. With 9.1% of sites and 158.9 site years of data, the KG classification Csa (temperate with dry, hot summer), a Mediterranean climate found in California, were over represented by the site data since only approximately 1% of the CONUS+ grid is classified as Csa. Alternatively, woody savannas (WSA) were underrepresented as they make up 2% of AmeriFlux sites but 12.6% of the CONUS+ gird.

Fig. 5
figure 5

Percentage breakdown of IGBP land cover and Köppen-Geiger climate classifications from the 253 AmeriFlux sites. Unk = unknown or not provided in AmeriFlux documentation.

IBGP land cover

IGBP land cover was used to classify AmeriFlux sites and for estimating VPD at the continental scale. Land cover classifications for pixels in the CONUS+ grid were generated from the Moderate Resolution Imaging Spectroradiometer (MODIS) Land Cover Type Product (MCD12Q1) Land Cover Type 1, which provides annual land cover from 2001–2023 at a 500 m spatial resolution using IGBP classes63. MODIS MCD12Q1 Land Cover Type 1, the Annual IGBP classification, was selected to align with the AmeriFlux provided IGBP land cover. The freely available MODIS land cover data were downloaded from AρρEEARS64 (https://appeears.earthdatacloud.nasa.gov) using the “Extract Area Sample” download tool with geographic coordinate projection. The 500 m MODIS was upscaled to the Daymet 1 km grid using a nearest neighbor interpolating algorithm. The AmeriFlux sites represented 14 of the 17 MODIS IGBP classes (Table 1). The three unrepresented IBGP land cover classes include deciduous needleleaf forests (DNF), which comprises approximately 0.005% of CONUS+ pixels, evergreen broadleaf forests (EBF) and permanent snow/ice (SNO), which together make up approximately 0.8% of pixels.

Köppen geiger climate

The gridded updated Köppen Geiger (KG) climate classifications were obtained from Beck et al.65,66. The KG classifications were also used to classify AmeriFlux sites and estimate VPD for CONUS+. The KG classifications are available at 1 km by 1 km spatial resolution, which we realigned to match the 1 km CONUS+ grid using a nearest neighbor approach. There are five main categories of classification, each broken down into subcategories, yielding a total of 30 unique classifications globally67. For CONUS+ grid, there were only 25 unique KG values. The AmeriFlux sites represent 16 KG classifications. In order to determine correction factors for the remaining nine KG classifications, we reclassified them following Table 3.

Table 3 Summary of Köppen-Geiger replacements.

North American Land Data Assimilation System (NLDAS)

Reanalysis data from North American Land Data Assimilation System phase 2 (NLDAS-2) Forcing File A68 was used to estimate an independent VPD product for evaluation of the CONUS+ VPD dataset. NLDAS integrates a multitude of guage-, radar-, and model-based observations using the National Centers for Environmental Prediction (NCEP) Eta Data Assimilation System (EDAS) to generate forcing data for all of CONUS at an hourly timestep and at a 0.125° by 0.125° spatial scale69. The freely available NLDAS-2 variables of 2-m above ground temperature temperature (TMP), 2-m specific humidity (SPFH), and surface pressure (PRES) were downloaded using the NASA GES DISC (https://disc.gsfc.nasa.gov/) dataset subset tool70. VPD was calculated following the method described in Lowman et al.8. Daily average temperature from NLDAS-2 was used to estimate esat from Teten’s equation and vapor pressure, e, was estimated using average daily specific humidity and air pressure.

Estimating VPD for CONUS+

In order to generate a daily gridded VPD dataset for CONUS+, we used the meteorological variables from Daymet, VPD from AmeriFlux, land cover classifications from MODIS, and climate classifications from KG. The temporal coverage of MODIS land cover was the limiting factor for the start and end dates of the data records produced in this study. At the time of development, the MODIS land cover record was 2001 to 2023, so this twenty-three-year period defined the date range for land cover correction. We assumed a single KG classification for each pixel of CONUS+ for the entire study period (i.e., no pixels changed climate classification).

Calculating VPD from Daymet - Phase 1

Calculating VPD for each AmeriFlux site required first identifying the Daymet 1 km by 1 km grid cell whose center was nearest the site’s geographic coordinate and extracting the Daymet variables (Fig. 2). Daily average VPD, in kPa, was calculated as the difference between saturated, esat, and unsaturated, e, vapor pressure

$$VPD({T}^{\ast })={e}_{sat}({T}^{\ast })-e$$
(1)

where e is provided by the Daymet daily average vapor pressure data and esat is calculated using Teten’s equation33,71:

$${e}_{sat}({T}^{\ast }\mathrm{)}=0.611\,\exp \left[A\left(\frac{{T}^{\ast }}{{T}^{\ast }+B}\right)\right]\mathrm{}.$$
(2)

The variable T*C) is a representative temperature computed using the Daymet maximum and minimum daily temperatures (Tmax and Tmin, respectively). The parameters A and B are empirically determined with \(A=17.27\) and \(B=237.15\) when \(T\ge {0}^{^\circ }C\), and \(A=21.87\) and \(B=265.5\) when \(T < {0}^{^\circ }C\)33,72. It is a known issue that calculating saturation vapor pressure using the above freezing coefficients in Eq. 2 to below freezing temperatures results in large errors72. In our calculations we used \({T}^{\ast }={T}_{max}\). Prior studies used a weighted average of Tmax and Tmin, \({T}^{\ast }={T}_{W}=0.606{T}_{max}+0.394{T}_{min}\), which puts more emphasis on Tmax57,73. The TW method is presented in the earliest Daymet derivations of meteorological variables as a way to estimate average daily temperature57. In this method, Tmin and Tmax are used to fit a sine curved to simulate diurnal changes in temperature, with the average value of the sine curve representing the average value of daytime temperature73. Unsaturated vapor pressure provided by Daymet uses Tmin as a representative of dew point temperature57. We compared how estimating esat using Tmax or TW as T* influences estimates of VPD computations.

Computing Correction Factors - Phase 1

We found that using TW did not capture day-to-day variability in observed VPD from the AmeriFlux towers while preliminary analysis showed calculations of \(VPD({T}^{\ast }={T}_{max})\) matched day-to-day fluctuations while overestimating daily AmeriFlux VPD (VPDAMF) (Fig. 3). To reduce the difference between \(VPD({T}_{max})\) and VPDAMF, we developed a method to scale \(VPD({T}_{max})\) using ratios based on AmeriFlux climate and land cover classification. In Phase 1, we computed daily \(VPD({T}_{max})\) for each AmeriFlux site from 1995 to 2023 and grouped all computations by KG and IGBP classifications. We computed ratios, R, of AmeriFlux to Daymet VPD at every time-step for all sites and all years with data within a given classification according to

$${R}_{j}({t}_{i})=\frac{VP{D}_{AMF}^{j}({t}_{i})}{VP{D}^{j}({T}_{max}({t}_{i}))},$$
(3)

where ti is the daily timestep. We binned the ratios by IGBP (KG) and found the mean and median across all sites with the same land cover (climate) classification. Thus, a site with more years of data was weighted more when computing the correction factors across all sites. Because the mean values could be skewed by large outlier ratios, we used the median ratio as the correction factor for the jth land cover type (climate classification) across all sites and time with land cover (climate) j,

$${R}_{j}=\mathop{{\rm{median}}}\limits_{j}({R}_{j}({t}_{i})).$$
(4)

There were five sites with climate listed as unknown (Unk), which were excluded from the correction factor development. We then computed new scaled VPD values at each time step, ti by multiplying \(VP{D}_{{T}_{max}}({t}_{i})\) by the appropriate land cover (climate) correction factor, Rj,

$$VP{D}_{S}^{j}({t}_{i})={R}_{j}\ast VP{D}^{j}({T}_{max}({t}_{i})).$$
(5)

Median ratios, along with their means and standard deviations, are provided for all land cover and climate classifications in Tables 4 and 5.

Table 4 Correction factors for each IGBP land cover classification with summaries of errors between observed VPD (VPDAMF) and VPD scaled by IGBP land cover.
Table 5 Correction factors for each Köppen-Geiger climate classification with summaries of errors between observed VPD (VPDAMF) and VPD scaled by KG climate.

Generating CONUS+ VPD - Phase 2

To generate daily maps of VPD for all of the CONUS+ grid, we computed daily \(VPD({T}_{max})\) for every pixel and multiplied the daily value by the appropriate correction factor using Eq. 5. However, the AmeriFlux towers used to generate the correction factors only provide a sample of IGBP and KG classifications, and in some instances, the interpolated land cover and climate classification for pixels containing AmeriFlux towers may differ from the classifications provided by AmeriFlux. For the missing 3 IGBP classes mentioned above (DNF, EBF, and SNO), no correction factor was applied to the CONUS+ \(VPD({T}_{max})\). There were 25 out of 30 KG classes represented in the CONUS+ grid, with 16 unique KG values represented by the AmeriFlux sites, requiring the need to determine the correction factors for 9 different KG values. We used ratios from the most similar climate classifications as the correction factors for unrepresented KG values, as summarized in Table 3. The differences in KG classifications and their replacements are in the third letter of the short-name classifications, which indicates summer temperatures67. The nine unrepresented KG classes are (Af, Am, Aw, Csc, Cwb, Cfc, Dsc, Dwa, EF). KG classifications with leading “A” are all tropical, and we had no tropical climates represented in the AmeriFlux site data, however, extreme biases exist if no correction factors were applied to these pixels (i.e., \({R}_{j}=1\)), which comprise <0.3% of pixels in the humid south of Florida. So, we replaced Af, Am, and Aw, with climate Cfa which covers most of the Southeastern US, including Northern Florida.

Data Records

The dataset, along with supporting data generation and evaluation codes, are available as a resource in the HydroShare repository74. Two different CONUS+ VPD datasets were created: one for \(VPD({T}_{max})\) scaled by the Köppen-Geiger (KG) climate classification, and one scaled by MODIS Land Cover Type 1 (IGBP) classification. We provide 23 years of daily, gridded VPD data spanning the period 2001–2023. The gridded data products are available as netCDF files covering CONUS+, with bounding box 25 to 50°N latitude and 67 to 125°W longitude (Fig. 6). Each netCDF files contains one year of data, with data arrays having dimensions 5731 by 3122 by 365. The first two dimensions correspond with the dimensions of the CONUS+ grid, and the third dimension represents each day of the year. Daymet provides 365 days of temperature and vapor pressure for each day of the year. In leap years, day 365 corresponds with December 30 and no data is provided for December 31. We follow the same convention. The VPD data are stored as 16-bit unsigned integers in Pascals. Therefore, a correction factor of 10−3 must be applied to convert the values into units of kPa. In the repository, we also provide netCDF files of the CONUS+ grids of KG and yearly IBGP values, along with.csv files containing tables of the corresponding correction factors. NaNs (indicating not-a-number) are used to fill values where the VPD data is unavailable, which usually occurs over oceans and larger bodies of water. There are two additional scripts provided in the HydroShare repository to assist users in: (1) Plotting CONUS+ maps of VPD for any day of year and (2) Extracting VPD time series data for any location (i.e., pixel) or set of locations in the CONUS+ grid.

Fig. 6
figure 6

Maps of the VPDs products for June 29, 2023 (DOY 180).

Technical Validation

We evaluated the performance of CONUS+ scaled VPD products (VPDS), \(VPD({T}_{W})\), and VPDNLDAS by comparing each to AmeriFlux VPD (VPDAMF) using a variety of error metrics. We found that both land cover and climate informed VPD datasets compare more favorably against AmeriFlux than daily VPD estimated by average daily temperature or NLDAS-derived VPD (Table 6). We further evaluated how the method performs across specific IGBP land cover and KG climate classifications, summarized in Tables 4 and 5. Lastly, we performed independent evaluations of VPDS at five sites that provided only hourly VPD and were not included in the correction factor development: US-Cop, US-Cwt, US-MMS, US-PFa, and US-UMB (Table 7, Fig. 7).

Table 6 Error metrics (MBE, MAE, RMSE, uRMSE, rP) for Daymet-derived and NLDAS VPD compared to AmeriFlux tower measurements and averaged across all sites.
Table 7 Summary of error metrics for five independent AmeriFlux sites.
Fig. 7
figure 7

Time series of scaled Daymet VPD compared with AmeriFlux, and NLDAS for US-MMS and US-PFa, and a map of the independent sites not included in the correction factor development whose error metrics are in Table 7. Pop outs show days 100–150 and 175–225 for US-MMS and days 125–175 and 200–250 for US-PFa.

Error metrics

The error metrics presented are mean bias error (MBE), mean absolute error (MAE), root mean squared error (RMSE), unbiased RMSE (uRMSE), and the Pearson correlation coefficient (rp) for each classification j at time ti, \(i\in 1,...,N=365\), and were computed as

$$MB{E}_{j}=\frac{1}{N}\mathop{\sum }\limits_{i\mathrm{=1}}^{N}(VP{D}_{S}^{j}({t}_{i})-VP{D}_{AMF}^{j}({t}_{i})),$$
(6)
$$MA{E}_{j}=\frac{1}{N}\mathop{\sum }\limits_{i\mathrm{=1}}^{N}|VP{D}_{S}^{j}({t}_{i})-VP{D}_{AMF}^{j}({t}_{i})|,$$
(7)
$$RMS{E}_{j}=\sqrt{\frac{1}{N}\mathop{\sum }\limits_{i\mathrm{=1}}^{N}{(VP{D}_{S}^{j}({t}_{i})-VP{D}_{AMF}^{j}({t}_{i}))}^{2}},$$
(8)
$$uRMS{E}_{j}=\sqrt{RMS{E}_{j}^{2}-MB{E}_{j}^{2}},$$
(9)

and

$${r}_{j}=\frac{\mathop{\sum }\limits_{i\mathrm{=1}}^{N}(VP{D}_{AMF}^{j}({t}_{i})-{\overline{VPD}}_{AMF}^{j})\mathop{\sum }\limits_{i\mathrm{=1}}^{N}(VP{D}_{S}^{j}({t}_{i})-{\overline{VPD}}_{S}^{j})}{\sqrt{\mathop{\sum }\limits_{i\mathrm{=1}}^{N}{(VP{D}_{AMF}^{j}({t}_{i})-{\overline{VPD}}_{AMF}^{j})}^{2}\mathop{\sum }\limits_{i\mathrm{=1}}^{N}{(VP{D}_{S}^{j}({t}_{i})-{\overline{VPD}}_{S}^{j})}^{2}}}\mathrm{}.$$
(10)

The error metric MBE can be any positive or negative value while MAE and RMSE are non-negative. Positive (negative) values of MBE indicate VPDS is greater (less) than VPDAMF, on average. Here, positive MBE indicates VPDS overestimates VPDAMF. Large errors with opposite signs can negate one another in calculations of MBE, resulting in small bias errors. By taking the absolute value of each daily error, MAE provides a measure of the overall magnitude of the differences between VPDAMF and VPDS. Like MAE, squaring the difference between VPDAMF and VPDS in RMSE makes the error positive, but it also provides more weight to large error and less weight to small errors. To remove the effects of systematic bias in the bulk error estimates of RMSE for independent study sites, we computed uRMSE75. The Pearson correlation coefficient (rp), which ranges between -1 and 1, is a metric of the linear relation between VPDAMF and VPDS. The closer the magnitude of rp is to one, the more linear the relationship between the two, with the sign of rp indicating whether or not the relationship is positive or negative. Overall, MBE, MAE, RMSE, and uRMSE closer to zero, and rp closer to one, indicate the Daymet-derived VPD aligns well with observations from AmeriFlux.

Overall performance of correction factors

While the Pearson correlation coefficients were similar in the VPDs, \(VPD({T}_{W})\) and VPDNLDAS methods, there were differences in the other error metrics (Table 6). The negative MBE indicates that the correction factors tended to underestimate daily averages of AmeriFlux VPD. However, for Daymet VPD scaled by IGBP or KG, those overestimates were 0.02 kPa versus the overestimate of 0.18 kPa using a weighted temperature approximation. The MAE from scaling VPDs was 29% smaller than \(VPD({T}_{W})\) across all sites. The RMSE for \(VPD({T}_{W})\) compared to VPDAMF was 0.14 kPa higher than the RMSE using the VPDs methods. Even after removing bias, the uRMSE is stil 0.1 kPa higher than VPDs. While VPDNLDAS has better error metrics than \(VPD({T}_{W})\), the uRMSE value of 0.340 kPa is 0.06 kPa larger than either the land cover or climate corrected VPD. The results indicate that on average, and in extreme cases, across all sites, using land cover and climate to inform the scaling of \(VP{D}^{j}({T}_{max})\) outperformed estimates of \(VPD({T}_{W})\) and VPDNLDAS.

Within the scaling methods, scaling VPD by land cover or by climate tended to have similar values across error metrics (Table 6), but performance varied within classifications (Tables 4 and 5). While most of the correction factors had standard deviations less than 0.4, some standard deviations were orders of magnitude larger. For example, the standard deviation for the IGBP land cover grassland (GRA) was over 5. This was due to a few large ratios (over 10) with the largest being 1067. Large ratios can happen when the denominator, in this case \(VP{D}^{j}({T}_{max})\), is near zero. Nearly all of the large ratios occurred in winter months with temperature below freezing. The coefficients for below freezing temperatures used in Eq. 2 to compute saturated vapor pressure allowed for saturated vapor pressure to be nearly the same as actual vapor pressure, resulting in VPD near zero (i.e. on the order of 10−4 kPa). In total, this happened for the IGBP land cover class GRA on 52 days, and since there were over 1400 site years of data, we did not remove these values. Data from sites US-NR3 and US-NR4 were the leading contributors to the large standard deviations for GRA, being responsible for 38 of the 52 ratios greater than 10 and all six ratios great than 100. In those instances where the ratios were greater than 100, \(VP{D}^{j}({T}_{max})\) (the denominator) was between 0.0002 and .0005 kPa while \(VP{D}_{AMF}^{j}\) (the numerator) fell between 0.1 and 0.2 kPa. Similarly, there was a very high standard deviation of 19.2 for the KG climate classification ET. There were only two sites that have KG classification ET, US-NR3 and US-NR4, explaining why these two classifications had higher standard deviations for the mean of the correction factors. Despite high standard deviations, the average errors associated with the application of correction factors to scale \(VPD({T}_{max})\) were unaffected by the few extreme values due to the total number of records used in the analysis and the fact that we adopted the median values as the correction factors when applying them to CONUS+. The magnitude of the error metrics MBE, MAE, and RMSE were smaller for the two ET sites than for the KG corrected methods across all sites (Tables 4 and 5). This suggests that the high standard deviation of the correction factors did not negatively affect the ability of the correction factors to accurately scale \(VP{D}^{j}({T}_{max})\).

The rp values correlating the VPDs and VPDAMF were strong (i.e., \({r}_{P} > 0.5\)) for most classifications, with most \({r}_{P} > 0.8\), the highest of which came from the open shrubland (OSH) IGBP classification, with \({r}_{P}=0.945\) and KG climate Dsb (cold, dry, warm summer), with \({r}_{P}=0.951\). The KG classifications Cwa (temperate, dry winter, warm summer) had the least linear relationship, \({r}_{P}=0.327\). This classification was only represented by 4 sites (Fig. 5) and 15.6 site years (Tables 1 and 2). And while there was not as strong of a linear relationship for Cwa when compared to VPDAMF, the overall accuracy indicated by other metrics validates the performance of the correction factor.

Considering, the method of scaling by IGBP land cover, the algorithm had the smallest error metrics for cropland/vegetation mosaics (CVM, Table 4) with MBE of -0.004 kPa, MAE of 0.158 kPa and RMSE of 0.201 kPa. However, there was only one site with 5.6 years of data, so CVM was not the most well represented land cover. With 56 sites and over 400 site years of data, evergreen needleleaf forests (ENF) were the most represented land cover. The correction factor median for ENF was 0.416, with a mean and standard deviation of 0.455 and 0.699, respectively. Despite the standard deviation being larger than the correction factor, there was a strong linear relationship (\({r}_{P}=0.840\)) between VPDs and \(VPD({T}_{max})\). The error metrics MAE, MBE, and RMSE for ENF were comparable to the same metrics across all IGBP classifications. Climate classifications Cfb (Temperature, no dry season, warm summer) and ET (Polar tundra) had some of the lowest error metrics but were only represented by one and two sites, respectively. Classification Cfa (Temperate, no dry season, hot summer), which covers most of the Southeast US and was represented by almost 27% of the study sites with 328 site years of data, had errors only slightly worse than the average across all climates.

VPD Uncertainty Analysis

Evaluating VPD for Independent AmeriFlux Sites

For the correction factor development presented above, we used AmeriFlux VPD for sites that reported half-hourly data. For time series analysis, we identified five sites inside of CONUS+ that provided hourly data and were not included in the correction factor development: US-Cop (GRA, no climate reported but interpolated as Bsk), US-Cwt (DBF, Dfb), US-MMS (DBF, Cfa), US-PFa (MF, Dfb), and US-UMB (DBF, Dfb). We scaled \(VP{D}^{j}({T}_{max})\) for these for sites using correction factors corresponding with the AmeriFlux provided vegetation land cover and climate. Error metrics for those sites are found in Table 7 and example time series for two sites for select years are shown in Fig. 7.

Error metrics indicate strong performance of VPDs relative to AmeriFlux VPD at US-Cwt, US-MMS, US-PFa, and US-UMB. For each of the four sites, uRMSE is smaller than the respective error for the same land cover and climate classifications (Tables 4 and 5). Performance of VPDs is weakest at US-Cop, a grassland in a cold, arid environment. AmeriFlux tends to have higher VPD than VPDs and VPDNLDAS at this site as seen in the negative MBE values (Table 7). Error metrics for the land cover and climate corrected VPD were lower than VPDNLDAS at US-MMS, US-PFa, and US-UMB, the three sites with the longest temporal records. VPDs and VPDNLDAS differed in uRMSE at US-Cwt by 0.005 kPa, indicating that our scaled VPD generally outperforms NLDAS across the five test sites. Even for NLDAS, error metrics are highest at US-Cop. The high RMSE values of 0.752 and 0.566 for land cover and climate corrected VPD, respectively, indicate the largest errors occurred during periods of elevated VPD. The lower uRMSE values for VPDs indicate that our methods capture the variability in daily VPD better than alternative methods. For example, during green up at US-MMS (days 100 to 150) and the middle part of the growing season (days 175 to 225) of 2012, the land cover and climate corrected VPD compared more favorably against AmeriFlux VPD than NLDAS, which underestimated AmeriFlux during green up and overestimated AmeriFlux in days 180 to 210 of 2012 (Fig. 7), a time period of an extreme drought in the Midwest76.

Uncertainty in Land Cover and Climate Classifications

We generated land cover and climate dependent correction factors using VPD from 253 AmeriFlux eddy covariance flux towers, however several land cover and climate classifications had less than five representative sites. For IGBP land cover, there was only one site classified as cropland/natural vegetation mosaics (CVM), two sites classified as barren sparse vegetation (BSV), and three sites classified as closed shrubland (CSH). For KG climate, one site represented classification Cfb (temperature, no dry season, warm summer), while classifications Bwh (arid, desert, hot), Dsa (cold, dry and hot summer), ET (polar tundra) each had 2 sites. Classifications Bsh (arid, steppe, hot) and Dwb (cold, dry winter, warm summer) had three sites each, and there were four sites with classification Cwa (temperature, dry winter, hot summer). No tropical sites were represented by the AmeriFlux sites. In contrast, several land cover and climate sites represented significant portions of the 253 sites. There were 59 sites classified as croplands (CRO), 56 as evergreen needleleaf forests (ENF), and 40 as grasslands (GRA). There were and 68 sites with KG climate classification Cfa (temperate, no dry season, hot summer), and 52 classified as Dfb (cold, no dry season, warm summer). It is possible that having a larger sample of underrepresented classifications, in more regions across CONUS+, could have improved performance. Additionally, most AmeriFlux sites are located in heterogeneous landscapes, and the heterogeneity has been shown to influence the variability of VPD in the local microclimate9. This consideration is important for researchers investigating the variability in ecosystem responses to variations in VPD.

Usage Notes

Research to Operational Applications

This dataset offers users a chance to perform fine spatial and temporal scale investigations in a variety of climate and land covers across CONUS+ which is particularly beneficial for studying areas with limited or no available ground data records of VPD. To conduct a point or other small-scale study, a user would need to identify a latitude/longitude coordinate pair (or set of coordinate pairs for multiple sites or for a small region) and the corresponding grid location(s) from the dataset. A time series of VPD could be useful for many reasons. For example, one could follow the methods described by Yuan et al.4 or Li et al.77 to analyze the VPD dependence of carbon assimilation on plant growth. Since the influence of VPD on vegetation growth can vary by region19, users could separate the data by land cover or climate to explore how VPD affects carbon assimilation and plant growth vary by climate and land cover.

Because VPD is linked to drought50, with high VPD corresponding to dry conditions, this dataset could be useful in tracking VPD anomalies that may be associated with drought identification and intensification for all of CONUS+, supporting a suite of indices and anomalies already implemented as drought markers78. One such index, the evaporative stress index (ESI) or ratio (ESR) considers more than precipitation anomalies by computing the ratio of actual evaporation to potential evaporation78, which depends on VPD79, and has been shown to effectively track rapid changes in drought conditions80. Generating standardized VPD anomalies in similar ways could be additionally effective ways to study the direct influence of VPD on drought development. Additionally, when analyzed with covariates such as precipitation and soil moisture8, this dataset could be useful in identifying VPD-induced drought thresholds across land cover and climate classifications, similar to Lowman et al.8.

VPD is a variable of interest to a broad range of the environmental and geophysical science communities, spanning climatology, ecology, ecophysiology, and beyond. Daymet has a meteorological record that dates back to 1980 for most of North America58. With over forty years of temperature and vapor pressure data, those interested in longer-term climatological studies could use this data and adapt our methods to extrapolate VPD back to 1980 and into the future to investigative how VPD is changing at climatological time scales, especially with changing land cover and climate classifications. For those interested in purely atmospheric studies, this dataset could also be useful for research investigating how vapor pressure changes within the vertical atmospheric column by providing surface or near surface VPD. Because of the wide range of altitudes spanning CONUS+, these datasets could be used with elevations maps to investigate how plant function, such as photosynthesis, varies with VPD across altitudinal gradients81. Fire and other disturbances, such as tropical cyclones, can impact the local VPD by altering the local climate and landscape82. Studies of how surface VPD anomalies impact wildfire risk24 across land cover and climate regions in CONUS+ will benefit from using this dataset in conjunction with other fire related data products.

We anticipate land managers, using research to inform operations, to access this dataset to assess when and how crop yields begin to be impacted during VPD-induced drought, and to evaluate the resiliency of various croplands to high VPD in different climates. This type of scientific inquiry can guide agricultural strategies to mitigate the effects of changing atmospheric demand for water. Similarly, studies using these datasets to investigate the impacts of high VPD on wildfire risk could be translated into protocols for forest management decisions to determine wildfire risk status.

Alternative Ways to Use the VPD Products

We encourage users to consider the land cover and climate classifications of their study sites when using this data product. Outputs between the two VPD products are similar, but subtle differences exists (Fig. 6). In areas like the Southeastern US, where land cover can be heterogeneous but climate is homogeneous over are large region, \(VP{D}_{S}^{KG}\) tends to be smoother than \(VP{D}_{S}^{IGBP}\). In some applications users may prefer to use daily maximum VPD (i.e., VPD (Tmax)) as a proxy for maximum daily water stress50. VPD (Tmax) can be obtained for any pixel by dividing by the correction factor. Additionally, users may want to account for both climate and land cover. To do so, they would divide by the correction factor before applying a correction factor from a different classification, or some combination of correction factors. For example, if a user is working with a site and they want to combine the effects of land cover and climate classification, they could scale VPD (Tmax) using the average of the IGBP and KG correction factors for that site. Or users could attempt to incorporate vegetation heterogeneity by combining correction factors for one or more vegetation types.

We assumed land cover changes occurred annually according to MODIS IGBP. One could update this VPD product using the newest available MODIS IGBP records in the future. Additionally, users may want to extend records further back (prior to 2001) in which case they could assume prior land cover, with caution, and use the land cover map of correction factors to apply to earlier Daymet records. With MODIS going offline in the near future, it may also be possible to follow this procedure with different land cover classification schemes. Similarly, any updated climate classification schemes could be applied to any future Daymet data records or those prior to 2001 in order to expand the available date record of VPD (Tmax). The methods presented here could also be adapted for regions across the world, especially in areas with similar climates and or land cover to CONUS+. Studies investigating VPD in arctic and tropical regions could follow the methods presented here but would benefit from a dataset developed using additional ground observations from those regions.

Transferability Outside of CONUS+

The methods presented in this manuscript can be adapted to other study regions as long as investigators account for the resolution of the temperature, vapor pressure, land cover, and climate inputs. Here, the MODIS land cover data is 500 m and the Koppen-Geiger climate 1 km, which were each resampled to align with the Daymet CONUS+ grid at 1 km. This method could be directly applied to the entire Daymet spatial coverage (Alaska, Hawaii, Puerto Rico, Mexico and Canada) because there are AmeriFlux and/or FluxNet towers to provide ground observations of VPD. Outside of those regions, Daymet is not available. Thus, some alternative meteorological data set would need to be used (e.g., Climatic Research Unit gridded Time Series83). The land cover and climate data are each available globally so applying the methods presented requires subsetting land cover and climate and aligning it to match the resolution of the temperature and atmospheric vapor pressure data of the new study region. FLUXNET data could be applied to a wider range of regions (e.g., South America, Africa, Asia, and Europe42). Otherwise, the key limitation to applying this approach is the availability of ground observations needed to build correction factors for the daily estimates of VPD.

Working with NetCDF Files

The datasets are available in netCDF file format. NetCDF files are commonly used formats for storing array data. They are easily readable by most software applications (e.g., Python, R, Matlab, Fortran, etc.). For more information and resources about netCDF files, visit https://www.unidata.ucar.edu/software/netcdf/. Additionally, users can subset, visualize, or analyze the netCDF files in the HydroShare resource using the THREDDS (Thematic Real-time Environmental Distributed Data Services) data server. For help with using THREDDS, visit https://help.hydroshare.org/apps/thredds-opendap/.