Introduction

Over the past few years, there has been a rapid increase in extreme events like heat waves and droughts, primarily driven by human-induced climate change1. For instance, wildfires in high northern latitude regions are expected to increase in upcoming years2. Climate change is altering landscape fire patterns in numerous areas, leading to smoke emissions that carry various harmful air pollutants, posing potential risks to human health3. The large quantities of atmospheric pollutants, such as PM2.5, contribute to the deterioration of air quality during the fire season2. Insects are also threatened by those chemicals. When present in sufficient concentrations, atmospheric pollutants from fire smoke can impair foraging and the development of some insect species such as honey bees, with potential consequences for population dynamics3,4.

Pollinators are among the most threatened insects, facing notorious declines due to habitat loss, extreme weather events, and agro-chemical pollutants5,6,7. Importantly, their nutrition depends on access to flowering plants8, and their capacity to locate them using scent has been found to be negatively affected by air pollution9,10. Research shows that elevated ozone concentrations (O3) can affect pollinators’ nutrition by degrading floral scent9,11, by disrupting pheromone diffusion12, and by reducing overall foraging activity13,14. In honey bees, air pollutants block chemical communication, impeding them from coordinate swarming by weakening the electroantennography response of their antennae to alarm pheromones15. Moreover, elevated levels of PM2.5 in the atmosphere affect skylight polarization, which honey bees use for navigation during their foraging trips, resulting in longer trip durations16,17. Interactions between such stressors can synergistically affect honey bees’ ability to withstand stress18, suggesting that air pollution may compromise honey bee survival. Yet, few studies have assessed the potential impacts of air pollution on pollinator survival, often constrained by limited access to large-scale data on these animals (but see18). Empirical studies suggest that ensuring access to a high-quality diet can enhance honey bee resilience to climatic stressors such as heat19. Indeed, the availability and quality of vegetation resources are important regulators of honey bee populations and health20,21. In addition, vegetation cover has been proven to reduce various types of air pollutants such as PM10 and O322,23,24. However, other studies advance that air quality is probably the main driver of bee health and not vegetation availability, but these analyses were performed without considering their interactive effect18. Hence, the relationship between vegetation and bees’ resilience to increased air contaminant levels remains unclear3.

Honey bees play a pivotal role as pollinators in (agro)ecosystems, enhancing wild plant variety and pollinating a large portion of the world’s food resources20. Thus, using honey bees as bioindicators25,26 of poor air quality would help determine whether and how vegetation availability mitigates its adverse effects, enabling the development of better biodiversity conservation, beekeeping, and land management practices. Here, we assess the impacts of air quality and vegetation availability on honey bee colony survival, combining statistical modeling and machine learning techniques tailored for survival analysis. Specifically, we address the following questions: (1) Will honey bee mortality increase with poor air quality? (2) Will high vegetation availability mitigate the negative impacts of poor air quality? Based on the model’s predictions and air quality data for 2024, we developed a risk map of honey bee mortality for 2024.

To assess the impact of various air quality-related stressors on honey bee colony survival, our analysis incorporated a range of environmental factors proven to influence bee population health16,21,27,28,29,30, including the daily average Air Quality Health Index (AQHI)31, daily average ozone concentration (O3 µg m-3), daily average temperature (°C), daily total precipitation (mm), daily average wind speed (km h-1), and the Normalized Difference Vegetation Index (NDVI) as a proxy of vegetation availability18 (see Methods for further details). We employed anonymized data with a sample size of 103 477 hives tracked over three beekeeping seasons across Canada and the United States from 2020 to 2023 (Supplementary Fig. 1). We performed the analyses using two distinct methodologies: (1) one involved statistical modeling using generalized linear mixed models (GLMM) to account for differences in honey bee mortality among beekeeping operations, and (2) a machine learning algorithm, the Random Survival Forest (RSF)32 to predict beehive mortality and evaluate which environmental factors were most important for predicting bee mortality. Our results show that air quality is an important predictor of beehive mortality, as mortality was greater in regions of poor air quality in Canada and the U.S. We also show that greater vegetation availability dampens the negative effects of poor air quality, suggesting that proper landscape management practices may attenuate the consequences of poor air quality on honey bee survival.

Results

For the statistical analysis, we report the best out of 33 GLMMs based on the Akaike Information Criterion (Supplementary Table 1), all predicting honey bee colony mortality (y). We found strong evidence that honey bee mortality increased with elevated ozone concentrations (Table 1 and Supplementary Fig. 2). Indeed, the baseline probability of mortality at mean ozone concentrations was 45.8%, increasing up to 73.5% (i.e. +27.8% increase) for a one unit increase in standard deviation of ozone concentration. We also found evidence of a strong effect of vegetation availability, where greater vegetation was associated with a reduced likelihood of bee mortality (Table 1 and Supplementary Fig. 2). As expected, both variables interacted with each other, such that greater vegetation availability attenuated the adverse effects of ozone on beehive mortality (Table 1 and Fig. 1). In addition, the risk of beehive mortality was reduced when wind speeds were slower, but the effect was weaker compared to ozone concentration and vegetation availability (Table 1 and Supplementary Fig. 2). Therefore, we did not find evidence that wind speed strongly attenuated the effect of ozone concentration. Other interactions were weak (Table 1). Overall, the fixed effects explained 15.5% of the total variance in beehive mortality (R2marginal = 0.155, Table 1).

Fig. 1: Predicted interactive effect of atmospheric ozone concentration and vegetation availability (NDVI) on honey bee hive mortality estimated by the GLMM with the lowest AIC.
figure 1

The probability of death is represented by a color gradient, where lighter colors (yellow) indicate higher chances of mortality. The chances of mortality are at their highest when the NDVI is low and ozone concentrations are elevated. Predictions cover 99% of the data, as very high NDVI values were infrequent in our observations.

Table 1 Results of the generalized linear mixed model with the lowest AIC estimating the effect of environmental variables on the beehives’ mortality (model 33)

The random effects show an important variance in intercepts and slopes among beekeeping operations (R2conditional = 0.734, Table 1), suggesting that the impact of ozone concentration on beehive mortality varies substantially among different locations (Supplementary Fig. 3). Moreover, operations that had greater average mortalities also displayed a steeper curve for ozone concentrations, revealing a more pronounced effect of ozone on beehive mortality for such operations (i.e. random effects correlation, Table 1 and Supplementary Fig. 3). These results highlight differences in how each beekeeping operation reacts to poor air quality, indicating local and regional variation in the relationship between beehive mortality and air quality.

To assess the importance of environmental variables in predicting beehive mortality, we trained a RSF on 85% of the dataset and assessed its performance on the remaining 15%. The variables used for prediction were the average AQHI, the average NDVI, average wind speed, average precipitation, and average temperature. Optimal parameters for the RSF were selected using a random search. We evaluated the model’s performance using Uno et al.’s (2011) modified concordance index33 and the time-dependent AUC34,35,36 (see Methods for further details). The model scored 76.5% for the concordance index, indicating that 76.5% of the compared pairs aligned with the survival event and its associated probability. The time-dependent AUC indicates that the model performed well, with an average AUC score of 93.6% (Supplementary Fig. 4).

The analysis of variable importance (see Methods for further details) revealed that the most important variables in predicting beehive mortality were precipitation, air quality (AQHI), and temperature, all having similar mean importance scores (Fig. 2). Contrary to our expectations, wind speed and vegetation availability (NDVI) were not as important as the aforementioned variables, having lower importance scores (Fig. 2).

Fig. 2: Variable importance measuring the contribution of each variable used in the Random Survival Forest to predict honey bee hive mortality. Each variable importance was computed using the permutation importance algorithm.
figure 2

Panel (a) shows, for each variable, the average decrease in model performance from 15 permutations with its standard deviation. Panel (b) shows the relative contribution of each variable.

Using the risk scores predicted by the RSF and air quality data from the OpenWeather API, we computed a risk map for honey beehives in 2024 for Canada and the U.S. (Fig. 3). Predictions indicate that the risks of honey bee mortality are greater in western U.S. as well as in the Northeast coast, regions characterized by lower air quality in 2024. Hence, conservation strategies could incorporate such risk maps to enable targeted approaches for protecting honey bee populations, directing resources towards areas with increased vulnerability due to poor air quality.

Fig. 3: Risk map of honey beehive mortality for 2024 in Canada and the U.S.
figure 3

The map displays extrapolated risk scores for 2024. Those were generated using predicted risk scores from the RSF model and air quality data (AQHI) obtained from the OpenWeather API. The AQHI values represent averages from April 1 to September 1, 2024.

Discussion

Combining statistical and machine learning modeling with an extensive longitudinal dataset on honey bee health, our study shows that air quality is an important predictor of beehive mortality. Our predictive approach enabled us to build a risk map, based on air quality data, that outlines regions of critical importance for bee health monitoring. Importantly, we show that vegetation availability attenuates the negative impacts of poor air quality on beehive mortality. Therefore, our work underlines the value of using honey bees as bioindicators of poor air quality to encourage proper land management practices, and ultimately, protect multiple pollinator species.

Out of all the environmental variables we analyzed, mean air quality emerged as the second most influential in explaining honey bee mortality after precipitation. Empirical studies have outlined adverse effects of poor air quality on immune system functionality, navigation, olfaction, and foraging behavior9,14,16,37,38. Moreover, using field samples collected in Bangalore (India), Thimmegowda et al. (2020) showed that Giant Asian honey bees (Apis dorsata) exposed to higher air pollution levels had reduced chances of survival18. Our findings align with these observations, establishing a clear link between poor air quality and increased honey bee mortality. Importantly, Thimmegowda et al. (2020) identified reduced foraging as one of the most probable causes linking poor air quality to reduced survival. Consequently, air pollutants may significantly contribute to the decline of pollinator populations by directly impairing their foraging ability, thereby limiting their capacity to provide essential ecosystem services such as pollination. Crop pollination is a major source of revenue for beekeepers in North America, contributing up to $3.18 billion in additional harvest revenue in Canada alone39,40. Such findings highlight a pressing need for policies aimed at reducing air pollution in order to maintain global food security.

We also observed substantial variation in how colonies from each beekeeping operation responded to elevated ozone concentrations, suggesting that the effects of air quality may be of more critical importance in certain areas. At the regional level, these differences could be due to seasonal changes in adverse air quality events and in vegetation availability that our analyses do not take into account. For instance, in Quebec, forest fires typically influence air quality during spring and early summer while in western regions, such events are more prevalent in late summer41. Yet, median NDVI values were similar among studied regions (Supplementary Fig. 5). At the local level, differences in beekeeping practices associated with the use of agrochemicals, pest management, proper monitoring, and operation size, may also explain why the effect of ozone concentration on honey bee mortality was stronger in some beekeeping operations42,43,44. Local attributes of vegetation density and availability may also drive the observed differences among operations in the relationship between mortality and ozone concentration, which we did not analyze here.

Although not as critical for predicting honey bee mortality, we observed increased chances of survival in regions with greater vegetation availability. One explanation could be that vegetation availability provided access to a larger pool of resources. Landscape changes alter the access to resources for honey bees, which are typically exposed to expansive farmlands and cropping systems45,46,47,48. These habitats are characterized by a limited variety of dominant crops such as almonds, cranberries, blueberries, apples, and raspberries49. Studies have shown that the abundance and diversity of surrounding floral species influence how much pollen and nectar pollinators collect, and the range of pollen types they gather46,50. Pollen is rich in both macro- and micronutrients and plays a crucial role in the physiological development and the immunity of honey bees, while floral nectar is rich in micronutrients and phytochemicals, acting as the primary source of carbohydrates to sustain daily activities46,51,52,53. Pollen and nectar also increase honey bees’ resilience against external stressors38,39,40. As such, shortages in these elements have been found to alter brood production and reduce survival in mason bees (Osmia bicornis) and honey bees54,55.

Our findings also suggest that vegetation availability has a protective effect on honey bees exposed to poor air quality. Indeed, hives exposed to poor air quality in areas with greater vegetation showed better survival rates compared to those in areas with lower vegetation. On the one hand, access to more resources may help mitigate the effects of air pollutants on foraging efficiency by making it easier for bees to locate food. For example, yellow-faced bumblebees (Bombus vosnesenskii) will tend to forage further from their colony to find patches with a richer diversity of flowers48. Thus, beekeepers may benefit from sending their colonies for pollination in yards surrounded by rich vegetation. On the other hand, it is also possible that vegetation acts as a protective barrier by absorbing parts of the atmospheric air pollutants. This can be observed by a negative correlation between air quality and NDVI24,56 (Supplementary Fig.6). Research shows that the primary mechanism by which trees and shrubs remove air pollutants is via uptake through leaf stomata and through the plant surface22. Additional mechanisms include microclimate regulation and wind speed reductions which contribute in reducing ground-level ozone concentrations57. Vegetation also provides a physical barrier by intercepting airborne particles, and reduces the risk of pesticide exposure in farmlands58. However, the timing of poor air quality events may influence the protective capacity of vegetation. For example, late summer decreases in NDVI could exacerbate honey bee mortality if air quality is unfavorable at this time of the year such as during forest fires. During the spring of 2023, Canada experienced exceptional weather conditions marked by warmth and dryness, setting the stage for a record-breaking fire season with up to 17 million hectares (ha) of land burned across the country41. Quebec and Alberta had exceptionally early starts according to the Canadian Interagency Forest Fire Center. In Alberta, wildfire activity started at the end of April and continued past the legislated end of wildfire season into November. In Quebec, 182 fires were ignited by lightning on June 1st alone and the total area burned (4.5 million ha) was greater than the sum of the area burned over the last 20 years. In the U.S., western regions have experienced an increase in the frequency and the spread of wildfires over the past 20 years59,60,61, with evidence pointing to wildfires being the main cause of above-average PM2.5 concentrations62. The relationship between honey bee mortality and air quality combined with the increasing trend of fires in both countries is alarming. Projections suggest that wildfire-induced PM2.5 concentrations are expected to nearly double by 205063, raising concerns about the future well-being of honey bee populations. Based on air quality data for 2024, our risk map reflects this trend in the Western U.S., but also in the Great Lakes and Northeast coast as the AQHI was much higher (i.e. poor quality) in these regions (Fig. 3). Air quality in the U.S. is also influenced by the smoke from Canadian wildfires64. While changes in air quality due to forest fires can be highly dynamic, our model suggests that bee health in these regions should be closely monitored to mitigate long-term consequences. Therefore, increasing vegetation cover should not be viewed as a standalone measure but rather as part of an holistic approach that includes proper monitoring alongside other strategies (e.g. biomarkers, integrated systems, pollinator ecology) to protect bee health65,66,67,68.

One limitation of our study is that our predictive model may underestimate seasonal mortality in regions of Northern latitudes such as Quebec, Alberta, or Eastern U.S. due to mortality rates increasing during winter. For instance, hives are typically overwintering in Canada between mid November and April and mortality is usually not monitored by beekeepers during this period. Winter mortality in Canada is often calculated at the beginning of the new season in spring, leading to right-censoring of the survival data for the beekeeping season (Supplementary Fig. 7). Therefore, regions with warmer climates (e.g. California, Florida) may obtain more accurate mortality data due to keeping hives active during winter by providing them with supplemental food. Another caveat of our study is that the NDVI can only be seen as a proxy of resource availability and does not consider species richness nor relative abundances (i.e. resource diversity). Hence, future implementations of our approach could integrate additional metrics to assess how different vegetation characteristics influence the relationship between air quality and bee mortality.

This study unveils a critical link between honey bee mortality and air quality in Canada and the U.S. Our research solidifies evidence linking poor air quality to compromised honey bee health18,37, and establishes a clear link between honey bee mortality and limited vegetation availability. Interestingly, the survival response of honey bee colonies to poor air quality varied among beekeeping operations, prompting questions about the interplay between ecological phenomena and beekeeping practices–a concern that translates into considerable workload and economic implications for beekeepers. Yet, our observations make a strong case that vegetation availability is crucial to mitigate the consequences of poor air quality on honey bee health, emphasizing the need for research and mitigation strategies to address the upcoming challenges associated with escalating wildfires in Canada and the U.S. Hence, maintaining robust and diverse vegetation cover emerges as a simple solution to support ecosystem health and protect honey bee populations.

Methods

Honey bees dataset

We extracted anonymized data from beekeepers who use Nectar 1.19, an apiary tracking and management platform tailored for commercial beekeepers. The platform empowers beekeepers to maintain a comprehensive record of all actions, locations, movements, mortality, and other information concerning their beekeeping operations. Therefore, we had access to historical data from the 2020–2021, 2021–2022, and 2022–2023 beekeeping seasons for beekeepers located in Canada and the U.S. Our dataset is structured as an open cohort, allowing new colonies to enter at any given point in time during the beekeeping seasons. We only analyzed operations with at least one full year of follow-up data to effectively track mortality rates throughout a whole beekeeping season. We defined a colony as dead if its recorded death date fell within the current beekeeping season up until the start of the next season. The beekeepers are located in the U.S. states of California, Wyoming, Florida, Maine, North Dakota, and New York, and the Canadian provinces of Alberta, Nova Scotia, and Quebec. The final dataset contained a total size of over 103 477 hives with 112 626 observations.

Environmental dataset

Using publicly available data, we examined environmental factors likely to impact honey bees survival based on the literature3,16,21,27,29. The variables we extracted include daily measurements of the Air Quality Health Index (AQHI), daily average ozone concentration (O3 µg m–3), daily average temperature (°C), daily total precipitation (mm), daily average wind speed (km h–1), and a measure of vegetation availability derived from satellite imagery, the Normalized Difference Vegetation Index (NDVI).

The AQHI is used by the government of Canada to assess and communicate health risks associated with air pollution. This metric represents a general assessment of air quality on an unbounded scale from 1 to 10+ by combining air pollutant concentrations, with higher values indicating worse air quality and increased health risks. The pollutants include Ozone (O3), particulate matter 2.5 (PM2.5), and Nitrogen Dioxide (NO2). We extracted the air quality data using Airpyllution69, a Python package offering access to current and historical (accessible from November 27th, 2020) air quality information via the OpenWeather API. Following the formula developed by Stieb et al., (2008), we calculated the seasonal average AQHI over the period each hive spent in its yard31.

We used the NDVI to assess whether vegetation availability reduced the negative impacts of poor air quality on honey bee survival. This index can be used as an indicator of vegetative water, energy, and carbon balance70. Moreover, it holds significant importance as a parameter in ecosystem models71, models estimating vegetation net primary production72, and models predicting crop yields73, and can be used as indicators of resource availability and habitat quality18. We calculated the NDVI values using the near infrared and the red bands of the Harmonized Sentinel-2 MSI L2 products74. Both bands are offered at a 10 meter resolution. For each hive, we defined a 3 km buffer around the yard location, to take into account the foraging distance of honey bees47,75. Prior to performing the NDVI calculations for each hive, we used cloud and water bodies masking to exclude extreme NDVI values. We then computed the seasonal average of the NDVI by extracting values in 10-day increments over the period each hive spent in the yard for a given season. Lastly, for the amount of precipitation and the wind speed, we averaged the values over the period spent in the yard using the daily local values. We performed the calculations in Python, using the Earth Engine client library76 for the NDVI and the Meteostat library77 for precipitation and wind speed.

Statistical model

We fitted 33 generalized linear mixed models (GLMM) with a binomial error distribution, all estimating the relationship between honey bee colony mortality (y) as a function of combinations of environmental factors (X). We defined models that included either main effects exclusively or combinations of main and interaction effects between environmental predictors. We also tested different configurations of random effects, with models defined with random intercepts exclusively or with random slopes for either the region of study (i.e. U.S. state or Canadian province) or the operation managing the beehive. Considering the computational complexities associated with linear mixed models, we performed a convergence check before performing model comparisons. We evaluated and compared the predictive performance among models using the Akaike Information Criterion78 (Supplementary Table 1). We standardized predictor variables to mean and unit variance prior to fitting the models. The fixed effects of the most parsimonious model (Table 1 and Supplementary Table 1) include the average atmospheric ozone concentration (μg m–3), vegetation availability (NDVI), the average wind speed (km h–1), and all combinations of two-way interactions as well as a three-way interaction between the three variables. Random effects include random intercepts and slopes relating mortality to ozone concentration for each operation managing the beehive (n = 17). We fitted the models in Python using the Lmer function of the Pymer479 package. We computed the predictions and the figures as well as Nakagawa and Schielzeth’s R2 for GLMM80 in R version 4.4.1 with the packages lme4 (version 1-35.4), ggplot2 (version 3.5.1), and performance (version 0.12.3).

Machine learning model

To predict honey bee mortality and assess the importance of environmental variables, we fitted a Random Survival Forest (RSF) using Python’s RandomSurvivalForest model32. RSF analysis extends Breiman’s81 random forest machine learning technique, specifically designed for investigating time-to-event observations. This method offers a flexible alternative to traditional survival analysis by eliminating the need for strict parametric or proportional hazards assumptions82, making it well suited for datasets with right-censored observations such as honey beehives that survived the season. We applied a min-max standardization to ensure that all variables had similar scales before fitting the model. We constructed the RSF model by first performing a grid search across common tuning parameters, followed by a randomized search to refine the hyperparameters. We tested different numbers of trees, ranging from 100 to 1000 in increments of 100, and set the number of variables considered at each split to either the square root or base-2 logarithm of the total number of predictors. We set the candidate values for the maximum depth of the trees to 10, 30, and 60, while the minimum number of samples required at each split and at each leaf node were each specified as 10, 20, or 40. We configured the randomized search algorithm to perform three-fold cross-validation over five iterations to identify the optimal combination of hyperparameters.

We trained the RSF model using 85% of the dataset and then evaluated its performance on the remaining 15%. We used the modified concordance index33 and the time-dependent ROC-AUC34,35,36 to evaluate the model’s performance. The modified C-index assesses the model’s ability to accurately rank two random individuals (i.e. hives) according to their observed survival times, and assumes random censoring and censoring that is independent of the variables. The modified concordance index is based on inverse probability of censoring weights and does not depend on the distribution of censoring times in the test data, making it more consistent and robust than Harell’s concordance index. Its values range between 0 and 1, with 0.5 being a random model and 1 a perfect model. In contrast, the time-dependent ROC-AUC assesses how well the model distinguishes between hives that died and those that survived at specific time intervals. It allows us to assess at a given time point how well a model can distinguish hives who will experience an event by time (sensitivity) from those that will not (specificity). This approach provides insight into the model’s predictive performance over time.

Lastly, we assessed the importance of variables in predicting the survival probability of honey bee colonies using scikit-learn’s permutation importance algorithm83. This method evaluates the decrease in the model’s performance when a particular variable’s values are randomly shuffled. A large drop in performance indicates that the variable is crucial for accurate predictions. In our study, we computed variable importance based on the decrease in the C-index when each feature was permuted. We used 15 permutations for each variable.

Map generation

Using the Airpyllution package and the OpenWeather API, we extracted hourly air pollutants data for a grid of points equally spaced at 1-degree intervals, from April 1st to September 1st 2024. From the extracted Ozone (O3), particulate matter 2.5 (PM2.5), and Nitrogen Dioxide (NO2) data, we applied the formula by Stieb et al., (2008) to compute the average AQHI over the period for each point on the grid. We then used the predicted scores from the RSF model to extrapolate new predicted scores for 2024. To do so, we fitted an elastic net regression model of the RSF risk scores (y) as a function of the 2024 AQHI values (x) using the ElasticNet function from scikit-learn. We then applied a min-max transformation of the extrapolated risk scores, and performed an inverse-distance weighted interpolation to create a continuous spatial map of newly predicted risk scores. The final map was generated in ArcGIS Pro version 3.3.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.