Main

Global forests have been documented to represent a consistent carbon sink1 that is essential to mitigating climate warming2. The diversity of tree species in forests has been found to play a pivotal role in maintaining the functionality of forest ecosystems as a vital carbon sink on the basis of plot observations3, further sustaining ecosystem stability4,5. The global forest extent has experienced gains and losses over the past three decades6, owing primarily to forestry, agriculture and wildfires7, while climate change and rising CO2 have caused continental-specific trade-offs in forest dynamics (for example, growth or mortality)8 and have been reported to cause shifts in trees species at the plot level9,10. Yet, the response of tree species diversity to changes in forests over the past few decades remains unknown at a large spatial scale. Furthermore, the responses of forest diversity to ecosystem productivity and stability have been documented on the basis of plot measurements4,11. These relationships established at the plot level may vary depending on regional climate and soil conditions12,13, and therefore cannot easily be used to generalize across a range of diverse environmental conditions. The lack of large-scale spatial continuous mapping of tree species diversity, and the temporal dynamics thereof, impedes further exploration of how tree species diversity contributes to various aspects of forest ecosystem functioning.

Existing studies have attempted to estimate global forest tree species diversity14 and vascular plant diversity on the basis of statistical relationships between species diversity metrics and environmental variables15, but these diversity maps represent static maps provided in coarse spatial resolution defined by the environmental input data. Satellite-based tree species diversity estimates have been conducted on the basis of spectral heterogeneity16,17,18,19 and have produced reasonable predictions of tree species diversity, thereby offering improved spatial resolution and adding a temporal dimension to the mapping. Long-term satellite-based species diversity estimation would further make it possible to identify temporal changes in diversity across large spatial scales, which currently represents a major unknown. The combined impacts of climate change and human appropriation have profoundly transformed global forest ecosystems in recent decades20, highlighting the urgent need for continuous, time-varying observations.

Boreal forests represent 30% of the Earth’s total forested area21, enduring harsh winters with freezing temperatures for approximately six to eight months and snow cover persisting for several months10,22. Boreal forests are further characterized by a much lower diversity of tree species than tropical forests21,23, and even minor positive or negative changes in species diversity may have substantial effects on the boreal ecosystems’ carbon uptake and stability24. For example, Larix spp. (the dominant species across Eurasian boreal forests) are approaching a thermal tipping point at their southern margin, which is expected to cause an abrupt ecological collapse of ecosystem functioning (for example, the capacity of carbon uptake) of Larix spp. under continued climate warming25. These unique characteristics of boreal forest ecosystems make them particularly susceptible to climate change, and boreal forests have therefore been identified as a critical ‘tipping element’ of the Earth’s climate system26. Profound climate warming and increasing frequency of fire activity have been observed in boreal ecosystems27,28, and the associated droughts have been documented to trigger widespread increases in tree mortality and decreases in the biomass carbon sink29,30,31. Yet, the extent to which boreal forest tree species diversity has undergone changes and the potential impact on ecosystem functioning remain largely unexplored to date. This knowledge would make a notable contribution to the assessment of carbon sequestration sustainability in boreal forests32 and boreal forest ecosystem transitions9,33.

In this study, we aim to quantify changes in boreal tree species diversity with climate change over recent decades and the impacts of species diversity change on the boreal ecosystem carbon cycle. We developed a framework to produce spatially continuous representations of forest tree species diversity quantified by a Shannon diversity index (H′)34,35 accounting for both species richness and evenness across the boreal forest ecosystems (Supplementary Fig. 1). This is achieved by deploying machine learning based on a combination of field and Landsat satellite observations for the years 2000, 2010 and 2020. We used 5,312 field observations including observations of 190,516 trees and a deep learning approach based on the InceptionTime architecture to train a predictive model. Subsequently, 55,560 Landsat scenes were subjected to the model to upscale tree species diversity across boreal forest ecosystems. Next, we analysed the spatiotemporal dynamics of boreal tree species diversity in relation to a comprehensive set of environmental factors potentially impacting diversity (including climate, population density, fire activity and soil conditions). Finally, we analysed the spatiotemporal changes in boreal tree species diversity associated with the productivity and temporal stability of boreal forest biomass. This analysis is particularly timely in the boreal areas given the observed distinct climate warming and increase in wildfire activity in this region, and it further contributes to generalizing our current understanding of the associations between tree diversity and ecosystem functions.

Results

Spatiotemporal changes in tree species diversity

The boreal biome has experienced a dramatic increase in air temperature over the past few decades, compared with other regions across the globe, with continuous warming by 2100 being projected in future climate scenarios (Fig. 1a and Supplementary Fig. 2). This warming is expected to profoundly alter the structure and functioning of boreal forest ecosystems. We used Landsat satellite data, trained by plot-based measurements of tree species and a deep learning approach based on the InceptionTime architecture (Methods, Extended Data Fig. 1 and Supplementary Figs. 3 and 4), to predict tree species diversity as described by H34,35, with a high accuracy (coefficient of determination (R2), 0.77; root mean square error, 0.12) for the entire boreal forest area (Extended Data Fig. 2). We observed a 12% increase in boreal tree species diversity by H′ values from 2000 to 2020, with average H′ values increasing from 0.41 ± 0.14 (mean ± standard deviation) to 0.46 ± 0.16 (Fig. 1b), representing a 5% ± 2% increase during 2000–2010 and a 7% ± 3% increase during 2010–2020.

Fig. 1: Temporal changes in tree species diversity in relation to climate warming across the boreal forest ecosystems.
Fig. 1: Temporal changes in tree species diversity in relation to climate warming across the boreal forest ecosystems.
Full size image

a, Observed and simulated changes in air temperature during the growing season (May–October, °C) are averaged for boreal forests during 1960–2020 and 2015–2100, on the basis of CRU TS v.4.07 data86 and the outputs of the Coupled Model Intercomparison Project Phase 6 (ref. 87). The shaded areas represent 95% confidence intervals. b, Tree species diversity, represented by H′, estimated using Landsat satellite data in 2000, 2010 and 2020. The boxes and whiskers show the 5th, 25th, median, 75th and 95th percentiles of the H′ values, while the black dots show the mean of the H′ values for each year (n = 128,715,043).

Large spatial variations in tree species diversity were observed across boreal forest ecosystems (Fig. 2a–c; zoomed-in figures are provided in Extended Data Fig. 3 for improved visual interpretation). In Eurasia, relatively high tree species diversity was observed over the Okhotsk–Manchurian taiga and Scandinavian–Russian taiga, located in northern Scandinavia as well as the northwestern and central regions of Russia, with average H′ values of 0.56 ± 0.16 and 0.55 ± 0.18, respectively, whereas a lower diversity (0.28 ± 0.07) was observed in the Northeast Siberian taiga located at the northern edge of Eurasia (Fig. 2a–c and Supplementary Table 1). In North America, the eastern forest–boreal transition region, characterized by boreal and temperate tree species, had the highest diversity (0.77 ± 0.20), followed by Central Rockies forests (0.49 ± 0.16) and Mid-Continental Canadian forests (0.44 ± 0.12). The Alaska–Yukon lowland taiga and Northern Canadian Shield taiga, dominated by spruce species, had the lowest diversity, with average H′ values of 0.33 ± 0.09 and 0.31 ± 0.08, respectively.

Fig. 2: Spatiotemporal changes in boreal tree species diversity.
Fig. 2: Spatiotemporal changes in boreal tree species diversity.
Full size image

ac, Spatial patterns of tree species diversity by H′ values across boreal forests in 2000 (a), 2010 (b) and 2020 (c). d, Changes in H′ between 2000 and 2020 superimposed on mean seasonal air temperature. H′ is aggregated at a spatial resolution of 1.5° × 1.5° and is shown by dots, with larger dots indicating higher diversity gains (blue) or losses (purple). Basemap in ad from Natural Earth (https://www.naturalearthdata.com/). e, Frequency plot of boreal tree species diversity in 2000, 2010 and 2020; the vertical lines denote the mean H′ values of 0.41 in 2000, 0.43 in 2010 and 0.46 in 2020. f, Frequency plot of gains and losses in tree species diversity during 2000–2020; the vertical dashed line represents no change in the H′ value.

The extent of tree species diversity gains (calculated from the differences in H′ values between 2020 and 2001; Methods) accounts for 53% of boreal forest areas (approximately 8,165,000 km2), and losses accounted for 17% (approximately 2,684,000 km2), while areas of no distinct change (determined by changes in H′ values ranging from −0.01 to 0.01) were observed for 20% of all boreal forest areas (Supplementary Table 2). Diversity gains in boreal forest were primarily observed in the Scandinavian–Russian taiga (Fig. 2d and Supplementary Table 3), with H′ values increasing by 35% ± 16% (0.18 ± 0.08); the Okhotsk–Manchurian taiga (33% ± 17%, 0.17 ± 0.09); and the eastern forest–boreal transition region (27% ± 11%, 0.19 ± 0.08). Diversity losses occurred mainly in the Kamchatka Mountain forest and the West Siberian taiga, with H′ values decreasing by 25% ± 16% (−0.13 ± 0.08) and 26% ± 10% (−0.13 ± 0.05), respectively. Areas of gains in diversity were mainly observed in warmer regions, while losses tended to occur in colder regions (Fig. 2d). Moreover, a clear reduction in the proportions of relatively low diversity values (approximately H′ < 0.4) was observed, accompanied by an increase in the proportions of relatively high diversity values (H′ > 0.4) at the pixel level for the three epochs from 2000 to 2020 (Fig. 2e); also, a higher frequency of pixels of diversity gains than losses was observed (a gain-to-loss ratio of 3:1) (Fig. 2f).

We further assessed the relative contributions of changes in species richness and evenness to diversity dynamics using a multiple linear regression model, on the basis of forest inventory data with repeated measurements (n = 648; most of the plots are located in North America) (Extended Data Fig. 4). The results show that in the eastern forest–boreal transition region and Canadian Shield forests, species richness and evenness contribute almost equally to the changes in tree species diversity (H′ values), while the contribution of evenness (β = 0.47; β represents the sensitivity of change in diversity against changes in the explanatory variable) significantly exceeds that of richness (β = 0.21) in the Northern Canadian Shield taiga and the Central Rockies forests (0.56 versus 0.28).

Determinants of tree species diversity changes

We quantified the spatial variability in boreal tree species diversity for the three epochs driven by a range of potential environmental variables accounting for geographic variations in climate, soil properties and disturbances (fire activity and human population density) (Supplementary Fig. 5) using a boosted regression tree (BRT) algorithm (Methods). We show that BRT can explain, on average, 62% of the spatial variability in boreal tree species diversity (Fig. 3a). We found mean seasonal temperature to be the most important predictor (53%), followed by mean seasonal precipitation (21%), while each of the remaining individual variables contributed less than 10% to the variability in diversity, including elevation (9%), human population density (6%), fire activity (4%), cation exchange capacity (3%), topsoil organic carbon (2%) and topsoil sand fraction (2%). When investigating the diversity response to an individual variable independent of other variables (partial dependency plots; Methods), we found that temperature shows a strong positive impact on diversity when mean seasonal temperature is below 12 °C (Supplementary Fig. 6), while the impact appears to reach a plateau for values above this threshold (Fig. 3b). Precipitation shows a clear positive effect with mean seasonal precipitation above 100 mm that tends to saturate after reaching 400 mm (Fig. 3c). Elevation generally shows a weak positive impact on diversity until approximately 700 m, beyond which elevation shows a weak negative impact (Fig. 3d). Human population density shows a weak positive effect on diversity with population density increasing to 1.5 people per km2, from where population density exerts a slightly negative effect on diversity (Fig. 3e). Similar patterns of the partial responses were observed for each individual period (Extended Data Fig. 5 and Supplementary Figs. 79).

Fig. 3: Environmental determinants of observed spatial variability in boreal tree species diversity.
Fig. 3: Environmental determinants of observed spatial variability in boreal tree species diversity.
Full size image

a, The relative importance of predictor variables controlling the spatial variability of boreal tree species diversity, determined by a BRT model. The model was run ten times to avoid stochastic errors. The coloured circles represent the relative importance of each predictor variable for each run, the bars are the mean relative importance and the error bars are one standard deviation of the mean. TMP, mean seasonal temperature; PR, mean seasonal precipitation; POP, human population density; OC, topsoil organic carbon; SAND, topsoil sand fraction; CEC, cation exchange capacity; FIRE, fire activity frequency; DEM, digital elevation model. be, Partial dependency plots of the top four variables explaining variability in boreal tree species diversity for 2020. The blue lines are smoothed representations of the responses, with fitted values (model predictions based on the original data), and the shaded areas represent 95% confidence intervals. The distributions of the predictor observations are indicated by the density of the vertical grey lines above the x axis.

Mean seasonal temperature and precipitation exerted strong positive effects on the spatial variation of boreal tree species diversity, whereas temporal changes in diversity were observed to show a negative response to increasing temperature over the past 20 years (ρ = −0.51, P < 0.01) (Fig. 4a,d). While lower rates of increasing temperature are associated with increasing trends in diversity, this relationship gradually changes towards higher rates of increasing temperature (exceeding 0.065 °C yr−1) associated with decreasing trends in diversity. These areas of decreasing trends are primarily observed to be located in northeastern Siberia (Fig. 4a). Similarly, the trends in diversity show a negative response to increasing fire activity frequency (Fig. 4b,e). The diversity trend generally shows a positive response to precipitation in cases of minor positive and negative precipitation trends, whereas for more extreme precipitation trends, the positive diversity trends approach zero (Fig. 4c,f). Furthermore, a negative relationship between diversity and stand age was observed across boreal forests (Extended Data Fig. 6), with higher gains in diversity in young forests than in mature forests. Accounting for these variables together, our analysis reveals that temperature trends and stand age exert a greater relative influence on regulating changes in diversity than precipitation trends and the frequency of fire activity (Supplementary Table 4). When the effect of stand age is removed, temperature trends play a predominant role in controlling changes in diversity (Supplementary Table 4).

Fig. 4: Temporal changes in boreal tree species diversity in relation to changes in environmental factors.
Fig. 4: Temporal changes in boreal tree species diversity in relation to changes in environmental factors.
Full size image

ac, Spatial distribution of trends in tree species diversity associated with trends in TMP (a), FIRE (b) and PR (c). H′ is aggregated at a spatial resolution of 1.5° × 1.5° and is shown by dots, with larger dots indicating higher tree species diversity gains (blue) or losses (purple). Basemap in ac from Natural Earth (https://www.naturalearthdata.com/). d, Response of tree species diversity trend to TMP trend. e, Response of tree species diversity trend to fire activity frequency trend. f, Response of tree species diversity trend to PR trend. The colours of the dots indicate the locations of changes in TMP, FIRE and PR corresponding to the panels above. The sizes of the dots indicate one standard deviation of tree species diversity trends at a spatial resolution of 1.5° × 1.5°, with larger dots indicating higher variations of species diversity. The solid lines are fitted by a linear or quadratic model, and the shaded ribbons indicate the 95% confidence intervals of the fitted lines. The rugs on the x axis show the distribution of dots, while the grey contour lines show the 50%, 75% and 95% quantiles of the occurrence probability of the dots. ρ indicates the Spearman correlation coefficient between detected variables based on the raw data. The two-sided Student’s t-test is used for statistical testing, and P values are indicated.

Association with carbon fluxes, stocks and stability

We quantified the associations between boreal tree species diversity (and spatiotemporal changes therein) and six independent indicators (and spatiotemporal changes therein) characterizing forest carbon, including carbon fluxes (net primary production (NPP), kernel normalized difference vegetation index (kNDVI) and vegetation optical depth climate archive Ku-band (VOD Ku-band)), carbon stocks (aboveground-biomass-based LiDAR and optical satellite data, and L-band passive microwave data (AGB_1 and AGB_2)) and the temporal stability of boreal forest biomass (Methods). Our results, based on a multiple linear regression, show significantly (P < 0.05) positive associations between diversity and forest carbon fluxes and stocks across spatial and temporal scales (Fig. 5 and Supplementary Figs. 1012).

Fig. 5: Spatiotemporal changes in boreal tree species diversity associated with the forest carbon cycle.
Fig. 5: Spatiotemporal changes in boreal tree species diversity associated with the forest carbon cycle.
Full size image

The colour scale indicates the strength of the relationship (standardized coefficients β) between each predictor (bottom) and each response variable (left) (NPP, kNDVI, AGB and temporal stability of AGB) in a multiple linear regression model. The black dots indicate significant impacts at a 95% confidence level, and the hashed areas indicate no statistical relationships. Diversity, climate and disturbances have mean values and temporal trends, with Δ indicating temporal trends of variables from 2000 to 2020. MODIS NPP, NPP from the MODIS MOD17A3HGF v.6.1 product; AGB_1, L-VOD-based AGB from the Soil Moisture and Ocean Salinity (SMOS) satellite; AGB_2, AGB from ref. 64; Stability, temporal stability of boreal forest biomass (calculated from AGB_2 from 2000 to 2019); AGE, stand age.

The spatial variability of all carbon indicators was significantly associated with tree species diversity and stand age, as well as with climate, disturbances, soil properties and topography; climate variables had the highest impact on forest carbon, followed by diversity, disturbances and stand age (Fig. 5 and Supplementary Fig. 10). In the spatial domain, diversity generally showed a positive relationship with forest carbon stock and fluxes, while stand age showed a negative relationship with carbon fluxes and a positive relationship with carbon stock (Fig. 5). Trends in NPP, kNDVI and AGB_2 were significantly positively correlated with diversity trends, while stand age showed significant negative relationships with trends for most carbon indicators (Fig. 5 and Supplementary Fig. 11). Temperature and precipitation changes showed significant positive relationships with trends in NPP and AGB_1, whereas temperature and precipitation had varying relationships with other carbon indicators (Fig. 5 and Supplementary Fig. 11). Disturbances were generally negatively correlated with most carbon indicators, with the impact of fire activity being larger than that of human population density. Elevation showed a positive correlation with most carbon indicator trends, whereas soil characteristics were both positively and negatively correlated (Fig. 5 and Supplementary Fig. 12). The temporal stability of boreal forest biomass was also found to be significantly associated with diversity at both spatial and temporal scales, while being co-regulated by climate, fire activity and topography.

Discussion

We developed a data-driven tree species diversity assessment framework that utilized remote sensing data in combination with in situ observations to generate spatially continuous boreal tree species diversity maps with a high level of spatial detail (30 m × 30 m), but also with a temporal dimension covering three different epochs around 2000, 2010 and 2020. This approach thus provides distinct advantages over other global mapping methods for diversity assessment15,23 and offers an unparalleled evaluation of the nature of spatiotemporal changes in boreal tree species diversity in response to global environmental changes at different scales of time and space. The success of the satellite-remote-sensing-based estimates of tree species diversity was mainly attributed to spectral heterogeneity (for example, plant chemical properties of the tissue related to photosynthetic pigments and water, branching structure, leaf size and colour, leaf clumping and leaf angle distribution) being sufficiently characterized by information from the visible, near-infrared and short-wave infrared regions16,36, while the InceptionTime deep learning approach used here captured well the complex relationships under the varying phases. Uncertainties in the prediction of diversity may still exist owing to data quality, environmental conditions (for example, lighting conditions and shadows) and the reflectance influenced by understory vegetation (that is, shrubs and grasses) in sparse forests18,37. However, our approach, including the use of segmentation and multiple vegetation indices in the model building, has largely reduced the direct effects of changes in greenness and tree cover on changes in H′ (indicated by the smaller R2; Extended Data Fig. 7). Future work could nonetheless consider techniques such as spectral unmixing, radiative transfer model inversion and data fusion to further reduce these uncertainties38.

Not surprisingly, the spatial patterns of forest tree species diversity show signs of latitudinal dependency, with decreasing species diversity towards the tundra biome, largely related to the temperature gradient from higher to lower temperatures23. However, supported by the high spatial resolution of the satellite data, the H′ maps (accounting for tree species richness and evenness) present distinctly spatial variations in diversity across areas at the same latitude, unlike previous studies of tree species richness generally displaying monotonic and homogeneous changes in diversity across boreal forest ecosystems15,23. This satellite-based spatial pattern of diversity may thus better reflect natural diversity changes in boreal forests.

We documented an overall increase in boreal tree species diversity during the period of analysis linked to climate warming, which is consistent with predictions and observations of other studies9,10,39. Warming of the climate fosters conditions conducive to the expansion of boreal forests and to the proliferation of species, achieved through mechanisms such as earlier snowmelt providing more time for seed germination40, sapling growth41, altered disturbance regimes42 and the augmentation of soil nutrient availability9,10. Particularly, moderate disturbances could catalyse tree community responses to climate change42, potentially shifting forest composition (both species richness and evenness) towards warm-adapted species (Extended Data Fig. 4)—for example, a transition from coniferous species to mixed species (temperate and boreal), as documented in southern boreal forest areas43,44. However, we observe that the diversity increase is negatively responding to increasing temperatures, suggesting that warming only to a certain extent can promote boreal tree diversity. A rapid increase in temperature is possibly detrimental to boreal tree diversity, because these species cannot adapt to such abrupt changes in temperature8, whereas pioneer species may adapt to the changes more quickly, thereby encroaching on the habitat of other species and potentially limiting their space and resources43,45. Moreover, extreme warming is likely to surpass the thermal tolerance of trees and triggers wildfires, resulting in tree mortality and thereby a reduction in tree diversity46,47. Such effects could particularly be occurring in northern regions with lower tree species diversity and scarce environmental resources (for example, low soil nutrients and seed availability)12,40, hindering the recovery of tree species. Here we observe a warming rate exceeding 0.065 °C yr−1 to have a negative impact on temporal changes in diversity. Such negative effects can also be enhanced due to the increased frequency of fire occurrences induced by rising temperatures48,49, as well as the co-occurrence with warming droughts50 and an increasing vapour pressure deficit exerting a higher demand for water availability51.

Increasing boreal tree species diversity is further found to be associated with high carbon stocks and the stability of boreal ecosystems, and thus there is no potential conflict of interest between preserving boreal tree species diversity and emphasising the role of boreal forest ecosystems as an essential global source of carbon sequestration3. Additionally, changes in tree species diversity and the stability of forest ecosystems can also be largely regulated by tree stand age. Our results suggest that young forests have higher increases in diversity and carbon cycling dynamics than mature forests, which is probably because young forests are undergoing rapid compositional/structural transitions. Finally, we note that some uncertainty may exist regarding the observed relationships introduced by the different data sources on carbon fluxes, carbon stocks and diversity, despite the efforts made (for example, the use of different data sources) to reduce such potential bias or uncertainty.

Our results indicate that increasing temperatures across the boreal zone overall corresponded to an increase in tree species diversity over the past two decades, but these increasing trends of diversity were reversed in areas of extreme warming, providing new insights into the long-term temporal changes in diversity in response to climate change across boreal forest ecosystems. However, here we did not account for the response of possible changes in dominant tree species due to climate warming, which may contribute to the change in productivity and stability of boreal ecosystems52,53. Further studies are also needed on the responses of forest structural diversity54 and functional trait diversity55 to climate change and their associated impacts on ecosystem functions. Incorporating these variables would enhance our understanding of the underlying mechanisms governing the associations between forest tree diversity, productivity and the temporal stability of ecosystems. Additionally, increasing diversity is probably associated with the emergence of a biome shift52,56, with trees and shrubs expanding towards tundra regions56,57, reshaping the structure and functioning of boreal ecosystems. Taken together, such an expanded knowledge base would provide a stronger foundation for promoting solutions for sustainable ecosystem functioning and mitigating the risk of destabilizing the terrestrial carbon sink under climate warming.

Methods

National forest inventory data

In this study, we collected boreal tree species diversity datasets from six countries (Supplementary Table 5). The datasets include 5,312 field observations of a total of 190,516 trees divided into 254 tree species. Most of the datasets come from national forest inventory databases, including Canada and China, while the remaining data come from publicly available datasets covering the United States (Alaska)58, northeastern Siberia59 and Northern Europe3,60. H′ was calculated on the basis of the number and species of trees in each sample plot35,61. Considering differences in plot size across datasets, we normalized forest tree species diversity to a common basis of 900 m2 in area and 10 cm in threshold diameter at breast height using plot area and diameter at breast height as predictor variables, following the approach adopted by ref. 23.

Landsat data

The Landsat surface reflectance data, with a spatial resolution of 30 m × 30 m, from the US Geological Survey Earth Resources Observation and Science archive were used to upscale boreal tree species diversity. We obtained three sets of Landsat data covering the growing season (May to October) for three time periods: 1999–2001, 2009–2011 and 2019–2021. We applied a three-year time interval for data compositions to increase the number of clear-sky satellite observations of the different time epochs. These data were obtained from Landsat-5 Thematic Mapper, Landsat-7 Enhanced Thematic Mapper Plus and Landsat-8 Operational Land Imager, and they were synthesized into monthly composites for the best-quality collection (that is, minimal cloud, fog and snow cover) of each period using published cross-calibration coefficients for surface reflectance (Supplementary Fig. 3).

Carbon indicator data

Multiple and independent long-term datasets related to carbon flux and stock were obtained to assess the association of tree species diversity with boreal forest carbon indicators. Two datasets were used to calculate the mean and changes of boreal forest carbon fluxes: the latest optimized annual NPP from the MODIS MOD17A3HGF v.6.1 product and the kNDVI introduced by ref. 62. The MOD17A3HGF is generated on the basis of the radiation use efficiency model that takes photosynthetically active radiation, leaf area index, climate factor and biome parameter as input. The kNDVI has high correlations with plot-level measurements of primary productivity and satellite retrievals of solar-induced fluorescence, and has thus been proposed as a robust proxy for terrestrial carbon sink dynamics51. We calculated the kNDVI for boreal forests from 2000 to 2020 using MODIS reflectance bands, on the basis of a method proposed by ref. 62 (equation (1)). The formula is as follows:

$${\rm{kNDVI}}=\tanh \left({\left(\frac{{\rm{NIR}}-{\rm{red}}}{2{\rm{\sigma }}}\right)}^{2}\right)$$
(1)

where σ represents the sensitivity of the index to sparsely/densely vegetated regions; in this study, σ = 0.5 (NIR + red).

In addition to carbon flux indicators, three products measuring carbon stocks were obtained: VOD Ku-band from the Vegetation Optical Depth Climate Archive63; AGB_1, driven by the L-band microwave radiometer of SMOS missions; and AGB_2, from integrated ground and airborne measurements and MODIS and PALSAR observations by ref. 64. The VOD Ku-band data (period 1987–2017) were generated with combinations of multiple sensors (SSM/I, TMI, AMSR-E, WindSat and AMSR2) using the Land Parameter Retrieval Model and are closely related to the density, biomass and water content of vegetation63. AGB_1 was derived from the SMOS L-VOD (vegetation optical depth of L-band microwave missions) ascending product in the IC v.1.05 (ref. 65). SMOS L-VOD was converted to carbon density using the previously published biomass map66 as a reference by a linear regression with mean L-VOD. Xu et al.64 mapped live biomass carbon stocks (used in this study as AGB_2) over all woody vegetation globally from 2000 to 2019 by using a large number of ground inventory plots, in combination with LiDAR data (ICESat) and optical and microwave satellite data (MODIS and PALSAR).

Finally, using time series of AGB_2 data, we quantified the temporal stability of boreal forest biomass for the three epochs as the ratio of mean AGB to its temporal standard deviation over a five-year period (2000–2004, 2008–2012 and 2014–2019), as similarly done in several other studies67,68.

Environmental data

We collected a set of environmental variables to explore the underlying mechanism of the variations of boreal tree species diversity. These variables were grouped into four categories: climate (that is, precipitation and temperature), soil properties (that is, topsoil sand fraction, topsoil organic carbon and cation exchange capacity), disturbances (that is, fire activity and population density) and topography (that is, elevation) (Supplementary Table 6). Incoming solar radiation can also strongly impact diversity but was excluded due to its high collinearity with temperature for the boreal biome. Seasonal precipitation and temperature from 2000 to 2020 obtained from the ECMWF Reanalysis v.5 were aggregated for the vegetation growing seasons from May to October. Topsoil sand fraction, organic carbon and cation exchange capacity of the clay fraction were obtained from the Regridded Harmonized World Soil Database v.1.2. We quantified the fire frequency using the monthly MODIS burned area product (MCD64A1) by summing the number of fire occurrences for each 500 m pixel from 2001 to 2020. The global human population density, provided in 30-arcsecond (approximately 1 km) grid cells, was used here as an indicator of human disturbance impact on forest resources. These data were derived from the Gridded Population of the World Version 4 Revision 11, which holds the estimates of population density for 2000, 2010 and 2020. The global stand age map was generated from ref. 69 using forest inventories and biomass and climate data, which were divided into several age classes (0–150+ with a decadal interval) at a spatial resolution of 1 km. A digital elevation model with 30 m spatial resolution was acquired from the ASTER Global Elevation Model. These variables were resampled to a 500 m × 500 m spatial resolution using the bilinear interpolation method.

Shannon diversity index

In this study, we applied H′ (equation (2))34,35,61, which represents alpha diversity, to characterize tree species diversity within each defined spatial unit in boreal forests. H′ considers both tree species richness and evenness to provide a comprehensive measure of alpha diversity:

$${H}^{{\prime} }=-\mathop{\sum }\limits_{{i}=1}^{S}{p}_{i}\ln{{p}_{i}}$$
(2)

where S is the total number of tree species in a plot, and pi is the proportional abundance of species i relative to the total abundance of all species S in a plot.

Mapping boreal tree species diversity

Satellite-remote-sensing-based diversity estimates are based on the spectral variability hypothesis18,19, which relates the spectral heterogeneity determined by plant biochemical and morphological differences (including photosynthetic pigments, branching structure, leaf clumping and leaf angle distribution) to environmental heterogeneity. A higher spectral heterogeneity is thus expected to be associated with higher environmental heterogeneity that sustains more species, providing a proxy for species diversity36,70. To reduce the potential impact of other land-cover types on spectral heterogeneity, we masked out all non-forested areas where shrubs and herbaceous vegetation dominate (tree cover less than 30%) across the boreal forests.

Accordingly, we developed a workflow for mapping boreal tree species diversity using satellite remote sensing imagery, consisting of the following five steps: (1) object segmentation, (2) spectral metrics, (3) matching to in situ H′, (4) prediction of H′ and (5) post-processing (Extended Data Fig. 1).

Object segmentation

To improve the categorization of morphologically similar species, we used object clustering, based on a simple non-iterative clustering algorithm. By grouping pixels on the basis of their spectral characteristics, shape, texture and spatial relationship with the surrounding pixels to accurately estimate spatial/spectral metrics, we better captured the spectral heterogeneity compared with an individual-pixel-level-based spectral characterization. The simple non-iterative clustering algorithm was performed by initializing cluster centres (called seeds) at regular grid points throughout the images of Landsat-based spectral bands and vegetation indices. Each pixel in the image was then assigned to the nearest cluster centre on the basis of both spatial distance and feature similarity. Various spacing distances (in pixels) between seeds (that is, 5, 10, 15, 20, 25, 30, 40 and 50) were tested to derive the optimal seed spacing based on the boundary recall71. After initial assignment, the cluster centres were updated to the mean position of all the pixels assigned to that cluster, so that the cluster centres better represent the pixels belonging to that cluster. The clusters thus represent forest segments (the smallest unit of a forest community), and the calculations of spectral metrics were performed within each segment, with representative spectral metrics obtained by averaging pixel-wise metrics per segment. This analysis was implemented in Google Earth Engine.

Spectral metrics

We obtained three classes of commonly used satellite-based spectral metrics17,18,19. First, we calculated the spectral heterogeneity metrics defined as the degree of spatial variations in spectral reflectance—that is, the coefficient of variation, spectral angle mapper (spectral dilation and spectral gradient) and texture features (dissimilarity and entropy) (Supplementary Table 7). These metrics have been proposed on the basis of different mathematical principles (that is, variance, distance, angle and volume) and have proved effective in capturing the spectral heterogeneity of a given area. Second, we calculated spectral/temporal metrics and derived five statistical metrics: median, minimum, maximum, standard deviation and mean for each spectral band and vegetation index. Third, we used the original spectral bands of the Landsat imagery and calculated six vegetation indices for each month during the growing season (May to October). The spectral bands, vegetation indices and temporal metrics help differentiate between trees, shrubs and herbaceous vegetation, thereby reducing the impact of changes in tree cover on the prediction of H′. We thus derived a total of 217 spectral metrics including 85 spectral heterogeneity metrics, 60 spectral/temporal metrics (related to temporal variations), 36 spectral bands and 36 vegetation indices, which were ultimately calculated per segment and used as predictors for the modelling (Supplementary Fig. 3 and Supplementary Table 8).

Matching to in situ H

We established the spatial matching between segment-based spectral metrics and in situ H′ included in national forest inventory records across 5,312 sites to derive the training samples. To improve the model robustness, we also implemented data augmentation to increase the size of training samples. The augmentation was applied by calculating the satellite-based spectral metrics averaged over window sizes of 1 × 1, 3 × 3 and 5 × 5 pixels centred on each plot location accounting for varying sizes in the segmented patches. This increases variations in the training data and allows control over the number of training samples, thereby improving the generalization performance of modelling. By removing missing values and outliers induced by clouds and shadows, we finally derived 20,100 samples (each with 217 metrics) paired with in situ H′, of which 70% were used for training the model, 10% were used for validation and 20% were used as the test dataset.

Prediction of H

We applied the InceptionTime architecture deep learning approach to establish a predictive model with in situ H′ as the response variable and 217 spectral metrics as predictors. InceptionTime has been extensively used for classification and regression72, because it allows the model to simultaneously analyse patterns exhibited at different convolutional scales and cope with time series with complex patterns and varying temporal frequencies. We fine-tuned InceptionTime, including deleting the maximum pooling layer, reducing inception modules, adding dropout layers and modifying residual connections to achieve the best architecture for the training data of this study (Supplementary Fig. 4). A grid search method combined with cross-validation was used to determine the optimal hyperparameters such as filters, kernel sizes, batch size and learning rate.

The performance of the predictive model was evaluated using a tenfold cross-validation method, which ensures that the validation set is independent and spans the entire range of the data. Mean values of R2 and the root mean square error over the ten iterations were computed to quantify the model performance. Finally, we established the best model for predicting H′ using the optimal hyperparameters and the selected predictors. The resultant model was used to generate H′ maps for the entire study area in 2000, 2010 and 2020.

Post-processing

The distortion of spectral reflectance caused by tree shadowing, cloud cover, topography, diverse understory vegetation and other factors may lead to larger variability or dispersion of data records around the mean. We thus used standard errors based on the inventory data to minimize the uncertainty of predicted H′ (Supplementary Fig. 13). We calculated the standard deviation of H′ using the forest inventory data with repeated measurements of H′ during different years in the same plot (n = 648 plots) and derived the 95th quantile value as the maximum standard deviation corresponding to 0.20 as the threshold applied to obtain H′ with the lowest potential uncertainty (Supplementary Fig. 14). About 10% of the study area (approximately 1,531,000 km2) was marked as uncertain, probably due to noise and randomness in the data, and was excluded from further analysis.

Statistical analysis

We quantified the changes of tree species diversity in boreal forests by calculating the per-pixel difference in H′ values between the 2000s and the 2020s. We defined distinct changes larger than 0.01 as a diversity gain and less than −0.01 as a diversity loss, while a change ranging between −0.01 and 0.01 was defined as no distinct change, according to the minimal units of changes in H′ values and the dependency of the two parameters (that is, species richness and evenness)73,74. We used a BRT analysis to assess the relative impacts of the explanatory variables, including temperature, rainfall and soil properties, on the spatial distribution of boreal tree diversity. This method has been used extensively in ecological studies to study response variables75,76. BRT is an advanced machine learning algorithm that iteratively fits and combines multiple regression tree models to improve predictive performance76,77. We randomly selected 92,416 samples (pixels) with a minimum distance of 50 km between neighbouring samples to establish the relationships between the explanatory variables and the response variable (H′) with the BRT model. The model was iterated ten times to avoid stochastic errors. Partial dependency plots resulting from the BRT analysis were derived to describe how the tree species diversity responds to change in each predictor independent of the other predictors.

We used a multiple linear regression model to explore the responses of spatiotemporal changes in carbon indicators to diversity as well as changing environmental variables. The carbon indicators were used as dependent variables, while H′, stand age and the other environmental variables were used as explanatory variables. To avoid bias introduced by spatial autocorrelation between neighbouring samples, we calculated Moran’s index for each variable and used the maximum distance of 50 km for the random selections of sampling (Supplementary Table 9). All explanatory variables were standardized (with an average of 0 and a standard deviation of 1) to obtain standardized coefficients β. Significance tests were set at a 95% confidence level (P < 0.05). The analyses and graphs were performed using R v.4.2.0 (ref. 78) and with the following packages: caret79, gbm80, tidyverse81, lme4 (ref. 82) and ggplot2 (ref. 83).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.