Main

Forests are a major carbon sink1, sequestering ~24% of annual anthropogenic carbon emissions2,3. However, these ecosystems are under severe threat4, with 420 million ha of forest lost globally between 1990 and 2020, leading to substantial reductions in carbon storage5. Contiguous forest landscapes are increasingly fragmented6, dividing large tracts into smaller patches and creating new edges along their perimeters. Currently, 30% of the world’s forested areas lie within 100 meters of an edge, and 70% are situated less than a kilometre away7. This fragmentation introduces ‘edge effects’, characterized by gradual changes in biodiversity7,8, microclimate9,10, soil conditions11 and exposure to human influences12,13,14, such as nutrient inputs and selective logging, from the forest edge to the interior. These changes can directly influence the growth potential and carbon storage capacity of forests7,9. As fragmentation accelerates15, understanding the scale and implications of these ‘edge effects’ is becoming critical for predicting terrestrial carbon storage under current and future climate scenarios.

Near forest edges, more solar radiation can usually penetrate the canopy, driving increases in air and soil temperatures, which in turn elevate vapour pressure deficit (VPD) and decrease soil moisture9. Forest edges also typically experience stronger wind exposure, and are more vulnerable to fire16 and biotic disturbances such as invasive species17. These microenvironmental gradients between forest edges and interiors are expected to influence vegetation growth and biomass density. Yet, empirical evidence regarding the sign and magnitude of edge effects on forest biomass across the globe is mixed. In tropical regions, biomass density has been shown to decrease near forest edges18,19. In temperate forests, the effects are highly variable, with studies showing positive, negative, or negligible impacts in different regions6,8,20,21,22,23. In boreal forests, both higher vegetation productivity and higher tree mortality have been observed along edges, leading to contrasting impacts on total forest biomass6,24.

The lack of consensus on the biogeographic patterns of edge effects in different ecoregions limits our capacity to represent them in carbon stock accounting efforts6,19,20,25. Current carbon accounting efforts thus generally overlook these indirect effects of forest fragmentation6, focusing instead on the direct effects of absolute forest loss on carbon stocks5,26. For example, the Tier 1 methodology of the Intergovernmental Panel on Climate Change instructs countries to estimate greenhouse gas inventories using fixed per-hectare carbon stock values for each forest type, without differentiating between edge and interior areas19,27. Where edge effects are pronounced, this approach risks substantial over- or underestimation of actual carbon stocks19, hampering effective carbon stock assessments and climate change policy28,29.

Here we address this issue by empirically quantifying the relationship between aboveground forest biomass (AGB) and distance from forest edges on a global scale. Recognizing that various mechanisms influence biomass near edges, we define edge effects as the net outcome of all factors shaping biomass variation along the edge–interior gradient. To better understand the potential drivers behind this variation, we employ interpretable machine-learning techniques to identify key environmental and anthropogenic contributors. Finally, we estimate the total global impact of edge effects on forest AGB.

Results and discussion

Global variation in edge effects

To measure forest edge effects at a global scale, we combined the high-resolution (30 m) global forest cover map from ref. 30 with the high-resolution (30 m) global forest biomass map created from ref. 31. We overlaid a 100 km × 100 km grid across the global forest area and sampled 500 random points within each grid cell. The spatial distribution of these points followed that of global forested areas (Fig. 1b,c). Using these sampled points, we fit spatial log-linear regression models at the individual grid cell level, predicting biomass density as a function of the log10-transformed distance to the forest edge while accounting for spatial autocorrelation (Methods). The resulting slopes, denoted as \(\frac{\Delta {\rm{AGB}}}{\Delta D}\), represent the local relationships between forest biomass and distance to edge D within each grid cell.

Fig. 1: Direction and magnitude of edge effects on forest biomass density.
figure 1

a, Map of edge effects (\(\frac{\Delta {\rm{Aboveground}}\; {\rm{biomass}}\; {\rm{density}}}{\Delta {\rm{log}}_{10}({\rm{Distance}}\; {\rm{to}}\; {\rm{edge}})}\) or \(\frac{\Delta {\rm{AGB}}}{\Delta D}\)), calculated individually within each 100-km grid cell using spatial log-linear regression (Methods). A more positive \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) corresponds to a greater decrease in aboveground biomass density near the forest edge relative to the interior. b,c, Histograms of longitude (b) and latitude (c) for sampled forest locations. d, Mean ± s.d. of \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) by forest biomes weighted by inverse coefficients of variation.

We found that most (96.1%) grid cells displayed \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) > 0.1 (Fig. 1a; see Extended Data Fig. 1 for uncertainty). The mean \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) was positive across all forest biomes (Fig. 1d), indicating negative edge effects, where biomass density near edges is consistently lower than in interior forests. Tropical forests exhibited the strongest negative edge effects, particularly in regions such as Southeast Asia, the Amazon, Central America and the Congo Basin (Fig. 1a,d). Temperate forests exhibited a 19% lower \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) compared with tropical forests (mean \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) = 53 for tropical and 43 for temperate forests, respectively, Fig. 1d). Nevertheless, we observed strong negative edge effects in temperate regions such as Europe and the United States. In boreal forests, we observed weaker negative edge effects, except for the Western Siberian grain belt in Russia, where strong negative edge effects were observed.

Positive edge effects (\(\frac{\Delta {\rm{AGB}}}{\Delta D}\) < 0) accounted for only 3.7% of the total observed values and were primarily restricted to regions near the biophysical growth limits of trees, such as high-latitude boreal forests. A negligible edge effect (\(\frac{\Delta {\rm{AGB}}}{\Delta D}\) between −0.1 and 0.1) was observed in only 0.2% of grid cells.

To confirm the robustness of our results, we conducted supplemental analyses. First, to ensure that the statistical method does not bias the results, we replaced log-linear regression with non-parametric Spearman correlations, which produced qualitatively similar results (Extended Data Fig. 2a). Second, we excluded points within 30 m of forest edges to ensure that mixed pixels at the border of forests were not driving the observed patterns. This analysis also returned consistent results (Extended Data Fig. 2b). Lastly, to address potential inaccuracies in the aboveground forest biomass map, we repeated the analysis using tree canopy cover30 as a response variable instead of biomass. We found analogous results (Extended Data Fig. 3), suggesting that reduced biomass near edges is most probably attributable to decreases in canopy cover rather than data artefacts.

Environmental drivers of edge effects

To identify the environmental factors underlying global variation in edge effects, we combined an Extreme Gradient Boosting (XGBoost) machine-learning model32 and Shapley Additive Explanation (SHAP) values33 (Methods). Because edge effects were already quantified at the grid level using spatial log-linear regressions (Fig. 1), the purpose of this machine-learning analysis was not to generate new predictions, but rather to interpret which environmental variables contribute most to observed variation in edge-effect magnitude. SHAP values are particularly well suited for this task, as they quantify the contribution of each variable to the model’s estimation of \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) for a given grid cell. The mean |SHAP| value indicates the overall importance of a variable across all locations. For example, if a variable has both high values and high positive SHAP values—as observed for agricultural land cover in Fig. 2—this suggests that areas with extensive agriculture tend to exhibit stronger negative edge effects. In contrast, if high values of a variable are associated with negative SHAP values, it implies that the variable tends to suppress the magnitude of the edge effect.

Fig. 2: Contribution of environmental variables to edge effects \((\frac{\Delta {{\bf{AGB}}}}{\Delta {\boldsymbol{D}}})\).
figure 2

a, SHAP summary plot showing the contribution of each environmental variable to predicted edge effects across grid cells. Variables are ranked by their mean absolute SHAP value (|SHAP|), with the most influential variables listed at the top. The x axis indicates the SHAP value (that is, contribution to prediction) and each dot represents a local (grid-cell level) SHAP value. The overall distribution of points illustrates the global importance of each variable. b,c, SHAP dependence plots for MAT (b) and agricultural land cover (c). The x axis shows the variable value and the y axis shows its corresponding SHAP value. Colour shading reflects the density of data points, with lighter colours indicating higher density. d,e, SHAP dependence plot for MAP, coloured by MAT (d) and agriculture (e) to illustrate interaction effects between variables. The red lines in be represent LOESS-smoothed trends. f, Dominant environmental variable by biome. Each grid cell is coloured according to the SHAP value of the single most important variable, defined by the highest mean |SHAP|, from biome-specific XGBoost models for tropical/subtropical, temperate and boreal forests. Only the top variable per biome is shown; for full variable sets and biome-specific results, see Methods and Extended Data Fig. 4.

To propagate uncertainty in our grid-cell estimates of \(\frac{\Delta {\rm{AGB}}}{\Delta D}\), we weighted each estimate by the inverse of its coefficient of variation. This approach, commonly used in meta-analyses, gives greater emphasis to effect size estimates with lower uncertainty34. To ensure that spatial patterns did not bias our evaluation of machine-learning model performance, we used a spatially buffered leave-one-out cross-validation approach to calculate R2 for our models (Extended Data Table 1). Using this method, we developed both global and biome-specific models (Fig. 2 and Extended Data Fig. 4), selecting environmental predictors on the basis of previous literature (Extended Data Table 2) and confirming no multicollinearity (Spearman’s rank correlation coefficients <0.7 and variance inflation factors (VIFs) <3 for all predictors).

Our global-scale machine-learning model had an R2 of 0.67. Among the predictors, mean annual temperature (MAT) was the most important variable, with a mean |SHAP| value of 7.2, followed by the percentage of cultivated and managed vegetation (Agriculture in Fig. 2) (4.9) and mean annual precipitation (MAP) (3.9) (Fig. 2a).

In colder regions such as boreal forests, temperature is the limiting factor for plant growth6. In these areas, higher temperatures near forest edges during summer months can promote vegetation growth during the growing season35, resulting in a negative SHAP value for low MAT (Fig. 2b). In contrast, low temperature is generally not a limiting factor in tropical forests. In these biomes, higher temperatures near the edge may instead increase the vulnerability of trees to heat stress during the growing season6, causing a positive SHAP value in regions with high MAT (Fig. 2b).

A high fraction of agriculture was identified as the second most important variable negatively impacting global edge biomass (Fig. 2a,c), aligning with the findings of ref. 19 in the tropics. Fires near edges, often driven by agricultural expansion, have been identified as a key driver of Amazon forest fragmentation and degradation, a process exacerbated during droughts36,37. In our analysis, this negative effect of agriculture was particularly evident in the Western Siberian grain belt in Russia (Extended Data Fig. 5). Here, the strength of edge effects was comparable to those seen in tropical forests but was driven primarily by the high fraction of agriculture rather than climatic factors such as MAT and MAP, which are more influential in the tropics.

The third most important variable in the global analysis, MAP, emerged as the top predictor of edge-effect magnitude within tropical biomes (Fig. 2a,f). Near forest edges, increased air and soil temperatures and increased wind exposure raise VPD, heightening vegetation vulnerability to water stress38. While drought-adapted plants in tropical dry forests possess biochemical and morphological mechanisms to conserve water39, forests in water-limited but not drought-adapted areas appear particularly susceptible. This is reflected in the SHAP dependence plot, where SHAP values increase with MAP up to a threshold, indicating stronger negative edge effects in regions with moderate precipitation (Fig. 2d). In regions with extremely high precipitation (>3,000 mm), the impact of MAP on edge effects diminished (Fig. 2d), probably because water is not limiting in these areas. This finding aligns with previous research, which indicates that water availability primarily limits plant growth in ecosystems receiving less than ~2,000 mm of precipitation6,40.

Several interactions were evident among the key predictors (Fig. 2d,e). First, at higher MAT, SHAP values of MAP increased more sharply with rising MAP compared with lower MAT (Fig. 2d). This suggests a synergistic effect between high temperature and precipitation in amplifying edge effects. One likely explanation is that elevated MAT increases VPD near forest edges, intensifying moisture gradients between edge and interior environments, particularly in forests that are not drought adapted38. Second, interactions between agriculture and MAP were particularly pronounced in drier regions (MAP < 1,000 mm; Fig. 2e). In such areas, agricultural activity can substantially alter atmospheric and soil moisture dynamics41,42, exacerbating local water stress. Widespread use of irrigation in low-precipitation regions43 may create artificially wetter conditions at forest edges, reducing the microclimatic contrast with wetter forest interiors. This could help explain the weaker edge effects (reflected by steeper SHAP value declines) in regions with low MAP.

To further examine the influence of environmental variables, we ran an XGBoost model using edge effects on tree canopy cover30 (\(\frac{\Delta {\rm{Tree}}\; {\rm{cover}}}{\Delta D}\), Extended Data Fig. 3) as the response variable instead of biomass-based edge effects (\(\frac{\Delta {\rm{AGB}}}{\Delta D}\)). Overall, the results were broadly consistent; however, notable differences emerged in regions with high agricultural land cover (>75%), where forest cover is minimal, and in tropical regions (Extended Data Fig. 6). These discrepancies probably stem from the fact that tree cover does not directly equal biomass density, as biomass is additionally influenced by tree height, diameter and wood density. Notably, in tropical regions, the edge effect on tree cover was less negative than that on biomass, suggesting that tree cover near edges remained relatively stable despite declines in biomass. Indeed, large trees at the tropical edge tend to be thinner and shorter, yet retain similar crown width compared to interior trees44. These edge-induced changes in tree architecture may explain how canopy cover remains relatively stable while biomass declines, especially in structurally complex forests such as those of the Amazon44.

The observed contributions of macro-scale environmental covariates to global edge-effect variation, identified by our XGBoost and SHAP analysis, are probably modulated by fine-scale factors such as soil conditions and forest microclimate. Changes in soil conditions near forest edges, including reduced soil carbon, lower enzyme activity and altered soil texture with increased freeze–thaw cycles11, can limit tree growth and biomass accumulation by elevating nutrient constraints and environmental stress. Similarly, the observed reduction in canopy cover at edges (Extended Data Fig. 3) weakens forests’ microclimatic buffering capacity, leading to greater temperature fluctuations and local heating45,46. This in turn could amplify the sensitivity of edge environments to macroclimatic factors such as temperature and precipitation46. Future studies incorporating fine-scale soil and microclimatic data will be needed to explicitly test the mechanistic roles of each variable and disentangle the interactions between these local factors and the broader environmental drivers that explain global variation in edge effects.

AGB difference between edge and interior

To assess the impact of edge effects on forest carbon stocks, we quantified how biomass differences between forest edges and interiors scale up to influence total forest AGB. Within each grid cell, we compared the observed AGB in edge areas to the expected AGB, as predicted by our spatial log-linear regression models at the grid-cell level. To construct a counterfactual scenario without edge effects, we assumed that edge areas would have the same biomass density as nearby forest interiors. Edge areas were defined by the depth of edge influence, a threshold distance beyond which edge effects are considered to dissipate (Fig. 3a; see Methods). The global mean depth of edge influence was 336 m, with biome-specific averages of 826 m for tropical forests, 235 m for temperate forests and 258 m for boreal forests. Globally, mean biomass density in edge areas was 16% lower than in interior forests. Using this depth-of-influence framework, we quantified the ‘missing biomass’ (sensu ref. 19) as the difference between observed edge-affected AGB and the counterfactual AGB expected in the absence of edge effects.

Fig. 3: Quantification of AGB considering forest edges.
figure 3

a, Estimated depth of edge influence (in meters) at the grid-cell level, defined as the threshold distance beyond which biomass density stabilizes and no longer exhibits a notable gradient with respect to edge proximity. b, Spatial distribution of AGB loss (in teragrams, Tg) attributable to edge effects at the grid-cell level. AGB loss is calculated as the difference between observed biomass and the counterfactual biomass if edge areas had the same biomass density as nearby forest interiors. c, Total AGB loss due to edge effects, aggregated at global and biome levels (in petagrams, Pg). The numbers in parentheses indicate the percent reduction in observed AGB relative to a counterfactual scenario without edge effects. Error bars represent 95% CIs.

The spatial distribution of missing biomass followed the global patterns of edge effects, with tropical biomes showing the highest AGB losses due to their pronounced edge effects (Figs. 3b and 1d). Globally, we estimated a cumulative AGB loss of 58 Pg (95% confidence interval (CI): 49–68 Pg) (Fig. 3c), representing a 9% AGB decrease relative to a counterfactual scenario without edge effects. At the biome level, estimated AGB losses amounted to 28 Pg (95% CI: 23–32 Pg; 7% of the biome’s total biomass) in tropical/subtropical forests, 11 Pg (95% CI: 9–12 Pg; 10%) in temperate forests and 8 Pg (95% CI: 6–9 Pg; 11%) in boreal forests (Fig. 3c).

This global biomass loss of 58 Pg is more than twice the AGB of all forests in Europe excluding the Russian Federation5. Translating biomass to carbon, this equals 28 Pg C of aboveground carbon loss, assuming a mean wood carbon concentration of 47.6% (ref. 47). When accounting for belowground carbon stored in roots (assumed to comprise 22% of total tree biomass48), total carbon losses rise to ~36 Pg C.

These results highlight the substantial contribution of forest fragmentation to biomass loss and underscore the need for incorporating edge effects into carbon stock assessments using standardized methodologies. Our analysis addresses this using the biomass data from the year 200031, as this dataset offers the highest-resolution (30 m) global biomass map currently available. However, substantial forest changes have occurred since 2000. For example, temperate forests in East Asia and parts of the boreal zone have experienced tree cover gains, while tropical regions such as the Amazon and Southeast Asia have seen substantial losses49. These changes could potentially impact our results if they were accompanied by disproportionate biomass gains or losses between forest edges and interiors. For example, deforestation often exposes interior forests to edge conditions, leading to faster biomass loss, while regrowth near edges can be slower due to harsher environments. Together, these dynamics can amplify overall biomass loss across the landscape. In heavily fragmented landscapes, true interior forests are becoming scarce, complicating the detection of edge–interior biomass gradients. Despite this limitation, our analysis provides a robust baseline for understanding global edge effects around the year 2000. Future studies could build on this work by assessing the sensitivity of edge effect estimates to temporal changes, using newer biomass datasets where available, even if at coarser spatial resolutions or for specific regions.

Importance of context dependency

Previous studies in temperate forests have suggested higher biomass and tree basal area near forest edges20,50,51, often attributed to factors such as elevated nitrogen deposition from anthropogenic sources and increased light availability near edges50. These studies were typically conducted at the field scale and focused on the first 30 m from the forest edge, a distance that would fall entirely within a single pixel in our 30-m-resolution global analysis. As a result, such fine-scale edge enhancements may not be detectable in our study. These contrasting patterns highlight the importance of considering both spatial scale and methodology when interpreting edge effects. While localized field studies provide valuable mechanistic insights at fine spatial resolution, our approach is designed to detect broader, landscape-scale patterns of biomass variation that extend well beyond the immediate forest edge.

The large-scale findings from the United States by ref. 20 using national forest inventory data (FIA plots) suggest higher tree density (number of trees per ha) and basal area in forest edges compared with interior forests. However, their analysis defined forests as areas with ≥10% canopy cover20, whereas we applied a more conservative threshold of 30%. As a result, low canopy cover areas that were classified as ‘interior’ in their analysis would have been excluded entirely from our study. This difference may have contributed to their relatively lower estimates of interior forest biomass. Moreover, while ref. 20 used a binary classification to distinguish ‘edge’ from ‘interior’ plots, our approach modelled biomass as a continuous function of distance to edge.

To better understand the differences between our findings and those of ref. 20, we conducted an additional analysis using plot-level FIA data52 for the United States from the year 2000. Our objective was to replicate their binary ‘edge’ versus ‘interior’ classification on the basis of proximity to forest edges (see Methods) and assess whether applying a similar approach would yield comparable results. Our analysis also included an ‘intermediate’ category for a more detailed comparison. Basal area52, tree density52 and biomass density values53 were analysed to compare these categories. Our analysis showed that all three metrics were significantly higher in interior plots compared with edge plots (Kruskal–Wallis and Dunn’s test, P < 0.001; Extended Data Fig. 7). Intermediate plots showed values similar to interior plots, especially for biomass density and basal area (Kruskal–Wallis and Dunn’s test, P > 0.05). These results were robust across different threshold definitions for edge and interior classification (Extended Data Fig. 7). This plot-level analysis corroborates our findings and highlights the critical role of methodological definitions of forest in interpreting edge effects. The results suggest that edge effects on forest biomass are context dependent, varying with landscape fragmentation and forest canopy cover. Further research is needed to explore how definitions of forest and edge influence outcomes, particularly in human-impacted areas such as agricultural land with low tree cover (10–30%).

Conclusions

High temperature emerged as a key predictor of negative edge effects (Fig. 2), highlighting the potential for climate warming to amplify edge-induced forest degradation and carbon loss. Our results suggest that regional climate warming and localized temperature increases at forest edges9 may interact synergistically to drive reductions in aboveground biomass. This effect was particularly pronounced in tropical and temperate forests, where combined warming pushes trees beyond physiological limits, increasing heat stress, reducing growth and elevating mortality rates54. While absolute biomass density differences between edges and interiors were smaller in boreal forests due to their overall lower biomass, edge effects led to substantial relative aboveground biomass loss of 11% (Fig. 3c). Given that boreal forests are anticipated to experience disproportionately severe warming compared with other biomes1, these findings highlight their heightened vulnerability. Rapid warming may exceed the adaptive capacities of boreal forest vegetation, exacerbating susceptibility to disturbances such as wildfires and drought55,56—disturbances already intensified at forest edges due to their exposed and drier conditions.

The widespread prevalence of negative edge effects (Fig. 2a) suggests a troubling synergy between two potent global change pressures: forest fragmentation and climate warming. Fragmentation exposes more forested areas to edge effects, while rising temperatures exacerbate biomass loss at these edges. Together, these forces pose a serious threat to the global forest carbon sink, with potentially compounding negative feedbacks to climate. While direct forest removal is responsible for 15–25% of annual human carbon emissions31, our work reveals the substantial indirect carbon costs of forest fragmentation. Our findings call for an urgent reorientation of forest management and conservation strategies to address the dual threats of deforestation and fragmentation. Policies must not only prioritize reducing forest loss but also mitigate the indirect impacts of fragmentation to preserve the critical role of forests as a global carbon sink.

Methods

Data acquisition and preparation

We used the 30-m-resolution forest cover map for the year 2000 published by ref. 30. We defined forest as a pixel where 30% or greater is covered by trees taller than 5 m in height. To compare variations in edge effects across the globe, we used Google Earth Engine57 to cover the Earth’s land surface with a grid where each cell is 100 km × 100 km in size. Then, we sampled 500 random points per grid cell and calculated, for each point falling within a forest, the Euclidean distance in meters between it and the nearest non-forest pixel. For each point, we also measured aboveground forest biomass density using the 30-m-resolution map of aboveground forest biomass for the year 2000 produced by ref. 31. The ref. 31 2000 map represents the most comprehensive global aboveground biomass dataset available at this fine 30-m resolution. To ensure temporal consistency, we opted for the 2000 datasets from refs. 30,31, allowing us to directly compare forest cover and biomass at the same historical point.

After filtering out points falling outside of the forest, we retained 8,077,835 biomass/edge measurements across 17,309 grid cells. We retained only grid cells containing more than 20 points to exclude small samples, which may give biased results, resulting in 8,074,224 points. To guarantee the detectability of edge effects, we also excluded grid cells where less than 3% of points fell within 100 m of a non-forest pixel, resulting in a final dataset of 7,837,233 points. Downstream analyses were carried out using the R statistical programming language (v.4.2.1)58.

Spatial log-linear regression

Within each individual grid cell, we conducted a spatial log-linear regression:

$$Y={\beta }_{0}+{\beta }_{1}\times X+\varepsilon$$
(1)

where Y is the aboveground biomass density (Mg ha−1), X is the log10-transformed distance to the nearest forest edge (m), β0 is the intercept, β1 is the slope coefficient describing the relationship between biomass density and distance to the edge and ε is the error term. This model structure was selected to reflect the expected non-linear nature of edge influence, where biomass loss is strongest near the edge and diminishes progressively with distance. Since we already excluded grid cells with less than 20 points, no additional statistical methods were used to predetermine sample size. To ensure that spatial autocorrelation did not bias our results, we used spatially filtered log-linear regression with eigenvector-based spatial filtering as implemented in R package spfilteR59.

Estimate values of β1, which quantify the observed local relationship between biomass density and distance from the forest edge within each individual grid cell, were extracted. Outliers of extreme 2.5% values, corresponding to 377 values from each side, were discarded, resulting in 15,094 β1 values. We refer to β1 as \(\frac{\Delta {\rm{AGB}}}{\Delta D}\), which is an abbreviated form of \(\frac{\Delta {\rm{Aboveground}}\; {\rm{biomass}}\; {\rm{density}}}{\Delta \log_{10}({\rm{Distance}}\; {\rm{to}}\; {\rm{edge}})}\). Positive \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) indicates a positive relationship between biomass density and log10-transformed distance variables; in other words, a relationship where biomass density is lower near the edge. A negative \(\frac{\Delta {\rm{AGB}}}{\Delta D}\), on the other hand, indicates that biomass density is higher near the edges. We measured the mean and standard deviation of \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) at a global scale as well as tropical/subtropical, temperate and boreal biome scales, weighted by corresponding inverse coefficients of variation60. To ensure that inaccuracies in the aboveground forest biomass map do not cause us to erroneously detect edge effects, we also performed the spatial log-linear regression using tree canopy cover30 as a response variable instead. We found analogous results to our primary analysis (Extended Data Fig. 3), demonstrating that lowered forest biomass near edges is most likely attributable to decreases in canopy cover rather than any kind of data product error.

In addition, we analysed edge effects using forest inventory data52 for the United States using FIA plots from 2000. Because FIA plot locations have a built-in positional uncertainty, with coordinates randomly displaced within a 1-mile (1,609.34 m) radius for privacy reasons, we implemented a buffer-based approach to account for this imprecision. We created a 1,609.34 m buffer zone around each plot and calculated the average distance to the nearest forest edge within this buffer. Distance calculation was done using the forest cover map from ref. 30, as in the main analysis. We then classified plots into three categories: ‘interior’ (plots located more than 150 m from the nearest edge), ‘edge’ (plots within 50 m of the nearest edge) and ‘intermediate’ (plots between 50 and 150 m from the edge). For each plot, basal area and tree density values were directly derived from the FIA data, while biomass density values for each plot were obtained from ref. 53. To identify possible differences in these metrics among edge, intermediate and interior plots, we applied Kruskal–Wallis and Dunn’s tests. To ensure the robustness of our classifications, we varied the edge and interior thresholds by increasing the edge threshold from 50 m to 100 m and the interior threshold from 150 m to 200 m. We then reran Kruskal–Wallis and Dunn’s tests to assess the sensitivity of our findings to these threshold changes (Extended Data Fig. 7).

In our analysis, we seek to examine the influence of large-scale environmental gradients on edge effects rather than to explain or statistically account for variation in edge effects at small scales within individual grid cells. For this reason, we include environmental covariates only in our downstream XGBoost models, rather than in these grid-cell level spatial regressions. The grid-cell level spatial regressions are intended to simply measure the actual, realized amount of edge effect occurring in each grid cell, without reference to any potential environmental drivers (which are addressed downstream). This approach allows us to meaningfully examine variation at larger scales.

Machine-learning models and interpretation

To examine how environmental gradients influence the strength of edge effects across the globe, we used an interpretable machine-learning approach to identify the drivers of variation in the \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) estimates from our grid-cell level spatial log-linear regressions. To do this, we first selected environmental covariates on the basis of previous literature, verifying that no covariates were highly correlated with one another (Spearman’s rank correlation coefficients <0.7). These environmental covariates included mean annual temperature (MAT), mean annual precipitation (MAP), mean annual wind speed, mean soil moisture, elevation, slope, and percentage of cultivated and managed vegetation (referred to as agriculture) (see Extended Data Table 2 for details of environmental covariates). We calculated their mean values within each grid cell using Google Earth Engine57.

Then, we fit an XGBoost model32 to investigate how environmental factors influence the direction and magnitude of the edge effects (\(\frac{\Delta {\rm{AGB}}}{\Delta D}\)) across the globe33. Since we had already directly measured grid-cell level edge effects in our upstream analysis, our goal with the XGBoost model was not to perform prediction but to empirically identify primary drivers of variation. With 80% of the original data used as training data, hyperparameters were tuned by Bayesian optimization61. To propagate uncertainty in the edge-effects estimates from our spatial log-linear regressions, each training data point was weighted by its inverse coefficient of variation (see Extended Data Fig. 1 for the map of coefficients of variation). The XGBoost model performance was measured by calculating root mean squared error (RMSE), R2 and mean absolute error (MAE) metrics on the test dataset corresponding to the randomly selected 20% of the original data. The final XGBoost model showed RMSE = 13.02, R2 = 0.67 and MAE = 9.44. We also measured these metrics using the spatial leave-one-out method with various spatial buffer sizes (Extended Data Table 1).

We also fit biome-level XGBoost models for tropical/subtropical, temperate and boreal forests separately, following the biome designations of ref. 62. Grid cells that straddled multiple biomes were excluded from these models. As before, highly correlated (Spearman’s rank correlation coefficients >0.7) and less important environmental covariates according to the global-scale model were also excluded (see Extended Data Fig. 4 for the list of environmental covariates used in each biome-scale model). Model performance evaluation was carried out with the same approach as for the global model. The biome-level XGBoost models showed RMSE = 16.97, R2 = 0.40 and MAE = 13.34 for tropical forests, RMSE = 14.71, R2 = 0.50 and MAE = 11.27 for temperate forests, and RMSE = 6.79, R2 = 0.70 and MAE = 5.08 for boreal forests. In addition, we fit an XGBoost model using edge effects on tree canopy cover30 (that is \(\frac{\Delta {\rm{Tree}}\; {\rm{cover}}}{\Delta D}\) in Extended Data Fig. 3) as the response variable, which yielded RMSE = 5.30, R2  = 0.64 and MAE = 3.92.

Finally, we performed SHAP analyses to interpret the XGBoost models63. SHAP analysis is based on additive feature attribution methods, where a particular prediction is explained as a sum of the contribution values of individual input features64. SHAP values represent these contribution values. For our XGBoost model, the sum of the SHAP values of all environmental variables results in the estimated \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) for the corresponding grid cell. A high positive and high negative SHAP value of a variable contributes to predicting the high positive and high negative \(\frac{\Delta {\rm{AGB}}}{\Delta D}\), respectively. The |SHAP| value represents the degree of contribution of the variable to the local prediction. We obtained mean |SHAP| values to compare the global contribution magnitude of each variable32. We visualized model interpretation with the SHAP summary plot and SHAP dependence plots.

Quantification of missing biomass

To estimate the impact of edge effects on the global carbon cycle, we quantified AGB at a global and forest-biome scale. We used the extent of our forest pixels to measure the global scale AGB. For the forest-biome-scale AGB, we delineated the extent of the tropical/subtropical, temperate and boreal forest biomes following ref. 62. We measured the ‘missing biomass’ sensu ref. 19 by comparing the observed AGB produced by ref. 31 with a counterfactual ‘expected AGB’, which is the AGB that would be expected if forest edge areas showed the same biomass density as nearby forest interiors. To this end, we used our grid-cell-level spatial log-linear regression models (equation 1).

Specifically, we first defined a distance threshold separating ‘edge’ areas from ‘interior’ areas. Since we found that the strength of edge effects varies globally, we reasoned that it was important to vary this distance threshold according to our observations rather than to impose a single, uniform threshold value across all regions. Thus, the distance threshold was defined within each grid cell as the mean distance to the edge for points with the 90th percentile of biomass density for that grid cell. This logic is similar to that of ref. 19, which defined the depth at which 90% of asymptotic biomass is reached. We refer to this threshold distance as the ‘depth of edge influence’. We classified forest pixels that were closer to the edge than the depth of edge influence as edge areas and those at or beyond the depth of edge influence as interior areas.

Then, the biomass density of each forest pixel (in Mg ha−1) was multiplied by the pixel area (in ha) to calculate the pixel’s AGB stock (in Mg). The actual AGB was calculated as the sum of the AGB of all pixels within each forest biome. To calculate the counterfactual expected AGB without edge effects, the AGB values for all pixels within the depth of edge influence were replaced with the value predicted for the depth of edge influence, using the relevant spatial regression model intercepts (β0) and \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) (β1) estimated for each grid cell. For example, if the depth of edge influence was equal to 200 m, the AGB of all forest pixels within 200 m from the nearest edge would be replaced with \({\beta }_{0}+{\beta }_{1}\times {\log }_{10}200\) according to equation (1). The expected counterfactual AGB would then be calculated as the sum of the interior AGB and the newly estimated edge AGB.

Finally, we calculated the absolute difference of AGB (counterfactual AGB–actual AGB, in Pg) and the percentage difference of AGB (\(\frac{{\rm{absolute\; difference\; of\; AGB}}}{{\rm{counterfactual\; AGB}}}\times 100\), in %). To estimate the uncertainty in AGB differences, we first calculated the 95% CI for each \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) using standard error-based bounds (that is, lower CI = (Estimate value of \(\frac{\Delta {\rm{AGB}}}{\Delta D}\)) − 1.96 × s.e.; upper CI = Estimate value of \(\frac{\Delta {\rm{AGB}}}{\Delta D}\) + 1.96 × s.e.). We then applied the same approach to compute the lower and upper confidence intervals for absolute AGB differences, ensuring consistency in uncertainty propagation. We explored different definitions of depth of edge influence, which showed similar patterns across biomes but naturally varied in the total amount of estimated missing biomass (see Extended Data Fig. 8 for an example).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.