Introduction

The land carbon cycle is a critical component of the global carbon budget1, modulating the warming rate of the global climate by absorbing approximately 25% of anthropogenic CO2 emissions2,3. Photosynthesis (i.e., gross primary productivity; GPP) and ecosystem respiration (ER) are the two primary carbon fluxes in the land carbon cycle and determine the magnitude of the sink1. However, global estimates of GPP and ER from models are highly uncertain, ranging from roughly 100 to 200 PgC/year4,5. The considerable uncertainty in the land carbon cycle persists across several generations of dynamic global vegetation models (DGVMs) and Earth system models (ESMs), posing a long-standing challenge in carbon cycle simulations and climate predictions6,7.

Considerable efforts have been made to identify the sources of uncertainty in models for land carbon cycle simulations, such as assessing climate forcing datasets8,9,10 and improving representations of ecological processes (e.g., thermal acclimation of photosynthesis11, explicit representation of microbial processes12, plant hydraulics13, and competition for resources14), and parameterizations (e.g., photosynthetic capacity15,16 and microbial carbon use efficiency17) in process-based models. However, these adjustments have yet to significantly reduce the uncertainty in GPP and ER, as evidenced by the similar spread of land carbon simulations from DGVMs over the past decades1,18. Meanwhile, remote sensing-based proxies19 and data-driven machine learning approaches20,21 have also been developed to provide estimates of GPP and ER. While these approaches are reported to estimate global GPP and ER from observations and have less uncertainty, they have limitations in identifying the mechanisms that are missed or misrepresented in carbon cycle simulations. These limitations include prescribed climate sensitivity22, lack of representation in the tropics20, empirical nature of the scaling factor between GPP and spectral signals23, and lack of or inconsistent consideration of CO2 fertilization effect24,25.

Here, we hypothesize that the key source of uncertainty in land carbon cycle simulations is biogeography – the distribution of plant functional types (PFTs), based on multiple lines of evidence. Biogeography is known to affect the simulations of the land carbon cycle in DGVMs26,27,28, causing up to a 24% difference in global GPP simulations within a single model26,27,28. The large impact implies that the inter-model differences in estimated land carbon budget are potentially linked to the inter-model differences in biogeography simulations. In particular, a recent study29 shows that the inter-model differences in C4 vegetation distribution, a key PFT, explained over 80% of the uncertainties in GPP estimates for C4 vegetation, providing direct evidence of the role of biogeography in land carbon cycle simulations. Additionally, the current framework for improving land carbon cycle simulations in DGVMs implies the importance of biogeography, given that changes in model parameters, structures, and even the emerging climate sensitivities of simulated carbon fluxes can also be driven by PFT change, as DGVMs largely regard these processes as PFT-specific26,30,31,32.

To test the hypothesis on the dependence of the simulated land carbon cycle on biogeography, we use the outputs from an ensemble of state-of-the-art DGVMs to assess the relationships between total GPP and total area for each PFT. We aim to quantify the dependence of PFT total GPP on PFT distribution and quantify the contribution of vegetation biogeography to the uncertainties in global GPP and ER estimation (see Methods). The dependence will then be applied to narrow inter-model discrepancies in GPP and ER using remote sensing-based (RS) PFT maps. Using the PFT-specific relationships between total GPP and total area, and the simulated changes in PFT distributions from some DGVMs, we further explore the contribution of biogeography changes to the changes in global GPP over the past two decades.

Results

The relationships between PFT GPP and area

We first established the relationships between total GPP (i.e., in PgC/year) and total area (i.e., in km2) for each PFT, using the simulations from an ensemble of DGVMs (see Methods). For each of the twelve PFT types on the land surface, we observed a strong and significant relationship (p < 0.001) between GPP and area across models (Fig. 1). Using alternative approaches (i.e., linear regression without forcing through zero or multivariate regression), we consistently observed the strong dependence of total GPP on area for most of the PFTs (Fig. 1; Supplementary Fig. 10; Supplementary Table 5). The strong relationships indicate that the inter-model discrepancy of total GPP of each PFT is highly relevant to the difference in the prescribed or simulated PFT area across DGVMs, and that DGVMs simulate a similar per-area GPP on average for each PFT (i.e., the slopes of the GPP-area relationships in Fig. 1). Specifically, the average per-area GPP is 2.48 ± 0.07 PgC/year per 106 km2 for evergreen broadleaf forests (EBF), 1.38 ± 0.08 PgC/year for temperate evergreen needleleaf forests (ENFt), 0.94 ± 0.02 PgC/year for boreal evergreen needleleaf forests (ENFb), 1.22 ± 0.01 PgC/year for deciduous broadleaf forests (DBF), 1.10 ± 0.02 PgC/year for deciduous needleleaf forests (DNF), 0.66 ± 0.02 PgC/year for shrubs (SHR), 0.78 ± 0.06 PgC/year for warm C3 grasses (C3Gw), 0.67 ± 0.03 PgC/year for cool C3 grasses (C3Gk), 0.34 ± 0.04 PgC/year for cold C3 grasses (C3Gc), 0.94 ± 0.10 PgC/year for C3 crops (C3C), 1.40 ± 0.09 PgC/year for C4 grasses (C4G), and 1.38 ± 0.04 PgC/year for C4 crops (C4C).

Fig. 1: The relationships between total gross primary productivity (GPP) and total area for each PFT.
Fig. 1: The relationships between total gross primary productivity (GPP) and total area for each PFT.
Full size image

The panels represent the relationships between GPP and area for a evergreen broadleaf forests (EBF), b temperate evergreen needleleaf forests (ENFt), c boreal evergreen needleleaf forests (ENFb), d deciduous broadleaf forests (DBF), e deciduous needleleaf forests (DNF), f shrubs (SHR), g warm C3 grasses (C3Gw), h cool C3 grasses (C3Gk), i cold C3 grasses (C3Gc), j C3 crops (C3C), k C4 grasses (C4G), and l C4 crops (C4C). The uncertainty of the relationship was quantified as the upper and lower boundary of the relationship through a bootstrapping method for the regression (see Methods). The regression lines were forced through (0,0) as in theory there would be no GPP if PFT area was zero. The strength of the relationships was assessed by the coefficient of determination (R2). Meanwhile, we also provided R2 for the regression lines without forcing through (0,0) in parentheses. The vertical green lines stand for the average PFT areas from five remote sensing-based (RS) PFT maps, including MODIS v6.133, ESA CCI v2.0.834, CGLS_LC100 v335, GLC_FCS30D36, and GLASS-GLC37. The vertical shadings indicate the one standard deviation of the PFT area estimated by the five RS PFT maps. The horizontal green lines and shading intervals represent the constrained total GPP for each PFT using RS PFT areas.

Updated land carbon cycle estimation using remotely sensed PFT area

Since there is a large spread in PFT area across DGVMs, the uncertainty in PFT area propagates to the substantial uncertainty in global GPP estimation (Fig. 1). The global GPP estimated by the ensemble of DGVMs ranged from 113.7 to 194.0 PgC/year, with a mean and a standard deviation (SD) of 146.32 ± 26.39 PgC/year (Fig. 2a). The global ER estimated by the DGVMs was 143.39 ± 25.99 PgC/year (Fig. 2b), and had the magnitude similar to GPP since global GPP and ER were strongly coupled across DGVMs (Fig. 2c). The tight relationships between PFT-specific total GPP and area across DGVMs (Fig. 1) allowed us to reduce the uncertainty in global GPP and then ER (Fig. 2c) if we have robust PFT distribution maps from observations.

Fig. 2: The constrained global gross primary productivity (GPP) and ecosystem respiration (ER).
Fig. 2: The constrained global gross primary productivity (GPP) and ecosystem respiration (ER).
Full size image

a The comparison of global GPP estimates from dynamic global vegetation models (DGVMs) from TRENDY v9 project, the constrained GPP using remote sensing-based PFT maps (DGVMRS), and the GPP estimates from other approaches, including upscaled eddy covariance fluxes using machine learning20,21 (MLEC), solar-induced fluorescence19,80,81 (SIF), soil respiration39 (SR) and plant carbonyl sulfide40 (OCS). The MLEC includes the RS_METEO output of FLUXCOM – the ensemble estimates upscaled using different machine learning methods and different meteorological forcings20, as well as the X-BASE product from the FLUXCOM-X21. The solid lines represent the mean and the error bars represent one standard deviation of the GPP estimates. b The comparison of global ER estimates from DGVMs, DGVMRS, and MLEC. c The relationship between global ER and GPP across DGVMs. d The total GPP changes driven by elevated CO2 (CO2_total), the total GPP changes driven by CO2-driven PFT changes (CO2_PFT), the total GPP changes driven by climate change (CLI_total) and the total GPP changes driven by climate change-driven PFT changes (CLI_PFT).

We, therefore, used the PFT area from five widely-used RS PFT maps – including MODIS v6.133, ESA CCI v2.0.834, CGLS_LC100 v335, GLC_FCS30D36, and GLASS-GLC37 (see Methods) – in the GPP-area relationships (Fig. 1) to estimate the PFT-specific total GPP, and then summed up to global GPP. We found that the global GPP was updated to 135.66 ± 6.54 PgC/year on average from 2001 to 2019. The uncertainty in the global GPP estimate was reduced by approximately 75% (i.e., from 26.39 PgC/year to 6.54 PgC/year; Fig. 2a). The reduced uncertainty in GPP is mainly due to the reduced uncertainty in RS PFT mapping compared to those used in DGVMs. Specifically, the total GPP of PFTs are 41.35 ± 4.67 PgC/year for EBF, 6.76 ± 1.41 PgC/year for ENFt, 5.20 ± 0.92 PgC/year for ENFb, 14.95 ± 2.59 PgC/year for DBF, 5.27 ± 0.93 PgC/year for DNF, 8.21 ± 1.84 PgC/year for SHR, 7.51 ± 1.03 PgC/year for C3Gw, 5.80 ± 1.08 PgC/year for C3Gk, 1.42 ± 0.20 PgC/year for C3Gc, 10.55 ± 2.36 PgC/year for C3C, 24.07 ± 0.98 PgC/year for C4G, and 4.51 ± 0.15 PgC/year for C4C. The constrained global GPP was within the range of other GPP estimates such as those based on solar-induced fluorescence19,38 and soil respiration39, but lower than those inferred from plant carbonyl sulfide40 and higher than those upscaled from eddy covariance observations using machine learning20,21 (Fig. 2a). The constrained global GPP was further used to estimate global ER (Fig. 2c), with the uncertainty reduced from 25.99 to 6.53 PgC/year (Fig. 2b). Using linear regression not forced through zero, we still achieved a substantial yet smaller reduction (i.e., 48% compared to 75%) in the spread of inter-model GPP and ER (Supplementary Fig. 11).

Contribution of PFT change to global GPP change

We further used the relationships between PFT-specific total GPP and area to estimate the changes in global GPP induced by PFT area changes (Fig. 2d), using the annually adjusted PFT area from four DGVMs that enable dynamic biogeography simulations (Supplementary Table 1). From 2001 to 2019, these DGVMs demonstrated that elevated CO2 increased global GPP by 23.47 ± 3.25 PgC/year and climate change increased global GPP by 2.05 ± 0.45 PgC/year. Of the global GPP increase induced by elevated CO2, we found that approximately 20 ± 4% of the increase resulted from elevated CO2-driven PFT changes, primarily due to a decrease in C4 grass distribution and an increase in C3 plant distribution simulated by these DGVMs (Supplementary Fig. 3). Similarly, of the global GPP increase induced by climate change, we found that 56 ± 21% resulted from climate change-driven PFT changes. Among these DGVMs, the climate change-driven PFT changes include the expansion of C4 grasses, C4 crops, and deciduous forests in high latitudes, in replacement of C3 grasses and evergreen forests. The expansion of C4 was likely due to its anatomical traits, which confer higher water-use efficiency41,42.

Large uncertainty in biogeography simulations

We further examined the multi-year average PFT distribution simulated by each DGVM in the framework of Whittaker bioclimatic schemes43 (Supplementary Fig. 4), where the distribution was presented in a space of mean annual temperature (MAT) and mean annual precipitation (MAP). While these biogeography simulations showed similar PFT types along the gradient of warm to cold (e.g., EBF to DBF, and to ENF), and wet to dry (e.g., EBF to SHR, and to C3G and C4G), there were clear differences in the bioclimatic boundaries of all PFTs between models, as well as between models and RS PFT maps. For example, the MAT range for C4G was narrow (20–30 °C) for CLASSIC and RS PFT maps, but much broader (−20–30 °C) for JULES. The MAP range for DBF was 0 – 2000 mm/year for JSBACH and 1000 – 2000 mm/year for RS PFT maps, but could reach 0–4000 mm/year for LPX_Bern and ORCHIDEE (Supplementary Fig. 4). Examining the spatial distribution of PFTs across models (Supplementary Fig. 5; Supplementary Fig. 6), we identified the hotspots for PFT disagreement between DGVMs. Those hotspots of biogeography uncertainty include EBF in the southern U.S. and Mexico, ENF in the boreal-temperate transition zones, DNF in North America, C3G in the low latitudes, C4G in Central Asia, and DBF and SHR across the globe. Given the difference in the mechanisms each model uses to adjust PFT distribution, it is challenging to generalise the reasons for biases in PFT distribution across models.

Moreover, we found that RS PFT maps also show large differences in their estimates of PFT total area and spatial distributions (Fig. 3; Supplementary Table 3). For example, there were large spreads in the estimates of global EBF area (i.e., from 13.97 to 21.56 × 106 km2) and C3C area (from 7.66 to 16.34 × 106 km2) across the five remote sensing products. The hotspots of biogeography disagreement among RS PFT maps include EBF in the eastern U.S. and western Europe, ENF in northern Canada and eastern Russia, DNF in North America and Europe, and SHR and C3G across the globe. The discrepancy across the RS PFT maps propagated into the uncertainty of our constrained global GPP (Fig. 1; Fig. 2).

Fig. 3: The difference in the distribution of PFTs from remote sensing-based (RS) products.
Fig. 3: The difference in the distribution of PFTs from remote sensing-based (RS) products.
Full size image

The panels represent the coefficient of variation (CV) for a evergreen broadleaf forests (EBF), b temperate evergreen needleleaf forests (ENFt), c boreal evergreen needleleaf forests (ENFb), d deciduous broadleaf forests (DBF), e deciduous needleleaf forests (DNF), f shrubs (SHR), g warm C3 grasses (C3Gw), h cool C3 grasses (C3Gk), i cold C3 grasses (C3Gc), j C3 crops (C3C), k C4 grasses (C4G), and l C4 crops (C4C) across RS products. The CV is calculated as the ratio of the standard deviation to the mean PFT area, with higher CV values indicating greater uncertainties in PFT simulations.

Discussion

In this study, we demonstrated that biogeography is a main source of uncertainty in modelling the key fluxes in the land carbon cycle, including global photosynthesis and ecosystem respiration. Using the strong relationships between PFT-specific total GPP and area, and five RS PFT maps, we reduced the uncertainty in global GPP and ER estimates by approximately 75%. The changes in PFT distribution driven by climate change and elevated CO2 have also contributed substantially to the changes in global GPP over the past decades, accounting for 20 ± 4% of the elevated CO2-driven increase in global GPP and 56 ± 21% of the climate change-driven increase in global GPP.

Our study found that DGVMs consistently show large differences in biogeography simulations (i.e., PFT distribution), and the issue persists across simulation scenarios (Supplementary Fig. 9). The reasons for these differences are manifold, including the base map of land cover adopted from remote sensing, bioclimatic boundaries, and the considerations of fire, mortality, reproduction, and competition for resources44,45. We noted that some models (e.g., JSBACH and YIBs) did not enable the dynamic vegetation component and adopted prescribed PFT maps in their simulations, i.e., do not simulate changes in biogeography over time. Additionally, when a remote sensing PFT map (e.g., ESA CCI) is used to prescribe PFTs in a model (e.g., ORCHIDEE), the uncertainty in the remote sensing map will inevitably propagate into the uncertainty of model simulations. Recent progress in implementing demography46, traits at the leaf and individual levels47, fire48, and land-atmosphere feedback49 in some DGVMs can likely affect the simulations of biogeography. However, these processes have not been consistently implemented across DGVMs. Using a consistent biogeography baseline in DGVMs can be a first step to reduce PFT-induced uncertainty in land carbon cycle simulations.

Our study demonstrated that using PFT maps from remote sensing reduced the spread in modelled GPP by 75% in our statistical framework. However, this does not imply that biogeography alone accounts for 75% of the total reduced uncertainty in GPP modelling, since the effects of model structures and parameters on biogeography are often intertwined with their effects on biogeochemistry in dynamic vegetation models. For instance, PFT-specific photosynthetic parameters influence carbon sequestration rates (biogeochemistry)50, which in turn affect the competition among PFTs (e.g., foliage area), driving shifts in PFT distribution (biogeography) and impacting parameterization again for carbon sequestration26. What we presented is the combined role of biogeographic processes – both directly linked to PFT distribution (e.g., climate boundaries) and indirectly linked to PFT distribution (e.g., photosynthesis parameterizations) – in contributing to uncertainties in land carbon cycle modelling. This perspective does not downplay the importance of parameters and structures in driving uncertainties in land carbon cycle modelling. Instead, it highlights how these factors interact with biogeography to influence GPP simulations, offering insights into their interconnected impacts.

To properly assess the relative importance of biogeography and biogeochemistry on GPP modelling, we conducted a preliminary analysis where we used machine learning to develop emulators for DGVMs and then forced emulators with a consistent PFT map from MODIS. We found that by applying the MODIS PFT map to all DGVM emulators (i.e., EM(DGVMs, MODIS) in Supplementary Fig. 7), we reduced GPP uncertainty by 46% from the original DGVMs (Supplementary Fig. 7), compared to a 75% reduction when using the relationship between PFT-specific total GPP and total area (i.e., DGVMRS). This analysis sheds light on how DGVMs simulate PFT distribution – an accurate distribution of PFTs involves correctly modelling both the location and area of PFTs. In the DGVMRS, we adjusted only the PFT area, achieving a 75% reduction in GPP uncertainty (Fig. 1; Fig. 2). In contrast, the emulator approach (EM(DGVMs, MODIS)) used both consistent PFT area and location, yet the uncertainty was reduced only by 46%, with the remaining uncertainty attributed to biogeochemistry processes. When PFT locations were consistent, uncertainty in GPP (i.e., EM(DGVMs, MODIS)) increased compared to DGVMRS, implying model parameters and structures on biogeochemistry are inducing substantial changes in GPP estimates at the PFT locations. Overall, the EM(DGVMs, MODIS) approach perhaps provides a more realistic and conservative estimate of biogeography’s role in GPP modelling, and indicates that improvements in biogeography in GPP modelling should be accompanied by improvements in biogeochemistry.

We acknowledge that variables like leaf area index (LAI) could also serve as observables in reducing the uncertainty of GPP estimates, given GPP’s strong dependence on LAI, and global LAI can be obtained through remote sensing. However, we used the relationship between GPP and area, not the relationship between GPP and LAI, for several reasons: biogeography and LAI are closely connected, as LAI simulations and associated parameters are likely PFT-specific (e.g., EBF typically has higher LAI than GRA); biogeography is more broadly linked to major carbon fluxes and stocks in the land carbon cycle (Supplementary Fig. 8b), while LAI primarily relates only to GPP. Lastly, we found the variation in PFT GPP is more closely linked to variation in PFT area than that in LAI or leaf photosynthetic rates (Supplementary Fig. 8a). Using other approaches (i.e., multivariate analysis and variance decomposition analysis), we reaffirmed that PFT area is more important than LAI and leaf photosynthetic rates in controlling the inter-model GPP variation for each PFT (Supplementary Table 5). This suggests that biogeography is likely a more important factor influencing the global GPP than LAI and leaf biogeochemistry.

This framework also helped explain the variability in the strength of GPP-area relationships among PFTs (i.e., R2 and slope values in Fig. 1) – the relatively lower R2 values, such as those for C3 grasses and crops, indicate potentially greater inter-model differences in LAI or leaf photosynthetic rates for C3 grasses and crops compared to other PFTs. Previous studies indeed reported a slightly larger inter-model spread in the maximum carboxylation rate of photosynthesis for C3 grasses than those for forest types50 and a likely greater inter-model LAI discrepancy for grasses51,52. Consistently, our statistical analysis shows that LAI differences in C3 grasses contribute strongly to inter-model GPP variation (Supplementary Table 5), implying a considerable difference in grass LAI among models. However, without a comprehensive comparison of model protocols and structures, it remains difficult to pinpoint the specific biogeochemical processes or parameterizations responsible for the varying strength of the GPP–area relationship across PFTs. The greater slope in the GPP-area relationship for a given PFT is likely related to the greater leaf photosynthetic capacity or leaf area abundance prescribed for it in the models53,54. Under our statistical framework, the future changes in carbon fluxes are driven by biogeographic changes (i.e., PFT area) and the changes in biogeography–carbon cycle relationships (e.g., PFT area-GPP relationships). While our results emphasize the role of PFT area in modulating future GPP, we acknowledge that the changes in the slopes of PFT area-GPP relationships might also cause future GPP changes. This is likely because the slopes essentially reflect impacts of PFT LAI, leaf photosynthetic rate, and other biogeochemical processes, which are known to change with climate and elevated CO255,56.

Our study reported that climate change and elevated CO2 caused changes in biogeography, which in turn affected the land carbon cycle. The climate change-driven PFT changes have been reported by some studies, mostly characterised by an increase in deciduous forests replacing evergreen forests in boreal zones57. As for CO2-driven PFT changes, the most significant changes include the decrease in C4 vegetation distribution29,58 and the increase in C3 grasses and woody plants59. Elevated CO2 can also reduce transpiration and alleviate the water demand for plants, indirectly influencing their distribution in arid environments in the future60. Compared to the widely reported land cover changes driven by human activities, such as deforestation for crop expansion61,62, the impacts of elevated CO2 and climate change on PFT changes have been less examined, though our estimates on several DGVMs suggest they have substantial impacts on the land carbon cycle (Fig. 2d).

Our study provided a framework to quantify the magnitude of land carbon cycle using RS PFT maps, however, we must emphasize the considerable difference among RS PFT maps (Fig. 1; Fig. 3). A long line of research63,64 has indicated large disagreement between RS PFT maps, and those differences are still challenging to reconcile. The key difference could be relevant to the fraction of specific PFT types in the world (e.g., especially the EBF and C3 crops). For example, MODIS suggests the EBF area is 21.56 ± 1.23 × 106 km2, but ESA CCI suggests it is only 14.23 ± 2.85 × 106 km2. The discrepancy in RS PFTs can propagate into the global GPP estimates, leading to an approximately 20 Pg C difference (Supplementary Fig. 2). Nevertheless, the uncertainty in RS PFTs (Fig. 3) is still smaller than models PFTs (Supplementary Fig. 5), making it feasible to use RS PFTs to constrain the land carbon cycle in DGVMs.

In addition to the inter-product difference in RS PFT mapping, we also note that for individual RS PFT maps, there was also considerable uncertainty incurred in its production process28,65 (e.g., uncertainty in the metrics used for classification, cross-walking from land cover to PFT). However, we were unable to consider the uncertainty associated with the production of each RS PFT map, as there was often no provision of such assessment or not all sources of uncertainty in the production process were reported66,67,68 (see Methods). Therefore, the uncertainty in RS PFT area we quantified, which refers to the inter-product difference in PFT area, was not comprehensive (Fig. 1). However, by examining the magnitude of production uncertainty for MODIS and ESA CCI PFT maps (Supplementary Table 3; Supplementary Fig. 2), we note that the inter-product difference in PFT area is often greater than the production uncertainty of each RS PFT map. We also note that while recent advances in using traits to replace PFT maps for carbon cycle modelling56,69 are critical steps toward improving the modelling, they are not necessarily free from biogeography due to the need for PFT maps to produce accurate trait maps70,71. To improve biogeography process and land carbon cycle simulations, we suggest prioritizing efforts to clear hotspots of biogeography uncertainty (e.g., those identified in Fig. 3) and to develop a more reliable global PFT map.

In conclusion, our study found that the uncertainty in modelling the land carbon cycle is largely associated with the uncertainty in global biogeography, and integrating remote sensing-based PFT maps can effectively reduce the uncertainty. Our findings highlight the crucial role of biogeography in contributing to the uncertainty in land carbon cycle simulation and potentially climate prediction, and indicate the urgency to advance biogeography studies to support carbon and climate sciences.

Methods

Dynamic global vegetation models (DGVMs)

To examine the relationship between the PFT total GPP and the corresponding PFT area, we used simulations from an ensemble of DGVMs participating in the TRENDY v9 project72. The DGVMs were coordinated to perform simulations under four different scenarios (S0, S1, S2, and S3), following the TRENDY protocol (https://blogs.exeter.ac.uk/trendy/protocol/). S0 is the control simulation, with no changes in the external forcings for DGVMs (i.e., time-invariant pre-industrial CO2, climate and land use mask). In S1, only atmospheric CO2 concentration varied when forcing DGVMs. In S2, atmospheric CO2 and climate varied. In S3, atmospheric CO2, climate and land use varied when forcing DGVMs.

We mainly used the simulations from the S2 scenario to examine the relationship between PFT-specific total GPP and area. Changing the simulations to other scenarios, such as S3, would not affect the relationship between PFT total GPP and PFT area across models (Supplementary Fig. 9), however, we would like to exclude the consideration of land use and land cover change in this analysis. A total of 11 models with varying spatial resolutions provide the required PFT-specific outputs (Supplementary Table 1). We obtained the annual total GPP and total area for each PFT for the period 2001 to 2019, and examined the relationships between them using the multi-year averages. Since not all models adopted the same PFT classification system, we harmonized the PFT classifications in each DGVM into twelve common PFT types, which cover the major functional differences in plant photosynthetic pathways, climate adaptation, and phenological cycles. These PFTs include evergreen broadleaf forests (EBF), temperate evergreen needleleaf forests (ENFt) and boreal evergreen needleleaf forests (ENFb), deciduous broadleaf forests (DBF), deciduous needleleaf forests (DNF), shrubs (SHR), warm C3 grasses (C3Gw), cool C3 grasses (C3Gk), and cold C3 grasses (C3Gc), C4 grasses (C4G), C3 crops (C3C), and C4 crops (C4C). The availability of PFT types in each DGVM is presented in Supplementary Table 1. For those DGVMs that only provide evergreen needleleaf forests (ENF) and C3 grasses (C3G) without subtypes, we used a set of bioclimatic boundaries73 to divide ENF into ENFt and ENFb, and to divide C3G into C3Gw, C3Gk, and C3Gc (Supplementary Table 2), as distinct climates otherwise might mask the relationship between PFT area and GPP (Supplementary Fig. 1). The climate data is obtained from CRU JRA v2.174, which is also the climate forcing data used for DGVMs in the TRENDY v9 project.

We also used PFT-specific GPP outputs from DGVM, which was used to establish PFT-specific relationship between total GPP and total area (Fig. 1). However, since DGVMs under TRENDY v9 did not supply PFT-specific ER, we were unable to establish PFT-specific relationship between total ER and area. We were only able to establish a relationship between global GPP and global ER in our analysis (Fig. 2c).

Remote sensing estimates and uncertainties of global PFT distribution

We adopted five widely-used remote sensing-based (RS) PFT products, including the MODIS33, ESA CCI34, CGLS_LC10035, GLC_FCS30D36, and GLASS-GLC37, to constrain the GPP estimates based on the relationship between PFT-specific total GPP and total area.

The MODIS and ESA CCI datasets directly provide PFT maps (MCD12Q1 v6.1 and ESA CCI v2.0.8), with some assessments of their production uncertainties28,65. Specifically, the MODIS PFT map was translated from land cover classification within the same product at a spatial resolution of 500 metres using a series of climate-based rules33,75. ENFt and ENFb were classified from the ENF land cover type, with ENFt requiring a minimum temperature of the coldest month (Tc) above −19 °C and an accumulated growing degree-days above 5 °C (GDD5) exceeding 1200 °C, while ENFb had colder conditions75. C3 and C4 grass were distinguished within grassland cover with GDD5 > 1000 °C, Tc > 22 °C and monthly precipitation > 25 mm as C4 grass, others as C3 grass75. C3 grass was further classified to C3Gw, C3Gk, and C3Gc subtypes using DGVM rules (Supplementary Table 2). EBF, DBF, DNF, and SHR were mapped from their corresponding land cover types from MODIS33.

The ESA CCI PFT map was translated from the ESA CCI land cover product at a 300-metre resolution from 1992 to 2020, with high-resolution auxiliary datasets such as tree cover and tree canopy height data34,76,77. Pixels classified as tree types (e.g., EBF, DBF, ENF, and DNF) in the land cover maps were assigned to corresponding PFTs, with tree cover percentage determined using a 30-metre resolution tree cover dataset. For shrubland classes (a mix of tree and shrub woody vegetation types), a tree canopy height dataset was used to separate shrubs from trees34,75.

The CGLS_LC100 provided the annual fraction of land cover classes, including EBF, ENF, DNF, DBF, SHR, mixed forest (MF), unknown forest (UNF), cropland and grassland, at a 100-m resolution from 2015 to 201935. The GLC_FCS30D provided the categorical information for the same classes at 30-m grid from 1985 to 2022, updated every five years before 2000 and annually thereafter36. The GLASS-GLC provided annual distribution of forest (F), SHR, grassland and cropland at 5-km resolution from 1982 to 201537. After aggregating to 0.5-degree resolution for each land cover map, the MF fractions were translated to forest PFTs and SHR according to the translation rules (Supplementary Table 2). The UNF and F were translated to EBF, ENF, DBF, and DNF PFTs according to the same criteria. To make the PFTs map compatible with those from DGVMs, we further divided the RS PFTs using environment constraints78 when needed (i.e., such as from ENF to ENFt and ENFb).

Since all the RS PFT maps lack explicit information on C4/C3 grass and C4/C3 crop classification, we used a newly developed C4 vegetation map30 to divide croplands and grasslands into their C3 and C4 counterparts. The C4 vegetation map was generated using a global database of photosynthetic pathways, satellite observations, and a photosynthetic optimality theory. It contains C4 grass and C4 crop distribution over the globe at a 0.5-degree resolution from 2001 to 201929.

We quantified the uncertainty in RS PFT maps as the one standard deviation of PFT area among MODIS, ESA CCI, CGLS_LC100, GLC_FCS30D and GLASS-GLC maps. We did not consider the production uncertainty of individual RS PFT maps as there was a lack of assessments for most products (Supplementary Table 3)28,65.

The relationship between PFT total GPP and PFT area

In this study, we identified the relationships between the simulated total GPP and total area for each PFT across DGVMs (i.e., GPP-area relationships). We obtained the multi-year average total GPP and total area for each PFT over the period 2001-2019, and examined the relationships between them.

$${{GPP}}_{{PFT}\_i}={\beta }_{i}\times {{area}}_{{PFT}{{\_}}i}$$
(1)

where, \({\beta }_{i}\) is the relationship between total GPP (\({{GPP}}_{{PFT\_i}}\)) and area (\({{area}}_{{PFT\_i}}\)) across DGVMs for \({i}^{{th}}\) PFT. The \(i\) ranged between 1 to 12, representing EBF, ENFt, ENFb, DNF, DBF, SHR, C3Gw, C3Gk, C3Gc, C4G, C3C, and C4C, respectively. We forced the intercepts of the linear models to zero, based on the assumption that PFT-specific GPP should be zero if the PFT area is zero. Meanwhile, we also examined the relationships without this assumption. We estimated the uncertainties in \({\beta }_{i}\) through a bootstrapping method73,79, where one of the DGVMs was removed each time for regression to get the upper and lower boundary of the relationships.

Based on the GPP-area relationships and area of each PFT from RS PFT maps (\({{RSarea}}_{{PFT\_i}}\)), we estimate the constrained global GPP (\({{GPP}}_{{global\_C}}\)) over the globe.

$${{GPP}}_{{global}{{\_}}C}=\int _{i=1}^{12}{\beta }_{i}\times {{RSarea}}_{{PFT}{{\_}}i}$$
(2)

We also identified a relationship between global ecosystem respiration (ER) and global GPP across DGVMs,

$${{ER}}_{{global}}=\gamma \times {{GPP}}_{{global}}$$
(3)

where, \(\gamma\) is the slope, \({{ER}}_{{global}}\) and \({{GPP}}_{{global}}\) stand for the global ER and global GPP in each DGVM. The global ER in each model is calculated by the sum of the available autotrophic respiration and heterotrophic respiration. Combining \(\gamma\) and the constrained global GPP (\({{GPP}}_{{global\_C}}\)), we estimated the constrained global ER (\({{ER}}_{{global\_C}}\)).

$${{ER}}_{{global}{{\_}}C}=\gamma \times {{GPP}}_{{global}{{\_}}C}$$
(4)

Machine learning emulators

To assess the relative importance of biogeography and biogeochemistry in GPP modelling, we developed a series of emulators for DGVMs by training random forest models globally for each PFT of each DGVM. Specifically, each emulator was trained to predict PFT GPP rate on grid cell level using annual climate forcings (temperature, precipitation, radiation, pressure, and wind speed) from the CRU JRA v2.1 datasets74 from 2001 to 2019, which was consistent with the climate forcing data for DGVMs in the TRENDY v9 project. We excluded the LPJ-GUESS model from our emulator analysis, as in this model, the pixel PFT GPP rate is not available to support the training. We conducted the training process in MATLAB R2024b, with automatic tuning of key parameters, including the number of trees and the minimum leaf size. These emulators were independently validated via cross-validation, demonstrating consistently high predictive accuracy (mean R² values > 0.8 for all PFTs, Supplementary Table 4). After developing these emulators, we used them to estimate PFT-specific GPP rate within each grid cell and obtain PFT-specific GPP by using a consistent PFT fraction map from MODIS33. We then aggregated the PFT-specific GPP of all grid cells and calculated global GPP.

The contribution of PFT changes to GPP changes

Using the relationships between PFT-specific GPP and area, we also estimated the contribution of elevated CO2- and climate change-driven PFT changes to the GPP changes. Specifically, we first obtained the simulated PFT area from DGVMs under S0 (time-invariant CO2 and climate), S1 (varying CO2 and time-invariant climate), and S2 (varying CO2 and climate) scenarios. By subtracting the PFT areas of S0 from S1, we calculated the PFT area changes driven by elevated CO2. Similarly, by subtracting the PFT areas of S1 from S2, we achieved the changes in PFT driven by climate change. Then we used the PFT area changes in the GPP-area relationship (Fig. 1) to infer the associated change in GPP. These PFT-driven changes in GPP were further compared against the total GPP changes driven by elevated CO2 and climate change. Please note that in this analysis, we only used the four DGVMs (i.e., JULES, LPJ-GUESS, LPX-Bern, and SDGVM) that dynamically adjust PFT distribution over time (Supplementary Table 1).