GPP estimation based on CatBoost and analysis of change driving factors in Shanxi Province, China

Li, Yujie; Liu, Xuanguang; Zhang, Zhenchao; Lang, Jun

doi:10.1038/s41598-025-08927-x

Download PDF

Article
Open access
Published: 01 July 2025

GPP estimation based on CatBoost and analysis of change driving factors in Shanxi Province, China

Yujie Li^1,2,
Xuanguang Liu^1,3,
Zhenchao Zhang^1,3 &
…
Jun Lang²

Scientific Reports volume 15, Article number: 22346 (2025) Cite this article

913 Accesses
Metrics details

Subjects

Abstract

The gross primary productivity (GPP) of Shanxi Province, China, plays an important role in the carbon cycle of the Loess Plateau ecosystem. However, Shanxi Province lacks carbon flux stations, leading to imprecise GPP estimation results. Additionally, few studies have explored the drivers of long-term GPP change in Shanxi Province. Therefore, in this study, we aimed to estimate the GPP in Shanxi Province from 2001 to 2022 and determine the driving factors of long-term GPP trends. To this end, we proposed an improved GPP estimation method based on the CatBoost model. Our CatBoost GPP model reduces model overfitting in few-shot scenarios and effectively captures the time dependence in time-series data. In addition, it integrates the change characteristics of vegetation ecological indicators and topography constraints, improving GPP estimation accuracy. Subsequently, we explored the spatial and temporal variations driving force through methods such as Theil-Sen Median trend analysis and Geodetectors. Our results show that (1) Compared with existing methods, the proposed CatBoost GPP method achieved superior site-level accuracy, with an $R^{2}$ value of 0.890, root mean square error (RMSE) of 1.155 gC$\,\textrm{m}^{-2}\,\textrm{day}^{-1}$, and mean absolute error (MAE) of 0.772 gC$\,\textrm{m}^{-2}\,\textrm{day}^{-1}$. Furthermore, we compared our results with previous GPP products to further assess the regional-level accuracy; (2) The GPP in Shanxi Province displayed a fluctuating increase, with a growth rate of 20.58 gC$\,\textrm{m}^{-2}\,\textrm{year}^{-1}$ from 2001 to 2022. The overall spatial variation was characterized by low GPP in the northwest and high GPP in the southeast. The GPP change was mainly characterized by weak anti-persistence; thus, approximately 58.8% of the area may experience degradation in the future; and (3) Land use type significantly influenced GPP changes in Shanxi, with the restoration and improvement of grassland being the main contributor to the increase in GPP. The interaction between precipitation and temperature had the most complex and significant impact on GPP, affecting approximately 62.05% of the study area. The results of this study provide a theoretical basis for ecological protection and sustainable development in Shanxi Province.

Spatiotemporal dynamics of ecological quality and its drivers in Shanxi Province and its planned mining areas

Article Open access 23 August 2025

Research on the evolutionary characteristics and mechanism of production-living-ecological space in Shanxi Province, China

Article Open access 28 December 2024

An abrupt shift in gross primary productivity over Eastern China-Mongolia and its inter-model diversity in land surface models

Article Open access 27 December 2023

Introduction

Terrestrial ecosystems are vital carbon reservoirs that play a key role as global carbon sinks¹. Gross primary productivity (GPP) is an important indicator for evaluating the health of terrestrial ecosystems² and reflects the total amount of organic matter produced through photosynthesis. Furthermore, it influences the rate at which plants absorb carbon dioxide from the atmosphere³. Therefore, the accurate estimation of GPP is of theoretical and practical importance for quantifying the carbon sink functions of terrestrial ecosystems⁴.

Due to the complexity of GPP field measurements and the limitations of observational scales⁵, regional-scale GPP estimation generally relies on models, such as BIOME-BGC⁶ and EC-LUE⁷, that are based on remote sensing data. However, it is challenging to represent complex nonlinear interactions between different vegetation types and the exchange rate between carbon dioxide and water vapor using these models^8,9. In addition, interference from specific environmental stressors such as moisture, temperature, and extreme weather reduces the accuracy and reliability of these models^10,11.

In recent years, with the development of machine learning (ML) technologies, ML-based GPP estimation methods have become a research hotspot^12,13. For example, Papale et al.¹⁴ used an artificial neural network to construct a GPP estimation model suitable for Europe. Yao et al.¹⁵ generated a carbon sink dataset for Chinese terrestrial ecosystems with a spatial resolution of 1 km based on model tree regression. Subsequently, Cho et al.¹⁶ utilized a support vector machine (SVM) to estimate the national-scale GPP of South Korea from 2000 to 2018. Recently, Guo et al.¹⁷ and Sarkar et al.¹⁸ used random forest (RF) model to develop time-series GPP dataset. These models were trained using site observation data (SOD) and extrapolated using remote sensing data^19,20. These ML-based methods efficiently capture the complex nonlinear relationships in a dynamic system of carbon fluxes^21,22. Compared with traditional ecological process models, ML-based methods achieve superior GPP estimation accuracy²³. However, most ML-based GPP estimation methods do not account for the effects of collinearity between variables, leading to low accuracy¹⁸. To address this problem, Lu et al.²⁴ developed a deep learning method to merge 23 CMIP6 datasets to generate highly accurate monthly GPP products with a spatial resolution of $0.25^{\circ }$. Deep learning models improve the accuracy of GPP estimation, but the complexity of network structures and large demand for training data lead to high computation and training costs. Conversely, the novel gradient-boosting algorithm CatBoost²⁵ has lightweight network structures. It ignores the influence of collinearity between variables and overcomes gradient biases and prediction shifts, efficiently handling time-series data. CatBoost has been applied to remote sensing estimation tasks^26,27. For example, Zheng et al.²⁸ used CatBoost as a comparative model to estimate the annual GPP in Europe and North America. Additionally, Liu et al.²⁹ integrated the extreme gradient boosting (XGBoost), CatBoost, and Shapley additive explanations (SHAP) models to estimate gross ecosystem productivity. However, the existing models fail to effectively integrate the temporal dynamics of vegetation parameters and lack adaptive modeling of terrain heterogeneity during the design process.

Shanxi Province is situated in the eastern part of the Loess Plateau (LP), within the mid-section of the Yellow River Basin in northern China. It serves as a crucial component of China’s national ecological security strategy (“Two Screens and Three Belts”) and is a key region for national soil and water erosion control projects³⁰. The ecological environment in this region is extremely fragile. Meanwhile, Shanxi Province is an energy-intensive heavy chemical industry base that is primarily driven by coal resource development. Large-scale and frequent coal mining and the development of high-energy-consuming industries have continuously increased the concentration of greenhouse gases in the atmosphere³¹, negatively affecting ecological health and development. However, there is a lack of carbon flux towers in Shanxi Province, and few studies have examined long-term GPP changes in this region. Furthermore, previous research mainly focused on the Yellow River Basin³² and LP³³. For example, Zhang et al.³³ explored the vulnerability of the LP ecosystem based on the GPP. Meanwhile, Gong et al.³⁴ used a modified vegetation photosynthesis model to obtain a long-term series of GPP in the LP from 2001 to 2022 and investigate the impacts of different land use patterns and meteorological factors. Previous studies^35,36,37 have revealed the spatiotemporal patterns of GPP in the LP and the Yellow River Basin, as well as their climatic and ecological driving mechanisms. However, as a core component of a low-carbon economic zone, Shanxi Province exhibits unique characteristics, such as coal mining pressure, a fragile ecosystem, and a “green, low-carbon and energy economy” synergistic development model that has not yet been adequately explored in existing models. Therefore, it is crucial to comprehensively quantify the impacts of factors such as climate, topography, human activities, and their interactions on the GPP in Shanxi Province.

In this study, we developed a CatBoost-based GPP estimation method. Using this method, we aimed to achieve long-term GPP estimation and explore the driving factors of GPP in Shanxi Province. The three major contributions of this study can be summarized as follows: (1) We proposed a CatBoost-based GPP estimation model that effectively captures temporal dependence and embeds terrain constraints, achieving better results than comparative methods. (2) We applied the CatBoost GPP model to estimate long-term GPP from 2001 to 2022 in Shanxi Province, analyze the GPP spatiotemporal change trends, and identify GPP declines and potential degradation areas. (3) We further explored the driving mechanisms of GPP changes and elucidated the contributions of natural and anthropogenic factors to GPP in Shanxi Province.

Materials and methods

Study area

Shanxi Province, China ($34^\circ 34.8'$–$40^\circ 43.4'$N, $110^\circ 14.6'$–$114^\circ 33.4'$E) is located in the eastern LP and is bordered to the west and east by the Lvliang and Taihang mountains, respectively (Fig. 1). The natural environment within the study region is complex and includes diverse landform types. The region is characterized by uneven spatiotemporal water and heat distributions influenced by the mid-latitude temperate continental monsoon climate and geographical factors. Soil erosion and water loss are severe in Shanxi Province^38,39 primarily because most areas are arid with low rainfall³⁰ and are prone to frequent meteorological disasters. Furthermore, the coarse texture and loose structure of the soil as well as human activities weaken its erosion resistance.

Shanxi Province is known in China as the “hometown of coal” and is rich in mineral resources⁴⁰. However, coal mining and steel smelting have caused severe environmental pollution in recent decades. Therefore, the Shanxi provincial government implemented ecological protection measures in 1997, and consequently, the ecological environment was gradually restored and improved. It is important to accurately monitor vegetation productivity and explore its driving forces in this region.

Data sources and preprocessing

Construction of the modeling dataset

We collected eddy covariance (EC) flux data, moderate-resolution satellite remote sensing imagery (MOD11A2, MOD13A3, MOD15A2H, and MCD12Q1), and digital elevation model (DEM) data to construct the GPP dataset.

The EC flux data were obtained from the ChinaFlux (http://www.chinaflux.org/) and FLUXNET2015 (http://fluxnet.fluxdata.org) datasets. Flux data were measured using the EC method.

For the EC flux data from ChinaFLUX, we obtained flux data measured every half hour, including net ecosystem exchange (NEE, gC$\, \textrm{m}^{-2}\, day^{-1}$) and ecosystem respiration (RE, gC$\, \textrm{m}^{-2}\, \textrm{day}^{-1}$). These data underwent standardized quality control by site researchers, including coordinate rotation, outlier filtering, Webb-Pearman-Leuning correction, gap filling, and flux partitioning⁴¹. The processed data, which have high reliability for GPP modeling^42,43, were aggregated to a daily resolution in ChinaFLUX. Additional details of the data processing method can be found in Yu⁴⁴. In this study, the daily GPP was derived from NEE and RE data using the following equation:

$$\begin{aligned} GPP = RE - NEE, \end{aligned}$$

(1)

The FLUXNET2015 dataset provided two GPP estimates (gC$\, \textrm{m}^{-2}\, \textrm{day}^{-1}$) derived from NEE partitioning using the variable Ustar (u*) threshold (VUT): (1) night-time method (GPP_NT_VUT_REF)⁴¹, and (2) day-time method (GPP_DT_VUT_REF)⁴⁵. All GPP data were calculated using the average of the night- and day-time partitioning methods. Specifically, we removed data points where the day-time and night-time partitioning results differed by more than 3 gC $\textrm{m}^{-2}\, \textrm{day}^{-1}$, and replaced them with the average of the 2 days before and after the current day⁴⁶.

Furthermore, the EC flux data were excluded if less than 80% of the data were observed during the year, and stations with less than 2 years of observations were not used. Following screening, we used 14 EC ground stations, including eight ChinaFLUX sites and six FLUXNET2015 sites. Table 1 shows the EC ground station locations, vegetation information, and observation period.

We utilized MOD11A2, MOD13A3, MOD15A2H, and MCD12Q1 data products (Table 2) collected by the National Aeronautics and Space Administration from 2001 to 2022. We reclassified the land use type according to the International Geosphere-Biosphere Programme vegetation classification system (Table 3) to match the flux site observation information^47,48. Simultaneously, we verified whether the land use raster pixels within 1000 m around the observation site had the same land use type as that listed on the official website of the observation site to ensure data consistency. Due to cloud coverage and instrument failures, there was noise in the data products. We processed the MOD11A2, MOD15A2H, and MOD13A3 products to ensure temporal continuity. We searched for pixels based on the image quality control information. When the quality control information indicated poor quality, the average of the previous and next image pixels were used to fill in the missing data. If the quality of both periods was poor, the average of many years with good image quality was used to fill in the data. Finally, the data were smoothed using the Savitzky-Golay filter to obtain a smoother MODIS time series dataset.

Table 1 GPP observation site information.

Full size table

Table 2 Description of MOD11A2, MOD13A3, MOD15A2H, and MCD12Q1 data. Target Band Name indicates the band of each product needed for our study.

Full size table

We collected the DEM data from the SRTM 90 m Digital Elevation Database. Subsequently, we input the DEM data into the ArcGIS slope calculation tool to generate slope data.

To reduce the effects of missing data on model accuracy, we processed the modeling data using the following steps:

(1)
The daily values of GPP were aggregated into an 8d resolution using a sum aggregation strategy to match the remote sensing data intervals. Additionally, the SOD were projected onto the WGS1984 coordinate system.
(2)
We resampled seven types of feature data, including time, temperature, Fraction of Absorbed Photosynthetically Active Radiation (FPAR), Normalized Difference Vegetation Index (NDVI), land use type, DEM, and slope, to a consistent 1000 m grid. Subsequently, we collected pixel points within a Euclidean distance of 1000 m from the observation site. Based on the aforementioned steps, invalid values were removed from the collected points. Finally, the mean of the remaining pixel values was calculated and used as the representative value for the observation site.
(3)
The temporal resolution of all feature data was converted to 8 d. We spatially matched the SOD and feature data through spatial linkages (Fig. 2). Following vectorization, the data were organized into columns to construct the GPP dataset (each column represents a unique feature). The dataset consisted of eight columns: GPP observation value, time, temperature, FPAR, NDVI, land use type, DEM, and slope. The final dataset contained 3864 samples, split into training and validation sets in a 7:3 ratio.

Table 3 Reclassification of the land use type.

Full size table

Factor data

We selected 2005, 2010, 2015, and 2020 as the key research nodes. To analyze the drivers of GPP spatial variation in Shanxi Province, we selected seven variables, including four natural variables (elevation, slope, temperature, and precipitation) and three anthropogenic variables (GDP, population, and land use type), as driving factors (Table 4). Furthermore, we performed corresponding reprojection, resampling, and clipping operations on the data of these seven factors to align their spatiotemporal resolutions.

Table 4 Sources and details of factor data.

Full size table

GPP validation dataset

To evaluate the accuracy of the proposed CatBoost GPP method, we compared the results of CatBoost GPP with those of two open-source GPP products: MODIS GPP⁶ and GOSIF GPP⁴⁹ (Table 5). Following resampling, the spatial resolutions of the MODIS GPP and GOSIF GPP data were 1000 and 5000 m, respectively. We removed invalid values from the collected pixels and calculated the average of the remaining pixel values as the site-specific value (as described in “Construction of the modeling dataset”) to create the GPP comparison dataset.

Table 5 GPP data product sources and overview.

Full size table

Methods

The workflow of this study is illustrated in Fig. 2. First, we proposed a novel GPP estimation method based on the CatBoost technique. We used a self-constructed GPP dataset to train the proposed CatBoost GPP model and then applied the best model weights to estimate the GPP of Shanxi Province from 2001 to 2022. Second, we combined the Theil–Sen approach (TSA) and Mann–Kendall (MK) trend test to analyze GPP change trends. In addition, we utilized the Hurst exponent to explore the persistence of spatiotemporal variations in the GPP and predict future GPP trends. Third, we revealed the factors controlling the spatiotemporal variation in the GPP in Shanxi Province using a Geodetector model.

CatBoost GPP model construction

Notably, CatBoost is a gradient boosting decision tree framework²⁵. It employs order boosting instead of traditional gradient estimation methods to reduce gradient and prediction bias along with overfitting risks⁵⁰. Previous studies have demonstrated that CatBoost outperforms other decision tree methods when dealing with high-dimensional input features and noisy data⁵¹. For GPP estimation, the data exhibit complex periodic, non-stationary, and nonlinear patterns, making CatBoost particularly suitable for model construction. The algorithm automatically encodes categorical land use features into numerical values, minimizing encoding errors and mitigating bias from low-frequency categories. Additionally, CatBoost’s feature combination mechanism effectively addresses spatiotemporal heterogeneity (e.g., between static topographic features such as elevation and dynamic variables such as NDVI), enhancing the model’s environmental driver characterization. Hence, we developed a CatBoost GPP model to estimate the GPP of Shanxi Province. The construction of CatBoost GPP consisted of the following four steps: (1) We enumerated various segmentation methods to construct trees. The CatBoost GPP model grades each tree based on the predicted result accuracy during the training phase and assigns higher weights to decision trees with better predictive performance. (2) The GPP dataset was divided into training and validation sets as follows: data beyond the 5th year were considered as the validation set when the data at a site spanned more than 5 years; otherwise, final-year data were considered as the validation set. Finally, 3864 samples, comprising 2806 training and 1058 validation samples, were collected. (3) We utilized the GPP training and validation samples to train CatBoost GPP. Guided by the loss function, root mean square error (RMSE) (Eq. 2), the proposed model gradually reduces the bias of noisy points to achieve an optimal regression tree structure. (4) We applied the best model trained by the self-generated GPP dataset to estimate the GPP of Shanxi Province. The final GPP estimation result was obtained by calculating the weighted sum of all decision trees. The RMSE was calculated as follows:

$$\begin{aligned} RMSE=\sqrt{\frac{1}{N}\sum _{i=1}^{N}(Y_{0,i} - Y_{m,i} )^{2}}, \end{aligned}$$

(2)

where N denotes the total number of samples, and $Y_{0,i}$ $Y_{m,i}$ are the observed and estimated values of i, respectively.

We implemented CatBoost GPP using the CatBoost regressor library with Python 3.7 programming language. The model was optimized during training using five parameters: n_estimators, learning_rate, max_depth, l2_leaf_reg, and loss_function²⁵. Firstly, we used “RMSE” as the loss_function. Secondly, based on prior experience, we initially set n_estimators to 400, and selected the most probable value of other parameters as the search space (Table 6). Then, we combined the grid search method with 10-fold cross-validation to determine the optimal value of the CatBoost model. Finally, we further tested all n_estimator values in the range of 50–1500 with an interval of 50 and observed that the RMSE stabilized when n_estimators = 750 (Figure 3).

Table 6 Parameter range and optimal parameters for grid search.

Full size table

Accuracy assessment and performance evaluation

We compared the results of CatBoost GPP with those of seven other models: SVM, RF, LightGBM, XGBoost, LSTM (Long Short-Term Memory), MODIS GPP, and GOSIF GPP. Three metrics were used to evaluate the accuracy of the results: $R^{2}$, RMSE, and mean absolute error (MAE). The equations for $R^{2}$ and MAE are as follows:

$$\begin{aligned} & R^{2} =\left[ \frac{ {\textstyle \sum _{i=1}^{N}}\left( Y_{0,i}-\overline{Y_{0} } \right) \left( Y_{m,i}-\overline{Y_{m} } \right) }{\sqrt{ {\textstyle \sum _{i=1}^{N}} \left( Y_{0,i}-\overline{Y_{0} } \right) ^{2} {\textstyle \sum _{i=1}^{N}\left( Y_{m,i}-\overline{Y_{m} } \right) ^{2}} } } \right] ^{2}, \end{aligned}$$

(3)

$$\begin{aligned} & MAE=\frac{1}{N} \sum _{i=1}^{N} \left| Y_{0,i}- Y_{m,i} \right| , \end{aligned}$$

(4)

where N denotes the total number of samples; $Y_{0,i}$ and $Y_{m,i}$ are the observed and estimated values of i, respectively; and $\overline{Y_{0}}$ and $\overline{Y_{m}}$ represent the averages of the observed and estimated data, respectively.

TSA and MK trend test

We combined the TSA⁵² and MK⁵³ trend tests to assess the GPP change trends. The TSA can be computed as follows:

$$\begin{aligned} S=Median\left( \frac{x_{j}-x_{i} }{j-i} \right) \end{aligned}$$

(5)

where $x_{j}$ and $x_{i}$ represent the GPP in years j and i, respectively ($2001\le i< j\le 2022$); $S> 0$ indicates an upward trend in the GPP in Shanxi Province during this time series, whereas $S< 0$ indicates a downward trend.

The MK trend test is a nonparametric statistical method used for trend analysis of time-series data. The equation is as follows:

$$\begin{aligned} Z=\left\{ \begin{array}{cl} \frac{S-1}{\sqrt{\operatorname {VAR(S)}}} & S>0 \\ 0 & S=0, \\ \frac{S+1}{\sqrt{\operatorname {VAR}(S)}} & S<0 \end{array}\right. \end{aligned}$$

(6)

where Z denotes the statistic S and is the Theil–Sen median. At a given significance level $\alpha$, $\left| Z \right| > Z_{1-(\alpha /2 )}$ indicates a significant change in the data series at the $\alpha$ level. We selected a significance level of $\alpha =0.05$ to classify the GPP trend changes as shown in Table 7.

Table 7 Classification criteria of GPP temporal changes at a significance level of $\alpha =0.05$.

Full size table

Hurst exponent

The Hurst exponent (H) quantitatively represents the persistence of time-series data. We divided H into five trends: (1) H approaches zero: greater anti-persistence; (2) $0<H<0.5$: future trends are expected to reverse from past trends; (3) $H = 0.5$: no change in the data; (4) $0.5< H < 1$: the process has continuous characteristics, and the future trend is consistent with the past; and (5) H approaches 1: greater persistence⁵⁴. We combined the TSA results and Hurst exponents to classify the results into five categories (Table 8). The classification results were used to analyze the sustainability of GPP changes.

Table 8 Classification of the sustainability of GPP changes.

Full size table

Geodetector model

We used a Geodetector model to quantify the effects of different variables on the spatial distribution of GPP in Shanxi Province by performing factor and interaction detection.

(1) Factor detection: We quantified the magnitude of the influence of environmental factors on GPP. A factor that significantly influences the GPP should have a spatial distribution that closely resembles that of the GPP. Suppose $q\in \left[ 0, 1 \right]$ refers to the explanatory power of a single factor on GPP spatial variation, and h is the category of variable GPP or factor X. The number of units for layer h and the whole region is set to $N_{h}$ and N, respectively, and $\alpha _{h}^{2}$ and $\alpha ^{2}$ are the strata and the study area of GPP, respectively. The value of q is calculated as follows⁵⁵:

$$\begin{aligned} q=1-\frac{ {\textstyle \sum _{h=1}^{L}}N_{h} \sigma _{h}^{2} }{N\sigma ^{2} } \end{aligned}$$

(7)

(2) Interaction detection: The interactive effect of the two drivers on GPP is determined by comparing the q-values of the two environmental factors after interaction $\left[ q(X_{1} \cap X_{2}) \right]$ with the q-values of the individual environmental factors $[q(X_{1})$ and $q(X_{2})]$⁵⁶ (Table 9).

Table 9 Types of interaction.

Full size table

Partial and composite correlation analyses

We used partial⁵⁷ and composite correlation analyses⁵⁸ to establish classification criteria for climate-driven factors (Table 10) at a significance level of p = 0.05 to clarify the response mechanism of GPP to precipitation and temperature in Shanxi Province.

Table 10 Rules of climatic driving factors for GPP variation. $PR_{GPP-T}$ is the partial correlation coefficient between GPP and temperature, $PR_{GPP-P}$ is the partial correlation coefficient between GPP and precipitation, and $MR_{GPP-T-P}$ is the composite correlation coefficient between GPP and temperature/precipitation.

Full size table

Results

Model performance evaluation

Figure 4 shows the results of the accuracy evaluation of each method. The CatBoost GPP achieved the highest consistency with SOD ($R^{2}$ = 0.890, RMSE = 1.155 gC$\textrm{m}^{-2}\, \textrm{day}^{-1}$, and MAE = 0.772 gC $\, \textrm{m}^{-2}\, \textrm{day}^{-1}$). Compared to traditional estimation models (MODIS GPP and GOSIF GPP), CatBoost GPP exhibited a 0.438 higher $R^{2}$ value, 1.42 gC $\, \textrm{m}^{-2}\, \textrm{day}^{-1}$ lower RMSE and 0.756 gC$\, \textrm{m}^{-2}\, \textrm{day}^{-1}$ lower MAE. Furthermore, CatBoost consistently outperformed all evaluated mainstream ML methods, including RF, SVM, LightGBM, XGBoost, and LSTM. Specifically, the $R^{2}$ of CatBoost improved by 1.14–2.53%, while the RMSE and MAE decreased by 3.91–8.48% and 3.86–7.66%, respectively. Notably, CatBoost achieved a 5.62% reduction in MAE compared to the second-best method XGBoost. This demonstrates that the proposed method effectively addresses the overestimation and underestimation issues of the comparative models. CatBoost GPP accurately captures complex nonlinear relationships between GPP and feature variables. The fitted line of our method is closest to the ideal 1:1 line, and its predictions demonstrate a more uniform distribution around this line, indicating greater reliability compared with other methods.

Spatiotemporal variations in GPP in Shanxi Province from 2001 to 2022

The proposed method was applied to estimate the GPP in Shanxi Province from remote sensing images. We compared the results of CatBoost GPP with those of MODIS GPP and GOSIF GPP products to evaluate the accuracy and validity of our method. Finally, we analyzed the temporal and spatial changes in the GPP in Shanxi Province.

Temporal variations in GPP

The annual average GPP values from CatBoost GPP, MODIS GPP, and GOSIF GPP exhibited significant upward trends (Fig. 5). The CatBoost GPP values had an annual fluctuation range of 555.31–1008.14 gC$\, \textrm{m}^{-2}\, \textrm{year}^{-1}$, with a mean value of 844.45 gC$\, \textrm{m}^{-2}\, \textrm{year}^{-1}$ and an annual increase of $\sim$20.58 gC$\, \textrm{m}^{-2}\, {year}^{-1}$. The results of CatBoost GPP were consistent with those of MODIS GPP and GOSIF GPP in terms of trends and growth. All three annual GPP curves exhibited low values in 2001, 2005, 2009, 2011, 2015, and 2019, which may have been due to extreme meteorological events⁵⁹.

Spatial variations in GPP

We normalized the maximum value of GPP from 2001 to 2022 that was obtained using different methods. This decreased the discrepancies between GPP estimates, allowing us to accurately compare and analyze the spatial distribution patterns of GPP in Shanxi Province. The spatial distributions of the CatBoost GPP, GOSIF GPP, and MODIS GPP results were highly consistent (Fig. 6), exhibiting a pattern of “low in the northwest and high in the southeast,” with distinct banding characteristics. A high GPP was observed in the western Lvliang Mountains, eastern Taihang Mountains, and central and southern river valley plains. The GPP was low in the central basin, loess hilly belt along the Yellow River west of the Lvliang Mountains, and urban areas.

From the perspective of administrative regions, Jincheng had the highest GPP, followed by Yuncheng and Changzhi. The three cities with the lowest values were Xinzhou, Datong, and Shuozhou. Jincheng’s GPP was 970.15 gC$\, \textrm{m}^{-2}\, \textrm{year}^{-1}$, whereas Shuozhou’s GPP was only 462.15 gC$\, \textrm{m}^{-2}\, \textrm{year}^{-1}$. Spatial differences in the GPP were affected by both natural and anthropogenic factors. Shuozhou had a higher latitude, lower precipitation, and lower temperatures than Jincheng, leading to a more fragile ecosystem⁶⁰. Furthermore, during the study period, Shuozhou experienced rapid GDP growth and large-scale mining activities⁶¹, leading to low GPP.

Significant differences were observed between the GPP results of different methods because of differences in data sources, parameter settings, model selection, spatial scales, and estimation timescales. MODIS GPP estimates GPP based on a light-use efficiency model, while GOSIF GPP directly estimates the GPP based on the solar-induced chlorophyll fluorescence emitted by plants during photosynthesis, reflecting the actual growth status of vegetation. As GOSIF GPP considers various environmental stressors and anthropogenic factors, it can capture the spatial distribution in greater detail^62,63. The CatBoost GPP results were highly consistent with those of GOSIF GPP with respect to both temporal and spatial variations. However, CatBoost GPP yielded a more detailed spatial distribution with a spatial resolution of 1000 m. As GOSIF GPP and CatBoost GPP have different spatial resolutions, the $R^{2}$ value between CatBoost GPP and GOSIF GPP was 0.63 (Fig. 7a). Wang et al.⁶⁴, Liu et al.⁶⁵, and Yu et al.⁶⁶ reported that MODIS GPP underestimates the GPP, especially in northern China. The values obtained with CatBoost GPP were higher than those obtained with MODIS GPP, with an $R^{2}$ value of 0.73 between the two models (Figure 7(b)), confirming that our method effectively addresses GPP underestimation.

In addition, we compared the CatBoost GPP results with the results of previous GPP estimation research (Table 11). Following cross-validation, our method obtained reasonable GPP results for Shanxi Province. Therefore, CatBoost GPP is suitable for investigating the spatiotemporal trends in and drivers of GPP in Shanxi Province.

Table 11 Estimation of GPP in Shanxi Province using different terrestrial models.

Full size table

Annual GPP trends and consistency of future GPP trends

Overall, the GPP in Shanxi Province noticeably improved from 2001 to 2022 (Fig. 8a). The percentage of areas characterized by significant improvement, slight improvement, slight degradation, and considerable degradation was 87%, 9%, 3%, and 1%, respectively. The areas of significant decline were mainly concentrated in densely populated areas of the cities. The largest proportion of the decline in GPP (8.1%) occurred in Taiyuan due to two main reasons. First, Taiyuan’s urbanization process has accelerated, and the type of land cover has changed rapidly, with large areas of vegetation being converted to construction land. Second, the increase in carbon emissions from human activities indirectly led to a decrease in GPP. Meanwhile, approximately 98% of Linfen and Lvliang had significant increases in GPP. Ecological improvement in the two cities mainly stemmed from the Chinese government’s ecological projects, such as afforestation efforts in the LP since the late 20th century⁷⁵.

The spatial distribution of the Hurst exponents for the GPP time series is shown in Fig. 8b. Regions in which the GPP trend was expected to remain consistent (0.5 < Hurst exponents < 1) covered 39.54% of the study area, whereas those in which the trend was expected to reverse (0 < Hurst exponents < 0.5) covered 60.46%. The average Hurst exponent for Shanxi Province was 0.48, suggesting weak anti-persistence in GPP development. In the future, while GPP will continue to rise, the growth rate will gradually decline. Furthermore, the vegetation carbon sequestration capacity may stabilize or even decrease without human intervention.

We combined the trend analysis and Hurst exponent results to predict the sustainability of GPP variations (Fig. 8c). In Shanxi Province, 37.2% of regions exhibited continuous improvement, 58.8% exhibited degradation, 2.34% exhibited continuous degradation, and 1.67% exhibited improvement. Notably, the improved and continuously degraded areas were concentrated in the centers and surrounding areas of the municipal administrative divisions. These areas had dense populations and underwent rapid land use changes, resulting in a high stochasticity of vegetation change. The persistently improved areas were mainly distributed in the Yuncheng Basin, Datong Basin, and Yellow River area on the left side of the Lvliang Mountains. The GPP degradation area was the largest, indicating that the ecological environment in most parts of Shanxi Province was unstable. Continued vegetation protection efforts are necessary to achieve sustainable development.

Quantitative analysis of GPP changes

Effects of natural and anthropogenic factors

The degree of influence of each factor on GPP was in the following order: land use type > precipitation > temperature > slope > elevation > population > GDP (Fig. 9). The land use type, with a q-value of 0.46, had the strongest explanatory power for GPP changes, followed by precipitation, with a q-value of 0.32, which was the main natural factor influencing spatial GPP variation. The effects of various driving factors changed from 2001 to 2022. Among natural factors, the q-values of temperature and elevation increased continuously, whereas those of precipitation and slope decreased. The explanatory power of anthropogenic factors (land use type, population, and GDP) continuously increased. This indicates that human activities, such as ecological protection measures, natural resource management, and urban development, had a significant impact on GPP changes. Human activities positively contributed to the ecological development of Shanxi Province (Figs. 8a, 9).

Effects of interactions between factors

The effects of the interactions between factors on the GPP were stronger than the effects of single factors, and interactions between factors were categorized as bivariable enhanced or nonlinear enhanced (Fig. 10). The interaction between land use type and precipitation yielded the highest q-value, and their respective interactions with other factors were also significant. Among these, the interaction between precipitation and temperature significantly affected the spatial distribution of GPP, evolving from nonlinear enhanced to bivariable enhanced over time. The q-values of the interactions between precipitation and elevation, precipitation and slope, temperature and elevation, and temperature and slope significantly increased compared with those of single factors. This is because elevation and slope serve as the underlying surfaces for vegetation growth, affecting sunlight and hydrothermal conditions, thereby influencing the spatial distribution of GPP. The interactions between population and elevation, as well as those between population and temperature, are complex.

Discussion

Interpretability analysis based on SHAP

We employed SHAP value-based interpretability analysis to quantitatively evaluate the relative importance of features (Fig. 11a) and their direction of influence on GPP estimation (Fig. 11b). The SHAP values effectively quantified each feature’s contribution to the model output, with higher values indicating that the feature had a more significant impact on the results⁷⁶. According to the SHAP values of the driving factors (Fig. 11a), time-series features were essential for constructing the CatBoost GPP estimation model. Among them, NDVI demonstrated the most significant impact on the CatBoost GPP model, followed by FPAR. Notably, NDVI and FPAR can directly characterize the spatiotemporal heterogeneity of vegetation growth status and photosynthetic capacity^77,78, thereby playing a dominant role in the model’s estimation process. As intra-annual changes in GPP vary significantly more than inter-annual changes, temporal features play a crucial role in the GPP estimation process. LST is an important environmental influencing factor in GPP estimation²⁰, yet its influence on the model was lower than that of temporal feature. This is mainly because LST changes are not completely synchronous with vegetation growth rhythms, and they vary in impact across different growth stages⁷⁹. Moreover, GPP is influenced by other environmental factors (e.g., precipitation and evapotranspiration), which reduces the effect of LST on the model. Land use changes directly affect GPP and are the most immediate manifestation of human impacts on terrestrial ecosystems⁸⁰. However, as a category-type feature, land use exhibited a weaker effect on the model than the time-series feature. Meanwhile, topographic factors such as DEM and slope, which represent terrain elevation and gradient, respectively, affect vegetation spatial distribution patterns by mediating the redistribution of hydrothermal factors, thereby affecting vegetation composition and structure⁸¹. Incorporated as constraints in CatBoost GPP estimation, these prevent overestimation in steep, high-elevation areas.

In Fig. 11b, the vertical axis represents feature variables, while the horizontal axis shows the SHAP values. Each point corresponds to a real sample, and broader point distributions indicate higher sample densities. The color gradient from blue (low) to red (high) visually demonstrates the degree and direction of influence of features. Specifically, higher values of NDVI, FPAR, and LST are associated with positive SHAP values, suggesting that these features stimulate vegetation GPP. High values of these characteristics typically indicate favorable vegetation growth conditions and environments, which enhance the vegetation’s carbon sequestration capacity. Conversely, the SHAP values of DEM and slope are mostly negative at high values, indicating that high elevations and steep slopes have a significant inhibitory effect on vegetation GPP. This is because high elevations and steep slopes values suppress vegetation productivity through direct environmental stressors, such as low temperature and drought, as well as indirect resource limitations, such as soil erosion and nutrient depletion. This leads to predominantly negative SHAP values being associated with high DEM and slope values. This characteristic improves the accuracy of vegetation GPP estimation in regions with large variations in elevation and slope, such as Shanxi Province.

Analysis of factors affecting GPP in Shanxi Province

We utilized a Geodetector model to identify the main drivers of GPP spatiotemporal variation in Shanxi Province and found that land use type was the main driver of GPP evolution. Previous studies indicated that land use change driven by human activity has substantially altered the structure and function of natural ecosystems, affecting the terrestrial carbon flux⁸²between 2001 and 2022, 27% of land use types in Shanxi Province underwent change (Fig. 12). The farmland area had the greatest increase (27621 km$^{2}$), which primarily occurred in Linfen, Yuncheng, southern Lvliang, and western Jincheng. The high carbon sequestration capacity of farmland and forest land had a notable positive impact on the GPP of Shanxi Province. Conversely, the continuous expansion of urban construction land led to a substantial decline in GPP in medium and large cities. Furthermore, considerable uncertainty surrounds future GPP trends (Fig. 8c).

Despite variations in land use type, GPP increased overall, with grassland experiencing the greatest increase in GPP (29.07 Tg C). Grassland constitutes the primary land use type in Shanxi Province, accounting for approximately 53% of the region, primarily in the northwestern section (e.g., Datong and Shuozhou). These areas have been impacted by coal mining, desertification, and land degradation, resulting in a fragile ecological environment. Nevertheless, the implementation of national ecological projects, including the Beijing–Tianjin Sandstorm Source Control Project, Three-North Shelterbelt Project, and Grain for Green Program^30,32, has led to improvements in the ecological environment and consequently, an increase in GPP.

The interaction detector revealed that the interaction between temperature and precipitation had a complex effect on the GPP spatial pattern in the study area. Therefore, we used partial⁵⁷ and composite correlation analyses⁵⁸ to establish classification criteria for climate-driven factors (Table 10) at a significance level of p = 0.05 to clarify the response mechanism of GPP to precipitation and temperature in Shanxi Province.

The results of partial and composite correlation analyses are shown in Fig. 13 The average partial correlation coefficient between GPP and temperature was 0.28, and that between GPP and precipitation was 0.37. The composite correlation coefficient was 0.44. The spatial distribution exhibited a pattern of “weak in the north and south and strong in the central region”, with climatic factors driving approximately 90.54% of GPP variation in Shanxi Province. However, the majority of areas (54.41%) were weakly driven by temperature and precipitation. Areas with significant GPP changes were primarily influenced by precipitation (23.53%) and were concentrated in the central part of the province. The regions where GPP was driven solely by temperature (4.96%) were scattered across the study area. Notably, in the southwestern and northwestern regions of Shanxi Province, GPP increases were closely related to human activities.

These findings are closely related to the geographical location of Shanxi Province. Precipitation is a major factor controlling vegetation growth and largely determines the spatial distribution of vegetation⁸³. Shanxi is a semi-humid and semi-arid region with low rainfall. In particular, the correlation between precipitation and GPP was significantly lower in the northern areas than in other regions. Although rainfall has increased in recent years, it has historically been a key factor limiting vegetation growth in the northern part of Shanxi Province. Furthermore, Shanxi is located in the mid-latitude region, in which the temperature difference between the north and south is small and the temperature is stable. This resulted in a lower sensitivity of GPP to temperature. However, the temperature affects surface evapotranspiration. As the temperature increases, the effective water content in the soil decreases, thereby slowing the growth and development of vegetation⁸⁴.

Topographic conditions such as slope and elevation generally remain stable in natural environments, and they mainly influence vegetation biomass, carbon storage, and carbon sequestration potential by regulating soil, water, and light resources⁸⁵. In Shanxi Province, the explanatory power of these factors was lower than that of climatic factors; however, the interacting effect of topographic and climatic factors exerted a notable influence on the GPP. This may have been related to the local redistribution of precipitation and temperature caused by topographical factors⁸⁶. In addition, areas with gradual slopes or level ground experience frequent human interference, hindering vegetation growth, but steep-sloped areas with infrequent human activity enable vegetation to grow better and therefore have a greater impact on GPP. As for elevation, in recent years, large-scale ecological restoration and afforestation projects have altered the natural vertical distribution of vegetation. The distribution of planted forests spans various elevations, which may reduce the effect of elevation on vegetation.

Study limitation

Although the proposed CatBoost GPP model showed superior performance in estimating GPP at the site scale, several issues remain that hindered its performance. The uncertainties and limitations of the proposed method were mainly attributed to the following aspects. First, owing to the difficulty and high cost of obtaining field-measured GPP data and the lack of publicly available GPP observations for Shanxi Province, we cross-validated the estimation results using data from MODIS GPP, GOSIF GPP, and previous studies. Although these datasets have been widely validated for GPP estimation, incorporating field measurements would further reduce accuracy uncertainties. Secondly, vegetation photosynthesis is a complex process influenced by multi-scale environmental, atmospheric, and physiological factors. In the future, we will consider additional drivers, such as carbon dioxide concentration⁸⁷ and solar radiation¹⁶, to explore the effect of each characteristic variable on the model. Finally, we employed the dataset division method used in previous studies²³, which has scientific validity. However, due to the lack of observational datasets, this division may be biased due to inter-annual variations. In future studies, we will collect more data to compare the effects of different dataset division methods on the model estimation results, and further verify the validity of dataset division.

Conclusions

We developed a novel GPP estimation framework based on CatBoost that synergistically combines ML techniques with vegetation ecophysiological mechanisms. The CatBoost GPP model exhibited superior performance ($R^{2}$ = 0.890, RMSE = 1.155 gC $\textrm{m}^{-2} \, \textrm{day}^{-1}$, MAE = 0.772 gC $\, \textrm{m}^{-2}\, \textrm{day}^{-1}$) compared with that of the RF, SVM, LightGBM, XGBoost, LSTM, GOSIF GPP, and MODIS GPP methods. We applied the CatBoost GPP model to estimate the GPP of Shanxi Province from 2001 to 2022. Based on the results, we further investigated the spatiotemporal evolution of the GPP and its driving factors. The main conclusions were as follows:

(1) Spatially, the GPP was generally low in the northwest and high in the southeast, with distinct band features. From 2001 to 2022, the GPP exhibited significant fluctuating growth at a rate of 20.58 gC $\textrm{m}^{-2}\, \textrm{year}^{-1}$. However, the recent trend in GPP indicates weak anti-persistence, with 58.8% of the land potentially facing degradation in the future.

(2) Human activities contributed to an increase in the GPP, with land use type being the most significant factor. In the northwestern and southwestern regions of Shanxi Province, GPP changes were primarily driven by human activities, whereas GPP changes in the central region were predominantly affected by climatic factors, particularly precipitation.

(3) Factor interactions were either bivariable or nonlinear enhanced. The combined effect of precipitation and temperature had the most complex and significant impact on GPP, with an explanatory power of 46%, covering approximately 62.05% of the study area.

Data availability

The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy or ethical restrictions.

References

Shilong, P. et al. Evaluation of terrestrial carbon cycle models for their response to climate variability and to CO2 trends. Glob. Change Biol. https://doi.org/10.1111/gcb.12187 (2013).
Article Google Scholar
Chen, M. et al. Regional contribution to variability and trends of global gross primary productivity. Environ. Res. Lett. 12, 105005. https://doi.org/10.1088/1748-9326/aa8978 (2017).
Article CAS Google Scholar
Wu, C., Munger, J. W., Niu, Z. & Kuang, D. Comparison of multiple models for estimating gross primary production using Modis and eddy covariance data in harvard forest. Remote Sens. Environ. 114, 2925–2939. https://doi.org/10.1016/j.rse.2010.07.012 (2010).
Article Google Scholar
Xing, L. et al. Solar-induced chlorophyll fluorescence is strongly correlated with terrestrial photosynthesis for a wide variety of biomes: First global analysis based on OCO2 and flux tower observations. Glob. Change Biol. https://doi.org/10.1111/gcb.14297 (2018).
Article Google Scholar
Göckede, M. et al. Quality control of carboeurope flux data—-Part 1: Coupling footprint analyses with flux data quality assessment to evaluate sites in forest ecosystems. Biogeosciences 5, 433–450. https://doi.org/10.5194/bg-5-433-2008 (2008).
Article Google Scholar
Running, S. W. et al. A continuous satellite-derived measure of global terrestrial primary production. BioScience 54, 547–560. https://doi.org/10.1641/0006-3568(2004)054[0547:ACSMOG]2.0.CO;2 (2004).
Article Google Scholar
Yuan, W. et al. Deriving a light use efficiency model from eddy covariance flux data for predicting daily gross primary production across biomes. Agric. Forest Meteorol. 143, 189–207. https://doi.org/10.1016/j.agrformet.2006.12.001 (2007).
Article Google Scholar
Law, B. et al. Environmental controls over carbon dioxide and water vapor exchange of terrestrial vegetation. Agric. Forest Meteorol. 113, 97–120. https://doi.org/10.1016/S0168-1923(02)00104-1 (2002). FLUXNET 2000 Synthesis.
Huntzinger, D. et al. Uncertainty in the response of terrestrial carbon sink to environmental drivers undermines carbon-climate feedback predictions. Sci. Rep. 7, 4765 (2017).
Article CAS PubMed PubMed Central Google Scholar
Frank, D. et al. Effects of climate extremes on the terrestrial carbon cycle: Concepts, processes and potential future impacts. Glob. Change Biol. 21, 2861–2880. https://doi.org/10.1111/gcb.12916 (2015).
Article Google Scholar
Stocker, B. D. et al. Drought impacts on terrestrial primary production underestimated by satellite monitoring. Nat. Geosci. 12, 264–270. https://doi.org/10.1038/S41561-019-0318-6 (2019).
Article CAS Google Scholar
Ueyama, M. et al. Upscaling terrestrial carbon dioxide fluxes in Alaska with satellite remote sensing and support vector regression. J. Geophys. Res. Biogeosci. 118, 1266–1281. https://doi.org/10.1002/jgrg.20095 (2023) https://agupubs.onlinelibrary.wiley.com/doi/pdf/10.1002/jgrg.20095..
Article CAS Google Scholar
Bai, Y., Liang, S. & Yuan, W. Estimating global gross primary production from sun-induced chlorophyll fluorescence data and auxiliary information using machine learning methods. Remote Sens. https://doi.org/10.3390/rs13050963 (2021).
Article Google Scholar
Papale, D. et al. Effect of spatial sampling from European flux towers for estimating carbon and water fluxes with artificial neural networks. J. Geophys. Res. Biogeosci. 120, 1941–1957. https://doi.org/10.1002/2015JG002997 (2015).
Article Google Scholar
Yao, Y. et al. A new estimation of China’s net ecosystem productivity based on eddy covariance measurements and a model tree ensemble approach. Agric. For. Meteorol. 253–254, 84–93. https://doi.org/10.1016/j.agrformet.2018.02.007 (2018).
Article Google Scholar
Cho, S. et al. Evaluation of forest carbon uptake in South Korea using the national flux tower network, remote sensing, and data-driven technology. Agric. Forest Meteorol. 311, 108653. https://doi.org/10.1016/j.agrformet.2021.108653 (2021).
Article Google Scholar
Guo, R. et al. Estimating global GPP from the plant functional type perspective using a machine learning approach. J. Geophys. Res. Biogeosci. https://doi.org/10.1029/2022jG007100 (2023).
Article Google Scholar
Sarkar, D. P., Uma Shankar, B. & Ranjan Parida, B. A novel approach for retrieving GPP of evergreen forest regions of India using random forest regression. Remote Sens. Appl. Soc. Environ. 33, 101116. https://doi.org/10.1016/j.rsase.2023.101116 (2024).
Article Google Scholar
Xiao, J., Chen, J., Davis, K. J. & Reichstein, M. Advances in upscaling of eddy covariance measurements of carbon and water fluxes. J. Geophys. Res. https://doi.org/10.1029/2011JG001889 (2012).
Article Google Scholar
Ichii, K. et al. New data?driven estimation of terrestrial CO2 fluxes in Asia using a standardized database of eddy covariance measurements, remote sensing data, and support vector regression. J. Geophys. Res. Biogeosci. 122, 767–795. https://doi.org/10.1002/2016JG003640 (2017).
Article CAS Google Scholar
Qin, Z. et al. Identification of important factors for water vapor flux and CO2 exchange in a cropland. Ecol. Model. 221, 575–581. https://doi.org/10.1016/j.ecolmodel.2009.11.007 (2010).
Article CAS Google Scholar
Wu, W., Gong, C., Li, X., Guo, H. & Zhang, L. An online deep convolutional model of gross primary productivity and net ecosystem exchange estimation for global forests. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 12, 5178–5188. https://doi.org/10.1109/JSTARS.2019.2954556 (2019).
Article Google Scholar
Dou, X. & Yang, Y. Modeling and predicting carbon and water fluxes using data-driven techniques in a forest ecosystem. Forests https://doi.org/10.3390/f8120498 (2017).
Article Google Scholar
Lu, J., Wang, G., Feng, D. & Nooni, I. K. Improving the gross primary production estimate by merging and downscaling based on deep learning. Forests https://doi.org/10.3390/f14061201 (2023).
Article Google Scholar
Prokhorenkova, L., Gusev, G., Vorobev, A., Dorogush, A. V. & Gulin, A. Catboost: unbiased boosting with categorical features. Adv. Neural Inf. Process. Syst. 31. https://doi.org/10.48550/arXiv.1706.09516 (2018).
Wu, C., Ju, Y., Yang, S., Zhang, Z. & Chen, Y. Reconstructing annual xCO2 at a 1 km x 1 km spatial resolution across China from 2012 to 2019 based on a spatial CatBoost method. Environ. Res. 236, 116866. https://doi.org/10.1016/j.envres.2023.116866 (2023).
Article CAS PubMed Google Scholar
Ding, Y., Chen, Z., Lu, W. & Wang, X. A CatBoost approach with wavelet decomposition to improve satellite-derived high-resolution pm2.5 estimates in Beijing-Tianjin-Hebei. Atmos. Environ. 249, 118212. https://doi.org/10.1016/j.atmosenv.2021.118212 (2021).
Zheng, Y. & Rad, R. Transforming GPP estimation in terrestrial ecosystems using remote sensing and transformers. In 2024 IEEE Conference on Artificial Intelligence (CAI). 1456–1461. https://doi.org/10.1109/CAI59869.2024.00262 (2024).
Liu, Y. et al. Ada-xg-CatBoost: A combined forecasting model for gross ecosystem product (GEP) prediction. Sustainability https://doi.org/10.3390/su16167203 (2024).
Article Google Scholar
Gong, C., Lyu, F. & Wang, Y. Spatiotemporal change and drivers of ecosystem quality in the Loess plateau based on RSEI: A case study of Shanxi, China. Ecol. Indic. 155, 111060. https://doi.org/10.1016/j.ecolind.2023.111060 (2023).
Article Google Scholar
Zou, X., Wang, R., Hu, G., Rong, Z. & Li, J. CO2 emissions forecast and emissions peak analysis in Shanxi province, China: An application of the leap model. Sustainability https://doi.org/10.3390/su14020637 (2022).
Article Google Scholar
Li, H., He, Y., Zhang, L., Cao, S. & Sun, Q. Spatiotemporal changes of gross primary production in the Yellow River Basin of China under the influence of climate-driven and human-activity. Glob. Ecol. Conserv. 46, e02550. https://doi.org/10.1016/j.gecco.2023.e02550 (2023).
Article Google Scholar
Zhang, X., Liu, K., Li, X., Wang, S. & Wang, J. Vulnerability assessment and its driving forces in terms of NDVI and GPP over the Loess Plateau, China. Phys. Chem. Earth Parts A/B/C 125, 103106. https://doi.org/10.1016/j.pce.2022.103106 (2022).
Article Google Scholar
Gong, E., Zhang, J., Wang, Z. & Wang, J. Estimating the dynamics and driving factors of gross primary productivity over the Chinese Loess Plateau by the modified vegetation photosynthesis model. Ecol. Inform. 83, 102838. https://doi.org/10.1016/j.ecoinf.2024.102838 (2024).
Article Google Scholar
Cao, R. et al. Shifts in ecosystem water use efficiency on China’s Loess Plateau caused by the interaction of climatic and biotic factors over 1985–2015. Agric. For. Meteorol. 291, 108100. https://doi.org/10.1016/j.agrformet.2020.108100 (2020).
Article Google Scholar
Li, D. et al. Drought limits vegetation carbon sequestration by affecting photosynthetic capacity of semi-arid ecosystems on the Loess Plateau. Sci. Total Environ. 912, 168778. https://doi.org/10.1016/j.scitotenv.2023.168778 (2024).
Article CAS PubMed Google Scholar
Zhang, L. et al. Assessing the responses of ecosystem patterns, structures and functions to drought under climate change in the Yellow River Basin, China. Sci. Total Environ. 929, 172603. https://doi.org/10.1016/j.scitotenv.2024.172603 (2024).
Article CAS PubMed Google Scholar
Xiong, L.-Y. et al. Past rainfall-driven erosion on the Chinese loess plateau inferred from archaeological evidence from Wucheng City, Shanxi. Commun. Earth Environ. 4, 4. https://doi.org/10.1038/s43247-022-00663-8 (2023).
Article Google Scholar
Fu, B. et al. Assessing the soil erosion control service of ecosystems change in the Loess Plateau of China. Ecol. Complex. 8, 284–293. https://doi.org/10.1016/j.ecocom.2011.07.003 (2011).
Article Google Scholar
Li, S., Wang, J., Zhang, M. & Tang, Q. Characterizing and attributing the vegetation coverage changes in North Shanxi Coal Base of China from 1987 to 2020. Resour. Policy 74, 102331. https://doi.org/10.1016/j.resourpol.2021.102331 (2021).
Article Google Scholar
Reichstein, M. et al. On the separation of net ecosystem exchange into assimilation and ecosystem respiration: Review and improved algorithm. Glob. Change Biol. 11, 1424–1439 (2005).
Article Google Scholar
Yang, Q. et al. Quantitative assessment of the parameterization sensitivity of the Noah-MP land surface model with dynamic vegetation using chinaflux data. Agric. Forest Meteorol. 307, 108542 (2021).
Article Google Scholar
Wang, Y. et al. Daily estimation of gross primary production under all sky using a light use efficiency model coupled with satellite passive microwave measurements. Remote Sens. Environ. 267, 112721. https://doi.org/10.1016/j.rse.2021.112721 (2021).
Article Google Scholar
Yu, G.-R. et al. Overview of chinaflux and evaluation of its eddy covariance measurement. Agricultural and Forest Meteorology 137, 125–137 (2006).
Article Google Scholar
Lasslop, G. et al. Separation of net ecosystem exchange into assimilation and respiration using a light response curve approach: Critical issues and global evaluation. Glob. Change Biol. 16, 187–208 (2010).
Article Google Scholar
Joiner, J. et al. Estimation of terrestrial global gross primary production (GPP) with satellite data-driven models and eddy covariance flux data. Remote Sens. 10, 1346 (2018).
Article Google Scholar
Liang, D. et al. Evaluation of the consistency of Modis land cover product (mcd12q1) based on Chinese 30 m globeland30 datasets: A case study in Anhui Province, China. ISPRS Int. J. Geo-Inf. 4, 2519–2541. https://doi.org/10.3390/ijgi4042519 (2015).
Article Google Scholar
Yao, Z., Zhang, L., Tang, S., Li, X. & Hao, T. The basic characteristics and spatial patterns of global cultivated land change since the 1980s. J. Geogr. Sci. 27, 771–785. https://doi.org/10.1007/s11442-017-1405-5 (2017).
Article Google Scholar
Li, X. & Xiao, J. Mapping photosynthesis solely from solar-induced chlorophyll fluorescence: A global, fine-resolution dataset of gross primary production derived from OCO-2. Remote Sens. 11, 2563. https://doi.org/10.3390/rs11212563 (2019).
Article Google Scholar
Zhang, Y., Zhao, Z. & Zheng, J. Catboost: A new approach for estimating daily reference crop evapotranspiration in arid and semi-arid regions of Northern China. J. Hydrol. 588, 125087. https://doi.org/10.1016/j.jhydrol.2020.125087 (2020).
Article Google Scholar
Hancock, J. T. & Khoshgoftaar, T. M. Catboost for big data: An interdisciplinary review. J. Big Data 7, 94. https://doi.org/10.1186/s40537-020-00369-8 (2020).
Article PubMed PubMed Central Google Scholar
Sayemuzzaman, M. & Jha, M. K. Seasonal and annual precipitation time series trend analysis in North Carolina, United States. Atmos. Res. 137, 183–194. https://doi.org/10.1016/j.atmosres.2013.10.012 (2014).
Article Google Scholar
Gocic, M. & Trajkovic, S. Analysis of changes in meteorological variables using Mann-Kendall and Sen’s slope estimator statistical tests in Serbia. Glob. Planet. Change 100, 172–182. https://doi.org/10.1016/j.gloplacha.2012.10.014 (2013).
Article Google Scholar
Tong, S. et al. Analyzing vegetation dynamic trend on the Mongolian plateau based on the hurst exponent and influencing factors from 1982–2013. J. Geogr. Sci. 28, 595–610. https://doi.org/10.1007/s11442-018-1493-x (2018).
Article Google Scholar
Zhang, S. et al. Using the geodetector method to characterize the spatiotemporal dynamics of vegetation and its interaction with environmental factors in the Qinba Mountains, China. Remote Sens. 14, 5794. https://doi.org/10.3390/rs14225794 (2022).
Article Google Scholar
Wang, Y., Zhang, Z. & Chen, X. Quantifying influences of natural and anthropogenic factors on vegetation changes based on geodetector: A case study in the Poyang Lake Basin, China. Remote Sens. 13, 5081. https://doi.org/10.3390/rs13245081 (2021).
Article Google Scholar
Zhan, Y. et al. Analysis on vegetation cover changes and the driving factors in the mid-lower reaches of Hanjiang River Basin between 2001 and 2015. Open Geosci. 13, 675–689. https://doi.org/10.1515/geo-2020-0259 (2021).
Article Google Scholar
Yan, Y. et al. Impacts of climate change and human activities on vegetation dynamics on the Mongolian Plateau, East Asia from 2000 to 2023. J. Arid Land 16, 1062–1079. https://doi.org/10.1007/s40333-024-0082-3 (2024).
Article Google Scholar
Liu, Y. et al. Changes of net primary productivity in China during recent 11 years detected using an ecological model driven by Modis data. Front. Earth Sci. 7, 112–127. https://doi.org/10.1007/s11707-012-0348-5 (2013).
Article CAS Google Scholar
Li, S., Zhao, Y., Xiao, W., Yue, W. & Wu, T. Optimizing ecological security pattern in the coal resource-based city: A case study in Shuozhou City, China. Ecol. Indic. 130, 108026. https://doi.org/10.1016/j.ecolind.2021.108026 (2021).
Article Google Scholar
Yang, R., Bai, Z. & Shi, Z. Linking morphological spatial pattern analysis and circuit theory to identify ecological security pattern in the Loess Plateau: Taking Shuozhou City as an example. Land 10, 907. https://doi.org/10.3390/land10090907 (2021).
Article Google Scholar
Li, X. & Xiao, J. A global, 0.05-degree product of solar-induced chlorophyll fluorescence derived from OCO-2, Modis, and reanalysis data. Remote Sens. 11, 517. https://doi.org/10.3390/rs11050517 (2019).
Wood, J. D. et al. Multiscale analyses of solar-induced florescence and gross primary production. Geophys. Res. Lett. 44, 533–541. https://doi.org/10.1002/2016GL070775 (2017).
Article Google Scholar
Wang, X. et al. Validation of Modis-GPP product at 10 flux sites in Northern China. Int. J. Remote Sens. 34, 587–599. https://doi.org/10.1080/01431161.2012.715774 (2013).
Article Google Scholar
Liu, Z., Shao, Q. & Liu, J. The performances of Modis-GPP and-et products in China and their sensitivity to input data (FPAR/LAI). Remote Sens. 7, 135–152. https://doi.org/10.3390/rs70100135 (2014).
Article Google Scholar
Yu, T., Zhang, Q. & Sun, R. Comparison of machine learning methods to up-scale gross primary production. Remote Sens. 13, 2448. https://doi.org/10.3390/rs13132448 (2021).
Article Google Scholar
Li, X. et al. Estimation of gross primary production over the terrestrial ecosystems in China. Ecol. Model. 261, 80–92. https://doi.org/10.1016/j.ecolmodel.2013.03.024 (2013).
Article Google Scholar
Yuan, W. et al. Global estimates of evapotranspiration and gross primary production based on Modis and global meteorology data. Remote Sens. Environ. 114, 1416–1431. https://doi.org/10.1016/j.rse.2010.01.022 (2010).
Article Google Scholar
Yao, Y. et al. Spatiotemporal pattern of gross primary productivity and its covariation with climate in China over the last thirty years. Glob. Change Biol. 24, 184–196. https://doi.org/10.1111/gcb.13830 (2018).
Article Google Scholar
Bo, Y. et al. Three decades of gross primary production (GPP) in China: Variations, trends, attributions, and prediction inferred from multiple datasets and time series modeling. Remote Sens. 14, 2564. https://doi.org/10.3390/rs14112564 (2022).
Article Google Scholar
Huang, Y., Yang, S. & Zhao, H. Distinct contributions of climate change and anthropogenic activities to evapotranspiration and gross primary production variations over Mainland China. Remote Sens. 16, 475. https://doi.org/10.3390/rs16030475 (2024).
Article Google Scholar
Xu, X. & Chen, D. Estimating global annual gross primary production based on satellite-derived phenology and maximal carbon uptake capacity. Environ. Res. 252, 119063. https://doi.org/10.1016/j.envres.2024.119063 (2024).
Article CAS PubMed Google Scholar
Gong, E., Ma, Z., Wang, Z. & Zhang, J. Spatiotemporal dynamics of vegetation productivity and its response to meteorological factors in China. Atmosphere 15, 491. https://doi.org/10.3390/atmos15040491 (2024).
Article Google Scholar
Li, X., Zou, L., Xia, J., Wang, F. & Li, H. Identifying the responses of vegetation gross primary productivity and water use efficiency to climate change under different aridity gradients across China. Remote Sens. 15, 1563. https://doi.org/10.3390/rs15061563 (2023).
Article Google Scholar
Ren, Z., Tian, Z., Wei, H., Liu, Y. & Yu, Y. Spatiotemporal evolution and driving mechanisms of vegetation in the Yellow River Basin, China during 2000–2020. Ecol. Indic. 138, 108832. https://doi.org/10.1016/j.ecolind.2022.108832 (2022).
Article Google Scholar
Lundberg, S.M. & Lee, S. A unified approach to interpreting model predictions. CoRR abs/1705.07874 (2017).
Zhang, R., Zhou, Y., Luo, H., Wang, F. & Wang, S. Estimation and analysis of spatiotemporal dynamics of the net primary productivity integrating efficiency model with process model in karst area. Remote Sens. https://doi.org/10.3390/rs9050477 (2017).
Article Google Scholar
Chang, X. et al. Evaluating gross primary productivity over 9 ChinaFLUX sites based on random forest regression models, remote sensing, and eddy covariance data. Sci. Total Environ. 875, 162601. https://doi.org/10.1016/j.scitotenv.2023.162601 (2023).
Article CAS PubMed Google Scholar
Shestakova, T., Gutiérrez, E., Valeriano, C., Lapshina, E. & Voltas, J. Recent loss of sensitivity to summer temperature constrains tree growth synchrony among boreal Eurasian forests. Agric. Forest Meteorol. 268, 318–330. https://doi.org/10.1016/j.agrformet.2019.01.039 (2019).
Article Google Scholar
Levy, P., Friend, A., White, A. & Cannell, M. The influence of land use change on global-scale fluxes of carbon from terrestrial ecosystems. Clim. Change 67, 185–209. https://doi.org/10.1007/s10584-004-2849-z (2004).
Article CAS Google Scholar
Xue, P. et al. How hydrothermal factors and CO2 concentration affect vegetation carbon sink over time and elevation gradient. J. Clean. Prod. 449, 141800. https://doi.org/10.1016/j.jclepro.2024.141800 (2024).
Article CAS Google Scholar
Brown, S. & Lugo, A. E. Trailblazing the carbon cycle of tropical forests from Puerto Rico. Forests 8, 101. https://doi.org/10.3390/f8040101 (2017).
Article Google Scholar
Zhao, X., Tan, K., Zhao, S. & Fang, J. Changing climate affects vegetation growth in the arid region of the northwestern China. J. Arid Environ. 75, 946–952. https://doi.org/10.1016/j.jaridenv.2011.05.007 (2011).
Article Google Scholar
Cui, L. & Shi, J. Temporal and spatial response of vegetation NDVI to temperature and precipitation in eastern China. J. Geogr. Sci. 20, 163–176. https://doi.org/10.1007/s11442-010-0163-4 (2010).
Article Google Scholar
Yuan, Z. et al. Few large trees, rather than plant diversity and composition, drive the above-ground biomass stock and dynamics of temperate forests in Northeast China. Forest Ecol. Manag. 481, 118698. https://doi.org/10.1016/j.foreco.2020.118698 (2021).
Article Google Scholar
Stroppiana, D., Antoninetti, M. & Brivio, P. A. Seasonality of Modis LST over southern Italy and correlation with land cover, topography and solar radiation. Eur. J. Remote Sens. 47, 133–152. https://doi.org/10.5721/EuJRS20144709 (2014).
Article Google Scholar
Lu, Q., Liu, H., Wei, L., Zhong, Y. & Zhou, Z. Global prediction of gross primary productivity under future climate change. Sci. Total Environ. 912, 169239. https://doi.org/10.1016/j.scitotenv.2023.169239 (2024).
Article CAS PubMed Google Scholar

Download references

Acknowledgements

This research was supported in part by the Program of Key Laboratory of Smart Earth under Grant KF2023YB02-09, the Youth Science Research Project, Shanxi Basic Research Program under Grant No. 202303021222252, and the General Program of Chongqing Natural Science Foundation under Grant No.cstc2021jcyj-msxmX0897.

Author information

Authors and Affiliations

Institute of Geospatial Information, Information Engineering University, Zhengzhou, 450001, China
Yujie Li, Xuanguang Liu & Zhenchao Zhang
Lyuliang University, Lüliang, 033001, China
Yujie Li & Jun Lang
National Key Laboratory of Intelligent Spatial Information, Beijing, 100000, China
Xuanguang Liu & Zhenchao Zhang

Authors

Yujie Li
View author publications
Search author on:PubMed Google Scholar
Xuanguang Liu
View author publications
Search author on:PubMed Google Scholar
Zhenchao Zhang
View author publications
Search author on:PubMed Google Scholar
Jun Lang
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, YL; methodology, YL and XL; formal analysis, YL, and XL; data curation, YL; validation, YL, XL; visualization, YL; writing–original draft preparation, YL; writing–review and editing, YL, XL, JL and ZZ; supervision, XL, JL, and ZZ; funding acquisition, ZZ, and JL. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Xuanguang Liu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Y., Liu, X., Zhang, Z. et al. GPP estimation based on CatBoost and analysis of change driving factors in Shanxi Province, China. Sci Rep 15, 22346 (2025). https://doi.org/10.1038/s41598-025-08927-x

Download citation

Received: 23 December 2024
Accepted: 24 June 2025
Published: 01 July 2025
DOI: https://doi.org/10.1038/s41598-025-08927-x

Subjects

Abstract

Similar content being viewed by others

Spatiotemporal dynamics of ecological quality and its drivers in Shanxi Province and its planned mining areas

Research on the evolutionary characteristics and mechanism of production-living-ecological space in Shanxi Province, China

An abrupt shift in gross primary productivity over Eastern China-Mongolia and its inter-model diversity in land surface models

Introduction

Materials and methods

Study area

Data sources and preprocessing

Construction of the modeling dataset

Factor data

GPP validation dataset

Methods

CatBoost GPP model construction

Accuracy assessment and performance evaluation

TSA and MK trend test

Hurst exponent

Geodetector model

Partial and composite correlation analyses

Results

Model performance evaluation

Spatiotemporal variations in GPP in Shanxi Province from 2001 to 2022

Temporal variations in GPP

Spatial variations in GPP

Annual GPP trends and consistency of future GPP trends

Quantitative analysis of GPP changes

Effects of natural and anthropogenic factors

Effects of interactions between factors

Discussion

Interpretability analysis based on SHAP

Analysis of factors affecting GPP in Shanxi Province

Study limitation

Conclusions

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links