Estimation of woody vegetation biomass in Australia based on multi-source remote sensing data and stacking models

Liu, Chenxi; Shi, Shuo; Liao, Zhanmang; Wang, Tong; Gong, Wei; Shi, Zixi

doi:10.1038/s41598-025-18891-1

Download PDF

Article
Open access
Published: 07 October 2025

Estimation of woody vegetation biomass in Australia based on multi-source remote sensing data and stacking models

Chenxi Liu¹^na1,
Shuo Shi^1,2,3,
Zhanmang Liao⁴,
Tong Wang²^na1,
Wei Gong^1,2,3 &
…
Zixi Shi²

Scientific Reports volume 15, Article number: 34975 (2025) Cite this article

2406 Accesses
3 Citations
Metrics details

Subjects

Abstract

Vegetation serves as the most critical carbon reservoir within terrestrial ecosystems and plays a vital role in mitigating global climate change. Australia features a vast and diverse landscape, ranging from dense eucalyptus forests to sparse woodlands, and harbors rich biodiversity. However, the significant spatial heterogeneity across the continent presents substantial challenges for accurately estimating regional aboveground biomass (AGB). This study aims to assess the accuracy of various models in AGB estimation. The dataset includes field-measured biomass and multi-source remote sensing data, such as vegetation canopy height products, Landsat imagery, topographic data, and climate variables. To build biomass estimation models, a Stacking regressor is constructed, and extensive comparative experiments were conducted. The Stacking model comprises seven base learners and one meta-learner. The meta-learner learns to optimally combine the predictions of the base models by minimizing prediction error. The experiments include: (1) comparing the performance of the Stacking model with seven individual machine learning models using K-fold cross-validation, and (2) evaluating the impact of recursive feature elimination (RFE) on model performance before and after feature selection. Based on predictive performance, the Stacking, Gradient Boosting Regressor (GBR), and Random Forest (RF) models are selected for biomass mapping. In addition, we introduced the Monte Carlo simulation approach to evaluate the uncertainty of biomass estimation. Results show that the Stacking model outperforms all individual machine learning models across all forest types. The final $R^2$ values for the Stacking, GBR, and RF models are 0.74, 0.72, and 0.71, respectively, with RMSE values of 49.79 Mg/ha, 51.00 Mg/ha, and 52.18 Mg/ha. The integration of multi-source data effectively addresses the limitations of single-source datasets, and the ensemble learning approach enhances the robustness of biomass estimation. This study provides theoretical and technical support for large-scale AGB estimation in heterogeneous vegetation landscapes.

Aboveground biomass estimation using multimodal remote sensing observations and machine learning in mixed temperate forest

Article Open access 24 August 2025

Remote sensing estimation of aboveground biomass of different forest types in Xinjiang based on machine learning

Article Open access 20 February 2025

Advancing forest carbon stocks’ mapping using a hierarchical approach with machine learning and satellite imagery

Article Open access 09 September 2024

Introduction

Woody vegetation, characterized by well-developed woody tissues and perennial growth, constitutes a primary component of natural terrestrial carbon sequestration. Through photosynthesis, the dense branches and leaves of woody plants convert substantial amounts of carbon dioxide into organic compounds, storing carbon within the plant biomass¹. This process not only contributes to maintaining the ecological balance of the Earth but also plays a crucial role in global climate regulation^2,3,4. Forests are biotic communities predominantly composed of woody plants. Accurately quantifying the spatial distribution of woody vegetation biomass is crucial for developing scientifically sound regional vegetation management policies and promoting the sustainable development of carbon sequestration in vegetation^5,6.

Traditional field survey methods have limitations, including delays, restricted spatial scale, and high labor costs⁷. Remote sensing technology, with its advantages of extensive coverage and high spatiotemporal resolution, provides new perspectives for obtaining vegetation biomass^8,9.The estimation of biomass and forest structural parameters through remote sensing is primarily achieved using passive optical imagery and active remote sensing, such as light detection and ranging (lidar). In recent years, freely available optical imagery has gained unique advantages in estimating forest AGB due to its spatial continuity, and high temporal and spatial resolution, covering large spatial areas^3,10. Optical remote sensing images establish empirical relationships to estimate vegetation biomass between vegetation indices and plot-based biomass^11,12. However, the ability of optical images to penetrate the forest canopy is limited. When the tree canopy closes or the biomass exceeds a certain range, optical signals are no longer sensitive to AGB variations, resulting in the signal saturation problem¹³. Feldpausch et al.¹⁴ found that tree height is a crucial factor in biomass estimation. Ignoring height information can lead to an underestimation of carbon stocks in tropical regions. Future research should incorporate tree height information to reduce errors in biomass estimates. Airborne LiDAR, utilizing laser pulse ranging principles, can accurately acquire elevation and structural information of ground vegetation¹⁵. Moreover, it is not affected by signal saturation^16,17,18. However, its limitations include a relatively small coverage and high flight costs, which pose challenges for acquiring extensive data¹⁹. Spaceborne LiDAR, with its extensive coverage and high spatial resolution, has the potential for large-scale biomass estimation²⁰. Nevertheless, the discrete footprint distribution prevent continuous spatial coverage. Therefore, current research increasingly focuses on integrating multiple data sources to generate large-scale and high-spatial-resolution maps of forest AGB distribution²¹.

In recent years, numerous scholars have conducted research on biomass estimation in Australia. Keith et al.²² noted that analysis of biomass data from 136 global primary forest sites revealed that the average carbon density in Australia’s temperate moist eucalypt forests is higher than that of northern and tropical forests. This is attributed to Australia’s unique climate (characterized by relatively high precipitation and lower temperatures) and minimal human disturbance. Taking the forests of Victoria, Australia, as an example, Grierson et al.²³ developed age-based regression equations for forest biomass accumulation across different forest types, based on field survey data, to estimate the average aboveground carbon density of the primary forests in Victoria. Liu et al.²⁴ compared tree parameters derived from airborne laser scanning with field-measured AGB data. They subsequently proposed a localized DBH (diameter at breast height) regression model, which ultimately led to the development of a biomass estimation model based on airborne LiDAR-derived tree parameters. Nguyen et al.²⁵ developed a Random Forest (RF)-based k-nearest neighbors (kNN) model using Landsat time series data and a single plot output, generating annual forest AGB maps for Victoria from 1988 to 2017.

Although some significant progress has been made in remote sensing biomass estimation studies in Australia, challenges persist. For instance, while airborne LiDAR has demonstrated good performance in vegetation biomass estimation, the high cost of data acquisition limits its widespread application at large scales^26,27. This highlights the need for effective model integration across multiple data sources to improve the accuracy of biomass predictions over large areas.

With the development of computer technology, data-driven machine learning algorithms have increasingly been applied in the field of remote sensing, and have become an effective tool for large-scale biomass estimation^28,29. However, different models exhibit varying applicability across regions^30,31, making it particularly important to explore a robust method for large-scale biomass estimation. Stacking model is an efficient ensemble learning method that combines the predictions of multiple base learners of different types as inputs and then a meta-learner is trained to make the final prediction³². Unlike traditional ensemble algorithms such as Bagging and Boosting, the Stacking model can further improve predictive accuracy by integrating information from different learners³³. Its primary advantage lies in reducing both the bias and variance of individual models, thereby effectively leveraging the strengths of each learner. In addition, Stacking can flexibly handle various types of input data and complex nonlinear relationships, making it highly promising for tasks such as forest AGB estimation³⁴. Although Stacking is more complex in terms of model construction and training, it achieves greater predictive power than individual models through sophisticated combination strategies. This advantage is particularly evident in applications involving large-scale datasets and diverse data sources, where Stacking has demonstrated significant advantages.

The main objectives of this study are: (1) to perform feature selection using RFE and compare model performance; (2) to generate spatial maps of aboveground biomass; and (3) to analyze associated uncertainties. Using Australia as the study area and field-measured biomass data as a reference, we extracted predictor variables from the vegetation canopy height map developed by Scarth et al. (2019)³⁵, and integrated them with Landsat imagery and topographic data as inputs to the biomass estimation models. For model evaluation, the Stacking regression model was compared with several individual machine learning algorithms. Finally, regional AGB maps of woody vegetation were generated using the three models with the highest predictive performance.

Study area and data

Biomass plot data

We utilized the biomass library released by the Australian Tern (https://portal.tern.org.au) as the reference data for aboveground Biomass³⁶. The primary purpose of establishing this biomass plot database was to calibrate and validate satellite-derived products in support of national-scale aboveground biomass mapping. The plot database integrates tree inventory data from a wide range of sources, including federal, state, and local government agencies, academic institutions, private enterprises, and other relevant organizations across Australia. It includes comprehensive tree and shrub census data collected nationwide. The generalized allometric equations used in the database are based on the universal allometric model developed by Paul et al.³⁷, which is tailored to Australian plant functional types and accounts for aboveground, belowground, and total biomass, including both live and dead components. These equations are further adjusted using decay correction factors proposed by Lucas et al³⁸. The biomass density at the plot level is derived by aggregating the biomass of individual trees within each plot. Following the methodology described by Liao et al.³⁹, we excluded plots with an area smaller than 0.03 hectares and those with an absolute error exceeding 75 Mg/ha or a relative error exceeding 25%. Outliers were removed using the 2-sigma rule^40,41. The final dataset after screening contained 3,054 sample plots from 2007 to 2011 (Fig. 1). However, the absence of non-woody plots in the dataset led to an overestimation of vegetation biomass³⁹. Therefore, we randomly added 1550 non-woody sample plots at a nearly 1:2 ratio to enhance the AGB database. The non-woody sample plots were selected based on a combination of land use data and vegetation cover data. The vegetation cover information was derived from the canopy height product³⁵, which includes a vegetation cover layer. The land cover data were obtained from the European Space Agency (ESA). Land use data were used to identify non-woody areas, while the vegetation cover data were used to select regions with no tree cover. This approach ensured that the selected non-woody plots corresponded to vegetation types such as grasslands, bare land, urban areas, and water bodies. Specifically, random points were generated in ArcGIS, and the “Extract Multi Values to Points” tool was applied to obtain attribute values (i.e., land cover type and vegetation cover). Points were then selected where the vegetation cover was zero and the land cover type was neither shrubland nor forest.

Landsat data

Since the launch of Landsat-1 in 1972, the Landsat program has provided long-term, high-quality global coverage data. Due to its appropriate spatial and temporal resolution, reliable data quality, and open-access policy, Landsat data have been widely used by governments and the remote sensing scientific community^42,43.

When sunlight reaches vegetation leaves, chlorophyll within the leaves strongly absorbs visible light to facilitate photosynthesis, while the leaf cellular structure reflects near-infrared radiation^44,45. The visible spectrum reflects the “greenness” of vegetation, whereas the near-infrared and shortwave infrared bands are highly responsive to vegetation water content and dry matter content⁴⁶. A relatively simple vegetation index is the Ratio Vegetation Index (RVI)⁴⁷, which is calculated by dividing the reflectance in the near-infrared band by that in the red band. A higher ratio typically indicates healthy vegetation, while lower values are indicative of soil or water. The Normalized Difference Vegetation Index (NDVI) assesses vegetation status by computing the difference and sum ratio between near-infrared and visible light⁴⁸. NDVI values range from $-1$ to 1. Values close to 1 indicate dense vegetation, values near 0 suggest tundra or bare ground, and negative values often correspond to clouds or snow. The Green Normalized Difference Vegetation Index (GNDVI) is a modified form of NDVI that substitutes the red band with the green band, making it more sensitive to chlorophyll content and less prone to saturation, thus better suited for dense canopies^49,50.The Soil Adjusted Vegetation Index (SAVI) introduces a soil adjustment factor (L) to the NDVI formula to mitigate the influence of soil brightness, effectively reducing background interference while retaining vegetation signals⁵¹. This makes it more applicable in areas with sparse vegetation where soil is visible through the canopy. The Visible Atmospherically Resistant Index (VARI) is designed to minimize atmospheric effects on vegetation indices and better account for illumination variability⁵². Similarly, the Atmospherically Resistant Vegetation Index (ARVI) is less sensitive to aerosols, making it suitable for regions with high aerosol content such as areas affected by smoke or precipitation⁵³. The Leaf Area Index (LAI)⁵⁴, defined as the total leaf area per unit ground surface area, reflects both leaf density and canopy structure. The Enhanced Vegetation Index (EVI)⁵⁵ was developed to address NDVI saturation in regions with high LAI, while simultaneously minimizing soil background and atmospheric interference to improve vegetation monitoring. The Difference Vegetation Index (DVI)⁵⁶, calculated as the difference between near-infrared and red reflectance, can be used to evaluate vegetation presence and density. In remote sensing imagery, texture features capture spatial heterogeneity and patterns of land surface cover^57,58,59. Even when different land cover types exhibit similar spectral characteristics, their spatial arrangements can be distinguished through texture, making it a valuable non-spectral complement with strong discriminatory capabilities. A variety of remote sensing features are considered to address diverse conditions, facilitating more accurate estimation of vegetation biomass.

We acquired Landsat data (USGS Landsat 5 Level 2, Collection 2, Tier 1) through the Google Earth Engine (GEE) platform. The data underwent atmospheric correction, and cloud masking was performed using the “SR_CLOUD_QA” band during subsequent processing. The imagery was collected between September 2008 and May 31, 2009 (growing season). To represent peak vegetation conditions, we selected the scene with the highest NDVI value to generate a composite image for the peak of the growing season. Extracted features included reflectance from the visible spectrum (blue, green, red), near-infrared (NIR), and shortwave infrared (SWIR) bands. In addition, we calculated multiple vegetation indices commonly used for biomass estimation, and extracted texture features using the Gray-Level Co-occurrence Matrix (GLCM) method (Table 1).

Table 1 Remote sensing feature extraction from Landsat imagery during the vegetation growing season, including various vegetation indices, spectral bands from visible to shortwave infrared, and texture features derived using the gray-level co-occurrence matrix (GLCM) method. These features are critical for biomass estimation and vegetation analysis.

Full size table

Vegetation height parameter acquisition

We obtained vegetation height and structure products³⁵ from the Australia TERN website (https://portal.tern.org.au/metadata/TERN/de1c2fef-b129-485e-9042-8b22ee616e66). This data product is derived from a combination of ICESat GLAS, ALOS PALSAR, and Landsat data products, including features such as vegetation height, age class, and coverage. The data product is from the year 2009 and has a spatial resolution of 30 meters. We selected four canopy height images from this dataset, which included four canopy height features: the 25th, 50th, 75th, and 95th.

Land cover classification data and DEM

The land cover classification data was obtained from the European Space Agency (ESA) website http://maps.elie.ucl.ac.be/CCI/viewer/index.php with a resolution of 300 m. The 2009 Australian land cover comprised 22 categories, and we reclassified it into six types based on their vegetation characteristics (Fig. 2a): 1. Broadleaved forest, 2. Needleaved forest, 3. Tree and shrub mix, 4. Shrubland, 5. Other (non-forested and non-shrubland, including cropland, grassland, herbaceous vegetation, water bodies, and bare soil.), and 6. Urban areas.

Topographic factors can influence ecological processes such as sunlight exposure, airflow, and vegetation growth, thereby impacting vegetation biomass. Particularly in the tall mountains of eastern Australia, which block warm and humid airflows from the Pacific Ocean, the vegetation growth on either side of the mountains varies. Therefore, slope and aspect are crucial factors influencing vegetation growth. Digital Elevation Model (DEM) data were obtained from the Shuttle Radar Topography Mission (SRTM) released by NASA and the National Imagery and Mapping Agency (NIMA). The spatial resolution of DEM is 30m. We calculated the elevation, slope, and aspect based on the DEM data (Fig. 2b).

Temperature and precipitation data

Australia spans a vast area with significant regional variations in precipitation and temperature—both of which play critical roles in vegetation growth. Favorable temperature and precipitation conditions can promote healthy vegetation development, whereas extreme heat may lead to drought and wildfires, resulting in substantial biomass loss⁶⁰. Our study area covers a wide range of climatic zones, including tropical rainforest, humid subtropical, tropical savanna, subtropical steppe, temperate oceanic, Mediterranean, and arid desert climates. These climate zones differ in seasonal patterns of temperature and rainfall, both of which are closely linked to vegetation dynamics^61,62. Moreover, the impact of soil temperature on vegetation growth varies across different elevations and climatic regions^63,64.

To account for these environmental influences, daily mean air temperature, soil temperature, and precipitation were incorporated as predictors into the biomass estimation model. Climate data were acquired from the GEE platform, including temperature and soil temperature from the ERA5-Land hourly dataset (ERA5-Land Hourly - ECMWF Climate Reanalysis), and precipitation from the CHIRPS daily dataset (Climate Hazards Center InfraRed Precipitation With Station Data (Version 2.0 Final)) (Fig. 3).

Comparison with other biomass maps

The result of this study were compared with two existing biomass maps, including the comprehensive pan-tropical biomass map published by Avitabile et al.⁶⁵ and the global aboveground and belowground biomass carbon density map provided by NASA ORNL DAAC for 2010. The second biomass map was generated by Spawn et al.⁶⁶, and we obtained it through the GEE (https://developers.google.com/earth-engine/datasets/catalog/NASA_ORNL_biomass_carbon_density_v1). The “Avitabile” map integrates existing aboveground biomass data from “Saatchi” and “Baccini” using weighted linear averaging and bias removal techniques to create a pan-tropical vegetation biomass map. The spatial resolution is 1 km, suitable for the early 2000s. The “Spawn” atlas integrated biomass remote sensing images based on land cover types (including woody, agricultural, grassland, etc.) in 2010, with a spatial resolution of 300 m.

Research methods

The overall methodological framework of this study is illustrated in Fig. 4. We collected Landsat, SRTM, and canopy height data to extract a variety of remote sensing features related to biomass. Biomass plot data were obtained, and the corresponding remote sensing features at each plot location were extracted. These plots and their associated feature parameters were used to train the models, including several individual machine learning models and an Stacking model. First, a total of 37 feature variables were used to compare the predictive performance of the Stacking regression model with that of other models. Subsequently, RFE was applied to all remote sensing parameters to identify the optimal subset of features, which were then used for further model training. Finally, the optimal model was selected based on accuracy evaluation to generate a spatial distribution map of biomass, which was compared with existing biomass products.

Methods for biomass estimation

In this study, we compared the Stacking regression model with several other models, including seven individual machine learning algorithms (Fig. 5). The Stacking regression model, along with two relatively high-performing machine learning models, was employed to estimate biomass in the study area.

Stacking regression is a powerful ensemble learning method^32,67. The main process is as follows: First, multiple base regression models are trained, and the predictions of these base models are used as new features to form a new training set for the meta-model^68,69. The meta-model (typically a simple linear regression, ridge regression, etc.) is used to learn how to optimally combine the predictions from different base models⁷⁰. The goal of the meta-model is to learn how to weight the predictions from the base models by minimizing the error. The Stacking regression model is highly flexible and allows for the selection of different base models and meta-models according to the task requirements. The base learning models include seven machine learning models, while the meta-model employs Multiple Linear Regression (MLR).

Base models

The base models we selected included RF, GBR, Support Vector Regression (SVR), Decision Tree Regression (DT), KNN, Adaptive Boosting (AdaBoost), and XGBoost (Extreme Gradient Boosting).

DT and RF are both based on decision trees as base learning models^71,72. They recursively split the data into smaller parts, selecting a feature and setting a threshold at each split, until a stopping condition is met. Random Forest, however, consists of multiple decision trees, each making independent predictions, with the final prediction being the average of all tree predictions. Compared to DT, Random Forest is more robust.

GBR, AdaBoost, and XGBoost are commonly used boosting algorithms. GBR is an ensemble learning method based on gradient descent, which incrementally adds the predictions of decision trees to the current model through an additive model^73,74. Each new weak learner (usually a decision tree) fits the residuals (prediction errors) of the previous model to reduce the overall loss of the model. AdaBoost optimizes the model by adjusting the weights of the samples. Unlike GBR, AdaBoost updates the model output through weighted averaging. XGBoost is an optimized version of GBR, which introduces a regularization term to control the model’s complexity and prevent overfitting⁷⁵.

KNN predicts by calculating the distance between the input feature point and other sample points in the training set, and then making predictions based on the K nearest neighbors⁷⁶. The output of KNN is the average (or weighted average) of the target variable values of the K neighbors.

The core idea of SVR is to tolerate small errors (within the $\varepsilon$ range) as much as possible during the regression process, and only penalize larger errors, thereby resulting in a more robust regression model⁷⁷.

We selected the optimal hyperparameter combinations for multiple models through repeated comparative experiments(Table 2).

Table 2 Optimized hyperparameter settings of various regression models, including Random Forest (RF), K-Nearest Neighbors (KNN), AdaBoost, Gradient Boosting Regression (GBR), Support Vector Regression (SVR), Decision Tree (DT), and Extreme Gradient Boosting(XGBoost), aimed at enhancing model performance.

Full size table

Meta-model

The core function of the meta-model is to integrate the predictions from various base learners⁷⁸. It learns how to adjust the weights based on the outputs of the base learners, thereby improving the accuracy of the final prediction. Unlike simple averaging methods (such as Bagging), Stacking performs more refined integration through the meta-model, capturing the strengths of different base learners. Typically, the meta-model is a simple regression model, such as MLR, Ridge Regression, or other types of regression models. The study employed MLR to determine the optimal combination weights, aiming to effectively integrate the predictions from base models in order to minimize the overall prediction error.

Feature selection of the image spectrum

Due to the large number of image feature variables, multicollinearity may exist, or the model may suffer from overfitting. RFE is a commonly used feature selection method⁷⁹. It calculates the “importance” or “contribution” of each feature based on coefficients, weights, or other model evaluation metrics. Features are ranked according to their importance in the model, and the least important features (i.e., those with the smallest importance) are removed. The model is then retrained, and the most useful features for model prediction are ultimately selected. The RFE method helps reduce the number of features and enhances the model’s interpretability⁸⁰.

Accuracy assessment and validation

Model performance is assessed using K-fold cross-validation. The data is divided into K equal subsets, with each subset used in turn as the validation set, while the remaining subsets are used for training. For each fold, the evaluation result is calculated, and the average of the K evaluations is used as the final performance assessment of the model⁸¹. The accuracy of the results is assessed using the coefficient of determination ($R^2$ ), Root Mean Square Error (RMSE), and Mean Absolute Error (MAE) to evaluate model performance, as described by the following formulas:

$$\begin{aligned} & R^2=1-\frac{\sum _{i=1}^n\left( y_i-{\hat{y}}_i\right) ^2}{\sum _{i=1}^n\left( y_i-{\bar{y}}\right) ^2} \end{aligned}$$

(1)

$$\begin{aligned} & RMSE=\sqrt{\frac{\sum _{i=1}^n\left( y_i-{\hat{y}}_i\right) ^2}{n}} \end{aligned}$$

(2)

$$\begin{aligned} & M A E=\frac{1}{n} \sum _{i=1}^n\left| \left( {\hat{y}}_i-y_i\right) \right| \end{aligned}$$

(3)

where ${\hat{y}}_i$ represents the prediction value of AGB, $y_i$ is the true value of AGB, and n is the number of samples.

Uncertainty analysis

To evaluate the uncertainty of model predictions, we incorporated Monte Carlo simulation into the Stacking Regressor framework. The Monte Carlo method, through repeated random perturbations and resampling, characterizes the variability of model outputs under changes in inputs and parameters, thereby reflecting the confidence intervals of the predictions. In biomass estimation, the presence of nonlinear relationships and complex interdependencies among variables makes it difficult for a single prediction to quantify such uncertainty. Monte Carlo simulation provides an effective approach to address this issue. By performing statistical analyses (e.g., standard deviation, extreme values, and quantiles) on the results of multiple simulations, we were able to systematically characterize the distribution of prediction uncertainty.

Results

Performance of biomass estimation models

To select the optimal biomass estimation model for regions with large-scale spatial heterogeneity, this study employed a Stacking ensemble framework that integrates the advantages of multiple base models through a dual-level optimization mechanism at both the feature and decision levels (Table 3). The results indicate that, in typical ecological study areas of Australia, the Stacking model ($R^2$ = 0.73) significantly outperforms all individual machine learning models, demonstrating the theoretical suitability of ensemble learning—specifically the Stacking approach—for heterogeneous landscapes. Specifically, DT model exhibited the poorest performance ($R^2$ = 0.45, RMSE = 71.66 Mg/ha) due to limitations in capturing feature interactions, highlighting the constraints of single models in fragmented habitats. Compared with the relatively optimal single model, RF model ($R^2$ = 0.72), Stacking reduced RMSE by 0.96 Mg/ha (a 1.86% decrease), and by 21.03 Mg/ha (a 29.4% decrease) relative to the worst-performing DT model.

These findings confirm that Stacking, through a bias–variance trade-off mechanism, effectively mitigates the fitting risks of individual machine learning models across diverse ecological regions, offering a novel theoretical pathway for developing robust regional carbon estimation paradigms.

Table 3 Performance metrics of various regression models for biomass estimation, including ${R}^2$, RMSE, and MAE. The Stacking model achieves the highest accuracy, demonstrating its superior effectiveness in enhancing predictive performance compared with individual machine learning models (best results are highlighted in bold, and the second-best results are italics).

Full size table

Model feature selection

To address the modeling uncertainty arising from redundancy in high-dimensional ecological features, this study proposes a“feature selection-ensemble learning” collaborative optimization framework.Initially, 37 image feature parameters were selected with the aim of improving AGB estimation accuracy through the integration of diverse feature types. However, an excessive number of features can lead to multicollinearity, increased computational complexity, and a higher risk of model overfitting. To evaluate the relationships and significance levels among variables, Pearson correlation analysis (correlation coefficient r range: [-1, 1]) was conducted (Fig. 6). The results revealed strong correlations between various spectral vegetation indices and image bands. Therefore, feature selection proved essential to reduce the number of model parameters. Ultimately, the RFE method was applied, resulting in the selection of 15 features, including four canopy height metrics (25th, 50th, 75th, and 95th), one vegetation index (GNDVI), four spectral bands (B1, B4, B5, and B7), one texture feature (gray_savg), two topographic variables (elevation and slope), and three climatic factors (precipitation, soil temperature, and temperature). Based on the correlation analysis (Fig. 6) and feature selection results: 1. Canopy height remains the most critical predictor of biomass, highlighting that the three-dimensional canopy structure is a fundamental parameter for AGB estimation. 2. The synergistic mechanism of multi-source features and the combination of key parameters validate the cross-scale coupling theory of “terrain-climate-spectral” factors. Specifically, terrain (elevation and slope) governs the large-scale spatial patterns of biomass, climate factors (precipitation/temperature) explain phenology-driven mechanisms, and near-infrared bands (B5/B7) quantify canopy biochemical components. Additionally, GNDVI was identified as an important variable in this study, consistent with previous research, indicating its excellent performance in biomass estimation, particularly in medium- to high-density forests.

After inputting the optimized feature set into the Stacking ensemble framework (Table 4), our model still achieved the best performance: $R^2$ = 0.74, RMSE = 49.79 Mg/ha, and MAE = 30.56 Mg/ha. The key findings are as follows: 1. Validation of the ecological response advantage of tree-based models: RF and GBR maintained the strongest performance among individual models after feature optimization (RMSE = 51.00 and 52.18 Mg/ha, respectively), confirming their inherent robustness in mapping heterogeneous environmental parameters. 2. The DT model still performed the worst ($R^2$ = 0.47, RMSE = 70.38 Mg/ha), owing to its limitations in modeling high-dimensional feature interactions and complex nonlinear relationships, thereby highlighting the inherent deficiencies of a single-tree model in complex ecosystems. 3.Superiority of the Stacking model: Compared with the best single model (RF), the Stacking model further reduced RMSE by 1.21 Mg/ha (a 2.4% decrease), demonstrating an enhanced capacity for regional biomass estimation beyond that of individual models. This collaborative framework successfully establishes a dual-path optimization paradigm of “feature compression–model fusion”, providing a novel perspective for biomass estimation.

In addition, independent validations were conducted using data from different years to evaluate the robustness of our model. The results demonstrated that the model maintained consistently high and stable performance across all tested years. Specifically, we divided the data by year, selecting five individual years from 2007 to 2011. For each iteration, data from four years were used for model training, while data from the remaining year were used for validation to assess model accuracy. Among all five years, our model consistently achieved the highest predictive accuracy, followed by the RF and GBR models (Table 5). The $R^2$ values of the RF model were only 0.01 lower than ours during 2008–2011. However, in 2007, the $R^2$ of the RF model decreased by 0.03, and the RMSE increased by 1.86 Mg/ha. For the GBR model, its $R^2$ was 0.01–0.02 lower than ours during 2007–2009. In 2010 and 2011, the $R^2$ values were 0.05 and 0.04 lower, respectively, with corresponding increases in RMSE of 4.56 Mg/ha and 3.87 Mg/ha.

Table 4 Results of biomass estimation using various regression models after RFE-based feature selection. The Stacking model achieved the highest performance with an ${R}^2$ of 0.74, while RF and GBR also performed well. The metrics include ${R}^2$, RMSE, and MAE for each model (best results are highlighted in bold, and the second-best results are italics).

Full size table

Table 5 Annual comparison of biomass estimation performance across various regression models from 2007 to 2011. The results show that some models demonstrate high stability and accuracy, with the Stacking model consistently exhibiting superior predictive capability over the others (best results are highlighted in bold, and the second-best results are italics).

Full size table

Spatial distribution and uncertainty analysis of biomass

We selected three models (Stacking model, GBR, RF) with relatively high predictive performance to estimate aboveground biomass, aiming to reveal their estimation potential at a large regional scale. The results included scatter plots of model predictions (Fig. 7), spatial distribution maps of AGB (Fig. 8) , and AGB uncertainty and relative uncertainty maps (Fig. 9).

As shown in Fig. 8, we observed that the three models exhibited similar trends in spatial distribution. The eastern and southwestern regions of the study area have higher biomass, while the central and western regions exhibit lower biomass. The uneven distribution of precipitation is the primary cause of this spatial heterogeneity. The Great Dividing Range in Australia stretches for thousands of kilometers along the eastern coast, playing a significant role in uplifting precipitation and forming a relatively moist local climate at higher altitudes, which contributes to the lush forests in eastern Australia (AGB ranging from 100 to 300 Mg/ha). The mountain range also blocks the warm, moist air from the Pacific, resulting in a significant reduction in precipitation on the western side of the mountains, leading to a marked decline in biomass (AGB ranging from 40 to 100 Mg/ha). Secondly, the hilly areas in the southwestern region also exhibit relatively high AGB levels (ranging from 40 to 300 Mg/ha), which can be attributed to the influence of the South Indian Ocean monsoon and local maritime climate that together create favorable conditions for vegetation growth. In contrast, the central region of Australia is dominated by vast basins and desert landscapes with poor soil water retention, coupled with extremely low annual precipitation and high evaporation, resulting in very low biomass (AGB ranging from 0 to 30 Mg/ha). The Stacking model predicted a biomass range from 0 to 307.14 Mg/ha, while the GBR and RF models predicted biomass ranges of 0 to 306.7 Mg/ha and 0 to 279.03 Mg/ha, respectively.

The uncertainty metrics of the Stacking model are presented in Fig. 9. Figure 9 (1) shows the prediction standard deviation, where larger values indicate greater variability of pixel-level predictions, i.e., higher uncertainty, whereas smaller values suggest more stable predictions. Overall, the uncertainty values are primarily concentrated within 0–20 Mg/ha, accounting for as much as 99.95%, which indicates that the predictions are generally stable and associated with high confidence. Figure 9 (2) and (3) display the lower (minimum) and upper (maximum) bounds of the predictions, respectively. Figure 9 (4) and (5) present the 25th and 75th percentile, which describe the interquartile range of the prediction distribution. Together, these statistics provide a more comprehensive understanding of the uncertainty range in regional biomass estimation. Figure 9 (6) illustrates the distribution of relative uncertainty, showing that most areas exhibit extremely low relative uncertainty, further confirming the robustness of the model predictions.

Comparison with other AGB maps

We compared our three predicted AGB maps with other existing biomass products (Fig. 10). We calculated the mean and standard deviation of the spatial differences between our model predictions and the existing products were calculated (Table 6). Compared with our results, the “Avitabile” map shows higher AGB values, while the “Spawn” product shows relatively lower values. The mean spatial differences between our three AGB maps and the “Avitabile” product range from –18.59 to –17.82 Mg/ha, while those with the “Spawn” product range from 14.99 to 15.48 Mg/ha. The standard deviation of the spatial differences between our maps and the two existing products ranges from 21.90 to 26.17 Mg/ha.

We conducted spatial correlation analyses between each of our three biomass distribution maps and two existing biomass products (as shown in Table 7). The results indicate that our products exhibit high overall spatial correlations with the other two datasets, demonstrating consistency in the spatial variation patterns of biomass.

Table 6 Mean error and standard deviation of biomass estimates from three different models compared with existing remote sensing products (“Avitabile” and “Spawn”). The table highlights the mean differences in biomass values, indicating the relative performance of each model with respect to the reference products.

Full size table

Table 7 Correlation coefficients (r) and significance levels (p-values) between biomass products estimated by the Stacking, GBR, and RF models and two global biomass products (“Avitabile” and “Spawn”). The high correlations indicate strong consistency in the spatial distribution patterns of biomass across different datasets.

Full size table

Discussion

Influence of feature variable selection on aboveground biomass estimation

Feature variable selection plays a critical role in the modeling of aboveground biomass estimation. In this study, a total of 37 image features were initially considered, including spectral reflectance, vegetation indices, topographic factors, texture information, and climatic variables. These features collectively provided comprehensive information across spectral, biophysical, and structural dimensions, while also incorporating key environmental conditions that influence vegetation growth. However, an excessive number of variables can introduce multicollinearity, increase model complexity, and elevate the risk of overfitting. To address this, the RFE method was employed to select 15 optimal variables from the initial 37. This feature reduction enhanced model robustness and generalizability.

Among the selected variables, canopy height percentiles (25th, 50th, 75th, and 95th) exhibited strong correlations with AGB and emerged as critical predictors—findings consistent with previous studies that emphasize the close relationship between tree height and carbon storage capacity^82,83. Notably, the inclusion of GNDVI highlighted the utility of spectral indices in monitoring medium- to high-density vegetation. Compared to traditional NDVI, GNDVI offers improved resistance to saturation, particularly in closed-canopy forests^84,85. The selection of topographic factors, specifically elevation and slope, supports the role of the Great Dividing Range in modulating hydrothermal gradients across eastern Australia. In addition, the inclusion of climate variables (precipitation, soil temperature, and air temperature) reflects the significant influence of environmental heterogeneity on vegetation growth at the continental scale. It is worth noting that climatic variables are typically excluded in studies conducted over small or localized regions, but they are especially relevant and informative in large-scale biomass mapping efforts. Furthermore, the study found that texture features also contributed meaningfully to the model, indicating that spatial heterogeneity can effectively complement spectral information. Texture metrics such as those derived from the GLCM are capable of capturing spatial arrangements of surface features—such as patch size, vegetation density variation, and edge transitions—that are not easily detected by spectral data alone⁸⁶. Following feature selection, the model’s RMSE was reduced by 0.84 Mg/ha, validating the effectiveness of parameter simplification in mitigating overfitting. However, this also underscores the need for future development of advanced techniques to detect and model feature interactions, which may further enhance predictive accuracy and model interpretability.

Robustness analysis of the method

Our biomass estimation model exhibits strong robustness across multiple dimensions. First, we incorporated diverse data sources during model construction, including Landsat imagery, SRTM-derived topographic data, climate information, and canopy height data provided by TERN. This multi-source data fusion strategy effectively addressed the limitations of single-source datasets in terms of spatial coverage and information diversity, thereby enhancing the model’s adaptability to complex environmental conditions^87,88.

Second, we employed a Stacking regression model based on ensemble learning, integrating multiple heterogeneous base learners. Compared to single machine learning models, the Stacking approach leverages the strengths of individual learners to reduce both variance, thus improving overall prediction accuracy and robustness^89,90. Our experimental results demonstrates that the Stacking model consistently outperformed individual models across different forest types (Table 8). Specifically, the $R^2$ values were improved by 0.01 to 0.28 in shrublands and 0.02 to 0.35 in broadleaf forests. This indicates that the Stacking method can effectively adapt to varying biomass distribution patterns and possesses strong generalization capability. Moreover, we conducted independent validation using datasets from different years. The Stacking model maintains consistently high and stable prediction performance over time, further confirming its temporal robustness.

Table 8 Comparison of accuracy metrics for various biomass estimation models in shrubland and broadleaf forest. The Stacking model consistently demonstrates superior performance in terms of ${R}^2$, RMSE, and MAE, highlighting its effectiveness in biomass estimation compared with individual models (best results are highlighted in bold, and the second-best results are italics).

Full size table

In addition, feature selection further contributes to model robustness. By applying the RFE method, we reduce the original 37 features to 15 optimal variables. This not only enhances the model’s predictive accuracy but also reduced its complexity, which improves interpretability and operational stability. Even after feature reduction, the Stacking model achieves an $R^2$ of 0.74, underscoring the importance of feature engineering in improving model robustness.

In summary, this study systematically enhanced the robustness of biomass estimation through multi-source data integration, ensemble learning, and optimized feature selection. The proposed Stacking model demonstrates strong and stable predictive performance across different forest types and time periods, providing a reliable technical foundation for biomass mapping at broader spatial and temporal scales. This is particularly valuable for supporting ecological carbon sink monitoring and management. Nevertheless, future research should explore the integration of ecological process mechanisms into data-driven models to further improve the accuracy and interpretability of biomass estimation⁹¹.

Analysis of spatial distribution differences and uncertainty

This study not only produces a high-resolution spatial distribution map of aboveground biomass, but also systematically analyzed both the absolute and relative uncertainties of model predictions. Furthermore, the results are compared with two AGB products—“Avitabile” and “Spawn”—through difference and correlation analyses.

The findings indicate that our predicted AGB product shows higher biomass values in the eastern tropical rainforest region and the southwestern forested zone, while substantially lower values are observed in the central arid areas. This pattern of spatial heterogeneity is consistent with previous studies and observed climatic conditions, thereby enhancing the credibility of the model. Comparisons with the “Avitabile” and “Spawn” datasets further confirm the advantages of our model in structurally complex regions. The mean difference between our product and the Spawn dataset was approximately 15 Mg/ha, whereas the difference with the “Avitabile” dataset was around -18 Mg/ha. Despite these systematic biases, a high spatial correlation (Pearson’s r > 0.80) is observed between the products, indicating strong consistency in spatial trends⁹².

Our results indicate that combining ensemble learning with Monte Carlo simulation provides an effective approach for characterizing uncertainty in biomass estimation. The study shows that, although uncertainty is relatively high in a few regions (e.g., humid areas in the eastern part), the standard deviation and relative uncertainty for the vast majority of pixels remain low (Fig. 9 (1) and (6)), indicating strong stability and reliability of the model for large-scale predictions. The spatial distributions of the minimum and maximum values (Fig. 9 (2) and (3)) as well as the 25th and 75th percentiles (Fig. 9 (4) and (5)) further reveal the prediction range across different regions. Relative uncertainty is slightly higher in arid and data-sparse regions, while it is lower in humid forest areas, which can be attributed to the distribution of training samples across ecological zones and terrain complexity.

Compared with traditional point-estimate methods, Monte Carlo simulation not only provides mean predictions but also reveals prediction intervals and probability distributions, offering more informative references for subsequent forest resource management and carbon stock assessment. However, the Monte Carlo approach has certain limitations. First, it is computationally intensive, particularly when applied to large-scale, high-resolution raster data, as a large number of iterations are required to obtain stable distribution estimates. Second, Monte Carlo simulation assumes that input error distributions are known and independent, whereas in practice, correlations among different remote sensing features may exist, potentially leading to under- or overestimation of uncertainty. Finally, for boundary regions or data-sparse areas (e.g., the white regions in Fig. 9 (2)), simulation results may be constrained by insufficient training samples, affecting the reliability of local predictions. Therefore, incorporating additional auxiliary data or adopting spatially correlated modeling in these regions may further improve the robustness of predictions and the accuracy of uncertainty characterization.

Limitations

This study presents several challenging technical limitations in vegetation biomass modeling in Australia. First, due to current technological constraints and monitoring costs, the sample plot data primarily cover woody vegetation, making it difficult to obtain large-scale ground truth data for non-woody vegetation such as grasslands and croplands. At the same time, we added 1,550 zero-biomass plots for field validation, which may introduce systematic bias (e.g., sparse vegetation or young trees being misclassified as zero). This data gap may affect the model’s generalization capability in complex ecosystems. Second, although machine learning models perform well in data-driven modeling, the physical mechanisms underlying vegetation biomass formation (e.g., carbon cycle dynamics and species interactions) have not been fully integrated into the model framework. Mechanistic modeling requires the integration of knowledge from multiple disciplines, such as ecology and remote sensing, posing a significant challenge. While this study provides a feasible pathway for large-scale biomass estimation, improving model accuracy still requires overcoming key technical bottlenecks, including multi-source data fusion in heterogeneous environments and the integration of mechanistic and data-driven modeling. Addressing these challenges necessitates continuous methodological innovation and cross-disciplinary collaboration.

Conclusion

Based on field plot data from Australia, this study integrates Landsat imagery, vegetation canopy height data, and auxiliary variables such as meteorological and topographic information. After applying RFE for feature selection, three high-performing models are used to develop aboveground biomass estimation models for the Australian continent. The main conclusions are as follows:

1.Our results demonstrates that for large-scale regions with high spatial heterogeneity, in addition to conventional vegetation indices and spectral bands, topographic and climatic variables also play a critical role in biomass estimation.

2. In this study, the Stacking regression model outperformed all individual models in estimating AGB. It consistently achieved the best performance across different forest types and data splitting scenarios, demonstrating strong robustness. The Stacking model yielded an $R^2$ of 0.74 and an RMSE of 49.79 Mg/ha. Compared with previous studies, this work highlights the advantage of combining multi-source remote sensing data with ensemble learning approaches.

3. We selected the top three performing models—Stacking models, GBR, and RF—to generate regional AGB maps. The results revealed significant spatial heterogeneity in biomass distribution across Australia, with higher AGB values mainly concentrated in the eastern coastal and southwestern regions. In addition, AGB uncertainty analysis indicated that the model maintained strong predictive stability in high-biomass areas.

However, the lack of long-term tree height data remains a limiting factor for accurately estimating AGB over extended time periods. This study provides a valuable reference for applying multi-source remote sensing data and ensemble modeling to AGB estimation in data-scarce or monitoring-limited environments.

Data availability

The datasets generated and/or analyzed during the current study are publicly available. The biomass plot data and vegetation canopy height data are available from the TERN platform (https://portal.tern.org.au). Landsat, SRTM, and climate data (temperature and precipitation) can be accessed through the Google Earth Engine platform (https://earthengine.google.com/). Land use data can be obtained from the European Space Agency (ESA) platform (http://maps.elie.ucl.ac.be/CCI/viewer/index.php).

References

Shi, S. et al. Analyzing canopy structure effects based on LiDAR for GPP-SIF relationship and GPP estimation. Front. Plant Sci. 16. https://doi.org/10.3389/fpls.2025.1561826 (2025).
Asner, G. P. et al. High-resolution forest carbon stocks and emissions in the amazon. Proc. Natl. Acad. Sci. 107, 16738–16742 (2010).
CAS PubMed PubMed Central ADS Google Scholar
Mondal, P., McDermid, S. S. & Qadir, A. A reporting framework for sustainable development goal 15: Multi-scale monitoring of forest degradation using modis, landsat and sentinel data. Remote Sens. Environ. 237, 111592 (2020).
Google Scholar
Li, Y. et al. Deforestation-induced climate change reduces carbon storage in remaining tropical forests. Nat. Commun. 13, 1964 (2022).
PubMed PubMed Central ADS Google Scholar
Saatchi, S. S. et al. Benchmark map of forest carbon stocks in tropical regions across three continents. Proc. Natl. Acad. Sci. USA 108, 9899–9904 (2011).
CAS PubMed PubMed Central ADS Google Scholar
Sha, Z. et al. The global carbon sink potential of terrestrial vegetation can be increased substantially by optimal land management. Commun. Earth Environ. 3, 8 (2022).
ADS Google Scholar
Bilyk, A., Pulkki, R., Shahi, C. & Larocque, G. R. Development of the Ontario Forest Resources Inventory: A historical review. Can. J. For. Res. 51, 198–209 (2021).
ADS Google Scholar
Goetz, S. & Dubayah, R. Advances in remote sensing technology and implications for measuring and monitoring forest carbon stocks and change. Carbon Manag. 2, 231–244 (2011).
Google Scholar
Rodríguez-Veiga, P., Wheeler, J., Louis, V., Tansey, K. & Balzter, H. Quantifying forest biomass carbon stocks from space. Curr. For. Rep. 3, 1–18 (2017).
Google Scholar
H. Nguyen, T., Jones, S., Soto-Berelov, M., Haywood, A. & Hislop, S. Landsat time-series for estimating forest aboveground biomass and its dynamics across space and time: A review. Remote Sens. 12, 98 (2019).
Dong, T. et al. Estimating crop biomass using leaf area index derived from Landsat 8 and Sentinel-2 data. ISPRS J. Photogramm. Remote Sens. 168, 236–250 (2020).
ADS Google Scholar
Li, Y., Li, M., Li, C. & Liu, Z. Forest aboveground biomass estimation using Landsat 8 and Sentinel-1a data with machine learning algorithms. Sci. Rep. 10, 9952 (2020).
CAS PubMed PubMed Central ADS Google Scholar
Aklilu Tesfaye, A. & Gessesse Awoke, B. Evaluation of the saturation property of vegetation indices derived from Sentinel-2 in mixed crop-forest ecosystem. Spat. Inf. Res. 29, 109–121 (2021).
Google Scholar
Feldpausch, T. R. et al. Tree height integrated into pantropical forest biomass estimates. Biogeosciences 9, 3381–3403 (2012).
ADS Google Scholar
Deng, Y., Pan, J., Wang, J., Liu, Q. & Zhang, J. Mapping of forest biomass in Shangri-La City based on lidar technology and other remote sensing data. Remote Sens 14, 5816 (2022).
ADS Google Scholar
Healey, S. P., Yang, Z., Gorelick, N. & Ilyushchenko, S. Highly local model calibration with a new Gedi Lidar asset on google earth engine reduces Landsat forest height signal saturation. Remote Sens. 12, 2840 (2020).
ADS Google Scholar
Shi, S. et al. Waveform information accurate extraction for massive and complex waveform data of hyperspectral lidar. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 18, 1020–1038. https://doi.org/10.1109/JSTARS.2024.3495039 (2025).
Article ADS Google Scholar
Wang, A. et al. Potential of hyperspectral LiDAR in individual tree segmentation: A comparative study with multispectral LiDAR. Urban For. Urban Green. 104, 128658. https://doi.org/10.1016/j.ufug.2024.128658 (2025).
Article Google Scholar
Lu, J. et al. Estimation of aboveground biomass of Robinia Pseudoacacia forest in the yellow river delta based on UAV and backpack lidar point clouds. Int. J. Appl. Earth Obs. 86, 102014 (2020).
Google Scholar
Narine, L. L. et al. Estimating aboveground biomass and forest canopy cover with simulated Icesat-2 data. Remote Sens. Environ. 224, 1–11 (2019).
ADS Google Scholar
Nandy, S., Srinet, R. & Padalia, H. Mapping forest height and aboveground biomass by integrating Icesat-2, Sentinel-1 and Sentinel-2 data using random forest algorithm in northwest Himalayan Foothills of India. Geophys. Res. Lett 48, e2021GL093799 (2021).
ADS Google Scholar
Keith, H., Mackey, B. G. & Lindenmayer, D. B. Re-evaluation of forest biomass carbon stocks and lessons from the world’s most carbon-dense forests. Proc. Natl. Acad. Sci. 106, 11635–11640 (2009).
CAS PubMed PubMed Central ADS Google Scholar
Grierson, P., Adams, M. & Attiwill, P. Estimates of carbon storage in the aboveground biomass of Victorias forests. Aust. J. Bot. 40, 631–640 (1992).
CAS Google Scholar
Liu, L., Lim, S., Shen, X. & Yebra, M. Assessment of generalized allometric models for aboveground biomass estimation: A case study in Australia. Comput. Electron. Agric. 175, 105610 (2020).
Google Scholar
Nguyen, T. H., Jones, S. D., Soto-Berelov, M., Haywood, A. & Hislop, S. Monitoring aboveground forest biomass dynamics over three decades using Landsat time-series and single-date inventory data. Int. J. Appl. Earth Obs. 84, 101952 (2020).
Google Scholar
Liao, Z., Dijk, A. I., He, B., Larraondo, P. R. & Scarth, P. F. Woody vegetation cover, height and biomass at 25-m resolution across Australia derived from multiple site, airborne and satellite observations. Int. J. Appl. Earth Obs. 93, 102209 (2020).
Google Scholar
Shi, S. et al. Analysis simulation and scanning geometry calibration of palmer scanning units for airborne hyperspectral light detection and ranging. Remote Sens. 17(8). https://doi.org/10.3390/rs17081450 (2025).
Lary, D. J., Alavi, A. H., Gandomi, A. H. & Walker, A. L. Machine learning in geosciences and remote sensing. Geosci. Front. 7, 3–10 (2016).
Google Scholar
Zhou, X. et al. Estimation of biomass in wheat using random forest regression algorithm and remote sensing data. Crop J. 4, 212–219 (2016).
Google Scholar
Zhang, Y., Ma, J., Liang, S., Li, X. & Li, M. An evaluation of eight machine learning regression algorithms for forest aboveground biomass estimation from multiple satellite data products. Remote Sens. 12, 4015 (2020).
ADS Google Scholar
Su, H., Shen, W., Wang, J., Ali, A. & Li, M. Machine learning and geostatistical approaches for estimating aboveground biomass in Chinese subtropical forests. For. Ecosyst. 7, 1–20 (2020).
Google Scholar
Pavlyshenko, B. Using stacking approaches for machine learning models. In 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP). 255–258 (IEEE, 2018).
Yao, H. et al. Combination of hyperspectral and quad-polarization SAR images to classify marsh vegetation using stacking ensemble learning algorithm. Remote Sens. 14, 5478 (2022).
ADS Google Scholar
Yang, H. et al. Estimation of potato chlorophyll content from UAV multispectral images with stacking ensemble algorithm. Agronomy 12, 2318 (2022).
CAS Google Scholar
Scarth, P., Armston, J., Lucas, R. & Bunting, P. A structural classification of Australian vegetation using Icesat/Glas, Alos Palsar, and Landsat sensor data. Remote Sens. 11, 147 (2019).
ADS Google Scholar
Joint, R. S. R. P. Biomass Plot Library—National Collation of Stem Inventory Data and Biomass Estimation, Australian Field Sites (2021). 04 Nov 2024.
Paul, K. I. et al. Testing the generality of above-ground biomass allometry across plant functional types at the continent scale. Glob. Change Biol. 22, 2106–2124 (2016).
ADS Google Scholar
Lucas, R. et al. An evaluation of the Alos Palsar l-band backscatter–above ground biomass relationship Queensland, Australia: Impacts of surface moisture condition and vegetation structure. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 3, 576–593 (2010).
ADS Google Scholar
Liao, Z., Liu, X., Dijk, A., Yue, C. & He, B. Continuous woody vegetation biomass estimation based on temporal modeling of Landsat data. Int. J. Appl. Earth Obs. 110, 102811 (2022).
Google Scholar
Hordo, M., Kiviste, A., Sims, A. & Lang, M. Outliers and/or measurement errors on the permanent sample plot data. In Proceedings of the Sustainable Forestry in Theory and Practice: Recent Advances in Inventory and Monitoring, Statistics and Modeling, Information and Knowledge Management, and Policy Science. USDA For Serv Gen Tech Rep, PNW-688. Portland, OR, USA. Vol. 15 (2008).
Rostampour, M. Comparison of outlier detection methods and their impact on rangeland measurement and assessment studies. J. Range Watershed Manag. 75, 639–660 (2022).
Google Scholar
Hemati, M., Hasanlou, M., Mahdianpari, M. & Mohammadimanesh, F. A systematic review of Landsat data for change detection applications: 50 years of monitoring the earth. Remote Sens. 13, 2869 (2021).
ADS Google Scholar
Wulder, M. A. et al. Fifty years of Landsat science and impacts. Remote Sensing Environ. 280, 113195 (2022).
Google Scholar
Xu, A., Wang, F. & Li, L. Vegetation information extraction in karst area based on UAV remote sensing in visible light band. Optik 272, 170355 (2023).
Google Scholar
Pinty, B., Lavergne, T., Widlowski, J.-L., Gobron, N. & Verstraete, M. On the need to observe vegetation canopies in the near-infrared to estimate visible light absorption. Remote Sens. Environ. 113, 10–23 (2009).
ADS Google Scholar
Holzman, M. E., Rivas, R. E. & Bayala, M. I. Relationship between TIR and NIR-Swir as indicator of vegetation water availability. Remote Sensing 13, 3371 (2021).
ADS Google Scholar
Xue, J. & Su, B. Significant remote sensing vegetation indices: A review of developments and applications. J. Sens. 2017, 1353691 (2017).
Google Scholar
Pettorelli, N. The Normalized Difference Vegetation Index (Oxford University Press, 2013).
Google Scholar
Richard, J. & Abah, I. A. Derivation of land surface temperature (LST) from Landsat 7 & 8 imageries and its relationship with two vegetation indices (NDVI and GNDVI). Int. J. Res. Granthaalayah 7, 108–120 (2019).
Google Scholar
Gitelson, A. A. & Merzlyak, M. N. Remote sensing of chlorophyll concentration in higher plant leaves. Adv. Sp. Res. 22, 689–692 (1998).
CAS ADS Google Scholar
Huete, A. R. A soil-adjusted vegetation index (SAVI). Remote Sens. Environ. 25, 295–309 (1988).
ADS Google Scholar
Emard, S. Time Series of a Forest Canopy: Detecting Changes Using the Visible Atmospherically Resistant Index and a Low-Cost Drone. Ph.D. thesis, Toronto Metropolitan University (2024).
Kaufman, Y. J. & Tanre, D. Atmospherically resistant vegetation index (ARVI) for EOS-MODIS. IEEE Trans. Geosci. Remote Sens. 30, 261–270 (1992).
ADS Google Scholar
Weiss, M., Baret, F., Smith, G., Jonckheere, I. & Coppin, P. Review of methods for in situ leaf area index estimation (LAI) determination: Part II. of LAI, errors and sampling. Agric. For. Meteorol. 121, 37–53 (2004).
ADS Google Scholar
Rodriguez, P. S., Schwantes, A. M., Gonzalez, A. & Fortin, M.-J. Monitoring changes in the enhanced vegetation index to inform the management of forests. Remote Sens. 16, 2919 (2024).
ADS Google Scholar
Zeng, Y. et al. Optical vegetation indices for monitoring terrestrial ecosystems globally. Nat. Rev. Earth Environ. 3, 477–493 (2022).
ADS Google Scholar
Yang, C. Plant leaf recognition by integrating shape and texture features. Pattern Recognit. 112, 107809 (2021).
Google Scholar
Duan, M., Song, X., Liu, X., Cui, D. & Zhang, X. Mapping the soil types combining multi-temporal remote sensing data with texture features. Comput. Electron. Agric. 200, 107230 (2022).
Google Scholar
Li, H. et al. Estimation of winter wheat LAI based on color indices and texture features of RGB images taken by UAV. J. Sci. Food Agric. 105, 189–200 (2025).
CAS PubMed Google Scholar
Liu, C. et al. Analysis of net primary productivity variation and quantitative assessment of driving forces—A case study of the Yangtze River Basin. Plants 12, 3412 (2023).
PubMed PubMed Central Google Scholar
Chuai, X., Huang, X. J., Wang, W. & Bao, G. NDVI, temperature and precipitation changes and their relationships with different vegetation types during 1998–2007 in Inner Mongolia, China. Int. J. Climatol. 33, 1696–1706 (2013).
Google Scholar
Pitt, M. & Heady, H. Responses of annual vegetation to temperature and rainfall patterns in Northern California. Ecology 59, 336–350 (1978).
Google Scholar
Wang, X. et al. Soil temperature change and its regional differences under different vegetation regions across China. Int. J. Climatol. 41, E2310–E2320 (2021).
ADS Google Scholar
Ni, J., Cheng, Y., Wang, Q., Ng, C. W. W. & Garg, A. Effects of vegetation on soil temperature and water content: Field monitoring and numerical modelling. J. Hydrol. 571, 494–502 (2019).
ADS Google Scholar
Avitabile, V. et al. An integrated pan-tropical biomass map using multiple reference datasets. Glob. Change Biol. 22, 1406–1420 (2016).
ADS Google Scholar
Spawn, S. A., Sullivan, C. C., Lark, T. J. & Gibbs, H. K. Harmonized global maps of above and belowground biomass carbon density in the year 2010. Sci. Data 7, 112 (2020).
PubMed PubMed Central Google Scholar
Zhang, Y., Ma, J., Liang, S., Li, X. & Liu, J. A stacking ensemble algorithm for improving the biases of forest aboveground biomass estimations from multiple remotely sensed datasets. GISci. Remote Sens. 59, 234–249 (2022).
Google Scholar
Wang, D. & Yue, X. The weighted multiple meta-models stacking method for regression problem. In 2019 Chinese Control Conference (CCC). 7511–7516 (IEEE, 2019).
Baskin, I. I., Marcou, G., Horvath, D. & Varnek, A. Stacking. Tutorials in Chemoinformatics. 271–278 (2017).
Aguiar, G. J., Santana, E. J., Carvalho, A. C. & Junior, S. B. Using meta-learning for multi-target regression. Inf. Sci. 584, 665–684 (2022).
Google Scholar
Fu, B. et al. Comparison of optimized object-based rf-dt algorithm and segnet algorithm for classifying karst wetland vegetation communities using ultra-high spatial resolution UAV data. Int. J. Appl. Earth Obs. Geoinf. 104, 102553 (2021).
Google Scholar
Wu, Y., Zhao, Q., Yin, X., Wang, Y. & Tian, W. Multi-parameter health assessment of jujube trees based on unmanned aerial vehicle hyperspectral remote sensing. Agriculture 13, 1679 (2023).
CAS Google Scholar
Sahin, E. K. Assessing the predictive capability of ensemble tree methods for landslide susceptibility mapping using xgboost, gradient boosting machine, and random forest. SN Appl. Sci. 2, 1308 (2020).
CAS Google Scholar
Tian, Y. et al. Aboveground biomass of typical invasive mangroves and its distribution patterns using UAV-lidar data in a subtropical estuary: Maoling River Estuary, Guangxi, China. Ecol. Indic. 136, 108694 (2022).
Google Scholar
Zhen, J. et al. Performance of xgboost ensemble learning algorithm for mangrove species classification with multisource spaceborne remote sensing data. J. Remote Sens. 4, 0146 (2024).
Google Scholar
Chirici, G. et al. A meta-analysis and review of the literature on the k-nearest neighbors technique for forestry applications that use remotely sensed data. Remote Sens. Environ. 176, 282–294 (2016).
ADS Google Scholar
Chen, L., Ren, C., Zhang, B. & Wang, Z. Multi-sensor prediction of stand volume by a hybrid model of support vector machine for regression kriging. Forests 11, 296 (2020).
Google Scholar
Xu, X. et al. Unleashing the power of machine learning and remote sensing for robust seasonal drought monitoring: A stacking ensemble approach. J. Hydrol. 634, 131102 (2024).
Google Scholar
Wang, Y. & Li, Y. Mapping the ratoon rice suitability region in China using random forest and recursive feature elimination modeling. Field Crops Res. 301, 109016 (2023).
ADS Google Scholar
Liu, H., An, H. & Zhang, Y. Analysis of worldview-2 band importance in tree species classification based on recursive feature elimination. Curr. Sci. 115, 1366–1374 (2018).
Google Scholar
Tchakoucht, T. A., Domingos, T., Proença, V. et al. Integrating k-fold cross-validation with advanced classification techniques for shrub coverage mapping in fire-prone landscapes using sentinel-2 imagery. In IGARSS 2024-2024 IEEE International Geoscience and Remote Sensing Symposium. 7554–7559 (IEEE, 2024).
Ehlers, D. et al. Mapping forest aboveground biomass using multisource remotely sensed data. Remote Sens. 14, 1115 (2022).
ADS Google Scholar
Liu, Y. et al. Estimation of potato above-ground biomass based on unmanned aerial vehicle red-green-blue images with different texture features and crop height. Front. Plant Sci. 13, 938216 (2022).
PubMed PubMed Central Google Scholar
Daliman, S. et al. Dual vegetation index analysis and spatial assessment in Kota Bharu, Kelantan using GIS and remote sensing. BIO Web Conf. 131, 05009 (2024) (EDP Sciences).
Google Scholar
Mangewa, L. J. et al. Comparative assessment of UAV and Sentinel-2 NDVI and GNDVI for preliminary diagnosis of habitat conditions in Burunge Wildlife Management Area, Tanzania. Earth 3, 769–787 (2022).
ADS Google Scholar
Liu, Y. et al. Estimating potato above-ground biomass based on vegetation indices and texture features constructed from sensitive bands of UAV hyperspectral imagery. Comput. Electron. Agric. 220, 108918 (2024).
Google Scholar
Tian, Y., Wu, Z., Li, M., Wang, B. & Zhang, X. Forest fire spread monitoring and vegetation dynamics detection based on multi-source remote sensing images. Remote Sens. 14, 4431 (2022).
ADS Google Scholar
Lu, J. et al. Lithology classification in semi-arid area combining multi-source remote sensing images using support vector machine optimized by improved particle swarm algorithm. Int. J. Appl. Earth Obs. Geoinf. 119, 103318 (2023).
Google Scholar
Lu, M. et al. A stacking ensemble model of various machine learning models for daily runoff forecasting. Water 15, 1265 (2023).
Google Scholar
Lin, X. et al. Time-series simulation of alpine grassland cover using transferable stacking deep learning and multisource remote sensing data in the Google Earth Engine. Int. J. Appl. Earth Obs. Geoinf. 131, 103964 (2024).
Google Scholar
Wang, Z. et al. Learning ensembles of process-based models for high accurately evaluating the one-hundred-year carbon sink potential of China’s Forest Ecosystem. Heliyon 9 (2023).
Tang, X. et al. Satellite evidence for China’s leading role in restoring vegetation productivity over global karst ecosystems. Forest Ecol. Manag. 507, 120000 (2022).
Google Scholar

Download references

Acknowledgements

Thanks to Google Inc. for the free and open data platform. We also thank the Python developer. Data was sourced from Terrestrial Ecosystem Research Network (TERN) infrastructure, which is enabled by the Australian Government’s National Collaborative Research Infrastructure Strategy (NCRIS).

Funding

This research was funded by the National Natural Science Foundation of China (42471413), Natural Science Foundation of Hubei Province (2024AFA069) and LIESMARS Special Research Funding.

Author information

Tong Wang and Chenxi Liu contributed equally to this work.

Authors and Affiliations

Electronic Information School, Wuhan University, Wuhan, 430072, China
Chenxi Liu, Shuo Shi & Wei Gong
State Key Laboratory of Information Engineering in Surveying Mapping and Remote Sensing, Wuhan University, Wuhan, 430079, China
Shuo Shi, Tong Wang, Wei Gong & Zixi Shi
Perception and Effectiveness Assessment for Carbon-neutrality Efforts, Engineering Research Center of Ministry of Education, Wuhan University, Wuhan, 430079, China
Shuo Shi & Wei Gong
School of Resources and Environment, University of Electronic Science and Technology of China, Chengdu, 611731, China
Zhanmang Liao

Authors

Chenxi Liu
View author publications
Search author on:PubMed Google Scholar
Shuo Shi
View author publications
Search author on:PubMed Google Scholar
Zhanmang Liao
View author publications
Search author on:PubMed Google Scholar
Tong Wang
View author publications
Search author on:PubMed Google Scholar
Wei Gong
View author publications
Search author on:PubMed Google Scholar
Zixi Shi
View author publications
Search author on:PubMed Google Scholar

Contributions

C.X.L. and T.W. contributed equally to this work. C.X.L. and T.W. designed and implemented the study with the supervision from S.S.; Z.M.L. and W.G. advised on the method and framework. C.X.L. drafted the manuscript and prepared the figures with the review and comments from T.W. and Z.X.S. All authors discussed the results and contributed to the final version of the manuscript.

Corresponding authors

Correspondence to Shuo Shi or Wei Gong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, C., Shi, S., Liao, Z. et al. Estimation of woody vegetation biomass in Australia based on multi-source remote sensing data and stacking models. Sci Rep 15, 34975 (2025). https://doi.org/10.1038/s41598-025-18891-1

Download citation

Received: 02 March 2025
Accepted: 04 September 2025
Published: 07 October 2025
Version of record: 07 October 2025
DOI: https://doi.org/10.1038/s41598-025-18891-1

Subjects

Abstract

Similar content being viewed by others

Aboveground biomass estimation using multimodal remote sensing observations and machine learning in mixed temperate forest

Remote sensing estimation of aboveground biomass of different forest types in Xinjiang based on machine learning

Advancing forest carbon stocks’ mapping using a hierarchical approach with machine learning and satellite imagery

Introduction

Study area and data

Biomass plot data

Landsat data

Vegetation height parameter acquisition

Land cover classification data and DEM

Temperature and precipitation data

Comparison with other biomass maps

Research methods

Methods for biomass estimation

Base models

Meta-model

Feature selection of the image spectrum

Accuracy assessment and validation

Uncertainty analysis

Results

Performance of biomass estimation models

Model feature selection

Spatial distribution and uncertainty analysis of biomass

Comparison with other AGB maps

Discussion

Influence of feature variable selection on aboveground biomass estimation

Robustness analysis of the method

Analysis of spatial distribution differences and uncertainty

Limitations

Conclusion

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links