Background & Summary

Three-dimensional vegetation structure (or vegetation vertical structure) refers to the distribution of plant biomass from the ground to the top of the canopy1,2,3. Vegetation vertical structure is an Essential Biodiversity Variable, a set of biological variables designed to monitor biodiversity changes in response to the current environmental crisis that the planet is experiencing4. Vegetation vertical structure influences hydrological cycles5,6,7, climatic regulation8,9,10, primary productivity11,12,13,14, nutrient fluxes15,16, habitat quality17,18,19,20, and biodiversity21,22,23. The most consistent and low-cost method to study vegetation vertical structure over large extents consists of using LiDAR (Light Detection and Ranging) sensors to estimate metrics that describe 3D vegetation structure, due to LiDAR’s ability to penetrate canopies and measure the sub-canopy distribution of vegetation24,25,26,27,28,29.

The NASA Global Ecosystem Dynamics Investigation (GEDI) LiDAR was designed to study vegetation vertical structure near-globally between approximately 51.6 degrees north and south latitude. It acquired data from April 2019 to March 2023, then was paused for 13 months30,31, and began reacquiring data in April 2024. GEDI uses eight laser beams which measure forest structure within ~25 m footprints. Along track, these footprints are spaced by 60 m, with 600 m spacing between beams32. Although GEDI tends to acquire fewer high-quality footprints in the tropics due to the geometrical characteristics of its orbit and persistent cloud cover3,30, never have there been so many detailed measurements of forest vertical structure in tropical ecosystems, the most diverse terrestrial areas on the planet33 and where the highest rates of natural habitat loss occur34.

GEDI footprints have limitations for spatially-continuous mapping because these footprint-level products represent samples of the land area, leaving most of the land surface without observations. GEDI is capable of discontinuously sampling only ~4% of the land surface every two-years30,31. Consequently, some research groups have integrated GEDI footprints with wall-to-wall multispectral data to enable the spatial prediction of GEDI information for consistent gridded maps of vegetation structure metrics and aboveground biomass. These predictions include a canopy height map at 30 m over the GEDI domain using Landsat predictors and RH95 as an indicator of height35, a global map of canopy height at 10 m using energy level RH98 and Sentinel-2 predictors36, maps of mean and standard deviation of canopy height at 1 km using the energy level RH10037, global maps of relative height metrics at 100 m, 200 m, 500 m, and 1000 m spatial resolutions integrating GEDI and ICESat2 (Ice, Cloud, and Land Elevation Satellite 2)38, and gridded mean aboveground biomass density at 1 km based on the canopy heights generated by GEDI39. This type of work modeled canopy height but did not map metrics related to the distribution of biomass between the ground and the canopy height. Burns et al.3 developed and published annual global maps from 2019 to 2023 of 26 GEDI structural metrics related to entire vertical vegetation profile at coarse spatial resolutions (1 km, 6 km, and 12 km), gridding the aggregated footprint values3. There are limited published maps of vegetation structure variables generated by GEDI predictions or interpolation that describe the entire vertical vegetation profile with a detailed resolution (<= 30 m) for large regions or countries. Those that have been published show great promise for enhancing our understanding of forest structure gradients and species habitat relationships40.

The objective of this research is to elucidate the construction and make available five maps of metrics of forest vertical structure (Table 1) with relatively high spatial resolution (25 m) for the year 2020 in Colombia, one of the most biodiverse countries on the planet. Colombia includes vegetation types that range from dry, moist, to rain forest at altitudes from sea level to >~5000 m. The maps were constructed by developing predictions for each metric of forest vertical structure using a set of 82 remote sensing predictors (temporal metrics) that included data from multispectral (Sentinel-2) and synthetic aperature radar (SAR) (Sentinel-1 and ALOS-PALSAR) sensors. The inclusion of the two SAR sensors allowed the use of regions of the electromagnetic spectrum that have been related to leaf density (Sentinel-1 C-band)41,42,43,44 and forest height (ALOS-PALSAR L-band)45,46,47, increasing the number of possible predictors and potentially reducing the error of the models. Each of these five national maps of forest structure was formed by a mosaic of regional maps corresponding to the five natural regions into which Colombia is divided. We did this to reduce errors in model predictions related to contrasting environmental conditions among regions, and relative uniformity within regions.

Table 1 Description of the five GEDI metrics selected for mapping.

Methods

Study area

Colombia’s mainland territory presents an area of ~1.142 million km2 in the northwestern corner of South America. Colombia is categorized as a megadiverse country since it contains record high numbers in counts of several taxa (e.g., birds, mammals, amphibians, butterflies, freshwater fish, orchids, vascular plants), ecosystems, types of vegetation, and types of forests48. Colombian environmental authorities divide the country into five primary natural regions, Andean, Caribbean, Amazon, Chocó (Pacific), and Orinoquía (Fig. 1). It was estimated, in 2020, that 52.1% of Colombia is covered by forests distributed as follows: 64.8% in the Amazon, 17.2% in the Andes, 7.7% in Chocó, 5.5% in the Caribbean, and 4.8% in Orinoquía49. The Amazon is dominated by Tropical Moist-Forest, the Chocó by Tropical Rain-Forest, the Caribbean and Orinoquía by Tropical Dry Forest, and the Andes presents mosaics of Tropical Dry-Forest, Tropical Moist-Forest, and Tropical Rain-Forest, separated by small distances in some areas due to the high environmental variability generated by the branching of the Andes Mountain range into three mountain ranges (Western, Central, and Eastern Mountain ranges)50.

Fig. 1
figure 1

Colombian terrestrial land (A) and natural regions (B).

GEDI response variables

We downloaded all the L2A and L2B granule data of GEDI (version 2.1) for the Colombian territory corresponding to the years 2019, 2020, and 2021 to build regional datasets of the five metrics (Table 1). High quality footprints were afterward selected using the comprehensive filtering process published by Burns et al.3. First, we selected quality shots that suitably estimated ground elevation and vegetation structure metrics. Selection criteria included minimal surface water, minimal urban cover, leaf on vegetation status, vegetation structure metrics within expected ranges, and ground elevation agreement with a reference DEM, among others. Then, we linked the filtered L2A, L2B, and L4A datasets by shot number. Finally, we used a dictionary of local outlier granules produced by University of Maryland to exclude orbit segments that were identified as local outliers, typically associated with low clouds. This quality-filtering procedure resulted in 5,720,940 high-quality footprints for the Amazon region, 5,620,920 for the Andean region, 5,584,260 for the Caribbean region, 5,630,300 for Orinoquía, and 1,105,860 for Chocó.

SAR and multispectral predictors

We constructed 76 mosaics of temporal and textural metrics using the pixel values of all imagery of Sentinel-1 (SAR data of the c-band) and Sentinel-2 (multispectral data) available between 1 January 2019 and 31 December 2021 in Google Earth Engine – GEE51. By using all imagery of these three years in all our calculations, one year before and one year after 2020, we maximized the use of data for robust estimation, i.e. reduced error and uncertainty. The temporal metrics were average (X) and standard deviation (SD)21,41,42 while the textural metrics were sum average (SAVG) and difference variance (DVAR)52. These four metrics represented the central tendency (X and SAVG) and evaluated the data dispersion (SD and DVAR), generating balance among predictors. Textural metrics were calculated in neighborhoods of 3 × 3 pixels using the glcmTexture and map functions of GEE, which allowed us to estimate texture in each image of the temporal collection and later obtain an average. A description of the temporal and textural metrics is found in Table 2.

Table 2 Temporal and textural metrics.

To develop the Sentinel-1 mosaics, the Sentinel-1 SAR GRD (C-band Synthetic Aperture Radar Ground Range Detected) product data sets53 were processed by applying an angular-based radiometric slope correction using a backscatter coefficient gamma nought, in addition to the calibration and ortho-correction of these data sets54. To develop the Sentinel-2 mosaics, we initially created image mosaics from Sentinel 2A surface reflectance products; however, because images were processed in tiles and bidirectional reflectance distribution function (BRDF) adjustments had not been applied, noticeable artifacts due to surface anisotropy and tile boundaries were present. To overcome this limitation, we applied a normalization approach based on the method outlined in Potapov et al. (2012) to Sentinel-2 Level 1 C top-of-atmosphere (TOA) imagery and constructed mosaics from these normalized images55. The approach reduces artifacts caused by surface anisotropy and variations in the viewing and solar geometries that remain in Sentinel-2A Level-2A products, resulting in mosaics with more consistent reflectance across scenes and acquisition dates. However, because the procedure adjusts TOA reflectance rather than performing full atmospheric correction, it does not provide true surface reflectance as other physics-based methods do. This method uses MODIS BRDF-adjusted reflectance as the normalization target. Here, we used a 10-year median of MODIS land surface reflectance bands, filtered to include only good-quality observations as indicated by the QA bands. We first selected relatively clear pixels from each image by using the scene classification map from the corresponding Sentinel 2 A SR product, which is developed by ESA and effectively removes most clouds and cloud shadows from L1C (Top-of-Atmosphere) and L2A (Surface Reflectance) imagery56. Next, the mean bias between MODIS and Sentinel-2 reflectance was calculated and used to adjust Sentinel-2 TOA reflectance, excluding pixels with large reflectance differences. To account for surface anisotropy, a linear regression between reflectance bias and distance from the center of each Sentinel-2 scene was applied to each spectral band independently. Table 3 shows the corresponding bands between the Sentinel-2 MSI and MODIS sensors used in the normalization process; however, there are no direct MODIS equivalents for the Sentinel-2 red edge bands. We generated synthetic MODIS red edge bands by modeling Sentinel-2 red edge bands as linear combinations of the MODIS red and near-infrared (NIR) bands. To do this, we convolved known surface reflectance spectra from the ECOSTRESS spectral library57,58 with the spectral response functions (SRFs) of the Sentinel-2 red edge bands and the MODIS red and NIR bands (SRFs obtained from the Pyspectral Python library59). The simulated reflectance values from the Sentinel-2 red edge bands served as dependent variables, while MODIS red and NIR band reflectances were used as independent variables.

Table 3 Sentinel 2 MSI (The MultiSpectral Imager) bands with analogous bands from the MODIS platform.

We also constructed six mosaics for ALOS-PALSAR data applying a variation to the previous methodology described for Sentinel. We first obtained two metrics for the two polarizations of ALOS-2-PALSAR data, the average of years 2019, 2020, and 2021 using the GEE product 25 m PALSAR/PALSAR-2 mosaic47, since this is a one-date annual product created by mosaicking imagery from PALSAR/PALSAR-2. We then obtained four textural metrics over the previous annual mean, SAVG and DVAR for each polarization, estimated in neighborhoods of 3 × 3 pixels. A summary of each backscatter coefficient, band, and index used to build the 82 mosaics for the Sentinel-1, Sentinel-2, and ALOS-2-PALSAR data is shown in Table 4 and scripts used to build these mosaics are available in the section Code availability.

Table 4 Summary of the multispectral and SAR data used to build temporal (average-X and standard deviations-SD) and textural (sum average-SAVG and difference variance-DVAR) metrics.

Prediction and mapping

To construct maps of the five GEDI metrics that describe the structure of Colombian forests at the year 2020 (Table 1), we first built maps for each natural region for each GEDI metric and then mosaiced these regional maps to create final national maps. We used this mapping approach because each natural region tends to have some similarity in forest types and environmental conditions (e.g., climate, topography, altitude) which allowed us to control sources of error in spatial modeling60,61. Other approaches typically applied in remote sensing modeling of large areas, such as mapping throughout the entire study area62 or mapping across regular grids that cover the study area35, could combine different forest types and environmental conditions, increasing modeling errors, given the highly heterogeneous characteristics of the Colombian territory.

Each regional map was constructed using the numerical values of each GEDI metric as the response variable, the associated values of the temporal and textural SAR and multispectral metrics as predictors, and the Random Forest algorithm (RF)63,64. Although in most regions we identified more than 5 million high-quality GEDI footprints we randomly subsampled 1,200,000 of these footprints for each regional model. This is the approximate maximum number of observations that our high-performance computing system could process for RF modeling with 82 predictors. The Choco region did not require any sub-sampling as we identified 1,105,860 high-quality footprints there. We then tuned RF hyperparameters, including the number of variables randomly sampled as candidates at each split and minimum size of terminal nodes. Once the best regional model was identified, the regional map for each GEDI metric was built based on the 82 mosaics of the SAR and multispectral predictors mentioned previously. We used the R packages “randomForest”64 and “Caret”65 for the RF modeling, “Boruta”66 to apply the Boruta algorithm for feature selection, and “raster” for mapping67.

Data Records

Maps of Colombian forest vertical structure for the year 2020 (Fig. 2) are available to download in GeoTiff format in Zenodo68: https://zenodo.org/records/15493516. These maps are also accessible in Google Earth Engine in the links below, which are organized corresponding to a tile shapefile, where each map is split into eleven tiles, with tile numbering starting at one and running from left to right, top to bottom, starting at the top left. The shapefile consists of four rows and three columns but note that the top row has only two tiles as the upper right tile does not contain any forest pixels in Colombia.

Fig. 2
figure 2

Maps of forest structure of Colombia in the year 2020. (A) Canopy height (CH) in meters, (B) Total cover (COVER) in percentage, (C) Foliage Height Diversity (FHD) in index, (D) Total Plant Area Index (PAI) in index-m2/m2, and (E) Height of half the accumulated energy (RH50) in meters.

Tile shapefile

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/COLOMBIA_FOREST_TILES

CH (canopy height)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_11

COVER (canopy cover)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_11

FHD_PAI (foliage height diversity calculated from plant area index)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_11

PAI (plant area index)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_11

RH50 (height at which 50% of lidar energy is returned)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_11

Data Overview

The five national Maps of forest vertical structure for Colombia presented in this publication correspond to the 2020 year (Fig. 2), have the coordinate reference system EPSG:4326, spatial resolution of 25 m, and the data type Float32. Forest areas were identify by masking out all areas with <70% tree cover based on Hansen Global Forest Change database v1.12 (2000–2024)34. The map of Canopy height (CH) is in meters, the map of total cover (COVER) in percentage of cover, the map of Foliage Height Diversity (FHD) in the FHD index, the map of Total Plant Area Index (PAI) in the PAI index, and the map of height of half the accumulated energy (RH50) is in meters. The details of how map units were calculated are described in Table 1.

Technical Validation

We implemented three types of validation: 1) cross validation using sample data (VSD), 2) validation using external data (VED), and 3) Validation testing the interrelationship curves of the forest structural variables between footprint data vs. predicted data (VRC). VSD refers to error estimates calculating two metrics that allow comparisons between different units, RAE (Relative Absolute Error) and RRSE (Root Relative Squared Error), and two error metrics for absolute data to recognize the magnitude of the error, MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error). These four-error metrics were calculated by sampling data partitions, using the sample data (GEDI footprints) where 70% of the footprints were used for building the maps and 30% were used for testing the resulting maps. These validations using sample data were estimated in each regional map for each of the five-forest structural metrics applying resampling of 5000 on the testing data to estimate value-ranges. We found error differences among the natural regions; the Amazon and Andean regions tended to present the highest RAE and RRSE values (Fig. 3) with maximum RMSE magnitudes of ~5.8 m for CH, ~0.25 for COVER, ~0.42 for FHD, 1.69 of m2/m2 for PAI, and ~5.4 m for RH50 (Table 5).

Fig. 3
figure 3

Error estimation, RAE (Relative Absolute Error) and RRSE (Root Relative Squared Error), of forest the forest structural maps of Colombia.

Table 5 Error-estimates of the Validation using Sample Data (VSD) for the regional maps of forest structural metrics using MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error).

VED refers to error estimates using GEDI footprints simulated using 578 km2 of discrete ALS-LiDAR across the Chocó natural region. A description of this data set can be found at Fagua et al.21. We followed the approach described by Hancock et al.69 to simulate GEDI footprints using the ALS-LiDAR data and to estimate values of CH, RH50, FHD, and COVER. We note that PAI could not be simulated due to the lack of some parameters necessary for its estimation. The process to simulate the GEDI footprints first consisted of noise removal using the Statistical Outlier Removal method of the R package lasR70. Next, we established a grid with the same resolution as the vertical structure maps. We later identified the centroid of each raster cell that was contained within one of the LiDAR tiles to derive a simulated GEDI footprint and its corresponding CH, RH50, FHD, and COVER values, using the Rgedisimulator tool of the R package rGEDI71. We finally estimated the same error metrics described above, RAE, RRSE, MAE and RMSE, by comparing 5000 resamples of CH, RH50, FHD, and COVER simulated-values with the corresponding values from the resulting maps in the Choco. Simulated GEDI footprints were selected randomly using a spatial filter of 200 m. Parameters and scripts of GEDI simulation using ALS-LiDAR can be found at the github site for this manuscript (see Code Availability). We found higher errors for the VED validation compared with VSD validation in the Choco (Table 6). This was expected since error estimates from ALS-LiDAR can be considered field validation24,35,72,73, which usually results in higher errors compared with errors estimated by cross-validation with reserved sample data.

Table 6 Error-estimates of the Validation using External Data (VED) for the Choco maps calculating RAE (Relative Absolute Error), RRSE (Root Relative Squared Error), MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error).

Finally, VRC evaluates the extent to which interrelationship curves between the sample data (our five metrics of GEDI footprints that describe forest structure) are preserved in the predicted data (mapped pixel data)21. Forest structure metrics covary (see Footprint values of Fig. 4); it is therefore important evaluate whether our independent models preserve the interrelationships of forest metrics as observed by GEDI. We consider this approach a useful complement to typical procedures because it indicates to users that even though the metrics were modeled independently, the predicted values reproduce observed relationships between structure metrics that may be important for forest ecology and conservation. We randomly selected 5000 footprints and 5000 pixels in the resulting maps to compare the interrelationship curves among the metrics. These 5000 pixels of predicted data did not coincide with the locations of the footprints. We observe that interrelationship curves and their parameters between the variable pairs of the footprints were maintained in the mapped pixel data (Fig. 4 and Table 7).

Fig. 4
figure 4

Interrelationship curves of the forest structural variables for footprint and predicted data.

Table 7 Regression models for the interrelationship between pairs of forest structural variables using GEDI footprints and predicted pixel values.

Usage Notes

Since the five produced maps correspond to Essential Biodiversity Variables, at moderately high spatial resolution (25 m) and, provide coverage throughout the continental territory of Colombia, they can be used to monitor and map the state of biodiversity and other environmental variables across the country. Previous works show that similar forest structural metrics have allowed precise mapping of tree alpha diversity, carbon content, and forest degradation, among others21,39,74,75. We note the regional approach in the creation of the maps accounts for the natural environmental variation of Colombia’s forests, in addition to reducing errors, which thereby provides more representative maps compared to global estimates that calculate without regional distinctions or are developed at lower spatial resolutions. Another point to highlight is that our maps were made for the forest areas of Colombia for the year 2020, using sample data for forested areas only. Forest areas were identify based on Hansen Global Forest Change database v1.12 (2000–2024)34. By focusing on forest cover type, we sought to reduce uncertainty for forest specific applications, such as mapping of forest diversity, carbon stock estimation, or forest degradation. These five national maps of forest structural metrics were formed by mosaicking of the regional maps using the average of the values in the transition zones. Although this method is commonly used in this type of analysis, possible unrepresentative values might be found in transition zones.

We note the error estimates of our CH maps in the Amazon and Andean regions, where errors were highest, are similar to the error estimates of an existing global CH map35 while the error estimates in other regions, such as Caribbean and Orinoquía, were lower than reported in such maps (Table 5). This, combined with the reported validations, indicates our maps are appropriate for forest assessments and related applications in Colombia.