Maps of forest vertical structure for Colombia, a megadiverse country

Camilo Fagua, J.; Jantz, Patrick; Burns, Patrick; Jantz, Samuel M.; Kilbride, John B.; Goetz, Scott J.

doi:10.1038/s41597-025-06297-7

Download PDF

Data Descriptor
Open access
Published: 03 December 2025

Maps of forest vertical structure for Colombia, a megadiverse country

Scientific Data volume 13, Article number: 1 (2026) Cite this article

2365 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

Vegetation vertical structure refers to the 3D distribution of vegetation aboveground biomass. Vegetation vertical structure of tropical forests influences other ecological and environmental variables that are essential for the functioning of the ecosystems. Integrating over 5.9 million Globel Ecosystem Dynamics Investigation (GEDI) LiDAR (Light Detection and Ranging) footprints, multispectral, and synthetic aperture radar (SAR) imagery, we built five national maps at 25 m resolution of five forest structural metrics for Colombia, South America, for the year 2020. We mapped canopy height, the height of half the cumulative returned energy from GEDI (RH50), total canopy cover, foliage height diversity, and total plant area index. The resulting maps tended to have the highest errors in the Amazon and Andean regions. Total cover had the highest relative error. Interrelationship curves between forest structural metrics of GEDI footprints are maintained across mapped metrics, indicating that the predictive models preserve structural relationships observed in GEDI data. Due to the medium-high spatial resolution and national coverage of the forest structural maps presented in this work, these maps will be useful for evaluating and mapping other ecological variables and conservation priorities in Colombia.

A high-resolution canopy height model of the Earth

Article Open access 28 September 2023

Characterizing the structural complexity of the Earth’s forests with spaceborne lidar

Article Open access 16 September 2024

Spaceborne LiDAR reveals the effectiveness of European Protected Areas in conserving forest height and vertical structure

Article Open access 29 March 2023

Background & Summary

Three-dimensional vegetation structure (or vegetation vertical structure) refers to the distribution of plant biomass from the ground to the top of the canopy^1,2,3. Vegetation vertical structure is an Essential Biodiversity Variable, a set of biological variables designed to monitor biodiversity changes in response to the current environmental crisis that the planet is experiencing⁴. Vegetation vertical structure influences hydrological cycles^5,6,7, climatic regulation^8,9,10, primary productivity^11,12,13,14, nutrient fluxes^15,16, habitat quality^17,18,19,20, and biodiversity^21,22,23. The most consistent and low-cost method to study vegetation vertical structure over large extents consists of using LiDAR (Light Detection and Ranging) sensors to estimate metrics that describe 3D vegetation structure, due to LiDAR’s ability to penetrate canopies and measure the sub-canopy distribution of vegetation^{24,25,26,27,28,29}.

The NASA Global Ecosystem Dynamics Investigation (GEDI) LiDAR was designed to study vegetation vertical structure near-globally between approximately 51.6 degrees north and south latitude. It acquired data from April 2019 to March 2023, then was paused for 13 months^30,31, and began reacquiring data in April 2024. GEDI uses eight laser beams which measure forest structure within ~25 m footprints. Along track, these footprints are spaced by 60 m, with 600 m spacing between beams³². Although GEDI tends to acquire fewer high-quality footprints in the tropics due to the geometrical characteristics of its orbit and persistent cloud cover^3,30, never have there been so many detailed measurements of forest vertical structure in tropical ecosystems, the most diverse terrestrial areas on the planet³³ and where the highest rates of natural habitat loss occur³⁴.

GEDI footprints have limitations for spatially-continuous mapping because these footprint-level products represent samples of the land area, leaving most of the land surface without observations. GEDI is capable of discontinuously sampling only ~4% of the land surface every two-years^30,31. Consequently, some research groups have integrated GEDI footprints with wall-to-wall multispectral data to enable the spatial prediction of GEDI information for consistent gridded maps of vegetation structure metrics and aboveground biomass. These predictions include a canopy height map at 30 m over the GEDI domain using Landsat predictors and RH95 as an indicator of height³⁵, a global map of canopy height at 10 m using energy level RH98 and Sentinel-2 predictors³⁶, maps of mean and standard deviation of canopy height at 1 km using the energy level RH100³⁷, global maps of relative height metrics at 100 m, 200 m, 500 m, and 1000 m spatial resolutions integrating GEDI and ICESat2 (Ice, Cloud, and Land Elevation Satellite 2)³⁸, and gridded mean aboveground biomass density at 1 km based on the canopy heights generated by GEDI³⁹. This type of work modeled canopy height but did not map metrics related to the distribution of biomass between the ground and the canopy height. Burns et al.³ developed and published annual global maps from 2019 to 2023 of 26 GEDI structural metrics related to entire vertical vegetation profile at coarse spatial resolutions (1 km, 6 km, and 12 km), gridding the aggregated footprint values³. There are limited published maps of vegetation structure variables generated by GEDI predictions or interpolation that describe the entire vertical vegetation profile with a detailed resolution (<= 30 m) for large regions or countries. Those that have been published show great promise for enhancing our understanding of forest structure gradients and species habitat relationships⁴⁰.

The objective of this research is to elucidate the construction and make available five maps of metrics of forest vertical structure (Table 1) with relatively high spatial resolution (25 m) for the year 2020 in Colombia, one of the most biodiverse countries on the planet. Colombia includes vegetation types that range from dry, moist, to rain forest at altitudes from sea level to >~5000 m. The maps were constructed by developing predictions for each metric of forest vertical structure using a set of 82 remote sensing predictors (temporal metrics) that included data from multispectral (Sentinel-2) and synthetic aperature radar (SAR) (Sentinel-1 and ALOS-PALSAR) sensors. The inclusion of the two SAR sensors allowed the use of regions of the electromagnetic spectrum that have been related to leaf density (Sentinel-1 C-band)^41,42,43,44 and forest height (ALOS-PALSAR L-band)^45,46,47, increasing the number of possible predictors and potentially reducing the error of the models. Each of these five national maps of forest structure was formed by a mosaic of regional maps corresponding to the five natural regions into which Colombia is divided. We did this to reduce errors in model predictions related to contrasting environmental conditions among regions, and relative uniformity within regions.

Table 1 Description of the five GEDI metrics selected for mapping.

Full size table

Methods

Study area

Colombia’s mainland territory presents an area of ~1.142 million km² in the northwestern corner of South America. Colombia is categorized as a megadiverse country since it contains record high numbers in counts of several taxa (e.g., birds, mammals, amphibians, butterflies, freshwater fish, orchids, vascular plants), ecosystems, types of vegetation, and types of forests⁴⁸. Colombian environmental authorities divide the country into five primary natural regions, Andean, Caribbean, Amazon, Chocó (Pacific), and Orinoquía (Fig. 1). It was estimated, in 2020, that 52.1% of Colombia is covered by forests distributed as follows: 64.8% in the Amazon, 17.2% in the Andes, 7.7% in Chocó, 5.5% in the Caribbean, and 4.8% in Orinoquía⁴⁹. The Amazon is dominated by Tropical Moist-Forest, the Chocó by Tropical Rain-Forest, the Caribbean and Orinoquía by Tropical Dry Forest, and the Andes presents mosaics of Tropical Dry-Forest, Tropical Moist-Forest, and Tropical Rain-Forest, separated by small distances in some areas due to the high environmental variability generated by the branching of the Andes Mountain range into three mountain ranges (Western, Central, and Eastern Mountain ranges)⁵⁰.

GEDI response variables

We downloaded all the L2A and L2B granule data of GEDI (version 2.1) for the Colombian territory corresponding to the years 2019, 2020, and 2021 to build regional datasets of the five metrics (Table 1). High quality footprints were afterward selected using the comprehensive filtering process published by Burns et al.³. First, we selected quality shots that suitably estimated ground elevation and vegetation structure metrics. Selection criteria included minimal surface water, minimal urban cover, leaf on vegetation status, vegetation structure metrics within expected ranges, and ground elevation agreement with a reference DEM, among others. Then, we linked the filtered L2A, L2B, and L4A datasets by shot number. Finally, we used a dictionary of local outlier granules produced by University of Maryland to exclude orbit segments that were identified as local outliers, typically associated with low clouds. This quality-filtering procedure resulted in 5,720,940 high-quality footprints for the Amazon region, 5,620,920 for the Andean region, 5,584,260 for the Caribbean region, 5,630,300 for Orinoquía, and 1,105,860 for Chocó.

SAR and multispectral predictors

We constructed 76 mosaics of temporal and textural metrics using the pixel values of all imagery of Sentinel-1 (SAR data of the c-band) and Sentinel-2 (multispectral data) available between 1 January 2019 and 31 December 2021 in Google Earth Engine – GEE⁵¹. By using all imagery of these three years in all our calculations, one year before and one year after 2020, we maximized the use of data for robust estimation, i.e. reduced error and uncertainty. The temporal metrics were average (X) and standard deviation (SD)^21,41,42 while the textural metrics were sum average (SAVG) and difference variance (DVAR)⁵². These four metrics represented the central tendency (X and SAVG) and evaluated the data dispersion (SD and DVAR), generating balance among predictors. Textural metrics were calculated in neighborhoods of 3 × 3 pixels using the glcmTexture and map functions of GEE, which allowed us to estimate texture in each image of the temporal collection and later obtain an average. A description of the temporal and textural metrics is found in Table 2.

Table 2 Temporal and textural metrics.

Full size table

To develop the Sentinel-1 mosaics, the Sentinel-1 SAR GRD (C-band Synthetic Aperture Radar Ground Range Detected) product data sets⁵³ were processed by applying an angular-based radiometric slope correction using a backscatter coefficient gamma nought, in addition to the calibration and ortho-correction of these data sets⁵⁴. To develop the Sentinel-2 mosaics, we initially created image mosaics from Sentinel 2A surface reflectance products; however, because images were processed in tiles and bidirectional reflectance distribution function (BRDF) adjustments had not been applied, noticeable artifacts due to surface anisotropy and tile boundaries were present. To overcome this limitation, we applied a normalization approach based on the method outlined in Potapov et al. (2012) to Sentinel-2 Level 1 C top-of-atmosphere (TOA) imagery and constructed mosaics from these normalized images⁵⁵. The approach reduces artifacts caused by surface anisotropy and variations in the viewing and solar geometries that remain in Sentinel-2A Level-2A products, resulting in mosaics with more consistent reflectance across scenes and acquisition dates. However, because the procedure adjusts TOA reflectance rather than performing full atmospheric correction, it does not provide true surface reflectance as other physics-based methods do. This method uses MODIS BRDF-adjusted reflectance as the normalization target. Here, we used a 10-year median of MODIS land surface reflectance bands, filtered to include only good-quality observations as indicated by the QA bands. We first selected relatively clear pixels from each image by using the scene classification map from the corresponding Sentinel 2 A SR product, which is developed by ESA and effectively removes most clouds and cloud shadows from L1C (Top-of-Atmosphere) and L2A (Surface Reflectance) imagery⁵⁶. Next, the mean bias between MODIS and Sentinel-2 reflectance was calculated and used to adjust Sentinel-2 TOA reflectance, excluding pixels with large reflectance differences. To account for surface anisotropy, a linear regression between reflectance bias and distance from the center of each Sentinel-2 scene was applied to each spectral band independently. Table 3 shows the corresponding bands between the Sentinel-2 MSI and MODIS sensors used in the normalization process; however, there are no direct MODIS equivalents for the Sentinel-2 red edge bands. We generated synthetic MODIS red edge bands by modeling Sentinel-2 red edge bands as linear combinations of the MODIS red and near-infrared (NIR) bands. To do this, we convolved known surface reflectance spectra from the ECOSTRESS spectral library^57,58 with the spectral response functions (SRFs) of the Sentinel-2 red edge bands and the MODIS red and NIR bands (SRFs obtained from the Pyspectral Python library⁵⁹). The simulated reflectance values from the Sentinel-2 red edge bands served as dependent variables, while MODIS red and NIR band reflectances were used as independent variables.

Table 3 Sentinel 2 MSI (The MultiSpectral Imager) bands with analogous bands from the MODIS platform.

Full size table

We also constructed six mosaics for ALOS-PALSAR data applying a variation to the previous methodology described for Sentinel. We first obtained two metrics for the two polarizations of ALOS-2-PALSAR data, the average of years 2019, 2020, and 2021 using the GEE product 25 m PALSAR/PALSAR-2 mosaic⁴⁷, since this is a one-date annual product created by mosaicking imagery from PALSAR/PALSAR-2. We then obtained four textural metrics over the previous annual mean, SAVG and DVAR for each polarization, estimated in neighborhoods of 3 × 3 pixels. A summary of each backscatter coefficient, band, and index used to build the 82 mosaics for the Sentinel-1, Sentinel-2, and ALOS-2-PALSAR data is shown in Table 4 and scripts used to build these mosaics are available in the section Code availability.

Table 4 Summary of the multispectral and SAR data used to build temporal (average-X and standard deviations-SD) and textural (sum average-SAVG and difference variance-DVAR) metrics.

Full size table

Prediction and mapping

To construct maps of the five GEDI metrics that describe the structure of Colombian forests at the year 2020 (Table 1), we first built maps for each natural region for each GEDI metric and then mosaiced these regional maps to create final national maps. We used this mapping approach because each natural region tends to have some similarity in forest types and environmental conditions (e.g., climate, topography, altitude) which allowed us to control sources of error in spatial modeling^60,61. Other approaches typically applied in remote sensing modeling of large areas, such as mapping throughout the entire study area⁶² or mapping across regular grids that cover the study area³⁵, could combine different forest types and environmental conditions, increasing modeling errors, given the highly heterogeneous characteristics of the Colombian territory.

Each regional map was constructed using the numerical values of each GEDI metric as the response variable, the associated values of the temporal and textural SAR and multispectral metrics as predictors, and the Random Forest algorithm (RF)^63,64. Although in most regions we identified more than 5 million high-quality GEDI footprints we randomly subsampled 1,200,000 of these footprints for each regional model. This is the approximate maximum number of observations that our high-performance computing system could process for RF modeling with 82 predictors. The Choco region did not require any sub-sampling as we identified 1,105,860 high-quality footprints there. We then tuned RF hyperparameters, including the number of variables randomly sampled as candidates at each split and minimum size of terminal nodes. Once the best regional model was identified, the regional map for each GEDI metric was built based on the 82 mosaics of the SAR and multispectral predictors mentioned previously. We used the R packages “randomForest”⁶⁴ and “Caret”⁶⁵ for the RF modeling, “Boruta”⁶⁶ to apply the Boruta algorithm for feature selection, and “raster” for mapping⁶⁷.

Data Records

Maps of Colombian forest vertical structure for the year 2020 (Fig. 2) are available to download in GeoTiff format in Zenodo⁶⁸: https://zenodo.org/records/15493516. These maps are also accessible in Google Earth Engine in the links below, which are organized corresponding to a tile shapefile, where each map is split into eleven tiles, with tile numbering starting at one and running from left to right, top to bottom, starting at the top left. The shapefile consists of four rows and three columns but note that the top row has only two tiles as the upper right tile does not contain any forest pixels in Colombia.

Tile shapefile

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/COLOMBIA_FOREST_TILES

CH (canopy height)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/ch/CH_COLOMBIA_FOREST_11

COVER (canopy cover)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/cover/COVER_COLOMBIA_FOREST_11

FHD_PAI (foliage height diversity calculated from plant area index)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/fhd_pai/FHD_PAI_COLOMBIA_FOREST_11

PAI (plant area index)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/pai/PAI_COLOMBIA_FOREST_11

RH50 (height at which 50% of lidar energy is returned)

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_1

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_2

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_3

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_4

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_5

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_6

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_7

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_8

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_9

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_10

https://code.earthengine.google.com/?asset=projects/ee-jantzenator/assets/colombia_forest_structure/rh50/RH50_COLOMBIA_FOREST_11

Data Overview

The five national Maps of forest vertical structure for Colombia presented in this publication correspond to the 2020 year (Fig. 2), have the coordinate reference system EPSG:4326, spatial resolution of 25 m, and the data type Float32. Forest areas were identify by masking out all areas with <70% tree cover based on Hansen Global Forest Change database v1.12 (2000–2024)³⁴. The map of Canopy height (CH) is in meters, the map of total cover (COVER) in percentage of cover, the map of Foliage Height Diversity (FHD) in the FHD index, the map of Total Plant Area Index (PAI) in the PAI index, and the map of height of half the accumulated energy (RH50) is in meters. The details of how map units were calculated are described in Table 1.

Technical Validation

We implemented three types of validation: 1) cross validation using sample data (VSD), 2) validation using external data (VED), and 3) Validation testing the interrelationship curves of the forest structural variables between footprint data vs. predicted data (VRC). VSD refers to error estimates calculating two metrics that allow comparisons between different units, RAE (Relative Absolute Error) and RRSE (Root Relative Squared Error), and two error metrics for absolute data to recognize the magnitude of the error, MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error). These four-error metrics were calculated by sampling data partitions, using the sample data (GEDI footprints) where 70% of the footprints were used for building the maps and 30% were used for testing the resulting maps. These validations using sample data were estimated in each regional map for each of the five-forest structural metrics applying resampling of 5000 on the testing data to estimate value-ranges. We found error differences among the natural regions; the Amazon and Andean regions tended to present the highest RAE and RRSE values (Fig. 3) with maximum RMSE magnitudes of ~5.8 m for CH, ~0.25 for COVER, ~0.42 for FHD, 1.69 of m²/m² for PAI, and ~5.4 m for RH50 (Table 5).

Table 5 Error-estimates of the Validation using Sample Data (VSD) for the regional maps of forest structural metrics using MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error).

Full size table

VED refers to error estimates using GEDI footprints simulated using 578 km² of discrete ALS-LiDAR across the Chocó natural region. A description of this data set can be found at Fagua et al.²¹. We followed the approach described by Hancock et al.⁶⁹ to simulate GEDI footprints using the ALS-LiDAR data and to estimate values of CH, RH50, FHD, and COVER. We note that PAI could not be simulated due to the lack of some parameters necessary for its estimation. The process to simulate the GEDI footprints first consisted of noise removal using the Statistical Outlier Removal method of the R package lasR⁷⁰. Next, we established a grid with the same resolution as the vertical structure maps. We later identified the centroid of each raster cell that was contained within one of the LiDAR tiles to derive a simulated GEDI footprint and its corresponding CH, RH50, FHD, and COVER values, using the Rgedisimulator tool of the R package rGEDI⁷¹. We finally estimated the same error metrics described above, RAE, RRSE, MAE and RMSE, by comparing 5000 resamples of CH, RH50, FHD, and COVER simulated-values with the corresponding values from the resulting maps in the Choco. Simulated GEDI footprints were selected randomly using a spatial filter of 200 m. Parameters and scripts of GEDI simulation using ALS-LiDAR can be found at the github site for this manuscript (see Code Availability). We found higher errors for the VED validation compared with VSD validation in the Choco (Table 6). This was expected since error estimates from ALS-LiDAR can be considered field validation^24,35,72,73, which usually results in higher errors compared with errors estimated by cross-validation with reserved sample data.

Table 6 Error-estimates of the Validation using External Data (VED) for the Choco maps calculating RAE (Relative Absolute Error), RRSE (Root Relative Squared Error), MAE (Mean Absolute Error) and RMSE (Root Mean Squared Error).

Full size table

Finally, VRC evaluates the extent to which interrelationship curves between the sample data (our five metrics of GEDI footprints that describe forest structure) are preserved in the predicted data (mapped pixel data)²¹. Forest structure metrics covary (see Footprint values of Fig. 4); it is therefore important evaluate whether our independent models preserve the interrelationships of forest metrics as observed by GEDI. We consider this approach a useful complement to typical procedures because it indicates to users that even though the metrics were modeled independently, the predicted values reproduce observed relationships between structure metrics that may be important for forest ecology and conservation. We randomly selected 5000 footprints and 5000 pixels in the resulting maps to compare the interrelationship curves among the metrics. These 5000 pixels of predicted data did not coincide with the locations of the footprints. We observe that interrelationship curves and their parameters between the variable pairs of the footprints were maintained in the mapped pixel data (Fig. 4 and Table 7).

Table 7 Regression models for the interrelationship between pairs of forest structural variables using GEDI footprints and predicted pixel values.

Full size table

Usage Notes

Since the five produced maps correspond to Essential Biodiversity Variables, at moderately high spatial resolution (25 m) and, provide coverage throughout the continental territory of Colombia, they can be used to monitor and map the state of biodiversity and other environmental variables across the country. Previous works show that similar forest structural metrics have allowed precise mapping of tree alpha diversity, carbon content, and forest degradation, among others^21,39,74,75. We note the regional approach in the creation of the maps accounts for the natural environmental variation of Colombia’s forests, in addition to reducing errors, which thereby provides more representative maps compared to global estimates that calculate without regional distinctions or are developed at lower spatial resolutions. Another point to highlight is that our maps were made for the forest areas of Colombia for the year 2020, using sample data for forested areas only. Forest areas were identify based on Hansen Global Forest Change database v1.12 (2000–2024)³⁴. By focusing on forest cover type, we sought to reduce uncertainty for forest specific applications, such as mapping of forest diversity, carbon stock estimation, or forest degradation. These five national maps of forest structural metrics were formed by mosaicking of the regional maps using the average of the values in the transition zones. Although this method is commonly used in this type of analysis, possible unrepresentative values might be found in transition zones.

We note the error estimates of our CH maps in the Amazon and Andean regions, where errors were highest, are similar to the error estimates of an existing global CH map³⁵ while the error estimates in other regions, such as Caribbean and Orinoquía, were lower than reported in such maps (Table 5). This, combined with the reported validations, indicates our maps are appropriate for forest assessments and related applications in Colombia.

Data availability

Resulting maps of this research are publicly accessible on Zenodo: https://zenodo.org/records/15493516.

Code availability

The code is publicly accessible on Github⁷⁶: https://github.com/CamiloFaguaUNAL/Forest_Structure_Colombia.

References

McElhinny, C., Gibbons, P., Brack, C. & Bauhus, J. Forest and woodland stand structural complexity: Its definition and measurement. For. Ecol. Manage. 218, 1–24 (2005).
Google Scholar
Hall, F. G. et al. Characterizing 3D vegetation structure from space: Mission requirements. Remote Sens. Environ. 115, 2753–2775 (2011).
ADS Google Scholar
Burns, P., Hakkenberg, C. R. & Goetz, S. J. Multi-resolution gridded maps of vegetation structure from GEDI. Sci. Data 11, 881 (2024).
CAS PubMed PubMed Central Google Scholar
Pereira, H. M. et al. Essential Biodiversity Variables. Science (80-.) 339, 277–278 (2013).
ADS CAS Google Scholar
Pérez-Suárez, M., Arredondo-Moreno, J. T., Huber-Sannwald, E. & Serna-Pérez, A. Forest structure, species traits and rain characteristics influences on horizontal and vertical rainfall partitioning in a semiarid pine- oak forest from Central Mexico. Ecohydrology 7, 532–543 (2014).
Google Scholar
Aron, P. G., Poulsen, C. J., Fiorella, R. P. & Matheny, A. M. Stable Water Isotopes Reveal Effects of Intermediate Disturbance and Canopy Structure on Forest Water Cycling. J. Geophys. Res. 124, 2958–2975 (2019).
CAS Google Scholar
Sun, J. et al. Effects of forest structure on hydrological processes in China. J. Hydrol. 561, 187–199 (2018).
ADS Google Scholar
Thom, D. & Keeton, W. S. Stand structure drives disparities in carbon storage in northern hardwood-conifer forests. For. Ecol. Manage. 442, 10–20 (2019).
Google Scholar
Foley, J. A. et al. Amazonia revealed: forest degradation and loss of ecosystem goods and services in the Amazon Basin. Front. Ecol. Environ. 5, 25–32 (2007).
Google Scholar
Frey, S. J. K. et al. Spatial models reveal the microclimatic buffering capacity of old-growth forests. Sci. Adv. 2, e1501392 (2016).
ADS PubMed PubMed Central Google Scholar
Gough, C. M., Atkins, J. W., Fahey, R. T. & Hardiman, B. S. High rates of primary production in structurally complex forests. Ecology 100, e02864 (2019).
PubMed Google Scholar
Clark, D. B., Olivas, P. C., Oberbauer, S. F., Clark, D. A. & Ryan, M. G. First direct landscape-scale measurement of tropical rain forest Leaf Area Index, a key driver of global primary productivity. Ecol. Lett. 11, 163–172 (2008).
PubMed Google Scholar
Coops, N. C., Hermosilla, T., Hilker, T. & Black, T. A. Linking stand architecture with canopy reflectance to estimate vertical patterns of light-use efficiency. Remote Sens. Environ. 194, 322–330 (2017).
ADS Google Scholar
Liu, X. et al. Enhancing ecosystem productivity and stability with increasing canopy structural complexity in global forests. Sci. Adv. 10, eadl1947 (2024).
ADS PubMed PubMed Central Google Scholar
Asner, G. P. et al. High-resolution mapping of forest carbon stocks in the Colombian Amazon. BIOGEOSCIENCES 9, 2683–2696 (2012).
ADS CAS Google Scholar
Meyer, V. et al. Forest degradation and biomass loss along the Choco region of Colombia. Carbon Balance Manag. 14 (2019).
Sanchez-Daz, B. et al. Modeling of the vertical structure of shade trees in cacao agroforestry systems. Theor. Appl. Ecol. 28–37, https://doi.org/10.25750/1995-4301-2023-1-028-037 (2023).
Basham, E. W. et al. Large, old trees define the vertical, horizontal, and seasonal distributions of a poison frog. Oecologia 199, 257–269 (2022).
ADS PubMed Google Scholar
Li, S., Hou, Z. Y., Ge, J. P. & Wang, T. M. Assessing the effects of large herbivores on the three-dimensional structure of temperate forests using terrestrial laser scanning. For. Ecol. Manage. 507 (2022).
Coops, N. C. et al. A forest structure habitat index based on airborne laser scanning data. Ecol. Indic. 67, 346–357 (2016).
Google Scholar
Fagua, J. C. et al. Mapping tree diversity in the tropical forest region of Chocó-Colombia. Environ. Res. Lett. 16, 54024 (2021).
Google Scholar
Marselis, S. M. et al. Evaluating the potential of full-waveform lidar for mapping pan-tropical tree species richness. Glob. Ecol. Biogeogr. n/a (2020).
Feng, G., Zhang, J., Girardello, M., Pellissier, V. & Svenning, J. C. Forest canopy height co-determines taxonomic and functional richness, but not functional dispersion of mammals and birds globally. Glob. Ecol. Biogeogr. 29, 1350–1359 (2020).
Google Scholar
Drake, J. B. et al. Estimation of tropical forest structural characteristics using large-footprint lidar. Remote Sens. Environ. 79, 305–319 (2002).
ADS Google Scholar
Dubayah, R. O. et al. Estimation of tropical forest height and biomass dynamics using lidar remote sensing at La Selva, Costa Rica. J. Geophys. Res. 115 (2010).
Hancock, S., Disney, M., Muller, J.-P., Lewis, P. & Foster, M. A threshold insensitive method for locating the forest canopy top with waveform lidar. Remote Sens. Environ. 115, 3286–3297 (2011).
ADS Google Scholar
Asner, G. P. et al. A universal airborne LiDAR approach for tropical forest carbon mapping. Oecologia 168, 1147–1160 (2012).
ADS PubMed Google Scholar
Coops, N. C. et al. Modelling lidar-derived estimates of forest attributes over space and time: A review of approaches and future trends. Remote Sens. Environ. 260, 112477 (2021).
Google Scholar
Tompalski, P. et al. Estimating Changes in Forest Attributes and Enhancing Growth Projections: a Review of Existing Approaches and Future Directions Using Airborne 3D Point Cloud Data. Curr. For. Reports 7, 1–24 (2021).
Google Scholar
Dubayah, R. et al. The Global Ecosystem Dynamics Investigation: High-resolution laser ranging of the Earth’s forests and topography. Sci. Remote Sens. 1, 100002 (2020).
Google Scholar
GEDI-team. GEDI Ecosystem Lidar. Available at: https://gedi.umd.edu/ (2024).
Eegholm, B. et al. Global Ecosystem Dynamics Investigation (GEDI) instrument alignment and test. in Proc.SPIE 11103, 1110308 (2019).
Primack, R. B. & Corlett, R. T. Tropical Rain Forests: An Ecological and Biogeographical Comparison. (Blackwell Publishing, 2009).
Hansen, M. C. et al. High-Resolution Global Maps of 21st-Century Forest Cover Change. Science (80-.) 342, 850–853 (2013).
ADS CAS Google Scholar
Potapov, P. et al. Mapping global forest canopy height through integration of GEDI and Landsat data. Remote Sens. Environ. 253, 112165 (2021).
Google Scholar
Lang, N., Jetz, W., Schindler, K. & Wegner, J. D. A high-resolution canopy height model of the Earth, https://doi.org/10.48550/ARXIV.2204.08322 (2022).
Dubayah, R. O. et al. GEDI L3 Gridded Land Surface Metrics, Version 2, https://doi.org/10.3334/ORNLDAAC/1952 (2021).
Saatchi, S. S. & Favrichon, S. Global Vegetation Height Metrics from GEDI and ICESat2. https://doi.org/10.3334/ORNLDAAC/2294 (2024).
Dubayah, R. et al. GEDI launches a new era of biomass inference from space. Environ. Res. Lett. 17, 95001 (2022).
Google Scholar
Vogeler, J. C. et al. Evaluating GEDI data fusions for continuous characterizations of forest wildlife habitat. Front. Remote Sens. 4 (2023).
Fagua, J. C. & Jantz, P. Mapping Tropical Dry Forest Gradients in an Andean Region with High Environmental Variability. Ecol. Indic. 168, 112744 (2024).
Google Scholar
Fagua, J. C., Rodríguez-Buriticá, S. & Jantz, P. Advancing High-Resolution Land Cover Mapping in Colombia: The Importance of a Locally Appropriate Legend. Remote Sensing 15 (2023).
Stendardi, L. et al. Exploiting Time Series of Sentinel-1 and Sentinel-2 Imagery to Detect Meadow Phenology in Mountain Regions. Remote Sensing 11 (2019).
Vreugdenhil, M. et al. Sensitivity of Sentinel-1 Backscatter to Vegetation Dynamics: An Austrian Case Study. Remote Sensing 10 (2018).
Qin, Y. et al. Annual dynamics of forest areas in South America during 2007–2010 at 50-m spatial resolution. Remote Sens. Environ. 201, 73–87 (2017).
ADS Google Scholar
Fagua, J. C., Jantz, P., Rodriguez-Buritica, S., Laura, D. & Goetz, S. J. Integrating LiDAR, Multispectral and SAR Data to Estimate and Map Canopy Height in Tropical Forests. Remote Sens. 11(1), 20 (2019).
Google Scholar
Shimada, M. et al. New global forest/non-forest maps from ALOS PALSAR data (2007–2010). Remote Sens. Environ. 155, 13–31 (2014).
ADS Google Scholar
IDEAM, I. de H. M. y E. A., INVERMAR, I. de I. M. y C. J. B. V. de A., IIAP, I. de I. A. del P. & IAvH, I. H. Informe del Estado del Ambiente y de los Recursos Naturales Renovables. (IDEAM, 2016).
IDEAM, I. de H. M. y E. A. Resultados del monitoreo deforestación año 2020-2021. (2022).
Etter, A. et al. Ecosistemas colombianos: amenazas y riesgos. (Pontificia Universidad Javeriana, 2020).
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).
ADS Google Scholar
Haralick, R. M., Shanmugam, K. & Dinstein, I. Textural Features for Image Classification. IEEE Trans. Syst. Man. Cybern. SMC-3, 610–621 (1973).
ADS Google Scholar
ESA, E. S. A. Sentinel-1 SAR GRD: C-band Synthetic Aperture Radar Ground Range Detected, log scaling. Earth Engine Data Catalog Available at: https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD (2022).
Vollrath, A., Mullissa, A. & Reiche, J. Angular-Based Radiometric Slope Correction for Sentinel-1 on Google Earth Engine. Remote Sens. 12 (2020).
Potapov, P. V. et al. Quantifying forest cover loss in Democratic Republic of the Congo, 2000-2010, with Landsat ETM plus data. Remote Sens. Environ. 122, 106–116 (2012).
ADS Google Scholar
Pasquarella, V. J., Brown, C. F., Czerwinski, W. & Rucklidge, W. J. Comprehensive quality assessment of optical satellite imagery using weakly supervised video learning. in 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW) 2125–2135, https://doi.org/10.1109/CVPRW59228.2023.00206 (2023).
Baldridge, A. M., Hook, S. J., Grove, C. I. & Rivera, G. The ASTER spectral library version 2.0. Remote Sens. Environ. 113, 711–715 (2009).
ADS Google Scholar
Meerdink, S. K., Hook, S. J., Roberts, D. A. & Abbott, E. A. The ECOSTRESS spectral library version 1.0. Remote Sens. Environ. 230, 111196 (2019).
Google Scholar
Dybbroe, A. et al. Satellite Sensor Relative Spectral Response data, https://doi.org/10.5281/zenodo.14008148 (2024).
Wang, J. et al. Enhancing Land Cover Mapping in Mixed Vegetation Regions Using Remote Sensing Evapotranspiration. IEEE Trans. Geosci. Remote Sens. 62 (2024).
Tsendbazar, N. et al. Towards operational validation of annual global land cover maps. Remote Sens. Environ. 266, 112686 (2021).
Google Scholar
Venter, Z. S. & Sydenham, M. A. K. Continental-Scale Land Cover Mapping at 10 m Resolution Over Europe (ELC10). Remote Sens. 13 (2021).
Breiman, L. Random forests. Mach. Learn. 45, 5–32 (2001).
Google Scholar
Liaw, A. Package ‘randomForest’: Breiman and Cutler’s Random Forests for Classification and Regression. (2018).
Kuhn, M. et al. Package ‘caret’:Classification and Regression Training. Available at: https://github.com/topepo/caret/ (2022).
Kursa, M. B. & Rudnicki, W. R. Feature Selection with the Boruta Package. J. Stat. Softw. 36, 1–13 (2010).
Google Scholar
Hijmans, R. et al. Package ‘raster’. (r-project.org, 2016).
Fagua, J. C. & Jantz, P. Maps of forest vertical structure for Colombia, a megadiverse country. Zenodo https://doi.org/10.5281/zenodo.15493516 (2025).
Hancock, S. et al. The GEDI Simulator: A Large-Footprint Waveform Lidar Simulator for Calibration and Validation of Spaceborne Missions. EARTH Sp. Sci. 6, 294–310 (2019).
ADS Google Scholar
Russel, J. lasR: Fast and Pipeable Airborne LiDAR Data Tools. (2025).
Silva, C. A. rGEDI:NASA’s Global Ecosystem Dynamics Investigation (GEDI) Data Visualization and Processing. (r- project.org, 2021).
Mascaro, J. et al. Controls over aboveground forest carbon density on Barro Colorado Island, Panama. BIOGEOSCIENCES 8, 1615–1629 (2011).
ADS Google Scholar
Meyer, V. et al. Detecting tropical forest biomass dynamics from repeated airborne lidar measurements. BIOGEOSCIENCES 10, 5421–5438 (2013).
ADS Google Scholar
Torresani, M. et al. LiDAR GEDI derived tree canopy height heterogeneity reveals patterns of biodiversity in forest ecosystems. Ecol. Inform. 76, 102082 (2023).
PubMed PubMed Central Google Scholar
Liang, M., Duncanson, L., Silva, J. A. & Sedano, F. Quantifying aboveground biomass dynamics from charcoal degradation in Mozambique using GEDI Lidar and Landsat. Remote Sens. Environ. 284, 113367 (2023).
Google Scholar
Fagua, J. C. Code for generating maps (rasters at 25m of spatial resolution) of forest vertical structure for Colombia (South America) from GEDI spaceborne LiDAR. GitHub Available at: https://github.com/CamiloFaguaUNAL/Forest_Structure_Colombia (2025).

Download references

Acknowledgements

We acknowledge Departamento de Biología of Universidad Nacional de Colombia (Sede Bogota D.C) and the School of Informatics, Computing, and Cyber Systems at Northern Arizona University for providing access to high performance computing resources. Support for J.C.F. was provided by Universidad Nacional de Colombia—Sede Bogotá; Proyecto HERMES 66218 and Semillero de investigación 2971. Support for P.J. was provided by NASA Group on Earth Observations Work Program, Grant #80NSSC18K0338.

Author information

Authors and Affiliations

Grupo de Biodiversidad, Biotecnología y Conservación de Ecosistemas, Departamento de Biología, Facultad de Ciencias, Universidad Nacional de Colombia—Sede Bogotá, Bogotá, DC, 111321, Colombia
J. Camilo Fagua
Global Earth Observation & Dynamics of Ecosystems Lab (GEODE), School of Informatics, Computing, and Cyber Systems (SICCS), Northern Arizona University, Flagstaff, AZ, 86011, USA
Patrick Jantz, Patrick Burns & Scott J. Goetz
National Institute for Modeling Biological Systems (NIMBioS), University of Tennessee, Knoxville, TN, 37996, USA
Samuel M. Jantz
Renoster Systems Inc, 21750 Hardy Oak Blvd Ste 104, PMB 37519, San Antonio, TX, 78258-4946, USA
John B. Kilbride

Authors

J. Camilo Fagua
View author publications
Search author on:PubMed Google Scholar
Patrick Jantz
View author publications
Search author on:PubMed Google Scholar
Patrick Burns
View author publications
Search author on:PubMed Google Scholar
Samuel M. Jantz
View author publications
Search author on:PubMed Google Scholar
John B. Kilbride
View author publications
Search author on:PubMed Google Scholar
Scott J. Goetz
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization, J.C.F., P.J. and S.J.G.; Methodology, J.C.F., P.J., P.B., S.M.J., J.B.J.; Formal analysis, J.C.F. and P.J.; Investigation, J.C.F., P.J. and S.J.G.; Primary writing review and editing, J.C.F. and P.J. All authors have reviewed, edited, and agreed to the submitted version of the manuscript.

Corresponding author

Correspondence to J. Camilo Fagua.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Camilo Fagua, J., Jantz, P., Burns, P. et al. Maps of forest vertical structure for Colombia, a megadiverse country. Sci Data 13, 1 (2026). https://doi.org/10.1038/s41597-025-06297-7

Download citation

Received: 28 May 2025
Accepted: 10 November 2025
Published: 03 December 2025
Version of record: 03 January 2026
DOI: https://doi.org/10.1038/s41597-025-06297-7