Introduction

Biodiversity is fundamental in maintaining ecosystem functions, supporting food security, regulating climate, and safeguarding human well-being1,2. However, the accelerating anthropogenic pressures, such as habitat destruction, climate instability, and the intensification of land use, which continue to erode biodiversity at an alarming pace3,4. Crucially, the speed and scale of these human-driven impacts now outstrip the pace at which science can document, understand, and respond to biodiversity loss5. As a result, species vanish undocumented, ecological interactions are disrupted, and opportunities to understand and conserve unique biotas are irreversibly lost6. Large-scale, nationally coordinated efforts to sample and monitor biodiversity and their habitats are urgently needed to close this widening gap7,8,9. Such initiatives are key to advancing conservation strategies and addressing critical knowledge gaps in underrepresented non-forest ecosystems like the South American savannas.

South America’s Cerrado represents the most biodiverse flora savanna on Earth10 and encompasses roughly 2 million km² in central Brazil, in addition to small and discontinuous extensions in eastern Paraguay (over 8000 km²) and Bolivia (about 30000 km²)11. In Brazil, the Cerrado also serves as the nation’s “Cradle of Waters”, supplying runoff to three of South America’s major river basins12. Despite harbouring an estimated 10500 vascular plant species, including 1605 tree taxa, 30 species ( < 2%) account for half of all individual trees13,14. The Brazilian Cerrado, has already lost nearly 50% of its native vegetation to agricultural conversion and other land-use changes, leading the projections that nearly 500 species may lose over 80% of their habitat, crossing critical thresholds for extinction risk15. Conservation efforts remain insufficient, strictly protected areas cover only 8% of the original extent16, far below the thresholds needed to secure species persistence in this highest biodiverse savanna. In this context, targeted biodiversity-mapping efforts can pinpoint high-priority regions where expanding protection would most effectively strengthen the Brazilian Cerrado overall conservation capacity.

Tree α-diversity, here quantified by Fisher’s α from standardized inventory plots, denotes the local richness of tree species and arises from the combined influence of climate, soil properties, and disturbance regimes17,18. In the Cerrado, these drivers interact with the fire regime to generate a fine-scale mosaic of vegetation type, from open grasslands to dense woodlands and riparian forest, each supporting distinct assemblages of tree species along a diversity gradient19. In the case of the Brazilian Cerrado, it comprises seven biogeographic districts that differ markedly in climatic conditions, habitat loss, and species composition20,21. Generally, tree diversity and richness increase from nutrient-poor, frequently burned savannas toward more fertile, less fire-prone woodlands and gallery forests21,22. The variation in these ecological factors described above in the Cerrado and their effect on the floristic composition of vegetation types has been studied primarily at small spatial scales, mostly at individual sites14,23. Other studies by Oliveira-Filho & Ratter19 and Ribeiro & Walter19 scaled these local studies up to the entire Brazilian Cerrado, but using a floristic composition approach.

Fig. 1: Distribution and sampling structure of vegetation plots in the Brazilian Cerrado.
Fig. 1: Distribution and sampling structure of vegetation plots in the Brazilian Cerrado.
Full size image

A Spatial distribution of vegetation plots across the Brazilian Cerrado. Green points represent plots located in the Forest Formation, and yellow points represent plots in savanna formation. The inset map at the top-left highlights a zoomed area with high sampling density. B Number of sampled plots in each vegetation type, savannas and forest formations. Total number of plots: 556 forest, 1247 savanna. C Frequency distribution of the number of individuals sampled per plot.

In complex tropical biomes, robust, high-volume, and spatially well-distributed sampling is essential for elucidating diversity patterns and ecosystem geography18,24. Despite numerous botanical inventories in the Cerrado, critical gaps persist at the large scale13,25: What is the total tree species richness, and how is diversity spatially structured? To address these questions, we compiled a unique, biome-wide dataset of 1803 botanically inventoried vegetation plots from national efforts, enabling the first spatially explicit assessment of tree α-diversity and alongside species richness across the entire Brazilian Cerrado (Fig. 1). We generated maps of tree species richness at 0.1° resolution ( ~ 11 km), using only plot coordinates stratified by major vegetation types (Savannas and Forest) and then summarised by floristic biogeographic districts from Brazilian Cerrado20. We tested potential drivers for the observed patterns in tree species richness in savannas and forests, including climatic variables, soil variables, anthropogenic impact, and fire frequency. Our model, based on the first large-scale data of vegetation plots spanning the entire Brazilian Cerrado, provides the most comprehensive and fine-scale portrait of tree species richness in this biodiversity hotspot.

Results

Tree species richness and alpha diversity in sampled plots

Tree density (N ha–1) exhibited high variability, averaging 266.7 ± 267.5 (mean ± sd) individuals per hectare. Plots in forest formation reached a mean density of 334.0 ± 319.0, whereas those in savanna formation averaged 237.0 ± 235.0 individuals per hectare (Fig. 2A, B). Tree alpha-diversity, measured as Fisher’s alpha, averaged 9.3 ± 6.6 across all plots, with a moderately right-skewed distribution (Fig. 2C). When analysed by vegetation formation, forest formation showed higher mean values (10.8 ± 7.8) than savanna formation (8.6 ± 5.8) (Fig. 2D). Tree species richness per hectare (Sha) followed a similar pattern, with a mean of 28.1 ± 17.8 species per hectare across all plots. Richness was greater in forest formation (33.1 ± 20.1) compared to savanna formation (25.9 ± 16.2).

Fig. 2: Patterns of tree community structure across the Brazilian Cerrado vegetation types.
Fig. 2: Patterns of tree community structure across the Brazilian Cerrado vegetation types.
Full size image

A Tree density (Nha, individuals per hectare), B tree species richness (number of species per hectare), C tree alpha-diversity measured as Fisher’s alpha index, and D tree density per hectare by vegetation formations and physiognomies (individuals per hectare); E tree species richness per hectare by vegetation formations and physiognomies (F) tree alpha-diversity measured as Fisher’s alpha index by vegetation formations and physiognomies. Colours indicating vegetation formation: green for forest formations and yellow for savanna formations. Black dashed vertical lines represent the overall mean value for each variable.

When disaggregated by Cerrado physiognomies, clear differences in tree alpha-diversity, richness, and density emerged across physiognomies (Fig. 2D–F). Dry Forest and Woodland savannas exhibited the highest tree alpha-diversity, with a mean alpha-diversity of 10.9 ± 7.8, tree species richness (Sha) of 33.1 ± 20.2, and average tree density of 319.0 ± 255.0 individuals per hectare. Riparian Forests also supported high diversity levels, with the mean tree alpha-diversity of 10.1 ± 7.1, species richness (Sha) of 32.8 ± 19.7, and the highest density recorded across groups (491.0 ± 689.0 ind./ha), albeit with substantial variability. The cerrado strict sensu, a typical savanna, in contrast, presented lower diversity metrics: alpha-diversity averaged 8.6 ± 5.8, species richness 25.9 ± 16.3, and density 238.0 ± 235.0. Palm-dominated vegetation showed the lowest value in tree species richness (17.6 ± 7.0) and tree density (77.5 ± 39.7), yet maintained a moderate alpha-diversity of 9.0 ± 5.5.

Spatial patterns of tree species richness

We used tree species richness per hectare as a summary diversity metric due to its high correlation with other diversity indices (Supplementary Fig. 1). Tree species richness, defined as the number of species per hectare, ranged from approximately 10 to 90 across the Cerrado region, with an average of 36.3 species/ha. Predictions showed moderate skewness, with higher frequencies of low to intermediate richness values (Fig. 3C). Vegetation formation explained a substantial portion of this variation, with forest formations presenting higher predicted richness (mean = 33.1 species/ha) compared to savanna formations (mean = 25.9 species/ha) (Fig. 3C).

Fig. 3: Spatial patterns and model performance for predicted tree species richness in the Brazilian.
Fig. 3: Spatial patterns and model performance for predicted tree species richness in the Brazilian.
Full size image

A Spatial distribution of predicted tree species richness (species per hectare), interpolated using a LOESS model across the Brazilian Cerrado. B Relationship between observed and predicted values of Tree species richness per hectare (Sha), coloured by vegetation formation. The dashed line indicates the 1:1 relationship. C Frequency distribution of predicted tree species richness by vegetation formation. Dashed vertical lines represent the mean richness for each formation (green = forest, yellow = savanna). D Tree species diversity per hectare (Sha) across biogeographic districts of the Brazilian Cerrado. Boxplots represent the median, interquartile range, and dispersion of values within each district: Central-west (CW), Central (CE), External North, North-east (NE), North-west (NW), South-east (SE), South-west (SW) e South (S).

The spatial model based on a LOESS smoother (span = 0.2) captured clear spatial gradients in predicted tree species richness across the Cerrado (Fig. 3A). Areas of highest tree species richness were concentrated in the central and west portions of the Cerrado. Tree species richness varied markedly across biogeographic districts in the Cerrado (Fig. 3D). The Central-west (CW) district exhibited the highest tree species richness, followed by the South-west (SW), South-east (SE), and Central (CE) districts. In contrast, the External North (ExN) and North-east (NE) districts had the lowest values, indicating reduced diversity in these northern savanna-dominated regions (Fig. 4D).

Fig. 4: Environmental drivers of tree species richness across vegetation formations in the Brazilian Cerrado.
Fig. 4: Environmental drivers of tree species richness across vegetation formations in the Brazilian Cerrado.
Full size image

A Mean annual precipitation (1986–2020), B Fire frequency (annual fire events, 1985–2020), C annual temperature range (1985–2020), D soil bulk density (0–30 cm), E soil aluminium concentration (cmol/kg), F soil clay content (%), and G latitude (decimal degrees). Each point represents a classified as forest (green) or savanna (yellow) formations. Red lines represent quantile regression models for the 50th percentile (solid) and the 95th percentile (dashed), capturing both median and upper-bound trends in species richness.

Model performance was reasonable (R² = 0.47), with a strong positive relationship between predicted and observed tree species richness per hectare (Fig. 3B). Residuals from the LOESS model were weakly autocorrelated (Moran’s I = 0.002, p > 0.05), indicating that most of the spatial signal was effectively captured by the model. The residual error map (Supplementary Fig. 2a) shows the spatial distribution of model residuals, calculated as the difference between observed and predicted values of Tree species Richness (Sha). Residuals from the combined spatial model had a mean close to zero and did not differ much between vegetation formations (Supplementary Fig. 2b).

The standard error for Fisher’s alpha was mostly low (Supplementary Fig. 3a and Supplementary Fig. 3c) and consistent across vegetation formation and biogeographic districts (Supplementary Fig. 3b and Supplementary Fig. 3d), although higher for South, resulting in higher standard errors in the subsampled portion of Cerrado.

The effects of abiotic factors on tree diversity and richness

We used quantile regression to assess the effects of abiotic variables on tree species richness following Steege17 approach. Tree species richness per hectare was significantly influenced by a combination of climatic, fire, and edaphic factors across the sampled vegetation plots (Table S1). Among the climatic predictors, both mean annual precipitation (τ = 0.5, p = 0.032) and annual temperature range (τ = 0.5, p = 0.0016) were positively associated with tree species richness. Fire frequency had a significant negative impact on tree species richness (τ = 0.5, p = 0.0024).

Soil conditions were also critical in shaping tree richness patterns. Increased soil bulk density was negatively associated with species richness (τ = 0.5, p < 0.01). Similarly, soil aluminium concentration exhibited a negative relationship (τ = 0.5, p = 0.001). A positive effect was also observed for soil clay content (τ = 0.5, p < 0.01). Lastly, latitude degrees had a negative association with tree species richness (τ = 0.5, p < 0.01).

Discussion

This study presents the first comprehensive, biome-wide assessment of tree species richness and alpha-diversity across the Brazilian Cerrado, leveraging an extensive dataset of 1803 standardized inventory plots. By integrating ground-based data with spatial modelling, we provide a portrait of tree species richness patterns and identify key environmental drivers, offering a robust approach to use data from national efforts programs for conservation implications.

Our findings reveal substantial variation in tree species richness and alpha-diversity across the Brazilian Cerrado, structured by vegetation formation and physiognomy. As expected, and in line with previous localized studies21,26, forest formations, particularly woodland savannas, dry forests, and riparian forests, exhibited significantly higher trees diversity values compared to savannas formations. These denser and often wetter environments function as tree diversity hotspots and important refugia that sustain higher tree diversity within the broader savanna matrix. In contrast, savannas formations (Cerrado strictu sensu) shaped by frequent disturbances and nutrient-poor soils, exhibited lower richness and alpha-diversity26. It is important to note, however, that the absolute values of richness and density reported here may appear comparatively lower than those found in other Cerrado-specific inventories14,23,27,28. Probably, due to the adoption of a stricter inclusion criterion Diameter at Breast Height (DBH > 10 cm), as opposed to the more commonly used Diameter at Soil Height (DSH > 5 cm)29,30. This methodological difference likely excluded a considerable number of individuals, especially in more open and fire-prone vegetation, where trees often remain below 10 cm DBH. Such variation underscores a key challenge in biome-wide biodiversity assessments: achieving methodological consistency across heterogeneous vegetation types However, some studies that applied the same methodological approach in the Bolivian Cerrado reported similar tree species richness per hectare in forest formations31,32,33. Despite this limitation, we can identify a consistent large-scale pattern in tree diversity and to assess the influence of environmental gradients on tree community structure. Therefore, while absolute values should be interpreted with caution, the relative differences across formations and physiognomies remain ecologically meaningful and informative for conservation planning.

The provided map illustrates a non-uniform distribution of tree species richness across the sampled region of the Cerrado. A distinct hotspot of higher richness (approximately 50 species per hectare) is evident in the central-western portion and south-west portion from transitional zone. The spatial patterns identified, with higher richness concentrated in the center and west portions of the Cerrado, are consistent with biogeographic patterns previously suggested for the Brazilian Cerrado24. This spatial gradients in in tree species richness observed across the Cerrado are characterized by a core region of high diversity and transitional patterns toward the Amazonian border. It is well established that savanna trees in the Cerrado–Amazon transition may exhibit greater basal area or biomass than similar formations located in the core region of the Cerrado34,35,36. Our findings reveal a comparable pattern for tree species diversity. This is consistent with recent evidence showing that both the central Cerrado and transitional regions, including the Bolivian Cerrado, exhibit similar patterns of tree species richness28. It is therefore essential to account for both the central Cerrado and its transition zones when delineating conservation units.

In fact, the core Cerrado exhibits relatively high species richness due to its proximity to the centre of species dispersal, whereas more peripheral regions tend to be poorer in species despite the influence of adjacent biomes37, with the exception of the southern Amazonian transition area34. In the South-west, at the transition zone with the Amazon rainforest and in the contiguous Bolivian Cerrado, a belt of high richness is evident due to the proximity to the hyperdiverse Amazonian forests17,34. It is evident that in the contact zones, ecotonal regions of ecological tension, the interchange of Amazon species contributes to increased richness and shapes species composition38,39,40. In addition, trees in these humid and hyperdynamic environments may grow faster34. As consequence, surpass the minimum diameter threshold (e.g., DBH ≥ 10 cm), thereby increasing apparent richness due to a higher inclusion rate of individuals. Conversely, in the northeastern edge, approaching the Caatinga, species richness declines sharply, likely reflecting both increased climatic seasonality and a decrease in tree species richness. However, it is also plausible that this pattern partly results from fewer individuals meeting the inclusion criteria, due to stunted growth under more xeric conditions41,42. In this sense, our results reflect natural patterns driven by seasonality and precipitation, as well as methodological aspects that favour sampling primarily in moist forest formations. Thus, both ecological gradients and sampling thresholds contribute to the observed richness patterns.

The positive association we observed between species richness and both mean annual precipitation and annual range temperature aligns with general patterns in tropical ecosystems where water availability and energy input are key diversity drivers15. This reinforces the sensitivity of Cerrado tree communities to climatic conditions43. Furthermore, the significant negative impact of fire frequency on richness underscores the role of disturbance in shaping these ecosystems44,45,46. While fire is a natural component of the Cerrado, our results suggest that high frequency, potentially exacerbated by human activities, can suppress tree diversity, likely by favouring a smaller subset of fire-tolerant species and hindering the establishment of less resistant ones47,48,49,50.

Soil conditions proved equally critical. The negative relationships found between richness and soil bulk density, aluminium concentration, and clay content point towards the importance of soil physical structure and chemistry. In general, soil bulk density have strong effects on forest structure and clay content on biodiversity51.Lower bulk density likely facilitates better root penetration and water infiltration52, while the association with higher aluminium concentration (lower pH -more acidic soils) is consistent with patterns in many high-diversity, nutrient-poor tropical systems, including specific Cerrado formations where diverse flora thrives on dystrophic soils53,54,55. In addition, clayey soils are associated with flatter areas had generally higher nutrient soils56,57. Consequently, clay-rich soils can support greater tree diversity, likely due to their enhanced water and nutrient retention capacity58.

The pronounced negative association between richness and latitude observed in our plots reflects a well-known global biodiversity gradient. However, interpreting this within the Cerrado requires considering the biome’s internal heterogeneity. Recent Brazilian Cerrado-wide analyses13, utilizing hundreds of plots, provide a more comprehensive picture of spatial diversity patterns and hyperdominance. Their work reveals extreme hyperdominance, where fewer than 2% of species comprise half of all individuals, a pattern mirroring Amazonia. While our study focused on richness drivers within specific plots, our findings contribute to understanding the factors that structure these broader patterns. For instance, the environmental factors we identified likely influence the distribution and abundance of both hyperdominant and rare species across the Brazilian Cerrado.

The critical conservation status of the Cerrado estimates massive tree loss since 1985 and potential extinction threats for hundreds of species due to deforestation (exceeding that of the Amazon in recent years), lends urgency to understanding diversity drivers13. Our identification of specific climatic, fire, and edaphic factors influencing richness provides valuable information for targeted conservation strategies. Managing fire regimes, protecting areas with favourable soil conditions, and considering the impacts of climate change on precipitation patterns are crucial steps. Furthermore, understanding the factors promoting richness in specific locations can help prioritize areas to safeguard the substantial number of rare species and potentially undiscovered diversity within the biome13.

In conclusion, our wide analysis reveals clear spatial patterns and environmental controls on tree diversity in the Brazilian Cerrado. The central Cerrado and the Amazonian transition zone exhibit the highest levels of tree diversity, a pattern strongly associated with precipitation, temperature ranges, fire frequency, and soil texture properties s. These findings underscore the urgent need for expanded and strategically placed conservation efforts, informed by spatially explicit data, to safeguard the unique tree flora of the Brazilian savanna against ongoing anthropogenic pressures. By establishing a robust baseline and identifying key diversity drivers, this work provides an essential foundation for future monitoring, research, and evidence-based conservation planning in the Brazilian Cerrado.

Methods

Tree-inventory data and sampling design

Tree-inventory data were obtained from the Brazilian National Forest Inventory (NFI, IFN in Portuguese59), following its standardized protocol for the Brazilian Cerrado. Sampling points were laid out in a regular grid (Grade Nacional de Pontos Amostrais, GNPA) at 20 km intervals across entire Brazilian portion of the Cerrado biome (here after mention as Cerrado), not including the Bolivian and Paraguayan extensions. At each GNPA point, we installed a “Maltese-cross” cluster of four rectangular subplots (20 × 50 m; total area 1000 m²), oriented toward the four cardinal directions. Within each subplot, all free-standing woody individuals with diameter at breast height (DBH, 1.30 m) ≥ 10 cm were measured (DBH, total height) and identified to the lowest taxonomic level possible (species, genus, or family). Field sampling was carried out between 2018 and 2020. We retained only plots located within native vegetation (savanna or forest formations) classified in field observation. We classified woodland savanna, riparian forest, and dry forest as forest formations, while the cerrado strict sensu considered as typical savanna (including cerrado típico, cerrado ralo, cerrado rupestre, and cerrado denso) were classified as savanna formations. In addition, babaçuais (palm-dominated vegetation formed primarily by Attalea speciosa) and veredas (palm swamp vegetation dominated by Mauritia flexuosa) were also considered savanna formations, following the physiognomic classification of Ribeiro and Walter19 The raw database comprised 3030 plots; we removed those in non-native vegetation or affected by access impediments. To ensure taxonomic and ecological consistency, we excluded any plot where more than 15% of stems lacked species-level identification and retained only subplots located within native forest and savanna vegetation. Additionally, exotic species were excluded based on consultations with the REFLORA60 Virtual Herbarium database. After applying these filters, 1084 plots remained for all subsequent calculations and spatial mapping.

Calculating tree diversity and species richness

Observed species richness (S) was first tallied as the total number of tree species recorded in each plot. Because the effective sampling area (Aᵢ, in hectares) could vary, due to access impediments or partial subplots, we standardized richness to a common area. For each plot, Tree species alpha-diversity was expressed as Fisher’s alpha, a diversity measure theoretically insensitive to sample size, by iteratively solving α = S/ln(1 + N/α), with N as the total number of individuals and S as the total number of morpho-species per plot61. Species richness per ha (Sha) was estimated by solving for Sha = α * ln(1 + Nha/α)1. We also calculated Hill numbers (q = 0, 1, 2)62 and rarefied richness for 100 individuals using the iNEXT package63. However, due to high correlations among these metrics, we retained Fisher’s alpha and tree species richness per hectare (Sha) as the primary diversity indicators for subsequent analyses.

We calculated diversity and density metrics for each plot according to the physiognomies observed in the field within the broader savanna and forest formations. We report our results both by the two major vegetation formations (savanna and forest) and by each physiognomies considered: Palm-dominated vegetation, babaçuais (dominated by Attalea speciosa) and palm, as well as cerrado strict sensu were considered as savanna formations. Forest formations considering Riparian Forest, dry forest, and woodland savanna as physiognomies.

All calculations were performed in R Software (v4.3.1) using base functions and the vegan package for α-diversity64. This two-step approach, raw richness standardized by area, plus Fisher’s α-based extrapolation, ensures comparability of diversity across plots with different sampling extents and stem densities.

Modelling diversity and richness patterns

The spatial predictions of tree alpha-diversity and tree species-richness for the Cerrado vegetation were plotted on a map with a resolution of 0.1 degree (11 × 11 km), based on the original vegetation formation extent of the Brazilian Cerrado, stratified into the major vegetation types of savannas and forest formations, without considering detailed vegetation physiognomies, only forest and savannas formations.

For our spatial interpolations we used loess regression, using only longitude, latitude, and their interaction as independent variables and tree alpha-diversity and species richness as the dependent variables17,63. We used a span of 0.2 for all loess regressions, a 2nd degree polynomial, and no extrapolation17.

For each of the two categories: forest formations and savannas formations, we constructed a separate spatial interpolation model of tree alpha-diversity and tree species-richness across Cerrado. For example, for tree species richness, we made a single spatial interpolation for all plots located on savannas. This interpolation was then used to predict tree species richness for each grid cell, using savannas vegetation class boundaries extracted from the TerraBrasilis platform, developed by the Brazilian National Institute for Space Research (INPE) under the Brazilian Biomes Monitoring Program65. The same procedure was applied to all plots established in forest formations (Supplementary Fig. 4).

Whereas the TerraBrasilis is based on the major vegetation type, the vegetation type of the plots was determined independently of this map and based on field observations of those who established the plot65. These classifications followed the physiognomic criteria of the classical Brazilian vegetation system, distinguishing savanna and forest formations according to Ribeiro & Walter19. Consequently, it is possible that a plot classified by observers as savanna is located in a grid cell classified as forest on the map. Regardless, it was used in the savanna spatial model as the field observations are considered to be correct. As we allowed no extrapolation, pixels too far from the plots were not given a value. As a 2nd degree polynomial may produce upward and downward exaggerations, values higher than the observed maximum in the data were set to the maximum value and those lower than the minimum to the minimum value.

Lastly, to provide a clear guideline on which biogeographical districts harbour greater tree species richness, we summarized the predicted values according to the biogeographic districts of the Brazilian Cerrado as proposed by Françoso et al. (2019), allowing for improved visualization.

Environmental, climatic, and anthropogenic drivers of diversity

To investigate the drivers of tree diversity patterns across forest formations, we evaluated a set of climatic, edaphic, and anthropogenic variables. Prior to model fitting, all numerical predictors were tested for multicollinearity. Because the soil dataset contained a large number of highly correlated indicators, we first performed a preliminary Pearson correlation analysis exclusively among the edaphic variables to reduce redundancy before combining them with the other predictors. Based on this soil correlation matrix. After this initial filtering step, all remaining predictors were evaluated using Pearson’s correlation (r > 0.7), and highly correlated variables were excluded to avoid redundancy (Supplementary Fig. 5), resulting in the selection of 12 variables (Supplementary Fig. 6).

Climatic predictors were selected based on their ecological relevance to tree community structure. We extracted temperature-related variables from the ERA566 reanalysis dataset provided by the Copernicus Climate Data Store and precipitation data from the CHIRPS dataset (Climate Hazards Group InfraRed Precipitation with Station data)67. Variables considered included mean annual temperature, temperature seasonality, and temperature annual range (from ERA5), along with annual precipitation and precipitation seasonality (from CHIRPS). In addition, we calculated the number of months with less than 100 mm of rainfall as a proxy for intra-annual drought intensity (months with less than 100 mm).

Soil data were obtained from two main sources. Soil bulk density (BDOD, g cm⁻³) was extracted from the SoilGrids database68 at 250 m resolution, averaged across the upper 0–30 cm soil layer. The edaphic variables, including organic matter, organic carbon, pH in water and KCl, nitrogen, sodium, magnesium, calcium, potassium, aluminium, and exchangeable hydrogen, aluminium saturation, sum of bases, base saturation, cation exchange capacity, clay, silt, sand, were obtained from a harmonized and model-based global soil dataset69. which integrates multiple soil profiles and environmental covariates across continent. We performed a preliminary Pearson correlation analysis exclusively among the edaphic variables to reduce redundancy before model fitting. This step allowed us to identify and retain the variables that best summarized the major properties of the soil before model fitting. Based on this soil correlation matrix (Supplementary Fig. 7), we selected a subset of soil predictors for subsequent analyses: bulk density (BDOD), aluminum concentration (Al), cation exchange capacity (CEC), pH, clay content, total carbon, and base saturation. Only after this filtering step were this soil variables combined with the climatic and anthropogenic predictors for the full correlation assessment.

Anthropogenic pressures were evaluated at both the landscape and plot levels. We used land cover data from the PRODES (Projeto de Monitoramento do Desmatamento na Amazônia Legal por Satélite), developed by the Brazilian National Institute for Space Research (INPE)70. The PRODES dataset, accessed via the TerraBrasilis platform (https://terrabrasilis.dpi.inpe.br/), provides annual deforestation maps at 30 m spatial resolution, allowing for consistent monitoring of forest loss over time. Using this dataset, we derived historical metrics of forest cover and land-use history. For each vegetation plot, we delineated three buffer zones with radii of 1000 m. Within each buffer, we calculated total native vegetation cover at 2020 including grassland, savannas, and forest formations.

Land-use history was characterized over a 35-year period (1985–2020) from MapBiomas71 using 1000 m buffers around each vegetation plot. Three metrics were derived:

Land-use duration

The average number of years in which each pixel was classified as non-native vegetation, representing the persistence of anthropogenic use. We defined as the non-native vegetation each pixel was classified as forest formation, savanna formation, or grassland, classified from MapBiomas. These variable measures the persistence of anthropogenic land use in areas where native Cerrado vegetation has been replaced by human-modified land uses Land-use change frequency: the average number of land cover transitions per pixel over the time series; Fire frequency: the total number of years a pixel was recorded as burned, based on MODIS fire detection data (500 m resolution) extracted for a 5000 m buffer. Only one fire event per year per pixel was counted, regardless of the number of detections. These metrics were computed via the Google Earth Engine platform using custom scripts applied to MapBiomas71 and MODIS72 products. Fire frequency represents the cumulative number of fire years before the sampling date of each vegetation plot (either 2018 or 2020).

Testing the model fit and data analysis

We calculated the percentage of variation as explained by the combination of the spatial models for tree species richness, by analysing the observed and predicted values together, using a simple linear regression. We tested the goodness of prediction by mapping the standard error of the loess regression, also examining it by region and forest type. We tested for autocorrelation in the residuals, using the function Moran.I(), in the ape package distribution to further assess the validity of the model predictions and mapped the residuals to assess potential residual spatial signal73,74. A histogram was constructed of all values for each variable, as well as a boxplot by region and forest type. This map was then used to predict the species richness/ha for the plot that was left out and can be considered a non-biased estimate of the quality of the resulting map.

We modelled the effects of climatic variables, anthropogenic pressures, and large-scale soil properties on species richness in forest and savanna formations, which had the highest number of sampled plots in the Cerrado. We analysed environmental, soil properties and anthropogenic impacts only for tree species richness, due to two main reasons: (1) species richness is more straightforward to interpret than Fisher’s alpha, and (2) species richness is strongly correlated with Fisher’s alpha and other diversity metrics. We applied quantile regression for a more robust estimation, as it is less sensitive to outliers49,50. We used tau = 0.9 and 0.5 to estimate the upper bound of the relationship, representing the maximum potential value of species richness for a given predictor. This approach has previously been effective in revealing limiting factors in tropical ecosystems17. All analyses were conducted in the R programming environment using mostly custom scripts75.

Statistics and reproducibility

All tests were carried out with all plots. All tests and data are available in the online supplementary material and can thus be reproduced.