Introduction

Biodiversity of plants encompasses various facets, including species diversity, structural diversity, functional diversity, and genetic diversity, all of which are increasingly threatened by the ongoing global biodiversity crisis1. Among these, the loss of plant functional diversity—the variety of plant functional traits within a community or an ecosystem—poses significant risks to ecosystem productivity2, functioning, and stability3,4,5,6. In this context, plant functional traits are valuable indicators of ecosystem processes and reliable proxies for assessing ecosystem conditions7,8. Thus, monitoring of functional diversity is crucial for a deeper understanding of ecosystem dynamics. Ideally, such monitoring should occur continuously and extend globally. However, measuring functional diversity requires assessing variations in multidimensional trait space9,10,11,12,13, and obtaining such plant trait data at the community or ecosystem level through traditional fieldwork and laboratory assays is impractical14.

This is where hyperspectral remote sensing may offer a powerful alternative. Hyperspectral data captures spectral information across a wide range of wavelengths and, through specific absorption features, enables the estimation of a series of plant traits15,16,17. Therefore, hyperspectral satellite-borne remote sensing can provide an avenue for continuous monitoring of functional diversity across time and space18. Current and upcoming spaceborne hyperspectral missions, such as EnMAP19, PRISMA20, GaoFen-521, CHIME22, and SBG23, promise to deliver unprecedented volumes of hyperspectral data, important steps towards a global system to track changes in plant traits and, by extension, functional diversity14,24,25.

To harness the full potential of these hyperspectral datasets for functional diversity monitoring, it is essential to develop robust models capable of accurately predicting plant traits over time and across different ecosystems and vegetation types. Data-driven approaches, particularly machine learning, are powerful tools for retrieving trait information from hyperspectral imagery by learning complex relationships between spectral signatures and plant traits26,27. Especially the combination of large data compilations and deep learning has shown great promise in improving the accuracy and scalability of trait prediction models17,28,29.

Previous studies have managed to map functional diversity locally30,31,32,33. However, so far, most of these surveys only cover a single point in time and ignore possible changes in functional diversity due to phenology. Other studies performed multi-temporal analyses of functional traits, but only for small areas, and did not specifically address functional diversity34,35. A third group of studies focused on the link between spectral signals and functional diversity, but only through simulations36,37,38. It is well known that values for plant functional traits change significantly throughout the vegetation period39,40 and also that these changes critically affect the retrieval of plant functional traits from spectral data41. It therefore remains unclear how representative functional diversity maps are across seasons.

Here, we assess the seasonal variation in remotely sensed functional diversity and examine whether these patterns differ across biomes and along latitudinal gradients (see Fig. 1). We assembled a large global dataset of hyperspectral satellite images acquired by the EnMAP mission42, covering a two-year period from 2022 to 2024. For each scene of 30 km² with a 30 m × 30 m spatial resolution, we retrieved 20 essential plant functional traits through a deep learning model17. We calculated two functional diversity indices for the whole scene: Rao’s Q, which measures trait dissimilarity, and functional richness, which captures the range of trait values. As our dataset provided broad spatial coverage rather than repeated observations at the same coordinates, these indices were compared based on the recording time of the scene and its geographical location to derive variations of functional diversity for five major biomes. While interpreting ecological drivers is beyond the scope of this study, our results demonstrate that functional diversity varies considerably across seasons and biomes, with metric-specific differences. These patterns suggest that single-time-point snapshots may not fully capture ecosystem functional diversity and hence impair assessments of ecosystem stability and functioning.

Fig. 1: General workflow of the multi-seasonal functional diversity analysis58.
figure 1

Hyperspectral data of the EnMAP satellite mission19 is fed into a one-dimensional (1D) convolutional neural network17, which predicts 20 different plant functional traits per scene. These traits are then used to calculate functional diversity indices and analyze their changes over time.

Results

Temporal variability among biomes

Across biomes, we found the data to vary substantially in time (Figs. 2 and 3). Overall, values of Rao’s Q ranged from 1.54 to 6.85 for the entire time period, with higher values indicating higher functional diversity. Rainforests exhibited the lowest range of values (2.69–5.35), whereas the highest range for Rao’s Q occurred in the Mediterranean ecosystems (2.20–6.85; Fig. 2). Savannas and Shrublands showed the most pronounced seasonal changes, followed by Temperate Grasslands and Mediterranean ecosystems. At the same time, Temperate Forests and Rainforests displayed the least variation. In Temperate Forests, subtle seasonal variations were observed for the Northern hemisphere, with Rao’s Q values peaking during spring and autumn (Figs. 2 and 3). Temperate Grasslands also exhibited their maximum in Rao’s Q values during spring and autumn in the Southern hemisphere (Figs. 2 and 3). For Savannas and Shrublands, pronounced seasonal patterns were detected, with higher Rao’s Q values occurring between April and October in tropical zones and between March and November South of the tropics, corresponding to the wet season (Figs. 2 and 3). Rainforests showed a similar tendency, but exhibited no clear seasonal patterns in Rao’s Q, which is consistent with the more stable climatic conditions in these ecosystems (Figs. 2 and 3). The Mediterranean ecosystems, Temperate Grasslands, as well as Savannas and Shrublands displayed substantial overall variation in Rao’s Q throughout the vegetation period (Supplementary Fig. 1).

Fig. 2: Scatter plots of Rao’s Q (trait dissimilarity) values for the five different biomes.
figure 2

Values North of the tropics were excluded for Rainforests as well as Savannas and Shrublands due to sparse data. Image acquisition dates are merged across years. A Gaussian smoothed median (GSM) with a moving window of 90 days is shown for each latitude group of each plot, along with colored bars representing the range from 25% to 75% quantiles.

Fig. 3: Polar plots of Rao’s Q (trait dissimilarity) values for the five different biomes.
figure 3

Values North of the tropics are not shown for Rainforests and Savannas and Shrublands due to sparse data. Image acquisition dates are merged across years. The monthly median is shown for each latitude group of each plot, along with colored bars representing a range from 25% to 75% monthly quantiles.

Differences between metrics

Functional richness generally displayed similar seasonal trends to those observed in Rao’s Q (Supplementary Figs. 26), but interquartile ranges were comparatively higher (Fig. 2, Supplementary Fig. 3). Overall functional richness values from convex hull ranged from 89.99 to 11798.05 for the entire time period, with higher values also indicating higher functional diversity. Temperate Grasslands displayed the lowest value range (89.99–6545.27), whereas the highest range of values for functional richness was found in Mediterranean ecosystems (457.14–11798.05). Functional richness patterns calculated by kernel density estimation hypervolume (KDE) showed high resemblance to those derived from convex hull (Supplementary Figs. 5 and 6) and exhibited a higher overall range of values (10203.09–305246.55).

To further quantify the differences between the two metrics, we calculated the means of monthly coefficients of variation (meanCVs). Rao’s Q showed low to medium variability, with meanCVs ranging from 0.13 in Temperate Forests to 0.31 in Temperate Grasslands (Supplementary Fig. 7). In contrast, functional richness exhibited much higher variability, with meanCVs ranging from 0.33 in Temperate Forests to 0.96 in Temperate Grasslands (Supplementary Fig. 8).

Discussion

This study documents the pronounced temporal variation of remotely sensed functional diversity across five major biomes. Our results highlight the importance of incorporating temporal variation into functional diversity assessments. The observed temporal dynamics in functional diversity have implications for ecological research and biodiversity monitoring through Earth observation, as well as through conventional field surveys. It has to be noted, however, that the functional traits considered in our analysis are restricted to those that can be addressed with optical remote sensing data. This leads to a bias towards aboveground and specifically towards leaf traits of the dominant and sun-exposed canopy layer, whereas belowground traits, propagation or dispersal traits are not included. Also note that given the 30 m pixel size of EnMAP, we primarily address functional diversity of plant communities. We can therefore not make statements about all possible aspects of functional diversity.

Temporal variation in functional diversity differs extensively across biomes, underscoring the influence of ecological and climatic contexts. Rainforests, for example, exhibit minimal seasonal variation in Rao’s Q and functional richness, reflecting the more stable climatic conditions of these ecosystems. In contrast, Savannas and Shrublands show stronger seasonal changes that accompany the prevailing wet-dry cycles. One caveat of these results is data scarcity in wet seasons due to fundamental limitations of optical remote sensing concerning high cloud cover14. At the same time, Mediterranean ecosystems and Temperate Grasslands exhibit substantial seasonal variation, where functional diversity peaks mainly during the spring and autumn months. These peaks were also observed for Temperate Forests in a less pronounced form, suggesting that the times of budburst and leaf senescence lead to high levels of remotely sensed functional diversity. In general, all these findings align with previous work indicating the strong phenological and environmental controls on plant traits39,40,43,44, which evidently apply also to functional diversity (compare Durán et al.45). The observed seasonal trends emphasize that temporal snapshots may not reveal the full picture of functional diversity (see Guimarães-Steinicke et al.46). This also highlights that comparing functional diversity estimated in different seasons may be misleading. Moreover, comparing functional diversity estimates across regions, which may not have a synchronous seasonal behavior, may also be misleading. Uncertainties will be particularly severe in ecosystems with high overall value ranges for diversity estimates, which in our study were found for Temperate Grasslands, Savannas and Shrublands and the Mediterranean (Supplementary Fig. 1). It should be noted, however, that Supplementary Figs. 1, 2 and Fig. 4 do not show which biomes are the most functionally diverse overall, but only provide a comparison at a resolution of 30 m × 30 m and a spatial extent of 30 km². This means that we do not observe functional diversity at the species level, but quantify diversity gradients of plant communities on a landscape scale. This inevitably leads to lower overall diversity, but it allows us to observe biodiversity patterns at the biome level. Since plant sizes and compositional patterns differ widely between life forms and ecosystems, more comprehensive and reliable comparisons between biomes would have to take the effects of scale and multiple spatial resolutions into account36,37.

Fig. 4: Location of the five biomes in our study using the Mollweide projection.
figure 4

Dotted black lines separate the three latitude groups. Boxplots show the monthly Rao’s Q (trait dissimilarity) medians for all biomes without differentiation by latitude. The boxes show the interquartile range (IQR) and the median of medians, while the whiskers extend to the smallest and largest median values within 1.5 times the IQR.

The choice of metric for functional diversity also influences the observed temporal patterns. Rao’s Q shows relatively low variation, while functional richness exhibits higher variation, particularly during the seasons with the highest values (Supplementary Figs. 3, 4). This can be explained by the dependency of functional richness on species numbers (Villéger et al., 2008), which have a less prominent influence on Rao’s Q. Functional richness also exhibits coefficients of variation that are up to three times higher than those of Rao’s Q (Supplementary Figs. 7, 8). In general, these findings highlight the importance of selecting functional diversity metrics. Rao´s Q and functional richness are different facets of functional diversity, and while Rao’s Q is more frequently used, the choice depends on the assessment’s focus. Thus, combining multiple metrics can provide a more nuanced understanding of functional diversity dynamics37.

The possible extent of uncertainty in our results depends on the accumulated uncertainties of the model’s trait predictions. Here, the robustness of the model itself remains a possible caveat since we cannot assess the temporal variation of the model in detail due to a general lack of validation data. Assembling a benchmark dataset of plant trait observations at a global scale, with sufficiently large plots and co-located hyperspectral data, is a long-standing goal in vegetation remote sensing - one that will require sustained, coordinated efforts across the research community14,18. However, we assessed the trait predictions in our study over time (Supplementary Figs. 928), particularly for two of the seasonally most affected traits, Chlorophyll and LAI (Supplementary Figs. 29, 30), and found consistent seasonal patterns in line with the expectations. We also know that the model of Cherif et al.17 was trained and successfully evaluated on 42 datasets of different biomes with large temporal variation. According to their results and our own assessments of temporal variation, we therefore assume that the model predictions are robust. Regarding uncertainty in the functional diversity metrics, we are aware that functional richness calculated via convex hull volume can be sensitive to outliers, as extreme trait combinations disproportionately influence the geometry of the hull. To address this, we decided to include KDE as an alternative method for calculating functional richness. KDE-based methods are generally less sensitive to outliers, as they weigh the density of trait distributions rather than relying solely on the outermost points. KDE results show high resemblance with the ones obtained from convex hull, which indicates the robustness of functional richness estimates (Supplementary Figs. 5, 6). Rao’s Q has low sensitivity to outliers, as it is based on pairwise trait dissimilarities across the trait space, rather than being dependent on the outer envelope. It integrates over all distances and does not emphasize extremes. As such, Rao’s Q is often considered a more stable metric when dealing with continuous trait distributions47,48. Both Rao’s Q and convex hull are parameter-free and deterministic once the trait distance matrix is defined. There is no inherent stochasticity or calibration involved in their calculation, and uncertainty arises primarily from input variability rather than from the metric formulation itself. Another uncertainty factor is any misclassification in the European Space Agency (ESA) WorldCover product that we used to mask out anthropogenic land cover types in the EnMAP scenes. Examples would be agricultural areas misclassified as Temperate Grasslands or tree plantations misclassified as Rainforests. However, the ESA WorldCover is currently the most accurate product available49. It has a 10 m × 10 m spatial resolution, which means that the extent of misclassification in our scenes with 30 m × 30 m pixels was considered to be minimal. Lastly, at current processing levels, EnMAP products do not include Bidirectional Reflectance Distribution Function (BRDF) correction, i.e. the removal of effects resulting from different illumination and observation angles. The model in our study was trained on airborne data with varying levels of BRDF and illumination effects. BRDF effects primarily result in changes in the magnitude of reflectance rather than the shape. Our method (Convolutional Neural Networks) is primarily dependent on shape-related features and was augmented during training with systematic shifts in reflectance magnitudes17. It should therefore be robust to satellite-level BRDF.

Our study underscores the value of hyperspectral remote sensing for tracking temporal dynamics in functional diversity. While remote sensing primarily reveals the functional diversity of dominant plants and may miss the diversity beneath the upper canopy, it remains a powerful tool due to its standardized, repeated, and large-scale observations14. The ability to retrieve plant traits across time and space provides unprecedented opportunities for ecosystem monitoring, particularly in remote or inaccessible regions. However, we also encountered challenges, including data gaps in tropical rainforests and savanna regions due to cloud cover and limited satellite scene availability. It should also be noted that we excluded boreal forests and the tundra biome after data collection due to severe data scarcity. These gaps necessarily limit the generalizability of our findings. Nevertheless, the seasonal changes observed in temperate biomes, especially since they correlate with leaf phenology, imply that boreal forests and the tundra biome should also exhibit pronounced seasonality effects. Additionally, sampling bias likely plays a role in our results due to EnMAP’s snapshot-on-demand coverage that lacks spatiotemporal continuity, particularly in the relatively stable pattern of functional diversity in Temperate Forests (Fig. 2, Supplementary Fig. 3). This biome includes both deciduous and mixed forests, and values during winter are more likely to come from mixed forests dominated by conifers than from deciduous ones. Addressing these limitations through enhanced satellite coverage and improved cloud-masking algorithms will be essential for achieving global functional diversity monitoring. More frequently available Landsat or Sentinel data with a few multispectral bands cannot fill the data gap since rich hyperspectral data are crucial to retrieve the analyzed key plant traits, including leaf water and nitrogen content18. Therefore, future hyperspectral missions such as CHIME22 and SBG23 promise a more comprehensive spatiotemporal picture across all biomes. Above all, there is an urgent need for more dense temporal coverage of hyperspectral acquisitions. Additionally, the sensitivity of trait retrieval to seasonal changes41 reinforces the need for robust, temporally adaptive models. The opportunities provided by deep learning-based trait estimation are especially promising in this context, as large, curated datasets17 now enable the inference of a wider range of traits essential for describing functional diversity. As more researchers share data, including those of underrepresented traits and regions (see Mederer et al.50), the capacity to monitor functional diversity will continue to grow.

Conclusion

This study underscores the importance of integrating temporal dynamics into functional diversity assessments. Our findings provide evidence that functional diversity exhibits substantial variation across seasons, biomes, and diversity metrics. Seasonal trends revealed in our analysis suggest that single temporal snapshots fail to capture the full complexity of functional diversity, which underscores the risk of misinterpretation when comparing estimates from different time points. Similarly, cross-regional comparisons of functional diversity may yield misleading conclusions if they do not account for asynchronous seasonal patterns among regions. These insights call for a shift from static to multitemporal functional diversity monitoring, leveraging the growing availability of hyperspectral satellite data. However, using hyperspectral data does not automatically provide a comprehensive solution (cf. Jetz et al.14). The choice of functional diversity metric depends on the specific goals of a study, as different metrics emphasize different aspects of diversity. Each has its limitations regarding trait selection, spatial resolution and temporal coverage. At the same time, more efforts are needed to promote data sharing across research communities, which will enable the inclusion of underrepresented regions in global analyses. By fostering collaborative initiatives and leveraging multitemporal monitoring, we can bridge critical data gaps and enhance the accuracy of functional diversity assessments. This will make it possible to unlock the full potential of functional diversity assessments to inform biodiversity science and sustainable ecosystem management on a global scale.

Methods

Dataset collection

We acquired 4157 EnMAP scenes from the corresponding website51. Their global distribution is shown in Fig. 5. The data were preprocessed to Level-2A data in GeoTIFF format with combined land and water correction and no ozone correction. We selected all available EnMAP scenes at the time (15.09.2024) that met the following criteria:

  • located in one of the five biomes we studied (Fig. 4)

  • covering mostly natural vegetation; national parks and protected areas where prioritized, in doubt we cross-checked with the ESA WorldCover dataset52

  • cloud cover below 50%

  • no dispersed cloud cover

Fig. 5: Distribution of EnMAP satellite scenes in our dataset using the Mollweide projection.
figure 5

One red dot represents one scene, while dotted black lines separate the three latitude groups.

Our study includes five biomes of the World Wildlife Fund Terrestrial Ecoregions Of The World dataset53. These were (with abbreviations in brackets): Tropical and subtropical moist broadleaf forests (Rainforests), Temperate broadleaf and mixed forests (Temperate Forests), Tropical and subtropical grasslands, savannas and shrublands (Savannas and Shrublands), Temperate grasslands, savannas and shrublands (Temperate Grasslands) and Mediterranean Forests, woodlands and scrubs (Mediterranean).

Data coverage

Rainforests as well as Savannas and Shrublands North of the tropics were excluded due to insufficient coverage, while areas South of the tropics had limited representation, reflecting their latitude-dependent distribution. Across the tropics, high cloud cover and relatively few requested satellite scenes posed challenges (Figs. 4, 5). Nevertheless, the vast natural expanses of these regions provided sufficient data, with 538 scenes available for analysis. Savannas and Shrublands exhibited the highest data coverage (1693 scenes), primarily due to the extensive areas in Africa and Australia that combine high scene requests with low levels of anthropogenic disturbance. Temperate Forests (806 scenes) and Temperate Grasslands (645 scenes) were well covered despite extensive anthropogenic land cover changes, particularly from agricultural expansion. Mediterranean ecosystems, while represented by the smallest total number of scenes (475), benefited from very high coverage relative to their limited geographic extent.

Model implementation

We employed a pre-trained one-dimensional CNN model for retrieving plant traits from spectral data, provided by Cherif et al.17 via GitHub54. The architecture, adapted from EfficientNet-B055 for one-dimensional input, incorporates depthwise separable convolutions and network scaling techniques. The model predicts 20 plant functional traits at once and was trained on a global collection of 42 different datasets. This collection includes hyperspectral data obtained from various remote sensing platforms and sensors (e.g. AVIRIS, HyMap, HySpex, NEON Airborne Observation Platform AOP). Despite having different spectral properties, they cover a comparable wavelength range of the solar electromagnetic spectrum (see Table A.1 in Cherif et al.17). Measurements were unified across the full range of 400–2500 nm in 1 nm steps by applying a forward and backward linear interpolation17. The model transferability was evaluated with a block cross-validation across 42 independent datasets. Evaluation results were then averaged for the final R-squared (R²) and normalized Root Mean Squared Error (nRMSE) values, which ranged from 0.10 to 0.69 for R² and 19.92 to 10.65 for nRMSE. Notably, many of the most important plant traits such as chlorophyll (0.51 R², 16.92 nRMSE), nitrogen (0.42 R², 14.28 nRMSE) and leaf mass per area (0.69 R², 10.65 nRMSE) were predicted with high accuracy, therefore showing that the model is able to predict these traits reliably across different biomes and ecosystems. All model settings were kept identical to those in Cherif et al.‘s original work and were implemented in Python.

Preprocessing

We used the five masks provided by EnMAP (cloud, cirrus, cloud-shadow, haze, snow) to eliminate non-vegetation surface elements in the scene. To deal with mask artifacts, a binary dilation buffer with a radius of 40 pixels was chosen after testing different values. Topography correction using a digital elevation model has already been applied by the EnMAP preprocessing pipeline for L2A data. All masks were merged and then applied on the scene. Furthermore, since EnMAP scenes have a rhomboid shape with padding of no-data values around them, each scene was cropped to an axis-parallel rectangle using the outer coordinates of the scene. Next, we masked all man-made surface elements and water (built-up area, croplands, permanent water and snow/ice) identified in the ESA WorldCover V2 2021 dataset52.

Prediction and evaluation

We drew a random sample of n = 5000 pixels from the remaining areas comprising natural and semi-natural, sunlit, vegetated pixels. On these pixels, we applied the trait model, predicting 20 plant functional trait values, resulting in a total of 100,000 trait values per scene. A heatmap with mean trait values and ranges per biome is available in the supplementary material (Supplementary Fig. 31). The values were then standardized to a mean of 0 and a standard deviation of 1 before being subject to a Principal Component Analysis (PCA). Standardization and PCA were first done globally for all scenes at once, and then the component loadings were used for the individual scenes (Supplementary Fig. 32). We created a PCA biplot to visualize how the traits influence the main two axes (Supplementary Fig. 33). Over 90% of the variance is explained by the first five components of the PCA (Supplementary Fig. 34), therefore we chose these as the basis for calculating both the Rao’s Q and functional richness value for each scene. Both metrics were chosen because they are widely used by the community and measure different aspects of functional diversity.

Rao’s Q measures diversity as trait dissimilarity by considering the weighted average of all pairwise differences between values, where the weights are based on the relative proportions of each data point following Eq. (1).

$$Q={\sum}_{i=1}^{n} {\sum}_{j=1}^{n}{p}_{i}{p}_{j}d\left({x}_{i},{x}_{j}\right)$$
(1)

Where \({p}_{i}\) and \({p}_{j}\) are the relative proportions associated with data points \({x}_{i}\) and \({x}_{j}\) and \(d({x}_{i},{x}_{j})\) is the difference between \({x}_{i}\) and \({x}_{j}\). Functional richness measures diversity as the range of plant functional trait values and was calculated in two ways. First, as the volume of the convex hull in the 5-dimensional PCA-space. For this we used the convex hull function from SciPy in python56. Second, functional richness was calculated as the kernel density estimation hypervolume (KDE) of the 5-dimensional PCA-space, as this method might be less prone to data outliers. Here we used the KernelDensity function from scikit-learn in python57. Bandwidth was set to 0.5 after applying GridSearchCV from scikit-learn to ten scenes from each of the five biomes. The threshold was set to the 95% quantile and the generated samples to 50,000. Values below 5% and above the 95% quantiles for both metrics were removed to prevent distortions from extreme outliers.