Background & Summary

Chlorophyll is an essential pigment in green plants’ photosynthesis that harvests solar radiation and absorbs carbon dioxide. Leaf chlorophyll content (LCC) indicates the maximum carboxylation rate (Vcmax)1 and can then be used to calculate the primary productivity of plants2. LCC also indicates light, temperature, water stress, pests, and diseases. Therefore, the accurate large-scale LCC can improve the performance of the terrestrial global carbon cycle model3,4 and the ability of ecosystem monitoring. Remote sensing methods, taking advantage of chlorophyll’s varied absorption and scattering properties in different bands, make it the only applicable approach to retrieve LCC at the continental or global scale.

In the past decades, some global LCC products have been generated. MERIS LCC product is the first global LCC whose spatial resolution is 300 m and temporal resolution is 7 days5. Leaf-level radiative transfer model PROSPECT combined with the 4-Scale model (for woody vegetation) and SAIL model (for non-woody vegetation) were used to construct look-up table (LUT) and derive LCC from the MERIS data. MODIS LCC is the product generated by MODIS data from a VI matrix method6. The spatial resolution is 500 m, and the temporal resolution is 8 days. GLCC products were derived from ENVISAT MERIS and Sentinel-3 OLCI with a spatial resolution of 500 m and a temporal resolution of 7 days7. LUTs constructed from the PROSPECT-D + 4-Scale model (for heterogeneous vegetation) and PROSPECT-D + 4SAIL model (for homogenous vegetation) were used to derive global LCC. Another global LCC product is the GLOBMAP MERIS LCC8. Based on the RTM simulations, a neural network was constructed and derived from the ENVISAT MERIS LCC with a resolution of 300 m/7 days from 2003 to 2012. The current global LCC product’s spatial resolution is from 300 m to 500 m. A recent study compared the performance of these LCC products in China and showed that the RMSE ranged from 21.0 μg/cm2 to 32.3 μg/cm2 9, indicating the accuracy still requires systematic improvement. In the scale of 300–500 m, a large proportion of vegetation areas should be in the mixed pixels10, and the mixed-pixel effect brings great uncertainties to the inversion of vegetation parameters11. Enhancing the spatial resolution becomes a practical way to improve the accuracy of the current product12, which allows researchers and analysts to better monitor vegetation status13 and predict the crop yield14. Meanwhile, the higher-resolution product is a more effective reference for decision-making in fine precision agriculture15 and grazing a more reasonable input for global and regional ecosystem models associated with carbon cycle modeling16.

The only published large-scale and high-resolution LCC product is the Multi-source data Synergized Quantitative remote sensing production system LCC (MuSyQ LCC17,18). Using Sentinel-2 MSI reflectance and the Chlorophyll Sensitive Index (CSI)-based empirical regression method, the resolution of the LCC product was improved to 30 m/10 days19. Previous study also suggests the CSI-based algorithm is stable and suitable for generating the large-scale LCC product when applied to Gaofen-6 images20. Validation suggests the MuSyQ LCC has the highest accuracy, demonstrating high overall spatial consistency with the MODIS LCC over China9. However, the MuSyQ LCC product only covers China from 2019 to 2020 without high-resolution global LCC information.

The objectives of this study are twofold: (1) To generate a high-resolution global LCC product by establishing empirical relationships between CSI and LCC through radiative transfer model, with the 100m-resolution product archived in Science Data Bank and higher-resolution versions accessible via provided code; (2) To validate the multi-resolution LCC products through direct comparison with ground measurements and indirect evaluation against MODIS LCC products.

Methods

Input data

Sentinel-2 Multispectral Instrument (MSI) images were used to generate the MuSyQ Global LCC product. The MSI onboard Sentinel-2 has 13 bands, including red-edge bands sensitive to the LCC variation. The spatial resolution of Sentinel-2 is 10 m for visible and near-infrared (NIR) bands and 20 m for the red-edge bands. The Sentinel-2 MSI level 2 (L2A) land surface reflectance product, pre-processed with radiometric calibration, geometric and atmospheric correction, is an ideal dataset for calculating the CSI index and retrieving the LCC. The Sentinel-2 MSI L2A dataset is available on both the official website (https://dataspace.copernicus.eu/) and the Google Earth Engine (GEE) platform (https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S2_SR_HARMONIZED). In this study, we processed the Sentinel-2 MSI L2A dataset in 2019–2022 on the GEE platform to calculate LCC, and the calculated LCC is resampled to a specific resolution using the nearest neighbor method. The product of Global Land Cover with a Fine Classification System at 30 m21 was used to define the vegetation types. Based on the GLC_FCS30 land-cover product, vegetation worldwide was reclassified into five major types: broadleaf forest, needleleaf forest, cropland, grassland, and shrub. Empirical regression relationships between LCC and CSI were constructed for each type.

Data processing

Figure 1 illustrates the diagram to generate the high-resolution LCC product and the validation procedure.

Fig. 1
Fig. 1
Full size image

Diagram of generating the LCC product.

Firstly, the Sentinel-2 MSI L2A product was selected to calculate CSI as the following equation,

$${CSI}=2.5\times \frac{{\rho }_{{NIR}}-{\rho }_{{RE}1}}{{\rho }_{{NIR}}+{\rho }_{{RE}1}}\times \frac{{\rho }_{{Blue}}}{{\rho }_{{RE}1}}$$
(1)

where \({\rho }_{{Blue}}\), \({\rho }_{RE1}\), \({\rho }_{N{IR}}\) represents the reflectance of blue, red-edge band 1, and NIR band of Sentinel-2. CSI is derived from the product of NDVIre (\(\frac{{\rho }_{{NIR}}-{\rho }_{{RE}1}}{{\rho }_{{NIR}}+{\rho }_{{RE}1}}\)) and the \(\frac{{\rho }_{{Blue}}}{{\rho }_{{RE}1}}\) factor. While NDVIre increases with both LCC and LAI, the \(\frac{{\rho }_{{Blue}}}{{\rho }_{{RE}1}}\) factor rises with LCC but declines with LAI. Their product thus enhances sensitivity to LCC while reducing LAI interference19.

During the calculation, the 20m-resolution red-edge bands of Sentinel-2 were resampled to 10 m using the nearest neighbour method. Meanwhile, the global vegetation cover map derived from the GLC_FCS30D product21 was resampled to 10 m resolution using the nearest neighbour method and then reclassified into five types in Table 1.

Table 1 Empirical regression equation between CSI and LCC in different vegetation types.

Secondly, using the type-specific regression equations (Table 1), LCC for different vegetation types was calculated on the GEE platform. The type-specific regression equations were acquired using radiative transfer models, and the key parameters were set according to Table S2 in the Supplementary Information. The RMSE of the regression model between CSI and LCC was between 6.04 and 10.21 μg/cm2 and the R2 is between 0.68 and 0.99 (Table 1). A cloud score based on the Sentinel-2 MSI was used to evaluate the cloud possibility, and the cloud-contaminated pixels can be identified using the algorithm (https://github.com/openforis/gee-gateway/blob/master/gee_gateway/gee/utils.py#L691). To make the max value of the product 80 μg/cm2, any value larger than 80 μg/cm2 was set to 80 μg/cm2. In this way, 1-day LCC maps with the same resolution as Sentinel-2 MSI (10 m) were generated.

Thirdly, several 1-day LCC maps of the same tiles were averaged for different composite requirements from customers, and subsequently, resampled to produce the LCC product with customized spatial resolutions. The method to generate the high-resolution LCC product is available on the GEE platform (see the Supplementary information).

Finally, the product’s retrieval rate (RR) in different day-composite strategies was calculated to evaluate the missing rates of the product. Additionally, the products in different resolutions were validated using ground-measured data and compared with the MODIS LCC product.

The large volume of global high-resolution product data requires substantial storage space, and uploading such data can be extremely time-consuming. Therefore, we only uploaded the global 100 m/10 days resolution LCC product from 2019 to 2023 to the online server. For the higher-resolution LCC product, we provided the code and a web interface based on the GEE platform (https://code.earthengine.google.com/a06dfc261ad8019e025153d5bd0e68ca), allowing users to independently select their desired temporal and spatial ranges as well as the corresponding resolutions.

Data Record

The dataset (MuSyQ Global LCC product with 100 m/10 days resolution) is available at Science Data Bank22,23,24,25,26, and the link for each year’s product is shown in Table 2. Each year’s data is stored in a folder named after the corresponding year (e.g., ‘2019’). These year folders contain multiple subfolders organized by latitude and longitude (e.g., ‘E0N10’), within which the LCC product images are stored in *.tif format, named as ‘LCC_longitude-latitude_day-of-year.tif’ (e.g., ‘LCC_E0N10_001.tif’). The mean size of each file was 10.76 MB. The dataset is licensed under CC BY 4.0. For the higher-resolution product or product of customized spatial and temporal resolutions and ranges, users can download it using the GEE-based link (https://code.earthengine.google.com/a06dfc261ad8019e025153d5bd0e68ca). For detailed instructions, please refer to the Section 3 of the Supplementary Information.

Table 2 Links of the MuSyQ Global Leaf Chlorophyll Content product with 100 m/10 days resolution.

Technical Validation

Validation methods

The ground-measured LCC from different research is collected to validate the MuSyQ Global LCC product. Details of the ground-measured data are shown in the Ground Measurement section of the Supplementary Information. These data encompassed 1139 sampling measurements in different field campaigns, including the National Ecological Observatory Network (NEON) in the USA, Huailai and Gaocheng field experiments in China. The smallest size of each experiment is 20 m * 20 m, which can validate the product with 10 m spatial resolution. To minimize temporal mismatch, we compared the 10-day mean composites derived from 10m-resolution MuSyQ Global LCC product with ground-based observations collected during the closest possible time windows. As for the validation of the 100m-resolution and the 500m-resolution LCC products, the 30m-resolution land cover product (GLC_FCS30) was used to assess if plots in Huailai and NEON are located in a homogeneous area. The centre coordinates of each plot in the experiment and its vegetation type in GLC_FCS30 were extracted. For the validation of the 100 m resolution product, sampling points are selected based on the presence of data with the same GLC_FCS30 product value within a 5 * 5 pixel area (i.e., 150 m * 150 m homogeneous). For the validation of the 500 m resolution product, sampling points are chosen within a 17*17 pixel area that shares the same GLC_FCS30 product value (i.e., 510 m * 510 m homogeneous).

Direct validation

Figure 2 illustrates the validation result of the 10m-resolution MuSyQ Global LCC product. The results suggest the overall RMSE and rRMSE of the product in 5 different vegetation types were 13.69 μg/cm2, 33.70%, respectively, and the R2 was 0.37. The LCC retrieved from the 10m-resolution product and the ground-measured LCC were aligned along the 1:1 line, with underestimation under high LCC conditions and overestimation under low LCC conditions. Figure 2b–f shows the detailed results in different vegetation types. The accuracy of the retrieved LCC is varied, with an RMSE between 11.28 and 15.22 μg/cm2 in the five types. The cropland had the highest accuracy with an RMSE of 11.28 μg/cm2, rRMSE of 19.36%, and bias of 1.57 μg/cm2. The grassland had the accuracy with an RMSE of 11.93 μg/cm2, rRMSE of 35.30%, and bias of 1.25 μg/cm2. The RMSE, rRMSE, and bias of the broadleaf forest were higher, with 14.25 μg/cm2, 36.24%, and 3.97 μg/cm2, respectively. LCC of needle leaf forest had the lowest accuracy, with RMSE of 15.22 μg/cm2. Figure 2b–f also shows LCC product for all five types tended to be underestimated when LCC is more than 60 μg/cm2 and the LCC was overestimated for forests, grasslands, and shrubs when LCC is less than 30 μg/cm2.

Fig. 2
Fig. 2
Full size image

(a) Validation of the 10m-resolution MuSyQ Global LCC; Validation results of the 10m-resolution MuSyQ Global LCC in broadleaf forest (b), needleleaf forest (c), cropland (d), grassland (e), and shrub (f).

Figure 3 shows the product’s bias under different LCC conditions. Due to the limited number of ground-measured LCC of shrubs, only the other four types were compared. Generally, the LCC retrieved from the product tended to be overestimated when the LCC was less than 20 μg/cm2. The bias of broadleaf forest, needleleaf forest, and grassland was more than 10 μg/cm2. When the LCC was 20 – 40 μg/cm2, the overestimation became less obvious, especially for the cropland, whose bias is close to 0 μg/cm2. When LCC increased, the overestimation gradually turned to underestimation for the broadleaf, needleleaf, and grassland. When the ground-measured LCC was 60 – 80 μg/cm2, the bias of the product for broadleaf forest and grassland declined to below 15 μg/cm2. As for the bias of cropland, it fluctuated with the increase in LCC. Apart from the condition of LCC = 40 – 60 μg/cm2, the mean bias of the product is close to 0 μg/cm2, indicating that the overestimation or underestimation is relatively slight.

Fig. 3
Fig. 3
Full size image

Bias of the LCC product under different LCC conditions. Due to the limited number of shrubs, only the first four vegetation types are compared. The black line within each violin represents the mean value of the bias, and the box represents the value of the upper and lower quartiles.

The published 100m-resolution MuSyQ Global LCC product was validated using the 361 ground measurements (Fig. 4a and Table 3). The accuracy (RMSE = 15.11 μg/cm2, bias = 3.69 μg/cm2, R2 = 0.15) was lower than the 10m-resolution product for all five vegetation types. The grassland had the highest accuracy with the RMSE and bias of 11.19 μg/cm2, and −0.19 μg/cm2. For the 500m-resolution product, which shares the same resolution as the MODIS LCC product, its overall accuracy was lower than that of the 100m-resolution product, meaning that the resolution is an important factor contributing to the accuracy of the LCC product. RMSE rose to 15.77 μg/cm2 and the overall bias was 6.02 μg/cm2. The overestimation existed for the broadleaf forest, the needleleaf forest. Due to limited validation samples for cropland and shrub at the 500 m resolution, the current accuracy assessment for these vegetation types has relatively low representativeness. A more robust evaluation will require additional ground-based measurements.

Fig. 4
Fig. 4
Full size image

Validation of the 100m-resolution (a) and the 500m-resolution MuSyQ Global LCC product (b) in different vegetation types.

Table 3 Accuracy of 100m-resolution LCC product in different vegetation types.

Spatial patterns of MuSyQ Global LCC

Figure 5a,b illustrate the global distribution of the MuSyQ LCC product in January and July 2020. The line chart (Fig. 5c,d) illustrate the variation in average LCC across different latitudes. Overall, global LCC was lower in January compared with July. In the region between 0–30°S, LCC was relatively high, with an average close to 40 μg/cm2. The highest LCC values were observed in the mid to low-latitude regions of eastern South America and Africa. Additionally, the low-latitude areas of the northern hemisphere, such as the Indian subcontinent, also exhibited high LCC values. In the other northern hemisphere regions, LCC was generally below 30 μg/cm2 due to the winter season. In July, the average LCC in the 30°N–60°N region was above 20 μg/cm2, and in the 45°N–75°N region, it was generally above 30 μg/cm2. The map showed that the highest LCC values were found in the northern parts of the Eurasian continent and the mid-latitude eastern regions of North America. In contrast, the Southern Hemisphere generally exhibited lower LCC values, mostly below 20 μg/cm2. Figure 5e illustrates the detailed information of typical vegetation sites.

Fig. 5
Fig. 5
Full size image

Global distribution of the MuSyQ Global LCC product. The product was resampled to 1 km, and the mean values of the LCC products in January 2020 (a) and July 2020 (b) were shown in the map. The line charts (c) and (d) represent the averaged LCC in different latitudes. (e) represented the details of specific sites for different vegetation types across the year 2020.

Consistency assessment between MODIS LCC and MuSyQ Global LCC

Figure 6 illustrates the differences between the MuSyQ Global LCC and MODIS LCC. In January, the MuSyQ Global LCC product was generally lower than the MODIS product in the southern hemisphere, while in the mid to high-latitude regions of the northern hemisphere, the MuSyQ Global LCC was slightly higher than the MODIS LCC. The histogram on the right shows that the distribution of LCC's differences (ΔLCC = MuSyQ Global LCC – MODIS LCC) was concentrated in the negative value region, indicating that the values of the MuSyQ Global LCC in January were lower than those of the MODIS LCC. Additionally, ΔLCC for most pixels was within ±5 μg/cm2, suggesting good consistency between the two products. Figure 6b shows the spatial distribution of the differences between the two products during July. The regions with the largest ΔLCC were in the northeastern part of the Eurasian continent, northeastern North America, southern Africa, and the eastern regions of the southern hemisphere. The regions with the smallest ΔLCC were in the western part of the Eurasian continent, the Northern Hemisphere regions of Africa, and western Australia. The histogram on the right indicates that in July, the peak of ΔLCC was around 0 μg/cm2, and the distribution of the histogram was more symmetrical compared to January, with no significant tendency towards the negative value region.

Fig. 6
Fig. 6
Full size image

Difference between MuSyQ Global LCC and MODIS LCC in January (a) and July (b) 2020.

Due to the temporal overlap of the MODIS LCC product (2000–2020) and available Sentinel-2 imagery (L2A level, after 2019) on the GEE platform being limited to 2019 and 2020, the study selected validation points from these two years for an intercomparison between the products (Fig. 7). There were 57 validation points, including where the two 500 m products overlapped from 2019 to 2020. The accuracy of the two products was similar, with the MuSyQ LCC product showing slightly higher accuracy than the MODIS LCC product. The RMSE and bias improved from 14.74 μg/cm2 and −2.65 μg/cm2 to 14.16 μg/cm2 and 1.68 μg/cm2, respectively. In the 500 m scale, the two products tended to show an obvious underestimation under high LCC conditions. The MuSyQ Global LCC product exhibited lower RMSE for broadleaf forests and grasslands. In comparison, the RMSE for needleleaf forests was higher than that of the MODIS product.

Fig. 7
Fig. 7
Full size image

Validation of the 500m-resolution MuSyQ Global LCC product and 500m-resolution MODIS LCC.

Temporal profiles of global LCC products

Figure 8 compares the temporal profiles of the four specific vegetation types. Both products effectively captured the phenological characteristics of typical vegetation types, showing an initial increase followed by a decrease in LCC. The MuSyQ Global LCC product generally fluctuated between adjacent time, while the MODIS LCC time series was smooth. Additionally, the 10m-resolution MuSyQ Global LCC showed higher values than the 500m-resolution MODIS LCC for all four types during the summer, which is the primary difference between the two products. For broadleaf forests, needleleaf forests, and crops, the MuSyQ Global LCC reached values above 70 μg/cm2 during the summer, while the maximum values of the MODIS LCC were all below 60 μg/cm2. For grasslands, the maximum value of the MuSyQ Global LCC approached 50 μg/cm2, whereas the MODIS LCC was below 40 μg/cm2. The MuSyQ LCC product exhibits stronger seasonal variability compared to MODIS LCC, likely because MODIS LCC undergoes temporal reconstruction, which smooths its time series. Additionally, non-vegetation components within mixed pixels can further reduce LCC values. Future work will explore time-series smoothing methods for high-resolution MuSyQ Global LCC.

Fig. 8
Fig. 8
Full size image

Temporal profiles of LCC in (a) broadleaf forest; (b) needleleaf forest; (c) cropland; (d) grassland.

Retrieval rate of MuSyQ global LCC

Figure 9 presents the RR of the product under different temporal composition strategies. RR continuously increased with the extension of composition days. The 10-day composite product exhibited an RR of 33.2%–70.7%, the 20-day composite product reached 42.8%–79.9%, and the 30-day composite product further improved to 47.9%–83.4%. Additionally, in terms of RR across different seasons, winter and spring exhibited lower RR values, mostly below 60%, while summer showed higher RR values, with the 20-day and 30-day composite products generally exceeding 70%.

Fig. 9
Fig. 9
Full size image

Retrieval rate (RR) of the MuSyQ Global LCC under different temporal compositing strategies in 2020.

Uncertainties of MuSyQ Global LCC

High-resolution satellites typically have limited swath widths, resulting in longer revisit periods. The revisit period of Sentinel-2 is 5 days; issues such as cloud and rain cover can lead to high pixel missing in the 10-day composite product (Fig. 9). In the future, the reconstruction algorithm of the time series for the LCC will further enhance the applicability of the MuSyQ Global LCC product. Additionally, the MuSyQ Global LCC product utilizes the blue band, which is sensitive to atmospheric aerosols. Consequently, variations in atmospheric conditions during Sentinel-2 imaging can lead to fluctuations in LCC values, resulting in uneven brightness patterns. More accurate atmospheric correction and cloud masking algorithms will further enhance the accuracy of this product.