Introduction

Agriculture in Chile is currently being affected by various factors such as climate change, the reduction in annual total rainfall, increasing maximum and minimum temperatures in interior regions, the high frequency of high temperatures during summer, polar frosts during winter, and changes in seasonal rainfall patterns. These symptoms contribute to a more challenging climate for agriculture, leading to increased instability and higher stress1. Additionally, water sources have shown signs of exhaustion. The farming industry, alongside others such as mining and power generation, has excessive water demands, resulting in a deficit of this resource2. Improving the management of orchards and crops will help address the water deficit and maintain the quality of crops3.

The use of robotic platforms combined with multi-spectral and hyperspectral imaging systems has become a common technique for quickly assessing the health status of vegetation across relatively large areas of land3,4. Spectral information from vegetation is used to compute vegetation indices, which are sensitive to both physical and chemical properties of the vegetation. By integrating vegetation indices derived from spectral information and in-situ measurements, it is possible to train machine-learning models to predict important agronomic indicators, such as chlorophyll and nitrogen content, biomass, and yield, across a range of cultivars, including maize5,6, rice7,8, ratoon rice9, cotton10,11, onion12, potato13, wheat14, sugarcane15, soybean16, and sweet potato17.

There are numerous datasets in the literature containing spectral information about leaves. One notable example is the ANGERS Leaf Optical Properties Database (2003)18, which includes reflectance and transmittance measurements for 276 leaf samples from 39 different tree species in the Angers region of France. The reflectance values range from 400 to 2400 nm. Data collection occurred in June 2003, and the dataset also includes information on pigment content and physical parameters such as leaf mass and area.

A similar dataset is the Leaf Optical Properties Experiment Database (LOPEX93)19, which includes reflectance and transmittance values for 330 leaf samples from 45 different species in the Ispra region, Italy. The experiments were conducted in 1993. The dataset also provides information on the biochemical and physical parameters of the leaves, similar to the ANGERS dataset. Both datasets are highly relevant as they were used to calibrate the PROSPECT-4 and PROSPECT-5 models20, which are employed to predict the contribution of biochemical parameters (chlorophyll a, chlorophyll b, carotenoids, water, and dry matter) to the optical properties of leaves.

Both the LOPEX93 and ANGERS datasetd contain information from trees in the Mediterranean region, in contrast, the dataset presented in21,22, contains leaf spectral data from tropical trees in the Tapajos national forest, Brazil. Besides the spectral information, the authors also provide leaf area, leaf mass, leaf relative water content, and relative canopy environment for each leaf.

The dataset presented in23 shows the leaf spectra and biochemical traits of agricultural species. The crops included are common sunflower, cotton, pumpkin, cucumber, tomato, kidney bean, soybean, and sweet basil. The dataset comprises 184 reflectance values along with the corresponding biochemical parameters. A similar dataset is detailed in24, which includes information on watered and drought-stressed crops. This dataset features crop species such as Carolina poplar, cayenne pepper, common sunflower, radish, field pumpkin, foxtail millet, and sorghum.

As seen, there are numerous datasets with spectral information from leaves of various species. However, these datasets typically provide only a single spectrum per leaf, which corresponds to a unique area within the leaf. In contrast, a multi-spectral camera can record multiple pixels from a single leaf, making it preferable for covering larger areas of land4. In this context, a few datasets in the literature address this need. For example, Ryckewaert et al.25 proposed a dataset of hyperspectral images of grapevine leaves, both healthy and exhibiting disease symptoms. This dataset includes 204 hyperspectral images of grapevine fields, covering the range of 400–900 nm. In addition to the hyperspectral images, the dataset provides information on chlorophyll content, flavonoid content, and nitrogen crop status. Another relevant dataset, presented by Vayssade26, consists of multi-spectral images in 6 bands and is used for the identification of crops and weeds.

More recent datasets concerned with chemical and biological traits of vegetation are the one presented by Przybylska et al.27, where hyperspectral reflectance and functional traits of Arabidopsis thaliana are shown. This dataset is still only concerned with hyperspectral reflectance and does not provide images. Other dataset with similar features is the one presented in28, where reflectance and measurements from tundra vegetation are shared to the scientific community.

Other datasets such as the one presented in29, show multi-spectral images of tomato leaves for the assessment of pests and diseases. One disadvantage of this dataset is that it does not include information about chemical of biological traits of vegetation, and it does not include hyperspectral reflectance.

Most datasets in the literature focus on a single type of information, whether it be full spectral data or images. Each approach has its limitations: images may lack spectral resolution, while hyperspectral data collected with a spectrometer may lack spatial resolution. Combining these types of information could leverage the advantages of both approaches-offering the cost-effectiveness, portability, and ability to survey large areas provided by multi-spectral cameras, alongside the rich, detailed information obtained from hyperspectral imaging systems.

Thus, we propose a dataset composed of spectral reflectance ranging from 350 to 2500 nm, multi-spectral images with five bands (blue, green, red, red-edge, and near-infrared), as well as data on chlorophyll content (in SPAD values), leaf nitrogen content, and leaf water content. The samples include leaves from common cultivars in the Valparaíso region of Chile, specifically avocado, olive, and grape. To analyze the water content of the leaves, the samples were subjected to a drying process, resulting in five dehydration stages. The final dataset is composed of 106 leaves of avocado, 111 leaves of olive, and 106 leaves of grape.

Methods

This section describes the methodology used to collect the samples for the database, including the equipment utilized, the dehydration process of the leaves, and the post-processing applied to both the images and numerical data.

Equipment

The equipment consists of:

  • Micasense Red-Edge Camera Captures multi-spectral images in five bands: blue, green, red, red-edge, and near-infrared.

  • ASD TerraSpec Hi-Res Spectrometer Measures the spectral reflectance of leaves in the range of 350–2500 nm.

  • TYS-4N Plant Nutrition Meter Measures chlorophyll and nitrogen content in each leaf.

  • Kern PFB 120-3 Balance Used to weigh the leaves.

  • Memmert UN30 Universal Oven Used to dry the leaves, producing five different drying stages.

Table 1 provides the main features of the equipment, and Fig. 1 displays images of the instruments used to gather the dataset.

Table 1 Equipment used to create the dataset.
Fig. 1
figure 1

Instruments used to create the dataset: (a) Spectrometer, (b) multi-spectral camera, (c) plant nutrition analyzer, (d) balance, (e) oven.

Sample recollection process

Three different trees were chosen for this study: avocado, olive, and grape which are some of the most representative cultivars in the region of Valparaiso, Chile.

For all the trees, the leaf collection process was as follows: Individual branches were cut directly from the trees, and healthy leaves were selected through visual inspection. We considered a healthy leaf to be one with no holes or internal ruptures in the leaf area and, if possible, with no visible symptoms of deterioration, such as brown spots. If a leaf showed any of these symptoms, we ensured that the majority of the leaf area remained unaffected. These leaves were then placed in bags and transported to the laboratory for analysis and measurement within the first hour of collection30,31.

For the avocado trees, leaf collection occurred during April and May of 2023, with a total of 106 leaves collected. Olive leaf collection took place in July and August of 2023, resulting in a total of 111 leaves collected. Finally, for the grapes, the collection process was carried out between December 2023 and January 2024, with a total of 106 leaves collected. A summary of the total number of leaves and the collection process is provided in Table 2.

Table 2 Leaf recollection summary.

Measurement process

The dataset includes the following measurements: leaf weight, spectral reflectance ranging from 350 to 2500 nm, multi-spectral images from the blue, green, red, red-edge, and near-infrared bands, and chlorophyll and nitrogen content. Each measurement was recorded at five different dehydration stages of the leaves.

The drying process follows the guidelines provided by the authors in30, which are standard practices for analyzing fuel moisture content in leaves. The process begins with fresh leaves plucked from the branches. First, the spectral reflectance was recorded using the ASD TerraSpec Hi-Res spectrometer. The probe and light source were turned on for at least 15 minutes to warm up, as per the vendor’s instructions. Next, the instrument parameters were optimized, and the measurements were calibrated using a white Spectralon reflectance panel.

Once the calibration and warming up of the instrument were completed, measurements of each leaf were taken. The contact probe was placed flush against each sample, which was positioned on a white background (Spectralon surface). Three individual measurements were collected from different zones of the leaf: one at the bottom, one in the middle, and one at the top. The approximate positions where the probe was placed to collect the measurements are shown in Fig. 2. The final spectral reflectance for each leaf was calculated as the average of these three measurements.

Next, the chlorophyll and nitrogen content were measured using the TYS-4N plant nutrition meter. The leaves were placed in the instrument’s probe, and again, three measurements were taken per leaf from the bottom, middle, and top zones. The final measurement was recorded as the average value.

The leaves were then weighed using the electronic balance. Finally, multi-spectral images were captured. The MicaSense RedEdge camera was mounted on a tripod at a height of 45 cm above the leaves. Each image contained small batches of samples (4 leaves for avocado, 6 to 8 leaves for olive, and 1 leaf for grape samples), with the leaves placed against a white background (white paper sheets). To ensure consistent lighting conditions during the measurement process, artificial white light was used, and windows and other natural light sources were blocked. The artificial white light used to capture the multi-spectral images is provided by a fluorescent lamp with a color temperature of 4100 K and a luminous flux of 2850 lumens. The lamps are mounted on the ceiling of the laboratory.

The dehydration proceeded as follows: After the initial measurements were collected, the leaves were placed in an oven set to a temperature of 65 °C. This temperature was selected based on common practices for assessing the flammable content of leaves30. The leaves were dried for 15 minutes. After this process, they were removed from the oven, and a new set of measurements was recorded at the first drying stage.

This process was repeated two more times, resulting in additional sets of measurements at the second and third drying stages. Finally, the leaves were placed in the oven for 24 h to remove all remaining traces of water. After this period, a final set of measurements was recorded with the completely dry leaves.

Both the spectrometer and the nutrition meter were calibrated after each batch of leaves was measured at each drying stage, sixteen leaves for avocado leaves and eight leaves for grape leaves. For olive leaves, calibration was performed after sixteen samples were measured, but a complete batch was of 32 leaves. The spectrometer was calibrated using a white Spectralon surface, and the instrument parameters were optimized. The nutrition meter was calibrated using its internal reference, following the vendor’s instructions. Images of the calibration panels were collected after all measurements were completed in each trial.

The laboratory temperature was kept between 20 and °C using an air conditioning system and the relative humidity varied between 55 and 70%.

This measurement process resulted in five distinct sets of measurements at various dehydration stages of the leaves: fresh leaves, dehydration stages one through three, and the completely dehydrated stage. This process allows the determination of the percentual water content of the leaves at each stage.

Fig. 2
figure 2

Approximate positions where the probe was located in each leaf to collect the measurements in (a) avocado, (b) olive and (c) grape leaves.

Multi-spectral image post-processing

Radiometric calibration was performed for each band of the multi-spectral images. To achieve this, images of a calibrated reflectance panel were captured during each trial. The images were then post-processed using the metadata from each file, following the vendor’s instructions. The radiometric calibration process was conducted using the free Python software provided by Micasense (https://github.com/micasense/imageprocessing).

With the calibrated images, a segmentation process was conducted to extract individual leaves from the batches. The segmentation process is as follows:

  • The binarization of the image was performed using a threshold method. The value of the threshold was chosen using the Otsu method32.

  • With the binarized image, the individual contours were found for each leaf.

  • The original image was masked with each contour.

  • Each leaf was cropped from the original image, and saved in a separate file.

The segmentation algorithm was implemented in a Python v3.11 environment using the OpenCV library. The results of the segmentation algorithm can be seen in Fig. 3.

Fig. 3
figure 3

Segmentation algorithm: (a) A multispectral image of a batch of olive leaves, (b) the binarization result using the Otsu threshold, (c) the result of masking the original image with the binarized image, and (d) the individual leaves generated.

A visual assessment was performed for each leaf segmented by the algorithm. In some cases, two or more leaves were too close to each other, resulting in them being segmented together. In other instances, parts of the leaf were mistakenly identified as the background. In such cases, manual segmentation was performed using the Image Segmenter app in MATLAB.

Leaf water content

Based on the weight of the leaves, the leaf water content can be expressed as fuel moisture content using either a dry basis or a fresh basis30. The equations to calculate the fuel moisture content are shown below:

$$\begin{aligned} FMC_f&= \dfrac{W_{f,t}-W_d}{W_{f,t}} \times 100 \end{aligned}$$
(1)
$$\begin{aligned} FMC_d&= \dfrac{W_{f,t}-W_d}{W_d} \times 100 \end{aligned}$$
(2)

In the equations \(FMC_f\) is the fuel moisture content using a fresh base, \(FMC_d\) is the fuel moisture content using the dry base, \(W_{f,t}\) is the leaf weight at the drying stage t, \(W_d\) is the dry leaf weight after the 24 h in the oven.

Data records

All of the data can be found in this repository33.

The dataset is organized into three main folders: Avocado, Olive, and Vineyard, each containing data specific to the respective species (the Vineyard folder contains the files for the grape samples). Within each of these folders, there is a subfolder called Multi-spectral images. This subfolder contains the multi-spectral images for each leaf sample, with five bands and five drying stages, resulting in a total of 25 images per leaf.

Each image is a .tif file, with a resolution of 16 bits. The size of the files and the dimensions of the images vary due to the fact that the leaves were cropped and the background was subtracted.

The naming convention is as follows, each file has the name leaf###dy_x.tif, where ### is a three-digit number identifying the number of the sample, y is the drying stage (ranging from 0 to 4, with 0 representing the fresh stage and 4 representing the completely dehydrated stage), and x is the multi-spectral band, which can take the following values: 1 (blue), 2 (green), 3 (red), 4 (near-infrared), and 5 (red edge). The Avocado and Vineyard Multi-spectral image folders contain 2,600 files each, while the Olive folder contains 2,775 files.

There are other files in the folders for each species, these files are: FMC_species.mat, Nitrogen_species.mat, Chlorophyll_species.mat, Weight_species.mat, Spectral_species.mat, with species being either avocado, olive or vineyard. These files can be read using Matlab software. Once these files are loaded, matrices containing the chlorophyll content, nitrogen content, weight, and fuel moisture content with a dry and fresh basis will appear in the working space, all these matrices are of size \(N\times 5\), N number of samples, and five drying stages. When the Spectral_species is loaded, five variables will appear in the workspace, these matrices are of size \(N\times 2151\), N is the number of samples which depends on the species, and the 2151 are the reflectance values in the wavelength range from 350 to 2500 nm. Each matrix contains a drying stage.

The tree structure of the dataset and a Matlab workspace when all the variables have been loaded, can be seen in Fig. 4, and a sample of the complete 25 images can be seen in Fig. 5.

Fig. 4
figure 4

(a) File tree with all the files and folders contained in the dataset, (b) variables created in a Matlab workspace once all the .mat files have been loaded.

Fig. 5
figure 5

Images of the five bands and five dehydration stages from an avocado leaf. to d.5) dehydration stage 3, e.1) to e.5) completely dry leaf.

Technical validation

All experiments conducted to gather the dataset followed the protocols specified by previous authors30,31,34. The drying process was implemented to achieve various levels of leaf water content, which will aid in analyzing the spectral response of leaves to different levels of water stress. The drying temperature for all species was selected based on established protocols to ensure that leaves do not lose any flammable gases during the process. These temperatures have been tested for forest trees such as Eucalyptus globulus and Pinus radiata30,31,35,36, as well as for crops such as potatoes37.

The results of the spectral reflectance values for all species and drying stages are shown in Figs. 6, 7, and 8. Additionally, Fig. 9 displays the mean values at different drying stages for the three species. As depicted in the figures, the areas in the short-wave infrared (SWIR) region exhibit changes in response across the different drying stages. This phenomenon is attributed to the response of infrared wavelengths to water molecules30,34. Particularly, reflectance values at bands at around 975 nm, 1200 nm, 1400 nm and 1900 nm systematically increase across the drying stages due to the vibrational bonds of water molecules38.

For the water content of the leaves, it is evident that as dehydration progresses, each leaf in all species shows a considerable loss of water content. Table 3 presents information on the water content for each species at each dehydration stage, including the mean, maximum (max), minimum (min), and standard deviation (std). The completely dehydrated stage was excluded from the table since the water content is zero in all cases.

Table 3 Descriptive statistics about the leaf moisture content at each drying stage.
Fig. 6
figure 6

Spectral reflectance for the avocado leaves, the mean value is darker and the shaded area corresponds to the maximum and minimum values, (a) fresh leaves, (b) first dehydration stage, (c) second dehydration stage, (d) third dehydration stage, and (e) completely dehydrated leaves.

Fig. 7
figure 7

Spectral reflectance for the olive leaves, the mean value is darker and the shaded area corresponds to the maximum and minimum values, (a) fresh leaves, (b) first dehydration stage, (c) second dehydration stage, (d) third dehydration stage, and (e) completely dehydrated leaves.

Fig. 8
figure 8

Spectral reflectance for the grape leaves, the mean value is darker and the shaded area corresponds to the maximum and minimum values, (a) fresh leaves, (b) first dehydration stage, (c) second dehydration stage, (d) third dehydration stage, and (e) completely dehydrated leaves.

Fig. 9
figure 9

Mean reflectance value for the full spectrum at the five different drying stages: (a) avocado samples, (b) olive samples, (c) grape samples.

Figure 10 shows the violin plots for additional measurements from the dataset, including chlorophyll content and leaf water content expressed as fuel moisture content on both a fresh and dry basis. As expected, at the final dehydration stage, all values of fuel moisture content are zero, due to the complete removal of water traces during the drying process in the oven.

Fig. 10
figure 10

Violin plots for (a) chlorophyll content, (b) fuel moisture content using a dry basis, and (c) fuel moisture content using a fresh basis. The graphs also show the p values for a t-test to check if the expected values between the means of each drying stage are significantly different from each other.

As it can be seen the variation in the spectra, is related to the variation in the physical traits presented by the leaves, particularly the leaf moisture content. Variations in the leaf moisture content presented in the violin plots are also reflected in the spectra measurements, from Figs. 6, 7, and 8. Suggesting that leaf moisture content can be estimated using hyperspectral information as suggested by30.

Another notable result is that in all the species the median value of the chlorophyll content increases with the dehydration stage. SPAD values is computed using two transmittance values, from red light at 650 nm and infrared light at 940 nm. The equation is shown below39:

$$\begin{aligned} SPAD = k\times [\log (T_{940})-\log (T_{650})]+C \end{aligned}$$
(3)

where \(T_x\) is the transmittance at x wavelength, \(\log\) is the common logarithm function, k is a confidential proporcional coeficcient and C is a compensation value in the instrument software39.

It has been demonstrated that the SPAD values are highly correlated with the chlorophyll content per area40, however, the SPAD value can be affected by external conditions such as illumination conditions, nitrogen treatment40 and water stress induced by floodings41,42.

Thus, it is reasonable to expect variations in the SPAD value, with the different drying stages of the leaves. Figure 9 shows the mean reflectance values across the spectrum. In the avocado, and olive plots there is a clear change in the region from around 750 nm to 1000 nm, where the fresh stage shows a steep change in the reflectance values, whereas in the dehydration stages a smoother change in the values is observed. This changes in the reflectance values shows a correlation with the increase in the SPAD values that is shown in the violin plots from Fig. 10, particularly from olive and avocado where the biggest change in the mean SPAD value occurs from the fresh stage to the first dehydration stage. In the grape leaves, such changes in the reflectance values from region 750–1000 nm are not so evident, thus the SPAD measurements remain with little variation to their mean value. Table 4 shows the mean, maximum (max), minimum (min), and the standard deviation (std) for the chlorophyll content for each species at each drying stage.

A t-test was conducted to determine if there are significant differences between the means of measurements at different drying stages. The test showed that the expected values for FMCf and FMCd are significantly different from each other at every drying stage for each of the three species. However, for chlorophyll content, dehydration stages 1 through 3 are not significantly different from each other in the avocado and grape samples. In contrast, for the olive dataset, the means of the three stages are significantly different from each other, except between stages 2 and 3 of dehydration.

Table 4 Descriptive statistics about the chlorophyll content at each drying stage.

Finally, to further validate the spectral dataset, we replicated the studies conducted by Villacres et al.30, in which water-related vegetation indices were computed and linear regressions were used to assess their relationships. For our analysis, we selected the top two performing vegetation indices based on the correlation factor (\(R^2\)): the Leaf Water Index (LWI) and the Double Difference Index (DDI), as well as one of the least performing indices, the Normalized Difference Water Index 1 (NDWI1). The equations for these indices are provided below, where \(R_x\) represents the reflectance value at the x wavelength:

$$\begin{aligned} LWI&= \dfrac{R_{1300}}{R_{1450}} \end{aligned}$$
(4)
$$\begin{aligned} DDI&= 2R_{1530}-R_{1005}-R_{2055} \end{aligned}$$
(5)
$$\begin{aligned} NDWI1&= \dfrac{R_{860}-R_{1240}}{R_{860}+R_{1240}} \end{aligned}$$
(6)

Table 5 shows the equations (\(y=ax+b\)) and the evaluation metrics of correlation factor \(R^2\) and root mean squared error. The linear plots for the three indices and the corresponding tree species are shown in Fig. 11. As can be seen, the results obtained from Villacres et al.,30 and in this technical validation, imply that there is a linear relationship between the fuel moisture content in vegetation and some vegetation indices, particularly LWI and DDI, however, the metrics in30, reach higher values, particularly the highest correlation factor achieved in30 in the LWI is of 0.96 and our result is of 0.84. This can be attributed to different factors, such as variations in the reflectance values across the different species, and a more widespread result in the computation of the VIs in this study.

Table 5 Linear regression results for the prediction of fuel moisture content using different vegetation indices.
Fig. 11
figure 11

Linear regression results for the tested VIs for fuel moisture content prediction.

Usage notes

There are numerous applications for this dataset. First, the data collection process was carried out following strict protocols, and it included information not provided by other datasets particularly multi-spectral images. Also, the various levels of leaf water content allow the exploration of the relationship between this biotic factor and the respective spectral signature in the range from 350 to 2500 nm, and also with the limited information provided by the multi-spectral cameras.

The use of multi-spectral images can be used to perform classification according to the dehydration level such as the study conducted in43, but applied to agricultural species.

Finally, the addition of SPAD values can be explored through the use of vegetation indices and multi-spectral images. Furthermore, the relationship between the SPAD values and the leaf moisture content can be explored with the use of the current dataset.