Background & Summary

Rice, wheat, and maize are the main sources of human food, accounting for approximately 42.5% of the calorie supply and 79.2% of the total cropland1. However, the rapid increase in global population has led to an increase in food demand, exacerbating the expansion and transformation of cropland use patterns2. Meanwhile, climate change, urbanization, and land degradation have led to significant changes in crop cultivation, posing a threat to global food security3. Therefore, accurate and timely information on crop spatial distribution and long-term changes is crucial, especially for agricultural ecosystem management and environmental sustainability policy formulation4.

In recent decades, remote sensing technology has been used for mapping crop types for a long time5,6. Due to the fact that different crop types may have similar spectral and texture features during the same growing season in large-scale remote sensing images7. Therefore, phenological differences play an important role in large-scale crop classification8,9,10. Meanwhile, with the development of machine learning technology, the random forest (RF) algorithm has been successfully used for crop classification11,12. The combination of these phenological features and machine learning techniques has facilitated the implementation of large-scale crop classification13.

The most important cropland for food security is the main grain land (MGL), mainly plant three food crops: rice, wheat, and maize14, the three major staple crops that constituted 30.37%, 20.06%, and 40.38%, respectively, of the national total grain production in 202315. Meanwhile, the implementation of agricultural measures depends on the MGL use change information. Although there is increasing research on cropland change, the large-scale and long-term MGL changes have not been discussed. The known time series has a resolution of only 500 m and lacks MGL distribution before 20001,16. Due to the high diversity and complexity of crop planting systems, spatially datasets of MGLs, including single wheat, single rice, single maize, wheat & maize, wheat & rice, double rice, and non-MGL, in China are not available, which hinders research on global food security17. Meanwhile, many previous studies have focused on mapping single crop types without considering different planting patterns or multiple crop types over long time series18. Therefore, considering the limitations of short time and low resolution19,20,21,22, it is necessary to mapping high-resolution long time series MGL maps. It is urgent to obtain a spatially clear MGL dataset in China in the long term. Accurate and timely MGL datasets are of great significance for relevant scientific research and policy formulation.

To address this issue, we have focused on China, a typical agricultural country, and constructed a MGL dataset by integrating multi-source data and high-resolution satellite imagery. From this, we have quantified the MGL use change. The main aims of this study are: (1) to reveal the crop type and spatial distribution of MGL; (2) to evaluate the accuracy of MGL maps by comparing the ground-truth crop type and grain statistical data obtained from the National Bureau Statistical of China (NBSC); (3) to investigate the spatiotemporal patterns of MGL change. This study provides scientific support for optimizing MGL farming practices, ensuring global food security, and promoting agricultural management.

Methods

Study framework

The study framework used for mapping MGL includes data preprocessing, MGL mapping methods, accuracy evaluation, and quantification of MGL change patterns. We first constructed phenological indicators based on the Google Earth Engine (GEE) platform, including Normalized Difference Vegetation Index (NDVI)、 Normalized Difference Water Index (NDWI). NDVI is often used to reflect surface vegetation characteristics23, evaluate crop growth status, and is one of the important indicators for identifying crops24. NDWI is a key indicator for distinguishing between water and land, often used to identify rice planting areas25,26. Therefore, phenological change information characterized by NDVI, NDWI, and RGB bands of Landsat are combined to train an RF model for identifying MGL. In addition, we collected all available land use/cover change (LUCC) data and spatial distribution datasets of the major grain lands from 1985 to 2020. Used to construct annual MGL training samples. Combining phenological indicators with MGL samples, a machine learning algorithm based on RF is used to train and identify different MGLs. We developed an MGL dataset, analyzed spatiotemporal changes in cropland and MGL in China over 36 years (Fig. 1).

Fig. 1
figure 1

Study framework.

To ensure the accuracy of MGL mapping results, we adopted three different data validation methods. Firstly, based on field observation data from different years, the accuracy of identifying types is verified at the plot scale. Secondly, by calculating the total MGL area from 1980 to 2020 and comparing it with the data reported by National Bureau of Statistics of China (NBSC), the accuracy of MGL identification at the time series scale is verified. Finally, we validated the spatial accuracy of the dataset based on different crop planting areas reported by NBSC in different provinces. Finally, based on accurate MGL mapping results, we extracted critical crop planting patterns, including crop types, MGL gain time (the time when the non-MGL pixels first transformed into MGL), planting intensity (the total number of years used for planting MGL), and MGL loss time (the time when MGL pixels first transform into non-MGL pixels), using time-series analysis. These indicators are the core indicators reflecting the changes in China’s MGL over the past 36 years.

Study area and data source

This study focuses on China, where the agricultural planting system is highly diverse and complex27 (Fig. 2). Wheat, rice, and maize are the most important staple crops in China28. Globally, wheat, rice, and maize account for 79% of the total grain harvest area29. In fact, in China, these three main crops accounted for about 97% of the country’s grain area in 2020 (www.stats.gov.cn/english/).

Fig. 2
figure 2

Study area. The land cover data is sourced from China Land Cover Dataset 1985–202032.

The study mainly utilized datasets of cropland and MGL distribution, field crop monitoring data, and statistical data (Table 1). The spatial distribution of rice19,20, wheat21, and maize22 in China from 1985 to 2020 were collected to assist in producing national MGL training samples30. We employed the Landsat dataset in GEE to generate MGL. Additionally, the annual 30 m cropland dataset in China from 1986 to 2021 was used to constrain the scope of MGL31, with the cropland in 1985 being supplemented using the cropland from the China land use dataset in 198532.

Table 1 Data source of this study.

Identification of MGL

We have collected spatial distribution datasets of rice, wheat, and maize in China within a resolution range of 30 m–1 km, spanning from 1985 to 2020. Based on these existing datasets, we have utilized data fusion techniques to create a nationwide fundamental sample dataset of MGL. In the specific process of data fusion, we first expand the spatial range of all available datasets to the entire China. During the expansion process, fill the missing pixels with NoData and resample them to a resolution of 30 m for subsequent statistics and fusion. Secondly, we used a probability distribution model (Eq. (2)) to calculate the probability (\(P\left(T\right)\)) of each pixel belonging to non-MGL (code: 0, N), wheat (code: 1, W), maize (code: 2, M), single rice (code: 3, R), and double rice (code: 4, DR) based on all existing crop datasets (\(A\)).

$$A=\left\{T\left|0,1,2,3,4\right.\right\}$$
(1)
$$P\left(T\right)=\frac{{N}_{{Type}}}{{N}_{A}}\times 100 \% $$
(2)

Considering that the multiple cropping index of cropland in China does not exceed 3 times, this study assumes that a pixel can belong to no more than 3 crop types in a year, which means that the threshold of the probability distribution model is 33%. Based on the probability distribution characteristics of each crop in each pixel, this study determines the actual type of pixel as the crop with a probability greater than the threshold. If a pixel belongs only to a single crop type other than double rice, it is determined as a single planting MGL type, such as single wheat, single rice, or single maize. If a pixel belongs only to double rice, then the pixel is defined as double rice. If a pixel belongs to multiple crop types, it is defined as an MGL type that has been planted multiple times, such as wheat & maize, wheat & rice, maize & rice. Considering that there are no pixels belonging to both maize & rice, seven basic MGL types including non-MGL were formed in the basic sample dataset (Fig. 3).

Fig. 3
figure 3

Data fusion process of the fundamental sample dataset of MGL.

Considering 7 types comprehensively, including single wheat, single rice, single maize, wheat & maize, wheat & rice, double rice, and non-MGL, we have automatically selected 100,000 sample points nationwide from the fundamental sample dataset using stratified sampling. Meanwhile, utilizing Landsat imagery from the GEE platform, we have constructed a remote sensing imagery feature set for each year. Available spectral features are synthesized every two months, including red, green, blue, and near-infrared bands, as well as NDVI, NDBI, and NDWI. By combining six spectral features from each year, we have formed a phenological feature dataset. Finally, using the random forest algorithm, we individually mapped the MGL for each year. By merging all types of MGL, the cropland map of China from 1985 to 2020 can be obtained.

Quantification of cropland planting patterns

Based on the MGL dataset from 1985 to 2020, we assigned MGL and non-MGL values of 1 and 0 to all cropland, respectively. The planting intensity gain time and loss time of cropland and MGL was calculated.

Detection of MGL planting intensity

Using time series analysis techniques, we counted the number of times each pixel (\({C}_{{ij}}\)) belonged to cropland or MGL over the past 36 years (Eq. (3)), which were named as the cropland planting intensity (\({{PI}}_{{cropland}}\)) and the MGL planting intensity (\({{PI}}_{{MGL}}\)), respectively. The attribute range of the \({{PI}}_{{MGL}}\) map is 0–36, where 0 represents non-MGL cropland. The values from 1 to 36 represent the number of times that pixel has been used for PIMGL over the past 36 years. Similarly, In the \({{PI}}_{{cropland}}\) map, 0 represents non-cropland, and the meaning of attribute values is similar to \({{PI}}_{{MGL}}\).

$${PI}=\mathop{\sum }\limits_{k=1985}^{2020}{C}_{{ijk}}$$
(3)

Detection of MGL gain time

Using time series analysis techniques, we counted the first time of each pixel (\({C}_{{ij}}\)) belonged to cropland or MGL over the past 36 years (Eq. (4)), which were named as the cropland gain time (\({{GT}}_{{cropland}}\)) and the MGL gain time (\({{GT}}_{{MGL}}\)), respectively. The attribute range of \({{GT}}_{{MGL}}\) map is 1986–2020, and the values of each pixel represent the time when the pixel first transformed into MGL. The attribute range of the \({{GT}}_{{cropland}}\) map is 1986–2020, and the values of each pixel represent the time when the pixel was first developed into cropland.

$${GT}=\mathop{\min }\limits_{k=1985\to 2020}\left[k\times {C}_{{ijk}}\left|{C}_{{ijk}}=1\right.\right]$$
(4)

Detection of MGL loss time

To ensure the stability of the cropland loss time (\({{LT}}_{{cropland}}\)) and MGL loss time (\({{LT}}_{{MGL}}\)), we define the pixels (\({C}_{{ij}}\)) that have been non-cropland or non-MGL for three consecutive years as stable loss pixels. Using time series analysis techniques, we counted the first time of each pixel (\({C}_{{ij}}\)) belonged to \({{LT}}_{{cropland}}\) or \({{LT}}_{{MGL}}\) over the past 36 years (Eq. (5)), respectively. The attribute range of \({{LT}}_{{MGL}}\) map is 1987–2020, and the values of each pixel represent the time when the pixel first transformed into non-MGL. The attribute range of the \({{LT}}_{{cropland}}\) map is 1987–2020, and the values of each pixel represent the time when the pixel was first abandoned or converted into cropland, including long-term fallow. The time for re-cultivation after fallow is recorded on the \({GT}\) map.

$${V}_{{ijk}}=\mathop{\max }\limits_{k=1985\to 2018}({C}_{{ijk}},{C}_{{ijk}+1},{C}_{{ijk}+2})$$
(5)
$${LT}=\mathop{\min }\limits_{k=1987\to 2020}\left[k\times (1-{V}_{{ijk}})\left|{V}_{{ijk}}=0\right.\right]$$
(6)

Crop planting area statistics

The annual sowing area and the area planted with MGL are different concepts. Due to the fact that the national statistical agricultural planting area is only the total sowing area of a single crop, without the composite MGL area, we cannot directly verify the accuracy of MGL. To verify the accuracy of MGL’s mapping area, it is necessary to calculate the annual total planting area for each crop type (\({A}_{i}\)), which is the sum of the areas of any MGL type planted with that crop (\({{AMGL}}_{{ij}}\)). The existence of the multiple cropping index allows this study to calculate the crop sowing area (Eq. (7)), ensuring comparability with the national statistical agricultural planting area. Please note that we have calculated the double cropping rice area twice when calculating the rice planting area (\({m}_{{ij}}=2\)).

$${A}_{i}=\mathop{\sum }\limits_{j=1}^{n}{m}_{{ij}}\times {{AMGL}}_{{ij}}$$
(7)

Data Records

MGL map in China

The datasets are available at the figshare repository in a Geotiff format with the ESPG: 4326 (WGS_1984) spatial reference system33. The main grain land changes in China are provided during 1985–2020 (Fig. 4). The dataset includes 36 bands. Bands 0 to 35 represent the MGL maps from 1985 to 2020, respectively. For an annual year MGL map, the code of MGL map ranges from 0 to 7, representing non-cropland, single wheat, single rice, single maize, wheat & maize, wheat & rice, double rice, and non-MGL respectively. Similarly, the cropland dataset can be obtained by simply merging the MGL dataset.

Fig. 4
figure 4

MGL in China from 1985 to 2020. (a) is the overall distribution of MGL in 2020. (b) is the mapping results of MGL in 1985. (c) is the mapping results of MGL in 2005. (d) is the mapping MGL results for 2020.

MGL change pattern map

The dataset also includes maps of three map layers of MGL planting patterns, including MGL planting intensity, MGL gain time, and MGL loss time (Fig. 5). The attribute range of MGL gain time map is 1986–2020, and the values of each pixel represent the time when the pixel first transformed into MGL. The attribute range of the MGL planting intensity map is 0–36, where 0 represents non-MGL cropland. The values from 1 to 36 represent the number of times that pixel has been used for PIMGL over the past 36 years. The attribute range of MGL loss time map is 1987–2020, and the values of each pixel represent the time when the pixel first transformed into non-MGL. The planting pattern map of the cropland dataset corresponding to the MGL year has the same pixel attributes as the MGL planting pattern.

Fig. 5
figure 5

The change pattern of MGL.

Technical Validation

Comparison with field data

We compared our maps with field data at each agro-meteorological site during the period 1985–2020. This involves a total of 3113 on-site verification points, with an average of over 100 verification points per region in 31 provincial-level administrative units in China. The overall accuracy (OA), user accuracy (UA), and producer accuracy (PA) were calculated for each annual maps of rice, wheat, and maize using the monitoring data (Table 2). The OA of the three grain crops was 93.57%. The user accuracy of rice, wheat, and maize were, 92.84%, 96.00%, and 95.36% respectively. Meanwhile, the producer accuracy of rice, wheat, and maize were, 94.98%, 91.45%, and 94.90% respectively.

Table 2 Accuracy assessment using field sites.

Validation of MGL total planting area

Here, we have calculated the estimated total MGL planting area in China each year and compared it with the annual major grain planting area reported by NBSC (Fig. 6). The validation results from a long time series show that the MGL dataset has a high consistency with the NBSC reported data (R2 = 0.824). The high consistency also indicates that the mapping accuracy of this dataset is high.

Fig. 6
figure 6

Time series consistency between MGL total area and statistical data.

Spatial validation with agricultural statistical data

The plant areas derived from MGL maps were compared with agricultural statistical data reported by NBSC at the provincial level from 1985 to 2020 (Fig. 7). There is a high consistency between the rice planting area of each province and statistical data, with an R2 close to 1 (0.97–0.99). Meanwhile, the wheat planting areas also agreed well with agricultural statistical data (R2 = 0.92–1). In contrast, the spatial consistency of maize is lower than that of wheat and rice, but its R2 is still higher than 0.80 (0.80–0.97). These high levels of consistency also indicate that the MGL dataset has high spatial accuracy.

Fig. 7
figure 7

Spatial validation with statistical data from different provinces.

Comparison of accuracy with existing datasets

To compare the accuracy of the MGL product, we compared the overall accuracy (OA, %) of the mapping results with existing high-resolution crop map using the same sample. The dataset for comparison includes crop maps near 2020 with a resolution of no less than 30 meters, including wheat maps in 30 m resolution by Dong et al., maize maps in 30 m resolution by Peng et al., and rice maps in 10–20 m resolution by Shen et al. Our wheat map has an OA of 91.34%, slightly higher than that of 89.88% of Dong’s wheat map (Fig. 8a). The maize map of this study has an OA of 93.14%, higher than that of 80.06% of Peng’s wheat map (Fig. 8b). Similarly, our rice map has an OA of 93.88%, higher than that of 85.23% of Peng’s wheat map (Fig. 8c). From a consistency perspective, the MGL results have a high degree of consistency with the mapping OA of a single crop map (Fig. d-f, R2 = 0.9995 in wheat, R2 = 0.9986 in maize, R2 = 0.8447 in rice). These results indicate that the MGL map are slightly higher than similar products with the same resolution available.

Fig. 8
figure 8

Comparison of accuracy with existing datasets. (a,b) are the OA comparison and consistency results of wheat. (c,d) are the OA comparison and consistency results of maize. (e,f) are the OA comparison and consistency results of rice. The codes 1–31 on the horizontal axis represent the 31 provincial-level administrative units in China.