Abstract
Changing crop patterns are primary driver of land use change and can impact global atmospheric cycles. While existing studies have mapped the distribution of several crops in China, harvest area maps for a complete set of crops over the past decades are lacking. This study pioneered the development of a spatiotemporal dataset of harvest area maps for 16 crop types in China at a 1-km resolution from 1990 to 2020 with 5-year intervals. Prefecture-level crop statistics were allocated to grids based on synthetical crop suitability score, which is evaluated by natural and socioeconomic factors. County-level validations demonstrated the built dataset is highly consistent with statistics, especially for primary grains and oilseed. Moreover, crop harvest area at sub-pixel level can better represent gradient changes within urban-rural zones. The built crop maps revealed the harvest zones for maize, rice and soybeans in Northern China have steadily expanded since 1990. This dataset fully supports identification of spatiotemporal changes in China’s crop patterns and can serve as critical input for biogeochemical and agricultural models.
Similar content being viewed by others
Background & Summary
As one of the largest food producers in the world, China occupies only 9% of global cropland but produces 25% of the world’s food to feed approximately 20% of the global population1. However, China’s crop cultivation has experienced a significant transformation since the initiation of rural reforms2. The cropland area has expanded from 169 million hectares to 172 million hectares from 1986 to 20213. In particular, change in the annual crop mix has also occurred. The maize cultivation area has increased by 105% over the past thirty years, while the area of rice and wheat has decreased by 11% and 19%, respectively4. Moreover, the increase in cropland and fertilizer application has resulted in severe environmental challenges in China, such as soil degradation, biodiversity loss, and non-point source pollution5,6,7. Therefore, revealing the spatially explicit evolution of crop patterns in China is critical for supporting food security strategies and assessing the impact of agricultural practices on the environment.
The geographical distribution of cropland in China has changed substantially over the past three decades, leading to significant transformations in crop patterns. The area of cropland in Southeastern China has declined by 12% since 1990, particularly in the Yangtze River Delta and Pearl River Delta regions, while cropland reclamation in Northeastern China has increased by 15%8. Meanwhile, the proportion of grain production in Northern China (56%) has surpassed that in Southern China, indicating a shift in the center of grain production from south to north9,10. In particular, grain production in Southeastern coastal provinces, such as Guangdong, Jiangsu, and Zhejiang, has declined dramatically11. In contrast, the grain output in the Northeastern provinces, including Heilongjiang and Jilin, has increased from 12% to 25% of national grain production12. However, these changes are primarily analyzed using administrative-level statistics, which overlook spatial heterogeneity within statistical units13,14. Spatially explicit maps with high resolution that illustrate the evolution of crop patterns in China over an extended period are yet to be developed.
Several studies have attempted to depict crop patterns in China using high-resolution Earth observation imagery. The distribution of the three main crops (maize, rice, and wheat) of China in the past two decades has been mapped at the resolutions of 1 km, 500 m, and 30 m in existing works based on satellite images from the MODIS (Moderate Resolution Imaging Spectroradiometer) and Landsat project15,16,17. While crop maps with higher spatial resolutions (10 m) are now available, the high-resolution imagery from the Sentinel-2 mission only offers coverage for periods starting from 201618,19,20. Although there have been ongoing advancements in the spatial and temporal resolutions of remote sensing imagery, the acquisition of free or low-cost remote sensing data remains challenging21,22. Furthermore, since crop recognition and classification rely on substantial training samples23,24, most studies have focused on the spatial distribution of individual grain crops, while harvest area maps for a comprehensive set of crops currently remain absent.
To map the distribution of multiple crops, data fusion techniques have been applied to integrate statistical data with remote sensing observations. For instance, the M3 and MIRCA (Monthly IRrigated and rainfed Crop Area) datasets depict the distribution of over 100 crop types by fusing national crop statistics with the global cropland extent in 2000, which were updated to the CROPGRIDS dataset for 202025,26,27. The GAEZ (Global Agro-Ecological Zones) programs illustrate the patterns of crop natural growth suitability for 2000 and 2010, and have been extended to include data for 2015 in the GAEZ 2015 + dataset28. Detailed information about these datasets is provided in Table S1 in the Supplement. However, the spatial resolution of global crop datasets is generally coarse (5 arcminutes, ~9 km), and their temporal coverage is limited to discrete years after 2000, as illustrated in Fig. 1. The SPAM (Spatial Production Allocation Model) datasets provide crop grids in China from 2000 to 2020 at 5-year intervals, nevertheless, the input data, crop types and methodologies are different for each period, which limits its application in time-series analysis applications29,30. Therefore, it remains a key task to develop high spatial resolution maps for multiple crops in China that are consistent with long-term statistical data.
Crop spatial distribution datasets. The dashed border rectangles indicate datasets from data fusion and the solid border rectangles indicate datasets from remote sensing observations. The x-axis represents the annual periods, the left y-axis represents the crop types included in the dataset, and the right y-axis represents the spatial resolution of the dataset.
The objective of this study is to map spatiotemporal changes in multi-crop harvested areas across China at a 1-km resolution from 1990 to 2020 at five-year intervals. We apply the maximum-score optimization approach to downscale prefecture-level statistics to grid cells, which is an extension of the minimum cross-entropy approach originally used in the SPAM datasets31,32. The cross-entropy framework has demonstrated high accuracy in mapping crop distributions at 5 arcminutes resolution globally, as well as for Sub-Saharan Africa and China33,34,35,36,37. Nevertheless, since this framework was originally developed for 5 arcminute grid allocation, it exhibits difficulty in converging and reduced allocation accuracy when applied to finer spatial resolutions38,39. To address these limitations, the maximum-score optimization approach has proven capable for generating 30 arcseconds maps and avoids the fragmented allocations by the cross-entropy method40.
The development of the dataset involves the following steps: firstly, this study involves the calculation and compilation of statistical data for multiple annual crops at the prefecture level. Subsequently, the spatial distribution of these crops is created by allocating the statistical records to corresponding grids. To enhance the accuracy and rationality of crop distribution over time, multi-source land use data, such as cropland and urban boundaries, temporally aligned with the statistical records, are integrated. Finally, the reliability of the dataset is validated through comparisons with county-level statistical data and existing gridded datasets, utilizing both quantitative metrics and spatial pattern analyses. The built dataset could provide critical input to biogeochemical models and dynamic agricultural models to support land use management and the sustainable transformation of agricultural production.
Methods
Data collection
Statistical data
This study collected agricultural statistical data at the national, provincial, prefectural, and county levels in China, covering the period from 1988 to 2022. The data sources include the statistical yearbooks from the National Bureau of Statistics of China (https://www.stats.gov.cn) and the Big Data Research Platform for China’s Economy and Society (https://data.cnki.net). Moreover, historical crop price records were obtained from the FAO (Food and Agriculture Organization of the United Nations) (https://www.fao.org/faostat)41. To evaluate irrigation and agricultural input levels, crop fertilizer application ratios were derived from the inventories of “Compilation of the National Agricultural Costs and Returns”42. Recommended fertilizer application ratios for different crops were obtained from He et al.43. The proportion of irrigated land was obtained from the third simulation round of the Inter-Sectoral Impact Model Intercomparison Project (ISIMIP3a) dataset44, which provides data on rainfed and irrigated cropland for crops from 1901 to 2021.
Spatial data
Several spatial datasets relevant to crop distribution, including land use, crop natural growth suitability, urban boundaries, and population density, have been used to facilitate the mapping of crop area. The land use data were employed to delineate the extent of cropland and irrigated cropland, including the China Land Cover Dataset (CLCD)45, the China Land Use and Land Cover Remote Sensing Monitoring Dataset (CNLUCC)46, the Global Land Cover Fine-Grained Classification Product (GLC_FCSD)47, the China Arable Land Dataset (CACD)3, and the China Irrigated Cropland Dataset48. The crop natural growth suitability and potential yield distribution were derived from the GAEZ v4 dataset49. The population density data originated from the China Population Spatial Distribution Dataset50. The urban extent data were obtained from the China Urban Boundary Dataset (GAIA)51. Additionally, the transport accessibility data were obtained from Weiss et al.52. All spatial datasets were rasterized and resampled to a 1-km resolution grid. The details of these datasets are provided in Table 1.
Modeling framework
The research framework is illustrated in Fig. 2, which consists of four key steps: pre-processing statistical and spatial data, allocating statistical harvest area data, validating the accuracy of the results, and analyzing spatiotemporal changes of crop patterns. First, national, provincial, and prefectural statistical data were harmonized and downscaled to grids, while county-level data were used to validate the accuracy of the mapping results. Then, crop cultivation was categorized into three different production systems, i.e., irrigated, high-input rainfed, and low-input rainfed based on crop irrigation and fertilizer application ratios. Multiple land use datasets were then integrated to develop a synthetic map of cropland and irrigated cropland. Meanwhile, pre-processing was performed on spatial datasets such as city boundaries, population distribution, and transportation accessibility (Figs. S1–S3). Then, a spatial allocation framework was used to downscale the prefecture-level crop area statistics into 1-km grids. Moreover, the accuracy of the dataset was evaluated in terms of both validation metrics and spatial patterns. Finally, changes in the spatiotemporal heterogeneity of crop distribution were analyzed.
Harmonization of statistical data
This study collected harvest area statistics for 16 different crop types across four different political administration levels. The detailed list of crop names is provided in Table 2. The time coverage of this built dataset spans from 1990 to 2020. We endeavored to collect crop statistics from publicly accessible sources, generally ensuring data completeness for major production regions (Fig. S4). However, due to the extended temporal coverage and reliance solely on public data, many regions inevitably have incomplete time series. To ensure data availability and methodological consistency, we followed previous studies by representing each time point with a 5-year average, using data from the two years before and after each time point26,29,53,54. For instance, statistical values from 1988 to 1992 were utilized to map the crop patterns in 1990, while those from 2018 to 2022 were employed to map the crop patterns in 2020. Besides, the linear interpolation was used to fill the missing values. Due to the lack of prefecture-level statistical data for Xinjiang, Qinghai, and Xizang, spatial allocations for these provinces were based on provincial-level statistical data.
In order to ensure the consistency of statistical data across different administrative levels, this study used national-level statistical data as a benchmark and successively corrected for each province and prefecture. First, the crop harvest area for each province \(({{HA}}_{j,t,p}^{{stat}})\) was adjusted according to Eq. (1), based on the national total harvest area \(({{HA}}_{j,t,{national}}^{{stat}})\), resulting in the revised provincial harvest area denoted as \({{HA}}_{j,t,p}^{{revise}}\):
where the j denotes the crop type, t represents the time node, and p indicates the provincial-level administrations. The sum of the revised area at the provincial level is highly in agreement with the national statistics, as shown in Fig. S5.
Based on the revised provincial-level data, a similar approach is applied to adjust the harvest area for the prefecture-level city \(({{HA}}_{j,c}^{{stat}})\) as Eq. (2):
where the \(c\) represents the prefecture-level city with province \(p\). The sum of the harmonized data at the prefecture level is equal to the revised provincial values, as shown in Fig. S6. Therefore, the total harmonized harvest area of all provinces and prefectures equals the national statistics.
Classification of crop production systems
Based on the ratios of irrigation and fertilizer application, each crop cultivation was categorized into three production systems: irrigated, high-input rainfed, and low-input rainfed. The proportion of irrigated crop for each 5-year node was extracted from the ISIMIP3a dataset44. The proportions of high-input and low-input systems are determined based on actual crop fertilizer application statistics compared with the recommended amounts. The shares of irrigated system for crops \(({H{A}_{{Ratio}}}_{j,t,p}^{{irrigat}{ion}})\) are determined by averaging the gridded values at the prefecture level. Owing to the lack of crop fertilizer application data at the prefecture level, this study assumed that all prefectures within the same province share the same average fertilizer application ratios. The details of the crop production system classification are as follows.
First, the crop fertilizer application ratios of each province \(({{Fert}}_{j,t,p}^{{unit}})\) were obtained from the “Compilation of the National Agricultural Costs and Returns” as presented in Table S2. Then, the sum of fertilizer application for each crop \(({{Fert}}_{j,t,p}^{{Sum}})\) was calculated using Eq. (3). The average crop fertilizer applications for all provinces are shown in Table S3.
Then, the high-input production area of the crop \(({{HA}}_{j,t,p}^{{high}})\) is obtained from dividing the total fertilizer application by the recommended fertilizer application ratios for the crop \(({{Fert}}_{j,p}^{{recommend}})\), as shown in Eq. (4):
The ratios of high-input system in the total area \(({H{A}_{{Ratio}}}_{j,t,p}^{{high}})\) are calculated following Eq. (5):
Finally, the ratios of the low-input system \(({H{A}_{{Ratio}}}_{j,t,p}^{{low}})\) are calculated by subtracting the proportion of the irrigated and high-input area following Eq. (6). Tables S4–S6 show the detailed proportions for each production system.
Synthesis of cropland
The cropland distribution from different sources was synthesized using the multi-source data integration approach proposed by Lu et al.55. The cropland extents were obtained from CACD, CLCD, GLC_FCSD, and CNLUCC datasets. The spatial resolutions of these datasets were resampled to 1 km, with temporal periods aligned to the time nodes of the statistics.
First, the average and maximum cropland areas for each grid were derived by spatially overlaying multiple datasets. Then, a ranking scoring table was developed to assess the credibility of the cropland distribution. As shown in Table S7, the agreement level reflects the degree of consensus among the datasets in classifying a grid cell as cropland. The rank value indicates the order of reliability across different data combinations. For example, an agreement level of 4 indicates that all datasets classify the grid cell as cropland; conversely, a value of 1 indicates that only one dataset identifies the cell as cropland. The ranking of the datasets was determined through existing data quality comparison studies and expert judgment. Given that the CACD was specifically developed for cropland mapping, it is considered to have the highest level of credibility56. Additionally, comparative studies on data accuracy indicate that the CLCD demonstrates higher overall accuracy when compared to both the GLC_FCSD and the CNLUCC datasets57,58. Finally, the synthesized average cropland area maps are illustrated in Fig. S7.
The approach for irrigated cropland synthesis is similar to cropland. The irrigated cropland maps were obtained from the China Irrigated Cropland Dataset, GLC_FCSD, and CNLUCC datasets. A scoring table for these irrigated cropland datasets is provided in Table S8. The synthesized maps of average irrigated cropland area for the period from 1990 to 2020 are shown in Fig. S8.
Spatial allocation of crop statistics
Prefecture-level statistics were allocated to 1-km grids using the maximum-score optimization approach developed by van Dijk et al.40. This method builds upon the minimum cross-entropy framework originally proposed by You and Wood (2005, 2006)31,32. The cross-entropy framework calculates the prior probability of crop distribution based on biophysical and socioeconomic factors, and then determines the actual probability under statistical constraints by minimizing cross-information entropy33. To enable multi-temporal, high-resolution mapping, the maximum plant suitability score was used in place of the minimum cross-entropy metric, resulting in improved spatial resolution and faster convergence. The main steps of the model are described below.
First, the crop harvest area of the prefecture \(({{HA}}_{j,t,c}^{{revise}})\) is converted into the physical area \(({{PA}}_{j,t,c})\) based on crop planting intensity \(({{CropIntensity}}_{j,t,c})\) as expressed in Eq. (7). The assumption is that there is no difference in cropping intensity between prefectures within the same province. The average crop planting intensities for different crops are shown in Table S9.
Then, the crop plant suitability scores for different crop production systems \(l\) within the grid \(i\) are calculated. The irrigated and rainfed high-input systems are mainly located in large farms, and their score is influenced by both biophysical characteristics and socio-economic factors, including transport accessibility and crop prices. Therefore, their crop plant suitability score is presented as Eq. (8):
where the \({{access}}_{i}\) represents the accessibility from the grid to cities, and where \({{rev}}_{{ijl}}\) represents the economic benefit of the crop, which is calculated via Eq. (9):
where the \({{pot\_yield}}_{i,j,l}\) is the potential yield of crop \(j\) in grid \(i\) under \(l\) system, which is obtained from the GAEZ v4 dataset. The changes in crop prices \(({{price}}_{j})\) are shown in Table S10.
For rained low-input system, its plant suitability score is constrained mainly by the natural environment, which is calculated following Eq. (10):
The \({s{uit}{\_index}}_{{ijl}}\) denotes the plant suitability of crop \(j\) in grid \(i\) under \(l\) system, which is derived from the GAEZ v4 dataset as well.
Finally, based on the plant suitability scores of crops, the objective function designed to maximize the plant suitability score is formulated as Eq. (11):
The \({s}_{i,j,l}\) represents the ratio of the area allocated to crop \(j\) under system \(l\) to the total area of the grid, which is the target solution value of the model. Meanwhile, the subsequent constraints should be abided by during the model-solving process as indicated in Eqs. (12–16).
-
1.
The area allocated to crops within the grid must not exceed the total area of the grid.
$$0\le {s}_{i,j,l}\le 1,\forall \,i\forall \,j\forall \,l$$(12) -
2.
The cumulative ratios of different production systems within the grid must sum to 1.
$$\sum _{l}{s}_{i,j,l}=1,\forall \,i\forall \,j$$(13) -
3.
The crop area is equal to the statistical data.
$$\sum _{l}{{PA}}_{j,c}\times {s}_{i,j,l}={{PA}}_{j,c},\forall \,i$$(14) -
4.
The crop area does not exceed the total cropland area within the prefecture (\({{cropland}{\rm{\_}}{area}}_{c}\)).
$$\sum _{l}{{PA}}_{j,l,c}\times {s}_{i,j,l}\le {{cropland}{\rm{\_}}{area}}_{c},\forall \,i\in c$$(15) -
5.
The irrigated crop area does not exceed the total irrigated cropland area within the prefecture (\({{irrigation\_area}}_{c}\)).
To address data inconsistencies between different types of constraints, the slack variables are incorporated into the objective function. Higher weights (106) are assigned to constraints relating to cropland availability and irrigated area to limit their slack, while lower weights (105) are assigned to constraints relating to subnational statistics. This configuration ensures solution feasibility and quantitatively clarifies trade-offs in constraint prioritization40,59. Furthermore, all weights, parameters, and implementation details have been made publicly available to enable rigorous validation and to ensure the robustness and reproducibility of the results.
The optimization model was implemented in the GAMS (General Algebraic Modeling System) using a linear programming framework, with the CPLEX (C-Simplex algorithm) solver employed to efficiently address large-scale spatial allocation problems. Data exchange between model components was managed by GDX (GAMS Data eXchange) format files. All data preprocessing and visualization procedures were performed in R60. To improve allocation accuracy for China-specific crop patterns, we enhanced the original mapspamc package38, thereby maintaining methodological consistency with global applications.
Validation metrics
The county-level statistics are used to evaluate the accuracy of the allocation results at the prefecture level. The scatter plot illustrates the relationship between the statistically derived harvest area (\({{HA}}^{{stat}}\)) and the allocated harvest area (\({{HA}}^{{alloc}}\)). Meanwhile, the Pearson correlation coefficient \((C{or})\), the coefficient of determination \(\left({R}^{2}\right)\), and the root mean square error \(({RMSE})\) is used to quantitatively evaluate the accuracy of the results. The calculation methods for the different accuracy verification metrics are presented in Eqs. (17–19):
Analysis of crop distribution change
The Standard Deviation Ellipse (SDE) model was used to analyse the spatiotemporal evolution of crop area at 1-km resolution. The SDE effectively captures the directional trends of crop distribution and includes key elements such as the ellipse center, rotation angle, and semi-axes61. First, the center coordinates of the ellipse \(\left(\bar{x},\bar{y}\right)\) are calculated, weighted by the crop area in grid \(i\) across different years, as expressed in Eq. (20):
where the coordinates \(({x}_{i},{y}_{i})\) represent the location of grid \(i\) and the \({\omega }_{i}\) denotes the crop harvest area within that grid. Then, the rotation angle depicts the trend of crop distribution and is calculated following Eq. (21):
where \(\widetilde{x}\) and \(\widetilde{y}\) are the differences between the grid and center, indicating the displacements of the grid relative to the overall distribution. Finally, the standard deviations along the x-axis and y-axis are defined as the long semi-axis \(({\sigma }_{x})\) and short semi-axis \(({\sigma }_{y})\) as shown in Eqs. (22, 23):
Spatiotemporal changes of crop harvest area in China
China’s crop structure from 1990 to 2020 is shown in Fig. 3. It is clear that maize, rice, and wheat are the three dominant crops, together accounting for more than 65% of the total harvest area of crops. Their area shares decreased from 70.6% in 1990 to 67.3% in 2005 and then gradually increased to 72.1%. Meanwhile, the area of maize surpassed wheat and rice in 2005 and 2010 to become the largest crop in China. In contrast, the shares of rice and wheat decreased by 5.0% and 7.7%, respectively. Furthermore, the shares of non-staple crops such as soybeans, rapeseed, and groundnuts remained relatively stable during this period. In contrast, the harvest area of millet and sorghum decreased significantly. Among economic crops, the cotton area decreased from 4.7% in 1990 to 2.2% in 2020, while the tea area increased steadily from 0.7% to 2.1%. In addition, local cultivation crops, including tobacco, sugarbeet, sesame, sunflower, and sugarcane, have remained relatively marginal and stable over time.
Figure 4 shows the spatiotemporal changes in harvest area for four crops (maize, rice, wheat, and soybean) from 1990 to 2020. The center of crop harvest is gradually shifting northward, with the Northeastern China emerging as a critical region for agricultural production. The maize cultivation is concentrated in the Northern and Northeastern China (Fig. 4a). There has been a steady increase in the area under maize cultivation since 1900, with a particularly accelerated pace after 2005. In particular, the most significant maize expansion occurred in the Northeastern China, resulting in a shift of the center by approximately 214 kilometers to the northeast. Rice cultivation is concentrated mainly in the middle and lower reaches of the Yangtze River, followed by the Northeastern China (Fig. 4b). Between 1990 and 2000, a significant decline was observed in the Zhejiang and Jiangsu Provinces. In contrast, there was a notable expansion in the Northeastern China as well, resulting in a shift of about 344 kilometers to the northeast. The wheat cultivation is clustered in northern China, including Henan, Hebei and Shandong Provinces (Fig. 4c). As wheat production has become increasingly concentrated in the Huang-Huai-Hai Plain, the center of wheat production has shifted only 44 kilometers. Soybean cultivation is mainly concentrated in the Northeastern and Northern China (Fig. 4d). Heilongjiang and Inner Mongolia Provinces have been constantly extending their soybean area, resulting in a shift of the center by about 354 km to the northeast. The spatial distribution of non-staple crops exhibits significant heterogeneity, yet their primary cultivation centers are spatiotemporally stable (Fig. S9). For instance, rapeseed is predominantly concentrated in Central China (e.g., Hubei and Hunan provinces), whereas peanuts, millet, sunflower seeds, sorghum, and sesame are mainly cultivated in northern regions (e.g., Henan, Hebei, and Inner Mongolia). Furthermore, the primary harvest of potatoes, tea, sugarcane, and tobacco occurs in the southwestern provinces, including Yunnan, Guizhou, and Guangxi. Moreover, the primary production regions for cotton and sugar beet have shifted from North China to Xinjiang province.
Spatiotemporal change of the crop harvest area in China. The distinct colors represent different years. The ellipses denote the standard deviation ellipses of crop harvest area distribution, while the dots indicate the centers of these ellipses. Profile graphs illustrating harvest area along both longitudinal and latitudinal directions are provided at the top and right side of the map. The background map is derived from the free open-source map dataset provided by Natural Earth (https://www.naturalearthdata.com).
Data Records
The constructed dataset offers comprehensive crop harvest area maps for 16 crop types in China at a 1-km spatial resolution from 1990 to 2020 with 5-year intervals, and it is accessible at Science Databank62. Unlike crop maps with pure pixels, the pixel value represents the crop harvest area (unit: hectare) at the sub-pixel level, which effectively mitigates the problem of mixed pixels and showcases the gradient variations in urban-rural transition zones. Additionally, the crop proportions of different agricultural production systems were generated based on the ratios of crop irrigation and fertilization obtained from statistical data.
The crop harvest area maps are stored in GeoTIFF format at a 1-km spatial resolution with the EPSG:4326 spatial reference. The various folders within the dataset contain raster data for different agricultural production systems, such as the Irrigated system (I), the Rainfed High-input system (H), the Rainfed Low-input system (L), and the total amount of the All system (A = I + H + L). The file name adheres to the format “ha_{*}_{**}_30 sec_{***}_CHN.tif”. Here, “ha” indicates the harvest area, “30sec” represents a spatial resolution of 30 arcseconds (approximately 1 km), “*” stands for the crop name, “**” denotes the production system, and “***” represents the time of the data.
Technical Validation
Accuracy assessment with county-level statistics
Figure 5 presents the validation of the spatial allocation results with county-level statistics. The three main food crops—maize, rice, and wheat—exhibit high accuracy, with correlation coefficients \((r)\) of 0.89, 0.87, and 0.90, and coefficient of determination \(({R}^{2})\) values of 0.80, 0.75, and 0.81, respectively. Among the industrial crops, the sugarcane, cotton, and sunflower show relatively considerable allocation accuracy with \(r\) values of 0.91, 0.78, and 0.76 and \({R}^{2}\) values are 0.84, 0.61, and 0.57, respectively. The allocation maps for oilseed crops such as soybeans, rapeseed, and groundnuts also show accuracy with \(r\) values exceeding 0.69 and \({R}^{2}\) values above 0.48. The \(r\) values of non-staple crops, such as sorghum, millet, potato, sesame, and sugarbeet, are approximately within the range of 0.41 to 0.60. Moreover, the \(r\) values for tobacco and tea are less than 0.40. Additionally, the results are highly consistent with the prefecture-level statistics, proving the reliability of the downscaling model, as shown in Fig. S10. Therefore, the constructed crop maps adequately depict the spatial pattern of crops, especially for primary grains and oilseed crops.
Validation at the county level. The subgraphs are arranged in descending order based on crop harvest area. The X-axis represents the county-level aggregation of allocated results, while the Y-axis represents the actual values from county-level statistical data. To enhance the clarity of the validation, a logarithmic transformation is performed for both axes. The validation metrics are displayed in the upper left corner of each subgraph, including the sample size \(\left(N\right)\), correlation coefficient \((r)\), coefficient of determination (\({R}^{2}\)), and root mean square error \(({RMSE})\). The count of legend represents the number of validation samples in one square block in the figure. The unit of legend (Count) represents the number of validation samples within each square block in the figure.
Comparison of spatial patterns with existing datasets
The constructed dataset was compared with four representative datasets for three main grain crops (maize, rice, and wheat) for the year 2010. These datasets include the Spatial Production Allocation Model dataset (SPAM, ~10 km), the China Crop Area dataset (CCA, 1 km), the China Cropping Patterns dataset (CCP, 500 m), and the China Crop Dataset (CCD, 30 m)16,18,55,63. The CCP and CCD data pixels were aggregated into a 1-km grid for comparison.
As shown in Fig. 6, the 1-km dataset from this study significantly upgrades the spatial detail of crop distribution compared to the 10 km resolution SPAM data. The CCA and CCP datasets provide homogeneous crop pixels, resulting in an overestimation of crop area in mixed pixels that contain both impervious surfaces and cropland (Fig. 6c1–c3,d1–d3). In contrast, the maps from this study comprise sub-pixel crop area attributes, which effectively reflect the gradient changes from urban to rural zones64. Therefore, the dataset built provides much more explicit details in estimating the area of crops in mixed pixels situated at urban fringes (Fig. 6a1–a3). Furthermore, the crop patterns of the constructed maps are similar to the CCD data (Fig. 6a3,e3), demonstrating that the integration of urban boundaries is necessary to improve the rationality of crop maps65.
Comparison of datasets at the grid level. The prefectures under consideration are located in Shijiazhuang for maize (38.08°N,114.82°E), Changsha for rice (28.28°N,112.51°E), Zhumadian for wheat (33.04°N,113.98°E), and the reference year is 2010. The first column is the high-resolution imaginary layers from the GEOVIS Earth (https://geovisearth.com/en). The second column shows the crop maps developed in this study. The third column presents the crop grids derived from the SPAM (Spatial Production Allocation Model) dataset. The fourth column illustrates the crop area pixels obtained from the CCA (China Crop Area) dataset. The fifth column displays the crop pixels from the CCP (China Cropping Patterns) dataset. The final column represents the crop distribution data sourced from the CCD (China Crop Dataset).
Comparison with SPAM datasets
The county-level validation metrics of the built dataset and the 10-km SPAM were compared as shown in Fig. 7. The spatial allocation accuracy of the three primary crops (maize, rice, and wheat) in this study was closely in line with the SPAM dataset, as manifested by the \(r\) and \({R}^{2}\) values. For tea, cotton, sugarcane, millet, and sorghum, the \(r\) and \({R}^{2}\) values of this study’s dataset surpass those of SPAM. However, the accuracies for soybean, rapeseed, groundnut, potato, tobacco, sunflower, sesame, and sugarbeet are lower than those recorded in SPAM. Owing to the existence of more fragmented pixels in the 1-km maps, the \({RMSE}\) values in this study are higher than those of SPAM. Therefore, the dataset generated in this study exhibits high accuracy in mapping the distribution of major grain crops. Nevertheless, it is also noted that the increase in resolution brings about a certain degree of uncertainty regarding the spatial distribution of non-major food crops66,67.
Limitations and prospects
The dataset developed in this study shows high accuracy in spatial allocation for domain food crops. However, its accuracy is comparatively lower for certain industrial and economic crops. This inconsistency is primarily due to the coarse resolution (10 km) of the GAEZ crop natural growth suitability data, which is not consistent with the high-resolution (30 m) input land use datasets. The resolution mismatch causes significant uncertainty in the spatial allocation of crops40. As the GAEZ dataset is designed for global agricultural assessments49, it may underestimate crop plant suitability in some regions, potentially causing uncertainty in crop mapping. Furthermore, the accuracy is also limited by the lack of complete prefecture-level statistical data. For instance, linear interpolation was used to estimate missing values in prefecture-level statistics, which oversimplified the variation trends in the harvest area22,68. Moreover, there is inherent uncertainty in the county-level statistical validation data27.
Therefore, the spatial distribution of crop plant suitability is the foundation for accurate crop spatial downscaling. Future research should focus on improving methods for assessing the plant suitability and potential yield of crops under changing climatic and anthropogenic conditions69,70. For instance, integrating the crop plant suitability assessment framework AEZ (Agro-Ecological Zoning) into the crop spatial allocation model could help improve the simulation accuracy71,72. Moreover, machine learning technologies provide a promising approach for assessing crop plant suitability. For example, the maximum entropy model has proven effective in evaluating the plant suitability of crops such as rice, wheat, and soybean39,69,73. Furthermore, incorporating deep learning-based crop recognition through satellite imagery into the data fusion framework will enhance the accuracy of the allocated maps and provide more detailed and reliable data support for agricultural monitoring and analysis74. Moreover, the advent of large language models introduces robust tools capable of extracting valuable agricultural insights from vast collections of historical and contemporary inventories75. Leveraging these tools will significantly advance the construction of multi-scale and long-term agricultural statistical databases (covering crop yields, fertilizer use, field management, etc.)76. This progress will facilitate data downscaling at finer statistical units, expand the variables of the crop maps, and promote their application in climate change adaptation and sustainable agricultural development.
Usage Notes
This study integrated multiscale agricultural statistical data and multisource spatial datasets to map the harvest area of 16 crops in China at a 1-km resolution from 1990 to 2020 with 5-year intervals. The prefecture-level crop statistics were downscaled to grids based on the integration of natural and economic factors affecting agricultural practices. The built dataset addresses the data gap in spatially explicit maps for multiple crops over a long time span in China. It can offer valuable insights for analyzing changes in China’s crop cultivation patterns and their impacts on environmental pollution and ecosystem services. Additionally, the dataset could be used as crucial inputs for land-surface ecosystem models (e.g., Lund-Potsdam-Jena managed Land model) and integrated assessment models (e.g., Global Change Analysis Model). Since the dataset aligns well with national, provincial, and prefectural statistics, it may help support further analysis in socioeconomic research.
In future research, a key challenge is to assess crop plant suitability scores at a high spatial resolution to improve the accuracy of crop mapping. Moreover, it is also crucial to integrate crop-specific maps derived from remote sensing products into the crop statistical data allocation framework. To further improve the usability and accuracy of these datasets, it is essential to collect statistical data over longer periods with continuous, complete time series and at finer spatial resolutions. With these potential improvements in the future, this dataset will become a dynamic and open-source program that can be continually updated to reflect changing crop patterns. This ongoing development will provide a solid foundation for future applications in agricultural modelling and land use management.
Code availability
The code is available at: https://github.com/KaixuanDai/mapspamc_chn.
References
Zhou, Y., Li, X. & Liu, Y. Cultivated land protection and rational use in China. Land Use Policy 106, 105454 (2021).
Lai, Z., Chen, M. & Liu, T. Changes in and prospects for cultivated land use since the reform and opening up in China. Land Use Policy 97, 104781 (2020).
Tu, Y. et al. A 30 m annual cropland dataset of China from 1986 to 2021. Earth System Science Data Discussions 1–34 https://doi.org/10.5194/essd-2023-190 (2023).
National Bureau of Statistics of China. National Data. (2024).
Li, K. et al. Human-altered soil loss dominates nearly half of water erosion in China but surges in agriculture-intensive areas. One Earth S2590332224004317 https://doi.org/10.1016/j.oneear.2024.09.001 (2024).
Kong, L. et al. Natural capital investments in China undermined by reclamation for cropland. Nat Ecol Evol 7, 1771–1777 (2023).
Gu, B. et al. Cost-effective mitigation of nitrogen pollution from global croplands. Nature 613, 77–84 (2023).
Kuang, W. et al. Cropland redistribution to marginal lands undermines environmental sustainability. National Science Review 9, nwab091 (2022).
Lu, C. et al. Changes in China’s Grain Production Pattern and the Effects of Urbanization and Dietary Structure. Journal of Resources and Ecology 11, 358 (2020).
Wang, J. et al. North-to-south transfer of grain and meat products significantly reduces PM 2. 5 pollution and associated health risk in China. Resources, Environment and Sustainability 17, 100168 (2024).
Sun, S. K. et al. Geographical Evolution of Agricultural Production in China and Its Effects on Water Stress, Economy, and the Environment: The Virtual Water Perspective. Water Resources Research 55, 4014–4029 (2019).
Qi, P. et al. Allocation of water and land resources and ecological security in the black soil area of Northeast China. CSB 69, 4063–4078 (2024).
Liu, Z., Yang, P., Wu, W. & You, L. Spatiotemporal changes of cropping structure in China during 1980–2011. J. Geogr. Sci. 28, 1659–1671 (2018).
Wang, Y. et al. Mapping Crop Distribution Patterns and Changes in China from 2000 to 2015 by Fusing Remote-Sensing, Statistics, and Knowledge-Based Crop Phenology. Remote Sensing 14, 1800 (2022).
Luo, Y., Zhang, Z., Chen, Y., Li, Z. & Tao, F. ChinaCropPhen1km: a high-resolution crop phenological dataset for three staple crops in China during 2000–2015 based on leaf area index (LAI) products. Earth System Science Data 12, 197–214 (2020).
Qiu, B. et al. Maps of cropping patterns in China during 2015–2021. Sci Data 9, 479 (2022).
Peng, Q. et al. A twenty-year dataset of high-resolution maize distribution in China. Sci Data 10, 658 (2023).
Mei, Q. et al. ChinaSoyArea10m: a dataset of soybean planting areas with a spatial resolution of 10 m across China from 2017 to 2021. Earth System Science Data Discussions 1–27, https://doi.org/10.5194/essd-2023-467 (2023).
Peng, Y. et al. Where is tea grown in the world: A robust mapping framework for agroforestry crop with knowledge graph and sentinels images. Remote Sensing of Environment 303, 114016 (2024).
Yang, G. et al. Automated in-season mapping of winter wheat in China with training data generation and model transfer. ISPRS Journal of Photogrammetry and Remote Sensing 202, 422–438 (2023).
Liu, S., Wang, L. & Zhang, J. The dataset of main grain land changes in China over 1985–2020. Sci Data 11, 1430 (2024).
Ye, S., Cao, P. & Lu, C. Annual time-series 1 km maps of crop area and types in the conterminous US (CropAT-US): cropping diversity changes during 1850–2021. Earth System Science Data 16, 3453–3470 (2024).
Kim, K.-H., Doi, Y., Ramankutty, N. & Iizumi, T. A review of global gridded cropping system data products. Environ. Res. Lett. 16, 093005 (2021).
You, L. & Sun, Z. Mapping global cropping system: Challenges, opportunities, and future perspectives. Crop and Environment 1, 68–73 (2022).
Monfreda, C., Ramankutty, N. & Foley, J. A. Farming the planet: 2. Geographic distribution of crop areas, yields, physiological types, and net primary production in the year 2000. Global Biogeochemical Cycles 22, (2008).
Portmann, F. T., Siebert, S. & Döll, P. MIRCA2000—Global monthly irrigated and rainfed crop areas around the year 2000: A new high-resolution data set for agricultural and hydrological modeling. Global Biogeochemical Cycles 24, (2010).
Tang, F. H. M. et al. CROPGRIDS: a global geo-referenced dataset of 173 crops. Sci Data 11, 413 (2024).
Grogan, D., Frolking, S., Wisser, D., Prusevich, A. & Glidden, S. Global gridded crop harvested area, production, yield, and monthly physical area data circa 2015. Sci Data 9, 15 (2022).
Yu, Q. et al. A cultivated planet in 2010 – Part 2: The global gridded agricultural-production maps. Earth System Science Data 12, 3545–3572 (2020).
Lee, D. et al. HarvestStat Africa – Harmonized Subnational Crop Statistics for Sub-Saharan Africa. Sci Data 12, 690 (2025).
You, L. & Wood, S. Assessing the spatial distribution of crop areas using a cross-entropy method. International Journal of Applied Earth Observation and Geoinformation 7, 310–323 (2005).
You, L. & Wood, S. An entropy approach to spatial disaggregation of agricultural production. Agricultural Systems 90, 329–347 (2006).
Tan, J. et al. Spatial evaluation of crop maps by the spatial production allocation model in China. JARS 8, 085197 (2014).
Koo, J. et al. CELL5M: A geospatial database of agricultural indicators for Africa South of the Sahara. F1000Res 5, 2490 (2016).
Yu, Q. et al. Assessing the harvested area gap in China. Agricultural Systems 153, 212–220 (2017).
Hu, Y. et al. Rice production and climate change in Northeast China: evidence of adaptation through land use shifts. Environ. Res. Lett. 14, 024014 (2019).
Ru, Y. et al. Estimating local agricultural gross domestic product (AgGDP) across the world. Earth System Science Data 15, 1357–1387 (2023).
van Dijk, M., Wood-Sichra, U., Ru, Y., Guo, Z. & You, L. mapspamc: An R package to create crop distribution maps for country studies using a downscaling approach. Preprint at https://doi.org/10.21203/rs.3.rs-2497136/v1 (2023).
Yue, Y., Zhang, P. & Shang, Y. The potential global distribution and dynamics of wheat under multiple climate change scenarios. Science of the Total Environment (2019).
van Dijk, M. et al. Generating multi-period crop distribution maps for Southern Africa using a data fusion approach. Preprint at https://doi.org/10.21203/rs.3.rs-2491830/v1 (2023).
Food and Agriculture Organization. FAOSTAT statistical database. (2023).
Department of Price of National Development and Reform Commission. Compilation of the National Agricultural Costs and Returns. (2024).
He, P., Xu, X. & Zhou, W. Principles and Applications of Fertilizer Nutrient Recommendations. (China Science Publishing & Media Ltd, Beijing, China, 2021).
Volkholz, V. & Ostberg, S. ISIMIP3a landuse input data (v1.2). ISIMIP Repository https://doi.org/10.48364/ISIMIP.571261.2 (2022).
Yang, J. & Huang, X. The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth System Science Data 13, 3907–3925 (2021).
Xu, X. et al. China’s multi-period land use land cover remote sensing monitoring data set (CNLUCC). (2018).
Zhang, X. et al. GLC_FCS30D: the first global 30 m land-cover dynamics monitoring product with a fine classification system for the period from 1985 to 2022 generated using dense-time-series Landsat imagery and the continuous change-detection method. Earth System Science Data 16, 1353–1381 (2024).
Zhang, C., Dong, J. & Ge, Q. Mapping 20 years of irrigated croplands in China using MODIS and statistics and existing irrigation products. Sci Data 9, 407 (2022).
Fischer, G. et al. Global Agro-ecological Zones (GAEZ v3.0)- Model Documentation. (2012).
Xu, X. China population spatial distribution grid dataset. (2020).
Gong, P. et al. Annual maps of global artificial impervious area (GAIA) between 1985 and 2018. Remote Sensing of Environment 236, 111510 (2020).
Weiss, D. J. et al. A global map of travel time to cities to assess inequalities in accessibility in 2015. Nature 553, 333–336 (2018).
You, L., Wood, S. & Wood-Sichra, U. Generating plausible crop distribution maps for Sub-Saharan Africa using a spatially disaggregated data fusion and optimization approach. Agricultural Systems 99, 126–140 (2009).
Wood-Sichra, U., Joglekar, A. B. & You, L. Spatial Production Allocation Model (SPAM) 2005: Technical Documentation. (2016).
Lu, M. et al. A cultivated planet in 2010 – Part 1: The global synergy cropland map. Earth System Science Data 12, 1913–1928 (2020).
Xu, H., Jiang, L. & Liu, Y. Assessing the Accuracy and Consistency of Cropland Products in the Middle Yangtze Plain. Land 13, 301 (2024).
Cui, P. et al. Comparison and Assessment of Different Land Cover Datasets on the Cropland in Northeast China. Remote Sensing 15, 5134 (2023).
Zhang, C., Dong, J. & Ge, Q. Quantifying the accuracies of six 30-m cropland datasets over China: A comparison and evaluation analysis. Computers and Electronics in Agriculture 197, 106946 (2022).
Boyd, S. & Vandenberghe, L. Convex Optimization. (Cambridge University Press, Cambridge, 2004).
R Core Team. R: A Language and Environment for Statistical Computing. (R Foundation for Statistical Computing, Vienna, Austria, 2023).
Hou, M., Deng, Y. & Yao, S. Spatial Agglomeration Pattern and Driving Factors of Grain Production in China since the Reform and Opening Up. Land 10, 10 (2021).
Dai, K. et al. Mapping harvest area of comprehensive crop types in China from 1990 to 2020 at a 1-km resolution (V1). Science Data Bank https://doi.org/10.57760/sciencedb.18098 (2024).
Shen, R., Peng, Q., Li, X., Chen, X. & Yuan, W. CCD-Rice: A long-term paddy rice distribution dataset in China at 30 m resolution. Earth System Science Data Discussions 1–31, https://doi.org/10.5194/essd-2024-147 (2024).
Huang, C., Hou, J., Li, X., Zhang, Y. & Guo, J. Fractional crop-planting area projection by integrating geographic grid data and agricultural statistics based on random forest regression. International Journal of Digital Earth 16, 4446–4470 (2023).
Li, X. et al. The impacts of spatial resolutions on global urban-related change analyses and modeling. iScience 25, 105660 (2022).
Folberth, C., Yang, H., Wang, X. & Abbaspour, K. C. Impact of input data resolution and extent of harvested areas on crop yield estimates in large-scale agricultural modeling for maize in the USA. Ecological Modelling 235–236, 8–18 (2012).
Sun, P., Congalton, R. G., Grybas, H. & Pan, Y. The Impact of Mapping Error on the Performance of Upscaling Agricultural Maps. Remote Sensing 9, 901 (2017).
Klein Goldewijk, K., Beusen, A., Doelman, J. & Stehfest, E. Anthropogenic land use estimates for the Holocene – HYDE 3.2. Earth System Science Data 9, 927–953 (2017).
Li, X. et al. Mapping cropland suitability in China using optimized MaxEnt model. Field Crops Research 302, 109064 (2023).
Liang, K. et al. Multi-scenario comparisons to identify the spatial distribution, land type, and effectiveness of cultivated land restoration in the main grain-producing area. Habitat International 154, 103211 (2024).
Alvar-Beltrán, J. et al. An FAO model comparison: Python Agroecological Zoning (PyAEZ) and AquaCrop to assess climate change impacts on crop yields in Nepal. Environmental Development 47, 100882 (2023).
Shi, W. Wheat redistribution in Huang-Huai-Hai, China, could reduce groundwater depletion and environmental footprints without compromising production. (2024).
Yu, X. et al. Predicting potential cultivation region and paddy area for ratoon rice production in China using Maxent model. Field Crops Research 275, 108372 (2022).
Hu, Q. et al. Integrating coarse-resolution images and agricultural statistics to generate sub-pixel crop type maps and reconciled area estimates. Remote Sensing of Environment 258, 112365 (2021).
Polak, M. P. & Morgan, D. Extracting accurate materials data from research papers with conversational language models and prompt engineering. Nat Commun 15, 1–11 (2024).
Davis, K. F. et al. HarvestStat: A global effort towards open and standardized sub-national agricultural data. Environ. Res. Lett. https://doi.org/10.1088/1748-9326/adcb54 (2025).
Acknowledgements
This study was supported by the National Natural Science Foundation of China (Grant No. 42041007), and the Research Fellowship provided by the Alexander von Humboldt Foundation (Recipient: Xudong Wu).
Author information
Authors and Affiliations
Contributions
Conceptualization: C.C., X.W., K.D.; Data curation: K.D., N.M., Z.L., S.Y.; Formal analysis: K.D., B.L., Z.W.; Funding acquisition: C.C., X.W.; Investigation: N.M., Z.L., S.Y.; Methodology: K.D., Z.W.; Visualization: K.D., X.W., J.A.G.; Validation: K.D., B.L., Z.W.; Supervision: C.C., X.W.; Writing – original draft preparation: K.D.; Writing – review & editing: C.C., X.W., Y.X., J.A.G.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Dai, K., Cheng, C., Li, B. et al. Mapping the harvest area of a comprehensive set of crop types in China from 1990 to 2020 at a 1-km resolution. Sci Data 12, 1371 (2025). https://doi.org/10.1038/s41597-025-05723-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41597-025-05723-0