A 30-m annual paddy rice dataset in Northeastern China during period 2000–2023

Hou, Dawei; Chen, Jing; Dong, Jinwei; Ji, Chao; Feng, Jingxuan; Du, Guoming; Yang, Lixiao

doi:10.1038/s41597-025-05715-0

Download PDF

Data Descriptor
Open access
Published: 05 August 2025

A 30-m annual paddy rice dataset in Northeastern China during period 2000–2023

Dawei Hou^1,2^na1,
Jing Chen¹,
Jinwei Dong ORCID: orcid.org/0000-0001-5687-803X³^na1,
Chao Ji⁴,
Jingxuan Feng⁵,
Guoming Du^1,2 &
…
Lixiao Yang^1,2^na1

Scientific Data volume 12, Article number: 1355 (2025) Cite this article

1695 Accesses
1 Citations
Metrics details

Subjects

Abstract

As the cornerstone of China’s food security, Northeastern China contributes nearly 20% of national rice production. However, we are still lacking of high-resolution rice maps with detailed and long time-series in this region, impeding crop management decisions for food security. Here we generated an annual 30 m resolution rice distribution dataset for Northeastern China since the 21st century (NECAR) using the Google Earth Engine platform and random forest classification. The workflow involved (1) hierarchical screening principle to select ground samples, (2) the linear interpolation and Whittaker smoothing Landsat5/7/8 time series data and (3) enhanced spectral-feature sets. The resultant annual maps have high overall accuracy (OA) ranging from 0.93 to 0.99, and the satellite estimates corresponded well with statistics for most cities (R² ≥ 0.7, p < 0.01), with higher accuracy than that of similar crops mapping datasets. This is the first attempt in Northeastern China to reconstruct paddy rice patterns at a 30-m resolution over a detailed and extended time series, enabling in-depth analysis of potential environmental and economic impacts.

The 10-m crop type maps in Northeast China during 2017–2019

Article Open access 02 February 2021

High-resolution maps of rice cropping intensity across Southeast Asia

Article Open access 12 August 2025

A long-term paddy rice distribution dataset in Asia at a 30 m spatial resolution

Article Open access 21 June 2025

Background & Summary

As a crucial staple crop, paddy rice is not only vital for food security worldwide, but also are significant sources of greenhouse gas¹ (GHG) emissions, including contributed to ~22% of global agricultural methane (CH₄) emissions and 11% of nitrous oxide emissions². As one of the primary rice-producing regions, Northeastern China plain has been recognized as the “rice golden belt” as well as an important grain bowl for the country³. However, the pattern of rice cultivation in this region has undergone drastic evolution year by year due to growing food demands and policy incentives. Up to date, we are still lacking of comprehensive, long-term and high-resolution maps of rice fields in this region, which impedes our advanced understanding about relevant yield variation and environmental outcomes.

Benefiting from the increased accessibility of satellite imagery, advancements in algorithmic innovation as well as maturation of cloud computing infrastructure, the domain of crop mapping has experienced substantial progress over the past few decades. Nevertheless, selecting appropriate remote sensing data remains a critical issue due to the trade-offs among spatial, temporal, and spectral resolutions. Among such a large remote-sensing products nowadays, the Moderate Resolution Imaging Spectroradiometer (MODIS), featuring a high temporal resolution and wide coverage, has been extensively utilized in rice mapping based on the application of various vegetation indices (e.g. LSWI, EVI and NDVI^4,5,6,7). However, MODIS with the spatial resolution ranging from 250 to 500 meters might be confronted with challenges to generate reliable rice maps in areas where land-use underwent frequent transition and field boundaries are small and fragmented, such as Brazil’s Mate Grosso state⁸ and Vietnam’s Mekong Delta⁹. Although some other images with higher spatial resolution (such as Sentinel-2A/2B, the Gaofen satellites, GeoEye, etc.) can capture fine field boundaries and detailed crop structure, they might also be limited by labor processing cost and data availability for long-term time coverage. In contrast, the Landsat program has provided continuous data archive spanning over four decades (1972-) and acceptable spatial resolution (ranging from 15 to 100 m), while simultaneously its consistent data quality and global coverage can ensure uniform measurement and large-scale spatial reference¹⁰. At present, the feasibility relying on Landsat images to identify land-cover or crop patterns had been well-validated. For instance, Gong (2013) et al. demonstrated Landsat’s utility in high-resolution land-cover mapping through generating the first global 30 m Land cover map based on Landsat images and 91,433 visually interpreted training samples¹¹.Wang (2020) et al. proved Landsat’s effectiveness in large-scale crop classification by leveraging the now-public Landsat archive and cloud computing services to map corn and soybean at 30 m resolution across the US Midwest from 1999–2018¹². Moreover, the research conducted by Qiu (2020) et al. highlighted the significant potential of utilizing multi-year Landsat enhanced vegetation index (EVI) data for deriving annual phenological indicators of dual-season agricultural lands over multiple years¹³.

Despite the substantial superiorities, in a certain extent, the frequent cloud coverage and the failure of the scan line corrector (SLC) might weaken Landsat’s ability in precisely interpreting objects’ features and phenological trends. To solve this problem, researchers have developed a variety of interpolation and fusion methods to improve the accuracy of crop classification. For instance, Zhu (2010) et al. developed an improved spatiotemporal data fusion algorithm based on the STARFM model to stacking multi-source remote sensing data for generating high-resolution time series¹⁴. Moreno-Martinez (2018) et al. proposed an optimal interpolation method to generate a smooth and notched Landsat reflectance time series by fusing MODIS and Landsat data via Google Earth Engine platform¹⁵. Chen et al. proposed a GF-SG (Gap Filling - Savitzky-Golay) interpolation smoothing method combining MODIS and Landsat, which showed good results in constructing high-quality NDVI time series and significantly improved spatio-temporal continuity and data integrity¹⁶. In Brazil, Zhu (2016) et al. constructed the Seasonal Dynamic Index (SDI) of MODIS EVI time series and fused it with Landsat data to establish a regression model to estimate the proportion of cultivated land, achieving farmland mapping at high spatiotemporal resolution¹⁷. In general, these studies show the validity of the fusion method and time-series classification technology in improving the accuracy of crop mapping.

Extensive efforts have been made globally to map rice distribution and extract phenological information using remote sensing time series. Overall, paddy rice mapping methods have evolved through four main stages: Phase I relied on reflectance and basic image statistics; Phase II incorporated vegetation indices with enhanced statistical approaches; Phase III introduced time-series analysis based on vegetation indices or RADAR backscatter and Phase IV focused on identifying key phenological stages through remote sensing techniques¹⁸. In recent years, phenology-driven methods that exploit the distinct growth patterns of paddy rice have been increasingly adopted in rice mapping studies. For instance, Xiao (2002) et al. found that rice can be effectively extracted by leveraging the unique relationship between the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Land Surface Water Index (LSWI) during the flooding and transplanting stages, in combination with the masking of non-farmland areas¹⁹. Dong (2015) et al. successfully traced the expansion trend of rice in one tile (path/row 113/27) of Northeastern China from 1986 to 2010 using time-series Landsat images and a phenological algorithm based on the unique spectral characteristics of rice at flood/transplanting stage²⁰. Despite such progress, existing studies might also confront with some difficulties in accurately eliminating the commission error of non-rice types by identifying rice only through the time window of rice transplantation. Therefore, Qiu (2015) et al. proposed a novel index “the Ratio of Change amplitude of LSWI to EVI2 (RCIE)”, which quantifies the ratio of the change amplitude of the Land Surface Water Index (LSWI) to the two-band Enhanced Vegetation Index (EVI2) during the tillering to heading stage of rice growth. This index was designed to comprehensively capture both vegetation phenology and surface water dynamics, and served as the core of the Comprehensive Consideration of Vegetation Phenology and Surface Water Change (CCVS) method²¹. Based on the 8-day composite time series data set and the Landsat time series data set with a relatively cloud-free interval of 16 days, the rice location algorithm combining vegetation phenology and surface water change (CCVS) was effectively applied^21,22.

Although significant efforts have been made, producing consistent annual rice maps for the entire Northeastern China is still difficult, particularly when retrospectively analyzing long-term historical data. First and foremost, the large spatial extent introduces significant intra-class variability in crop spectra and phenology driven by diverse climates, varieties, and cultivation methods, which may limit classification accuracy²³. Furthermore, the broad spatial and temporal coverage limits not only the availability of valid satellite observations due to differences in orbital paths, acquisition dates, and cloud contamination²⁴, but also the generation of timely intensive and independent samples for training and validation²⁵. Moreover, poor understanding of feature effectiveness may lead to the omission of key features or inclusion of irrelevant ones, both negatively impacting classification accuracy²⁶.

To deal with the challenges mentioned above, we developed a framework based on the combination of multiple revised pathways to generate annual rice maps in Northeastern China. To alleviate the potentially negative impacts from crops’ spectral and phenological variability in different location, we first generated regionally independent classifiers by considering Agro-climate zones (ACZs) and cropping systems. After that, regular time series of Landsat images for each Agri-climate zone was produced based on the linear interpolation and Whittaker smoothing algorithms to obtain homogeneous time series and fill the data gaps. For improving the pertinence of sample points, we applied an algorithm based on vegetation phenology and surface water change (CCVS) to extract rice samples. Further, we developed a sophisticated feature selection procedure to select optimal features from the huge size of L5\7\8-based feature candidates, to avoid the possible Hughes effect (also known as the “curse of the dimensionality”) and save computing time.

The overarching goal of our study was to generate annual maps of rice fields in the Northeastern China from 2000 to 2023 at 30-m spatial resolution using (1) Agro-climate zone-specific random forest classifiers, (2) interpolated and smoothing Landsat5\7\8 time-series imagery, (3) rice samples extracted by CCVS methods and (4) optimized features from spectral-, temporal- and textural- information. Our consistent rice maps can be utilized to monitor rice dynamics and assess the potentially environmental effects driven by drastic land-use/land-cover change.

Methods

Study area

Northeastern China, extending roughly from 38°N, 115°E to 54°N, 135°E, encompasses the Heilongjiang province, the Jilin province, the Liaoning province as well as the four municipalities in eastern Inner Mongolia (Fig. 1). Northeastern China covered an area of 1.2 million km², which is about 13% of China’s territory. According to the “the Regionalization of Agro-climate of China”²⁷, Northeastern China spans six Agro-climate zones (ACZs), which are the North Greater Khingan (GK), the Sanjiang Plain (SP), the Lesser Khingan and Changbai Mountains (LK), the Songliao Plain (SL), the Liaodong Peninsula (LP), and East Inner Mongolia (IM), respectively. Precipitation usually ranges from 500 to 800 mm, which is mainly concentrated in July and August. The number of frost-free days varies between 140 and 170 days. The unique geographical and climatic conditions in Northeastern China, especially the long sunshine time in summer, abundant water resources and fertile black soil, provide favorable conditions for rice production²⁸. As the pillar of China’s food security²⁹, the importance of Northeastern China in rice production cannot be underestimated. Furthermore, driven by climate change and advances in agricultural technology, the cultivation boundary has gradually shifted northward, enhancing the importance of Northeastern China, especially its northern region, in the country’s agricultural production³⁰.

Overview of the rice classification

We applied the Random Forest (RF) algorithm and integrated multiple relevant features to generate annual maps of paddy rice based on. Google Earth Engine Platform (Fig. 2). In each of the six ACZs, the annual rice map was independently generated following four steps: (1) The Landsat 5/7/8 Surface Reflectance imagery was synthesized, interpolated and smoothed to produce classification-ready imagery used as input for the rice classifier. The 300 m land use dataset from the ESA Climate Change Initiative (CCI)³¹ was used as annual cropland masks to exclude the non-crop pixels. (2) The CCVS method was employed to identify rice sample points and exclude non-rice sample points. In this study, the classification was carried out within the cropland range, meaning that only crop-type samples were needed. Our existing crop products with 10-m resolution from 2017 to 2019³² were utilized to generate random sample points for the same period. The RCIE dataset was subsequently developed using the CCVS method to distinguish between rice and non-rice sample points. Following accuracy verification, the CCVS method was extended to other years, and the sample points were expanded by integrating existing datasets with field sampling data. High-resolution image interpretation was used to eliminate clear errors. (3) Feature collections—including vegetation index, texture harmonic regression features as well as the RCIE dataset—were used to train a crop classifier (rice vs. non-rice) based on the RF algorithm. This classifier was applied to the pre-processed Landsat imagery. To reduce the interference of small and medium-sized patches in the classification results and enhance the spatial coherence of the classification map, post-processing was carried out on the preliminary classification results, encompassing majority voting based on super-pixel segmentation as well as connected pixel filtering and smoothing operations. (4) Accuracy assessment and comparative analysis of the rice mapping dataset. A Random Forest classifier was trained using 70% of the samples, while the remaining 30% were used to generate confusion matrices and compute accuracy metrics (OA, UA, PA, and Kappa). The classification results (NECAR) were further validated through comparison with municipal-level rice planting area statistics derived from agricultural statistical yearbooks (2000–2023), and statistical fit (R²) was used to quantify consistency. In addition, annually ground-truth reference samples were constructed via visual interpretation of Landsat and Sentinel-2 imagery, supported by Google Earth high-resolution images and field survey records collected during the 2017–2021 growing seasons. These samples were also used to evaluate the performance of three widely used rice products via identical accuracy assessment procedures. For each dataset, confusion matrices and associated accuracy metrics were calculated using the same validation samples. Additionally, municipal-level rice area estimates from these datasets were compared with official statistics to assess their spatial consistency. Cross-comparisons between NECAR and other rice datasets were also performed at the municipal level to evaluate spatial agreement and potential discrepancies.

Landsat series images and pre-processing

We obtained Landsat Surface Reflectance Tier 1 imagery for the period 2000–2023 from Landsat 5³³, 7³⁴ and 8³⁵ collections via the Google Earth Engine platform³⁶. Following the acquisition of annually valid Landsat observations, we processed the time-series data using a standardized three-step procedure. Specifically, we first generated 16-day composite images by computing the median of valid surface reflectance observations within each interval. Subsequently, linear interpolation and the Whittaker smoother were applied to ensure temporal coverage, suppress residual noise and enhance the phenological signal. However, we might confront with challenges in data continuity gaps particularly for 2012 due to the temporal validity of Landsat 5 and Landsat 8 as well as the scan-line corrector (SLC) failure of Landsat 7 To mitigate these issues and ensure temporal continuity, we applied a data fusion approach based on the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM)³⁷. This method integrated 8-day MODIS surface reflectance data (MOD09A1)³⁸ with a cloud-free Landsat reference image acquired in July 2011 to reconstruct a synthetic 30 m time series for the 2012 growing season (Fig. 3). As a consequence, we yielded a consistent, gap-filled and cloud-free 16-day composite time-series images across all years, including the synthetic 2012 series reconstructed via STARFM (See the annual valid Landsat observations in Supplementary Figure S1).

CCVS method to identify rice samples

In order to accurately identify rice sample points, we adopted the comprehensive consideration method based on vegetation phenology and surface water change (CCVS) proposed by Qiu (2015) et al.²¹. This method identifies rice objects by considering the characteristics of LSWI and EVI2 ranging ratios from the transplanting stage to the heading stage of rice (RCIE). Due to post-transplanting flooding, paddy fields exhibit sustained high LSWI values from transplanting to heading stages, while EVI2 values rise sharply during this same period. In contrast, upland crops typically show lower LSWI values and a similarly strong EVI2 increase (Fig. 4).

Based on these differences, we calculated the Ratio of Change in LSWI to EVI2 (RCIE) as follows:

$${\rm{RCIE}}=\frac{\triangle {LSWI}}{\triangle {EVI}2}$$

(1)

and applied a threshold-based classification rule to distinguish paddy rice:

$$\text{if}\left(\,,{{LSWI}}_{\min } > {\theta }_{1}\;{and\; RCIE} < {\theta }_{2}\right){rice;\; else\; other}$$

(2)

where RCIE was the range ratio index, representing the range ratio of LSWI and EVI2 from transplanting stage to heading stage; LSWI_min was the minimum observed LSWI value of rice from transplanting stage to heading stage; the constants θ₁ and θ₂ represent thresholds for water saturation and spectral balance, respectively. Following Qiu et al.^21,22, default values of θ₁= 0.1 and θ₂ = 0.7 were adopted as initial references. In this study, we further optimized the constants θ₂ (RCIE < 0.7) for each agro-climatic zone to enhance regional classification accuracy. Specifically, the Jeffries–Matusita (JM) distance between rice and non-rice samples was used as a separability criterion. The constants θ₂ was iteratively adjusted from 0.4 to 1.0 (step = 0.05), and the value that yielded the maximum JM distance was selected as the optimal threshold for each zone (Please see the detailed methods illustration, Table S1 and Figures S2–S9 in Supplementary Materials).

Based on the abovementioned principles, we constructed a 16-day composite EVI2 and LSWI time series based on cloud-free observations from Landsat imagery and processed the time series using Whittaker smoothing to form a smoothed time series. For each suspected rice sample point, a 50 m buffer zone was generated, and rice and non-rice were distinguished by analyzing the characteristics of the time series within the buffer zone (Fig. 5). To verify the accuracy of the CCVS method, we conducted seven consecutive years of fieldwork (2017–2023). Specifically, the field study was conducted annually in early April and mid-September. We systematically selected 30% of the rice and other crop samples. The locations and categories were recorded using a mobile GIS device (an iPad equipped with the OvitalMap software), and the accuracy of the sample points was validated through on-site visits.

Training and validation data

We first used a stepwise screening method to collect ground samples from existing crop classification products in Northeastern China: (1) the annual rice planting area dataset for the Asian monsoon region (2000–2020) (PRA)³⁹, (2) the 30 m resolution crop maps from 2013 to 2021(NETCs-30m)⁴⁰, and (3) cropping mapping datasets from 2017 to 2019 (NETCs-10m)³². In general, all of above three datasets had satisfactory accuracy and similarity, and thus can be used to acquire training/validation samples. To avoid the resolution differences and products errors, we conducted double-validation for initially selected samples using CCVS and field surveys.

Take the generation of sample points in 2019 as an example: we first separately generated 20,000 random points within our study area from the PRA, NETCs-30m and NETCs-10m, and only retained those points with consistent crop types in the three datasets as the basic samples. Secondly, the utilization of an optimized CCVS method for validation ensures the retention of valid data points that accurately reflect either rice or non-rice characteristics. Then, all the valid samples were visually checked using high-resolution images in Google Earth and two S2 RGB composites, including the RGB composite (R: SWIR1, G: NIR, B: Red) from the mid-April to mid-June and the RGB composite (R: NIR, G: SWIR1, B: SWIR2) during early-July to late-August. The samples with obvious errors (such as incorrectly labeled the nature vegetation as crops) were excluded. The samples lied in the roads or the field boundaries were also removed. We ultimately obtained a comprehensive sample set encompassing rice and other crops. Finally, we got a number of ground samples in the order of 10,755 in 2019 (Fig. 6).

Based on existing products, we formed samples for a limited number of years, and extended the sample set by CCVS method, so that the finite year samples can converted to use samples of other years. In each year, the samples were randomly and equally divided into two parts: one part (70%) used for training and classification and the other part used for accuracy evaluation and validation (30%).

Features selection

To account for seasonal spectral changes, seven 16-day spectral indicators were added to the classifier according to the crop calendar during the growing period (DOY 90–300) in Northeastern China (Table 1), including: NDVI (Normalized Difference Vegetation Index), EVI (Enhanced Vegetation Index), EVI2 (Two-band Enhanced Vegetation Index), LSWI (Land Surface Water Index), NDSVI (Normalized Difference Senescent Vegetation Index), NDTI (Normalized Difference Tillage Index), GCVI (Green Chlorophyll Vegetation Index). Using 16-day composite images has been shown to improve time series processing by enhancing spatiotemporal resolution and aiding crop classification^41,42. Furthermore, to mitigate the influence of clouds, cloud shadows and snow, we calculated the 10th, 25th, 50th, 75th, 90th percentiles and the interquartile range (25th–75th) of each pixel. This method addressed data gaps by incorporating spectral-temporal metrics and enriching time-series images with temporal context⁴¹. To enhance the discrimination of spectrally similar crops, texture metrics derived from NDVI and EVI were also computed using the gray-level co-occurrence matrix (GLCM) following previous research⁴³.

Table 1 The formulation of the seven indices and RCIE used in the study.

Full size table

Moreover, the time-series image harmonic regression features of the three vegetation (NDVI, EVI, LSWI) indices were extracted to characterize the phenological stages of paddy rice, such as transplanting, tillering, heading, and ripening. These harmonic features help capture the seasonal periodicity and subtle spectral variations of rice growth, while suppressing noise caused by cloud contamination and missing observations⁴⁴. We conducted the harmonic regression (discrete Fourier transform) on the original valid observations to extract the temporal characteristics of the time series curves.

$$Y\left(t\right)={\beta }_{i,0}+{\beta }_{i,1}\;\cos \;\left(3\omega t\right)+{\beta }_{i,2}\;\sin \;\left(3\omega t\right)+{\beta }_{i,3}\;\cos \;\left(6\omega t\right)+{\beta }_{i,4}\;\sin \;\left(6\omega t\right)$$

(3)

Where Y(t) is the value of the $i$-th vegetation index (e.g., NDVI, EVI, LSWI) at time $i$. ${\beta }_{i,0}$ is the constant term for the $i$-th vegetation index. ${\beta }_{i,1}$ and ${\beta }_{i,2}$ are the coefficients for the cosine and sine terms at frequency $3\omega $ for the $i$-th vegetation index. ${\beta }_{i,3}$ and ${\beta }_{i,4}$ are the coefficients for the cosine and sine terms at frequency $6\omega $ for the $i$-th vegetation index. $\omega $ is the angular frequency of the periodic component. All the features used for classification can be seen in Table 2.

Table 2 The summary of the feature candidates for rice classification.

Full size table

Random forest algorithm

Random Forest (RF), a decision tree–based ensemble method, is widely used in remote sensing classification due to its enhanced generalization through random sampling of data and features⁴⁵. RF is composed of multiple decision trees, each built independently using bootstrap samples and random feature subsets, without pruning or dependence on other trees. Majority voting among trees determines the final classification outcome, enhancing model stability and mitigating overfitting. RF has been proven to deliver higher accuracy and robustness compared to conventional methods like maximum likelihood, standalone decision trees, and single-layer neural networks. In this study, we adjusted two key parameters of the RF algorithm in GEE when training rice classifiers: (1) number Of Trees: number of trees determines how many binary CART trees are used in the model. While a higher number can slightly improve accuracy, it also increases computational cost linearly. The number Of Trees in our study was set to 200 following previous work⁴⁶. (2) min Leaf Population: the minimum number of samples required at a leaf node. We set it to 10 to control tree depth and prevent overfitting. The other parameters—Variables Per Split (default: square root of feature count), Bag Fraction (default: 0.5), Out-of-Bag Mode, and Seed—were kept at their default settings in GEE.

Data Records

The dataset provides annual paddy rice distribution maps for Northeastern China from 2000 to 2023 at a nominal spatial resolution of 30 meters. The dataset is stored in GeoTIFF format, with pixel values coded as 1 for rice and 0 for non-rice areas. The spatial extent of the dataset ranges from 38.7°N to 53.8°N latitude and 115.5°E to 135.0°E longitude. All data layers can be viewed and downloaded via Figshare https://doi.org/10.6084/m9.figshare.28407710⁴⁷, and are ready for use in GIS software or remote sensing analysis platforms.

Technical Validation

Accuracy assessment with testing samples

We used the remain 30% samples as validation by generating the confusion matrix which the Producer Accuracy (PA), User Accuracy (UA), producer accuracies (PA) and the Kappa coefficient for the annual rice maps were used as validation indicators. Our results indicated that the Overall Accuracy (OA) for all classified results are higher than 0.9, of which the average OA is 0.95 and the highest value emerged in 2007 (0.97). Moreover, the Kappa coefficient varies from 0.89 to 0.95, with the highest value (0.95) in 2007 and the lowest value (0.89) in 2012, with the average at 0.92 (Fig. 7). In general, the high accuracies indicated our classification algorithm performs well and the annual rice maps were reliable.

Comparison with agricultural statistics

Since the 2023 Statistical Yearbook has not been updated for the time being, we calculated the annual paddy rice areas using the pixel number approach and then compared with the annual statistical data at the municipal level. Upon comparing the mapped paddy rice areas with statistical data, it was observed that the mapped paddy rice areas had relatively high correlations with the statistical data for 2000–2022, indicated by R² values between 0.7 and 0.99 at the municipal level (Fig. 8). (See Supplementary Figure S10 for annual comparisons between our rice map and municipal-level statistical yearbook data)

Validation with independent ground truth samples and product comparison

To assess spatial consistency and reliability, we compared our rice classification product (NECAR) with three existing datasets with varying resolutions and temporal coverages: PRA (500 m, 2000–2020)³⁹, NETCs-30m (30 m, 2013–2021)⁴⁰, and NETCs-10m (10 m, 2017–2019)³². All datasets were resampled or aggregated to 30 m resolution and compared at the municipal level against official statistical yearbook data and among each other.

To evaluate the reliability of the datasets, we first compared the estimated paddy rice area from PRA, NETCs-30m, and NETCs-10m with the official municipal statistical yearbook data. To ensure spatial consistency, all products were resampled to 30 m resolution using the nearest neighbor method. The rice pixels from each product were then aggregated at the municipal level for comparison (Fig. 8). The results revealed moderate to strong correlations (R² > 0.4) between satellite-derived estimates and official statistics, suggesting that all three products maintain a reasonable level of spatial agreement with ground-reported rice areas. Furthermore, a municipal-level comparison was conducted between our rice mapping product (NECAR) and three reference products (PRA, NETCs-30m, and NETCs-10m) (Fig. 9). The correlation coefficients between NECAR and PRA ranged from 0.78 to 0.98 across multiple years, indicating the temporal robustness and reliability. In comparison with NETCs-30m, which shares the same spatial resolution (30 m), the correlations consistently exceeded 0.97, demonstrating a high level of agreement and spatial consistency. For NETCs-10m, the correlation coefficients for the selected years were 0.96, 0.98, and 0.98, further confirming the accuracy and reliability of NECAR in capturing paddy rice distribution patterns.

To comprehensively evaluate the classification accuracy of rice mapping products, we also constructed annual ground reference datasets from 2000 to 2021, which were consistent with the temporal coverage of existing products (The annual distribution and total number of truly ground sample points were shown in the Supplement Figure S11). These reference samples were generated through visual interpretation of remote sensing imagery, supported by high-resolution Google Earth images, and validated using long-term ground survey data collected during the period from 2017 to 2021. Based on these annually updated ground samples, we assessed the classification performance of NECAR as well as three existing products with different spatial resolutions. In the main text, we present the Overall Accuracy (OA) for each product and year (Fig. 9), while additional accuracy metrics—including the Kappa coefficient, User’s Accuracy (UA), and Producer’s Accuracy (PA)—are provided in the Supplement Table S2. NECAR consistently outperformed or matched the classification accuracy of the three existing rice mapping products across all evaluated years. Specifically, NECAR exhibited the highest Overall Accuracy (OA) in most years, indicating its robustness and generalization capability over time. In terms of spatial resolution, NETCs-10m showed advantages from 2017 to 2019 due to its finer granularity, but NECAR maintained a more stable performance across the fulltime span. The Kappa coefficients, UA, and PA metrics (Table S2) further confirmed NECAR’s superior balance between commission and omission errors, highlighting its reliability in capturing both rice presence and absence. These findings collectively validate NECAR as a highly accurate and temporally consistent product for annual paddy rice mapping in the study area.

We also compared five locations in 2004, 2010, 2013, 2017 and 2019 to show the differences among the four datasets (Fig. 10). Overall, our rice patch is more elaborate and captures subtler differences in contrast to PRA. Furthermore, our map is more in line with the rice distribution extracted from NETCs-30m and is even more precise. In comparison with NETCs-10m, it could potentially be slightly coarser due to the dissimilar resolutions of Landsat and Sentinel-2. The details of our 2022 rice classification products were shown in Fig. 11.

Code availability

JavaScript code used to generate the paddy rice maps are available from the figshare repository⁴⁷.

References

Zhang, G. et al. Fingerprint of rice paddies in spatial–temporal dynamics of atmospheric methane concentration in monsoon Asia. Nat Commun. 11, 554, https://doi.org/10.1038/s41467-019-14155-5 (2020).
Article ADS PubMed PubMed Central CAS Google Scholar
Qian, H. et al. Greenhouse gas emissions and mitigation in rice agriculture. Nat Rev Earth Environ. 4, 716, https://doi.org/10.1038/s43017-023-00482-1 (2023).
Article ADS CAS Google Scholar
Yuan, S. et al. Sustainable intensification for a larger global rice bowl. Nat Commun. 12 https://doi.org/10.1038/s41467-021-27424-z (2021).
Xiao, X. et al. Mapping paddy rice agriculture in South and Southeast Asia using multi-temporal MODIS images. Remote Sens. Environ. 100, 95–113, https://doi.org/10.1016/j.rse.2005.10.004 (2006).
Article ADS Google Scholar
Zhang, G. et al. Mapping paddy rice planting areas through time series analysis of MODIS land surface temperature and vegetation index data. ISPRS J. Photogramm. Remote Sens. 106, 157–171, https://doi.org/10.1016/j.isprsjprs.2015.05.011 (2015).
Article ADS PubMed PubMed Central Google Scholar
Son, N.-T., Chen, C.-F., Chen, C.-R., Duc, H.-N. & Chang, L.-Y. A phenology-based classification of time-series MODIS data for rice crop monitoring in Mekong Delta, Vietnam. Remote Sens. 6, 135, https://doi.org/10.3390/rs6010135 (2014).
Article ADS Google Scholar
Nuarsa, I. W., Nishio, F., Hongo, C. & Mahardika, I. G. Using variance analysis of multitemporal MODIS images for rice field mapping in Bali Province, Indonesia. Int. J. Remote Sens. 33, 5402, https://doi.org/10.1080/01431161.2012.661091 (2012).
Article Google Scholar
Brown, J. C., Kastens, J. H., Coutinho, A. C., Victoria, D. C. & Bishop, C. R. Classifying multiyear agricultural land use data from Mato Grosso using time-series MODIS vegetation index data. Remote Sens. Environ. 130, 39–50, https://doi.org/10.1016/j.rse.2012.11.009 (2013).
Article ADS Google Scholar
Kontgis, C., Schneider, A. & Ozdogan, M. Mapping rice paddy extent and intensification in the Vietnamese Mekong River Delta with dense time stacks of Landsat data. Remote Sens. Environ. 169, 255–269, https://doi.org/10.1016/j.rse.2015.08.004 (2015).
Article ADS Google Scholar
Wulder, M. A. et al. Fifty years of Landsat science and impacts. Remote Sens. Environ. 280, 113195, https://doi.org/10.1016/j.rse.2022.113195 (2022).
Article Google Scholar
Gong, P. et al. Finer resolution observation and monitoring of global land cover: first mapping results with Landsat TM and ETM+ data. Int. J. Remote Sens. 34, 2607–2654, https://doi.org/10.1080/01431161.2012.748992 (2013).
Article Google Scholar
Wang, S., Di Tommaso, S., Deines, J. M. & Lobell, D. B. Mapping twenty years of corn and soybean across the US Midwest using the Landsat archive. Sci Data. 7, 307, https://doi.org/10.1038/s41597-020-00646-4 (2020).
Article PubMed PubMed Central Google Scholar
Qiu, T., Song, C. & Li, J. Deriving Annual Double-Season Cropland Phenology Using Landsat Imagery. Remote Sens. 12, 3275, https://doi.org/10.3390/rs12203275 (2020).
Article ADS Google Scholar
Zhu, X., Chen, J., Gao, F., Chen, X. & Masek, J. G. An enhanced spatial and temporal adaptive reflectance fusion model for complex heterogeneous regions. Remote Sens. Environ. 114, 2610–2623, https://doi.org/10.1016/j.rse.2010.05.032 (2010).
Article ADS Google Scholar
Moreno-Martínez et al. Interpolation and gap filling of Landsat reflectance time series. In IGARSS 2018 – 2018 IEEE International Geoscience and Remote Sensing Symposium, 349–352 https://doi.org/10.48550/arXiv.2012.07987 (IEEE, 2018).
Chen, Y., Cao, R., Chen, J., Liu, L. & Matsushita, B. A practical approach to reconstruct high-quality Landsat NDVI time-series data by gap filling and the Savitzky–Golay filter. ISPRS J. Photogramm. Remote Sens. 180, 174–190, https://doi.org/10.1016/j.isprsjprs.2021.08.015 (2021).
Article ADS Google Scholar
Zhu, C., Lu, D., Victoria, D. & Dutra, L. V. Mapping fractional cropland distribution in Mato Grosso, Brazil using time series MODIS enhanced vegetation index and Landsat Thematic Mapper data. Remote Sens. 8, 22, https://doi.org/10.3390/rs8010022 (2016).
Article ADS Google Scholar
Dong, J. & Xiao, X. Evolution of regional to global paddy rice mapping methods: A review. ISPRS J. Photogramm. Remote Sens. 119, 214, https://doi.org/10.1016/j.isprsjprs.2016.05.010 (2016).
Article ADS Google Scholar
Xiao, X. et al. Observation of flooding and rice transplanting of paddy rice fields at the site to landscape scales in China using VEGETATION sensor data. Int. J. Remote Sens. 23, 3009–3022, https://doi.org/10.1080/01431160110107734 (2002).
Article Google Scholar
Dong, J. et al. Tracking the dynamics of paddy rice planting area in 1986–2010 through time series Landsat images and phenology-based algorithms. Remote Sens. Environ. 160, 99–113, https://doi.org/10.1016/j.rse.2015.01.004 (2015).
Article ADS Google Scholar
Qiu, B., Li, W., Tang, Z., Chen, C. & Qi, W. Mapping paddy rice areas based on vegetation phenology and surface moisture conditions. Ecol. Indic. 56, 79, https://doi.org/10.1016/j.ecolind.2015.03.039 (2015).
Article Google Scholar
Qiu, B., Lu, D., Tang, Z., Chen, C. & Zou, F. Automatic and adaptive paddy rice mapping using Landsat images: case study in Songnen Plain in Northeast China. Sci. Total Environ. 598, 581, https://doi.org/10.1016/j.scitotenv.2017.03.221 (2017).
Article ADS PubMed CAS Google Scholar
Wang, S., Azzari, G. & Lobell, D. B. Crop type mapping without field-level labels: Random forest transfer and unsupervised clustering techniques. Remote Sens. Environ. 222, 303, https://doi.org/10.1016/j.rse.2018.12.026 (2019).
Article ADS Google Scholar
Wu, M., Niu, Z., Wang, C., Wu, C. & Wang, L. Use of MODIS and Landsat time series data to generate high-resolution temporal synthetic Landsat data using a spatial and temporal reflectance fusion model. J. Appl. Remote Sens. 6, 63507, https://doi.org/10.1117/1.JRS.6.063507 (2012).
Article Google Scholar
Whitcraft, A. K., Vermote, E. F., Becker-Reshef, I. & Justice, C. O. Cloud cover throughout the agricultural growing season: Impacts on passive optical earth observations. Remote Sens. Environ. 156, 438, https://doi.org/10.1016/j.rse.2014.10.009 (2015).
Article ADS Google Scholar
Hu, Q. et al. A phenology-based spectral and temporal feature selection method for crop mapping from satellite time series. Int. J. Appl. Earth Obs. Geoinf. 80, 218–229, https://doi.org/10.1016/j.jag.2019.04.014 (2019).
Article ADS Google Scholar
Liu, J. et al. Spatiotemporal characteristics, patterns, and causes of land-use changes in China since the late 1980s. J. Geogr. Sci. 24, 195–210, https://doi.org/10.1007/s11442-014-1082-6 (2014).
Article Google Scholar
Liu, Z., Zhang, G. & Yang, P. Geographical Variation of Climate Change Impact on Rice Yield in the Rice-Cropping Areas of Northeast China during 1980–2008. Sustainability. 8, 670, https://doi.org/10.3390/su8070670 (2016).
Article Google Scholar
Qin, S., Chen, H., Tran, T. T. & Wang, H. Analysis of the Spatial Effect of Capital Misallocation on Agricultural Output—Taking the Main Grain Producing Areas in Northeast China as an Example. Sustainability. 14, 5782, https://doi.org/10.3390/su14105782 (2022).
Article Google Scholar
Ha, T. T. V., Fan, H. & Shuang, L. Climate change impact assessment on Northeast China’s grain production. Environ Sci Pollut Res. 28, 14508, https://doi.org/10.1007/s11356-020-11602-5 (2021).
Article Google Scholar
European Space Agency (ESA). Land Cover CCI Product User Guide, Version 2.0. Available at: maps.elie.ucl.ac.be/CCI/viewer/download/ESACCI-LC-Ph2-PUGv2_2.0.pdf European Space Agency (2017).
You, N. et al. The 10-m crop type maps in Northeast China during 2017–2019. Sci Data. 8, 41. figure https://doi.org/10.6084/m9.figshare.13090442.v1 (2021).
United States Geological Survey. Landsat 5 Surface Reflectance Tier 1 Collection 2 Level-2 Science Products. https://doi.org/10.5066/P9OGBGM6.
United States Geological Survey. Landsat 7 Surface Reflectance Tier 1 Collection 2 Level-2 Science Products. https://doi.org/10.5066/P975JDFX.
United States Geological Survey. Landsat 8 Surface Reflectance Tier 1 Collection 2 Level-2 Science Products. https://doi.org/10.5066/P9W1TO6E.
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27, https://doi.org/10.1016/j.rse.2017.06.031 (2017).
Article ADS Google Scholar
Gao, F., Masek, J., Schwaller, M. & Hall, F. On the blending of the Landsat and MODIS surface reflectance: predicting daily Landsat surface reflectance. IEEE Trans. Geosci. Remote Sens. 44, 2207–2218, https://doi.org/10.1109/TGRS.2006.872081 (2006).
Article ADS Google Scholar
NASA LP DAAC. MOD09A1 Version 6.1, Terra Surface Reflectance 8-Day L3 Global 500 m. Distributed by NASA EOSDIS Land Processes DAAC. https://doi.org/10.5067/MODIS/MOD09A1.061.
Han, J. et al. Annual paddy rice planting area and cropping intensity datasets and their dynamics in the Asian monsoon region from 2000 to 2020. Agric. Syst. 200, 103437. Zenodo. https://doi.org/10.5281/zenodo.5557022 (2022).
Fu, X. et al. Mapping crop type in Northeast China during 2013–2021 using automatic sampling and tile-based image classification. Int. J. Appl. Earth Obs. Geoinf. 117, 103178. figure https://doi.org/10.6084/m9.figshare.20411424.v1 (2023).
Roy, D. P. et al. Landsat-8: Science and product vision for terrestrial global change research. Remote Sens. Environ. 145, 154–172, https://doi.org/10.1016/j.rse.2014.02.001 (2014).
Article ADS Google Scholar
Pasquarella, V. J., Holden, C. E., Kaufman, L. & Woodcock, C. E. From imagery to ecology: leveraging time series of all available Landsat observations to map and monitor ecosystem state and dynamics. Remote Sens Ecol Conserv. 2, 152, https://doi.org/10.1002/rse2.24 (2016).
Article Google Scholar
Haralick, R. M., Shanmugam, K. & Dinstein, I. H. Textural features for image classification. IEEE Trans. Syst. Man Cybern. SMC-3, 610–621, https://doi.org/10.1109/TSMC.1973.4309314 (1973).
Article Google Scholar
Sakamoto, T. et al. A crop phenology detection method using time-series MODIS data. Remote Sens. Environ. 96, 366–374, https://doi.org/10.1016/j.rse.2005.03.008 (2005).
Article ADS Google Scholar
Belgiu, M. & Dragut, L. Random forest in remote sensing: A review of applications and future directions. ISPRS J. Photogramm. Remote Sens. 114, 24–31, https://doi.org/10.1016/j.isprsjprs.2016.01.011 (2016).
Article ADS Google Scholar
Gislason, P. O., Benediktsson, J. A. & Sveinsson, J. R. Random Forests for land cover classification. Pattern Recognit. Lett. 27, 294–300, https://doi.org/10.1016/j.patrec.2005.08.011 (2006).
Article ADS Google Scholar
Hou, D. et al. A 30-m annual paddy rice dataset in Northeastern China during period 2000–2023. Figshare https://doi.org/10.6084/m9.figshare.28407710 (2025).
J, T. C., Red and Photographic Infrared l, lnear Combinations for Monitoring Vegetation. Remote Sens. Environ. 8 127 https://doi.org/10.1016/0034-4257(79)90013-0 (1979).
Huete, A. R., Liu, H. Q., Batchily, K. & van Leeuwen, W. A comparison of vegetation indices over a global set of TM images for EOS-MODIS. Remote Sens. Environ. 59, 440, https://doi.org/10.1016/S0034-4257(96)00112-5 (1997).
Article ADS Google Scholar
Jiang, Z., Huete, A. R., Didan, K. & Miura, T. Development of a two-band enhanced vegetation index without a blue band. Remote Sens. Environ. 112, 3833–3845, https://doi.org/10.1016/j.rse.2008.06.006 (2008).
Article ADS Google Scholar
Sakamoto, T. et al. Detecting temporal changes in the extent of annual flooding within the Cambodia and the Vietnamese Mekong Delta from MODIS time-series imagery. Remote Sens. Environ. 109, 295–313, https://doi.org/10.1016/j.rse.2007.01.011 (2007).
Article ADS Google Scholar
Zheng, B., Campbell, J. B. & de Beurs, K. M. Remote sensing of crop residue cover using multi-temporal Landsat imagery. Remote Sens. Environ. 117, 177–183, https://doi.org/10.1016/j.rse.2011.09.016 (2012).
Article ADS Google Scholar
Zhong, L., Hu, L. & Zhou, H. Deep learning based multi-temporal crop classification. Remote Sens. Environ. 221, 430, https://doi.org/10.1016/j.rse.2018.11.032 (2019).
Article ADS Google Scholar
Gitelson, A. A. K. Y. J. M. Use of a Green Channel in Remote Sensing of Global Vegetation from EOS-MODIS. Remote Sens. Environ. 58, 289, https://doi.org/10.1016/S0034-4257(96)00072-7 (1996).
Article ADS Google Scholar

Download references

Acknowledgements

This study was supported by the National Natural Science Foundation of China (Grant No. 42301316), the Natural Science Foundation of Heilongjiang Province of China (Grant No. LH2024D002), the Heilongjiang Province Provincial Undergraduate University “Excellent Young Teacher Basic Research Support Plan” (Grant No. YQJH2023214), the Northeast Agricultural University “Young Talents” Program (Grant No. 22QC25) and the funding project of China Association for Science and Technology (Grant No. XMSB20240927024).

Author information

These authors contributed equally: Dawei Hou, Jinwei Dong, Lixiao Yang.

Authors and Affiliations

School of Public Administration & Law, Northeast Agricultural University, Haerbin, 150030, China
Dawei Hou, Jing Chen, Guoming Du & Lixiao Yang
The Center of land remote-sensing engineering technology for Heilongjiang Province, Northeast Agricultural University, Haerbin, 150030, China
Dawei Hou, Guoming Du & Lixiao Yang
Institute of Geographic Sciences and natural Resources Research, Chinese Academy of Sciences, Beijing, 100101, China
Jinwei Dong
Yunnan University of Finance and Economics, Kunming, 650221, China
Chao Ji
College of Horticulture and Landscape Architecture, Northeast Agricultural University, Haerbin, 150030, China
Jingxuan Feng

Authors

Dawei Hou
View author publications
Search author on:PubMed Google Scholar
Jing Chen
View author publications
Search author on:PubMed Google Scholar
Jinwei Dong
View author publications
Search author on:PubMed Google Scholar
Chao Ji
View author publications
Search author on:PubMed Google Scholar
Jingxuan Feng
View author publications
Search author on:PubMed Google Scholar
Guoming Du
View author publications
Search author on:PubMed Google Scholar
Lixiao Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

Dawei Hou: Writing–original draft, Writing–review & editing, Methodology, Data production, Conceptualization, Funding acquisition. Jing Chen: Data production, Writing–original draft, Methodology, Writing–review & editing. Jinwei Dong: Conceptualization, Conceptualization, Funding acquisition. Chao Ji: Data production, Methodology. Jingxuan Feng: Validation. Guoming Du: Validation, Conceptualization. Lixiao Yang: Writing–review & editing, Methodology, Validation, Project administration, Funding acquisition, Conceptualization.

Corresponding authors

Correspondence to Dawei Hou or Jinwei Dong.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Figures and Tables

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Hou, D., Chen, J., Dong, J. et al. A 30-m annual paddy rice dataset in Northeastern China during period 2000–2023. Sci Data 12, 1355 (2025). https://doi.org/10.1038/s41597-025-05715-0

Download citation

Received: 24 March 2025
Accepted: 28 July 2025
Published: 05 August 2025
DOI: https://doi.org/10.1038/s41597-025-05715-0