Background & Summary

As a crucial staple crop, paddy rice is not only vital for food security worldwide, but also are significant sources of greenhouse gas1 (GHG) emissions, including contributed to ~22% of global agricultural methane (CH4) emissions and 11% of nitrous oxide emissions2. As one of the primary rice-producing regions, Northeastern China plain has been recognized as the “rice golden belt” as well as an important grain bowl for the country3. However, the pattern of rice cultivation in this region has undergone drastic evolution year by year due to growing food demands and policy incentives. Up to date, we are still lacking of comprehensive, long-term and high-resolution maps of rice fields in this region, which impedes our advanced understanding about relevant yield variation and environmental outcomes.

Benefiting from the increased accessibility of satellite imagery, advancements in algorithmic innovation as well as maturation of cloud computing infrastructure, the domain of crop mapping has experienced substantial progress over the past few decades. Nevertheless, selecting appropriate remote sensing data remains a critical issue due to the trade-offs among spatial, temporal, and spectral resolutions. Among such a large remote-sensing products nowadays, the Moderate Resolution Imaging Spectroradiometer (MODIS), featuring a high temporal resolution and wide coverage, has been extensively utilized in rice mapping based on the application of various vegetation indices (e.g. LSWI, EVI and NDVI4,5,6,7). However, MODIS with the spatial resolution ranging from 250 to 500 meters might be confronted with challenges to generate reliable rice maps in areas where land-use underwent frequent transition and field boundaries are small and fragmented, such as Brazil’s Mate Grosso state8 and Vietnam’s Mekong Delta9. Although some other images with higher spatial resolution (such as Sentinel-2A/2B, the Gaofen satellites, GeoEye, etc.) can capture fine field boundaries and detailed crop structure, they might also be limited by labor processing cost and data availability for long-term time coverage. In contrast, the Landsat program has provided continuous data archive spanning over four decades (1972-) and acceptable spatial resolution (ranging from 15 to 100 m), while simultaneously its consistent data quality and global coverage can ensure uniform measurement and large-scale spatial reference10. At present, the feasibility relying on Landsat images to identify land-cover or crop patterns had been well-validated. For instance, Gong (2013) et al. demonstrated Landsat’s utility in high-resolution land-cover mapping through generating the first global 30 m Land cover map based on Landsat images and 91,433 visually interpreted training samples11.Wang (2020) et al. proved Landsat’s effectiveness in large-scale crop classification by leveraging the now-public Landsat archive and cloud computing services to map corn and soybean at 30 m resolution across the US Midwest from 1999–201812. Moreover, the research conducted by Qiu (2020) et al. highlighted the significant potential of utilizing multi-year Landsat enhanced vegetation index (EVI) data for deriving annual phenological indicators of dual-season agricultural lands over multiple years13.

Despite the substantial superiorities, in a certain extent, the frequent cloud coverage and the failure of the scan line corrector (SLC) might weaken Landsat’s ability in precisely interpreting objects’ features and phenological trends. To solve this problem, researchers have developed a variety of interpolation and fusion methods to improve the accuracy of crop classification. For instance, Zhu (2010) et al. developed an improved spatiotemporal data fusion algorithm based on the STARFM model to stacking multi-source remote sensing data for generating high-resolution time series14. Moreno-Martinez (2018) et al. proposed an optimal interpolation method to generate a smooth and notched Landsat reflectance time series by fusing MODIS and Landsat data via Google Earth Engine platform15. Chen et al. proposed a GF-SG (Gap Filling - Savitzky-Golay) interpolation smoothing method combining MODIS and Landsat, which showed good results in constructing high-quality NDVI time series and significantly improved spatio-temporal continuity and data integrity16. In Brazil, Zhu (2016) et al. constructed the Seasonal Dynamic Index (SDI) of MODIS EVI time series and fused it with Landsat data to establish a regression model to estimate the proportion of cultivated land, achieving farmland mapping at high spatiotemporal resolution17. In general, these studies show the validity of the fusion method and time-series classification technology in improving the accuracy of crop mapping.

Extensive efforts have been made globally to map rice distribution and extract phenological information using remote sensing time series. Overall, paddy rice mapping methods have evolved through four main stages: Phase I relied on reflectance and basic image statistics; Phase II incorporated vegetation indices with enhanced statistical approaches; Phase III introduced time-series analysis based on vegetation indices or RADAR backscatter and Phase IV focused on identifying key phenological stages through remote sensing techniques18. In recent years, phenology-driven methods that exploit the distinct growth patterns of paddy rice have been increasingly adopted in rice mapping studies. For instance, Xiao (2002) et al. found that rice can be effectively extracted by leveraging the unique relationship between the Normalized Difference Vegetation Index (NDVI), Enhanced Vegetation Index (EVI), and Land Surface Water Index (LSWI) during the flooding and transplanting stages, in combination with the masking of non-farmland areas19. Dong (2015) et al. successfully traced the expansion trend of rice in one tile (path/row 113/27) of Northeastern China from 1986 to 2010 using time-series Landsat images and a phenological algorithm based on the unique spectral characteristics of rice at flood/transplanting stage20. Despite such progress, existing studies might also confront with some difficulties in accurately eliminating the commission error of non-rice types by identifying rice only through the time window of rice transplantation. Therefore, Qiu (2015) et al. proposed a novel index “the Ratio of Change amplitude of LSWI to EVI2 (RCIE)”, which quantifies the ratio of the change amplitude of the Land Surface Water Index (LSWI) to the two-band Enhanced Vegetation Index (EVI2) during the tillering to heading stage of rice growth. This index was designed to comprehensively capture both vegetation phenology and surface water dynamics, and served as the core of the Comprehensive Consideration of Vegetation Phenology and Surface Water Change (CCVS) method21. Based on the 8-day composite time series data set and the Landsat time series data set with a relatively cloud-free interval of 16 days, the rice location algorithm combining vegetation phenology and surface water change (CCVS) was effectively applied21,22.

Although significant efforts have been made, producing consistent annual rice maps for the entire Northeastern China is still difficult, particularly when retrospectively analyzing long-term historical data. First and foremost, the large spatial extent introduces significant intra-class variability in crop spectra and phenology driven by diverse climates, varieties, and cultivation methods, which may limit classification accuracy23. Furthermore, the broad spatial and temporal coverage limits not only the availability of valid satellite observations due to differences in orbital paths, acquisition dates, and cloud contamination24, but also the generation of timely intensive and independent samples for training and validation25. Moreover, poor understanding of feature effectiveness may lead to the omission of key features or inclusion of irrelevant ones, both negatively impacting classification accuracy26.

To deal with the challenges mentioned above, we developed a framework based on the combination of multiple revised pathways to generate annual rice maps in Northeastern China. To alleviate the potentially negative impacts from crops’ spectral and phenological variability in different location, we first generated regionally independent classifiers by considering Agro-climate zones (ACZs) and cropping systems. After that, regular time series of Landsat images for each Agri-climate zone was produced based on the linear interpolation and Whittaker smoothing algorithms to obtain homogeneous time series and fill the data gaps. For improving the pertinence of sample points, we applied an algorithm based on vegetation phenology and surface water change (CCVS) to extract rice samples. Further, we developed a sophisticated feature selection procedure to select optimal features from the huge size of L5\7\8-based feature candidates, to avoid the possible Hughes effect (also known as the “curse of the dimensionality”) and save computing time.

The overarching goal of our study was to generate annual maps of rice fields in the Northeastern China from 2000 to 2023 at 30-m spatial resolution using (1) Agro-climate zone-specific random forest classifiers, (2) interpolated and smoothing Landsat5\7\8 time-series imagery, (3) rice samples extracted by CCVS methods and (4) optimized features from spectral-, temporal- and textural- information. Our consistent rice maps can be utilized to monitor rice dynamics and assess the potentially environmental effects driven by drastic land-use/land-cover change.

Methods

Study area

Northeastern China, extending roughly from 38°N, 115°E to 54°N, 135°E, encompasses the Heilongjiang province, the Jilin province, the Liaoning province as well as the four municipalities in eastern Inner Mongolia (Fig. 1). Northeastern China covered an area of 1.2 million km2, which is about 13% of China’s territory. According to the “the Regionalization of Agro-climate of China”27, Northeastern China spans six Agro-climate zones (ACZs), which are the North Greater Khingan (GK), the Sanjiang Plain (SP), the Lesser Khingan and Changbai Mountains (LK), the Songliao Plain (SL), the Liaodong Peninsula (LP), and East Inner Mongolia (IM), respectively. Precipitation usually ranges from 500 to 800 mm, which is mainly concentrated in July and August. The number of frost-free days varies between 140 and 170 days. The unique geographical and climatic conditions in Northeastern China, especially the long sunshine time in summer, abundant water resources and fertile black soil, provide favorable conditions for rice production28. As the pillar of China’s food security29, the importance of Northeastern China in rice production cannot be underestimated. Furthermore, driven by climate change and advances in agricultural technology, the cultivation boundary has gradually shifted northward, enhancing the importance of Northeastern China, especially its northern region, in the country’s agricultural production30.

Fig. 1
figure 1

The location (a), topographical characteristic (c) and six Agro-climate zones (ACZs) of Northeastern China (b). The Landsat tiles covered Northeastern China were showed in subplot (c). The total number of images used from 2000 to 2023 and the number of valid images were shown in subplot (d).

Overview of the rice classification

We applied the Random Forest (RF) algorithm and integrated multiple relevant features to generate annual maps of paddy rice based on. Google Earth Engine Platform (Fig. 2). In each of the six ACZs, the annual rice map was independently generated following four steps: (1) The Landsat 5/7/8 Surface Reflectance imagery was synthesized, interpolated and smoothed to produce classification-ready imagery used as input for the rice classifier. The 300 m land use dataset from the ESA Climate Change Initiative (CCI)31 was used as annual cropland masks to exclude the non-crop pixels. (2) The CCVS method was employed to identify rice sample points and exclude non-rice sample points. In this study, the classification was carried out within the cropland range, meaning that only crop-type samples were needed. Our existing crop products with 10-m resolution from 2017 to 201932 were utilized to generate random sample points for the same period. The RCIE dataset was subsequently developed using the CCVS method to distinguish between rice and non-rice sample points. Following accuracy verification, the CCVS method was extended to other years, and the sample points were expanded by integrating existing datasets with field sampling data. High-resolution image interpretation was used to eliminate clear errors. (3) Feature collections—including vegetation index, texture harmonic regression features as well as the RCIE dataset—were used to train a crop classifier (rice vs. non-rice) based on the RF algorithm. This classifier was applied to the pre-processed Landsat imagery. To reduce the interference of small and medium-sized patches in the classification results and enhance the spatial coherence of the classification map, post-processing was carried out on the preliminary classification results, encompassing majority voting based on super-pixel segmentation as well as connected pixel filtering and smoothing operations. (4) Accuracy assessment and comparative analysis of the rice mapping dataset. A Random Forest classifier was trained using 70% of the samples, while the remaining 30% were used to generate confusion matrices and compute accuracy metrics (OA, UA, PA, and Kappa). The classification results (NECAR) were further validated through comparison with municipal-level rice planting area statistics derived from agricultural statistical yearbooks (2000–2023), and statistical fit (R²) was used to quantify consistency. In addition, annually ground-truth reference samples were constructed via visual interpretation of Landsat and Sentinel-2 imagery, supported by Google Earth high-resolution images and field survey records collected during the 2017–2021 growing seasons. These samples were also used to evaluate the performance of three widely used rice products via identical accuracy assessment procedures. For each dataset, confusion matrices and associated accuracy metrics were calculated using the same validation samples. Additionally, municipal-level rice area estimates from these datasets were compared with official statistics to assess their spatial consistency. Cross-comparisons between NECAR and other rice datasets were also performed at the municipal level to evaluate spatial agreement and potential discrepancies.

Fig. 2
figure 2

Workflow of the rice mapping.

Landsat series images and pre-processing

We obtained Landsat Surface Reflectance Tier 1 imagery for the period 2000–2023 from Landsat 533, 734 and 835 collections via the Google Earth Engine platform36. Following the acquisition of annually valid Landsat observations, we processed the time-series data using a standardized three-step procedure. Specifically, we first generated 16-day composite images by computing the median of valid surface reflectance observations within each interval. Subsequently, linear interpolation and the Whittaker smoother were applied to ensure temporal coverage, suppress residual noise and enhance the phenological signal. However, we might confront with challenges in data continuity gaps particularly for 2012 due to the temporal validity of Landsat 5 and Landsat 8 as well as the scan-line corrector (SLC) failure of Landsat 7 To mitigate these issues and ensure temporal continuity, we applied a data fusion approach based on the Spatial and Temporal Adaptive Reflectance Fusion Model (STARFM)37. This method integrated 8-day MODIS surface reflectance data (MOD09A1)38 with a cloud-free Landsat reference image acquired in July 2011 to reconstruct a synthetic 30 m time series for the 2012 growing season (Fig. 3). As a consequence, we yielded a consistent, gap-filled and cloud-free 16-day composite time-series images across all years, including the synthetic 2012 series reconstructed via STARFM (See the annual valid Landsat observations in Supplementary Figure S1).

Fig. 3
figure 3

Reconstruction of NDVI imagery and time series in 2012 using STARFM and Whittaker smoothing (April 1st, 2012 as an example). (a) Original Landsat 7 NDVI image with striping and data gaps; (b) NDVI image reconstructed through STARFM fusion using MODIS time series and a cloud-free Landsat reference image; (c) Smoothed NDVI image after applying the Whittaker smoother; (d,e) NDVI time series of sample point A B for the whole year of 2012, respectively.

CCVS method to identify rice samples

In order to accurately identify rice sample points, we adopted the comprehensive consideration method based on vegetation phenology and surface water change (CCVS) proposed by Qiu (2015) et al.21. This method identifies rice objects by considering the characteristics of LSWI and EVI2 ranging ratios from the transplanting stage to the heading stage of rice (RCIE). Due to post-transplanting flooding, paddy fields exhibit sustained high LSWI values from transplanting to heading stages, while EVI2 values rise sharply during this same period. In contrast, upland crops typically show lower LSWI values and a similarly strong EVI2 increase (Fig. 4).

Fig. 4
figure 4

The 16-day composite time series of EVI2, LSWI for (a) paddy rice and (b) maize. Note: a1 and a2 represent the images of rice at transplanting and heading stage. b1 and b2 represent maize images at the same stage.

Based on these differences, we calculated the Ratio of Change in LSWI to EVI2 (RCIE) as follows:

$${\rm{RCIE}}=\frac{\triangle {LSWI}}{\triangle {EVI}2}$$
(1)

and applied a threshold-based classification rule to distinguish paddy rice:

$$\text{if}\left(\,,{{LSWI}}_{\min } > {\theta }_{1}\;{and\; RCIE} < {\theta }_{2}\right){rice;\; else\; other}$$
(2)

where RCIE was the range ratio index, representing the range ratio of LSWI and EVI2 from transplanting stage to heading stage; LSWImin was the minimum observed LSWI value of rice from transplanting stage to heading stage; the constants θ1 and θ2 represent thresholds for water saturation and spectral balance, respectively. Following Qiu et al.21,22, default values of θ1 = 0.1 and θ2 = 0.7 were adopted as initial references. In this study, we further optimized the constants θ2 (RCIE < 0.7) for each agro-climatic zone to enhance regional classification accuracy. Specifically, the Jeffries–Matusita (JM) distance between rice and non-rice samples was used as a separability criterion. The constants θ2 was iteratively adjusted from 0.4 to 1.0 (step = 0.05), and the value that yielded the maximum JM distance was selected as the optimal threshold for each zone (Please see the detailed methods illustration, Table S1 and Figures S2S9 in Supplementary Materials).

Based on the abovementioned principles, we constructed a 16-day composite EVI2 and LSWI time series based on cloud-free observations from Landsat imagery and processed the time series using Whittaker smoothing to form a smoothed time series. For each suspected rice sample point, a 50 m buffer zone was generated, and rice and non-rice were distinguished by analyzing the characteristics of the time series within the buffer zone (Fig. 5). To verify the accuracy of the CCVS method, we conducted seven consecutive years of fieldwork (2017–2023). Specifically, the field study was conducted annually in early April and mid-September. We systematically selected 30% of the rice and other crop samples. The locations and categories were recorded using a mobile GIS device (an iPad equipped with the OvitalMap software), and the accuracy of the sample points was validated through on-site visits.

Fig. 5
figure 5

The variation of smoothed EVI2 and LSWI time series curves over a 16-day time step during the key phenological stages of rice.

Training and validation data

We first used a stepwise screening method to collect ground samples from existing crop classification products in Northeastern China: (1) the annual rice planting area dataset for the Asian monsoon region (2000–2020) (PRA)39, (2) the 30 m resolution crop maps from 2013 to 2021(NETCs-30m)40, and (3) cropping mapping datasets from 2017 to 2019 (NETCs-10m)32. In general, all of above three datasets had satisfactory accuracy and similarity, and thus can be used to acquire training/validation samples. To avoid the resolution differences and products errors, we conducted double-validation for initially selected samples using CCVS and field surveys.

Take the generation of sample points in 2019 as an example: we first separately generated 20,000 random points within our study area from the PRA, NETCs-30m and NETCs-10m, and only retained those points with consistent crop types in the three datasets as the basic samples. Secondly, the utilization of an optimized CCVS method for validation ensures the retention of valid data points that accurately reflect either rice or non-rice characteristics. Then, all the valid samples were visually checked using high-resolution images in Google Earth and two S2 RGB composites, including the RGB composite (R: SWIR1, G: NIR, B: Red) from the mid-April to mid-June and the RGB composite (R: NIR, G: SWIR1, B: SWIR2) during early-July to late-August. The samples with obvious errors (such as incorrectly labeled the nature vegetation as crops) were excluded. The samples lied in the roads or the field boundaries were also removed. We ultimately obtained a comprehensive sample set encompassing rice and other crops. Finally, we got a number of ground samples in the order of 10,755 in 2019 (Fig. 6).

Fig. 6
figure 6

The distribution of the ground truth samples in 2019 in (a), Note: The red dots in the figure represent the rice after our field verification, with a total of 1519. The number of the annual samples from 2000 to 2023 were displayed in subplot (b).

Based on existing products, we formed samples for a limited number of years, and extended the sample set by CCVS method, so that the finite year samples can converted to use samples of other years. In each year, the samples were randomly and equally divided into two parts: one part (70%) used for training and classification and the other part used for accuracy evaluation and validation (30%).

Features selection

To account for seasonal spectral changes, seven 16-day spectral indicators were added to the classifier according to the crop calendar during the growing period (DOY 90–300) in Northeastern China (Table 1), including: NDVI (Normalized Difference Vegetation Index), EVI (Enhanced Vegetation Index), EVI2 (Two-band Enhanced Vegetation Index), LSWI (Land Surface Water Index), NDSVI (Normalized Difference Senescent Vegetation Index), NDTI (Normalized Difference Tillage Index), GCVI (Green Chlorophyll Vegetation Index). Using 16-day composite images has been shown to improve time series processing by enhancing spatiotemporal resolution and aiding crop classification41,42. Furthermore, to mitigate the influence of clouds, cloud shadows and snow, we calculated the 10th, 25th, 50th, 75th, 90th percentiles and the interquartile range (25th–75th) of each pixel. This method addressed data gaps by incorporating spectral-temporal metrics and enriching time-series images with temporal context41. To enhance the discrimination of spectrally similar crops, texture metrics derived from NDVI and EVI were also computed using the gray-level co-occurrence matrix (GLCM) following previous research43.

Table 1 The formulation of the seven indices and RCIE used in the study.

Moreover, the time-series image harmonic regression features of the three vegetation (NDVI, EVI, LSWI) indices were extracted to characterize the phenological stages of paddy rice, such as transplanting, tillering, heading, and ripening. These harmonic features help capture the seasonal periodicity and subtle spectral variations of rice growth, while suppressing noise caused by cloud contamination and missing observations44. We conducted the harmonic regression (discrete Fourier transform) on the original valid observations to extract the temporal characteristics of the time series curves.

$$Y\left(t\right)={\beta }_{i,0}+{\beta }_{i,1}\;\cos \;\left(3\omega t\right)+{\beta }_{i,2}\;\sin \;\left(3\omega t\right)+{\beta }_{i,3}\;\cos \;\left(6\omega t\right)+{\beta }_{i,4}\;\sin \;\left(6\omega t\right)$$
(3)

Where Y(t) is the value of the \(i\)-th vegetation index (e.g., NDVI, EVI, LSWI) at time \(i\). \({\beta }_{i,0}\) is the constant term for the \(i\)-th vegetation index. \({\beta }_{i,1}\) and \({\beta }_{i,2}\) are the coefficients for the cosine and sine terms at frequency \(3\omega \) for the \(i\)-th vegetation index. \({\beta }_{i,3}\) and \({\beta }_{i,4}\) are the coefficients for the cosine and sine terms at frequency \(6\omega \) for the \(i\)-th vegetation index. \(\omega \) is the angular frequency of the periodic component. All the features used for classification can be seen in Table 2.

Table 2 The summary of the feature candidates for rice classification.

Random forest algorithm

Random Forest (RF), a decision tree–based ensemble method, is widely used in remote sensing classification due to its enhanced generalization through random sampling of data and features45. RF is composed of multiple decision trees, each built independently using bootstrap samples and random feature subsets, without pruning or dependence on other trees. Majority voting among trees determines the final classification outcome, enhancing model stability and mitigating overfitting. RF has been proven to deliver higher accuracy and robustness compared to conventional methods like maximum likelihood, standalone decision trees, and single-layer neural networks. In this study, we adjusted two key parameters of the RF algorithm in GEE when training rice classifiers: (1) number Of Trees: number of trees determines how many binary CART trees are used in the model. While a higher number can slightly improve accuracy, it also increases computational cost linearly. The number Of Trees in our study was set to 200 following previous work46. (2) min Leaf Population: the minimum number of samples required at a leaf node. We set it to 10 to control tree depth and prevent overfitting. The other parameters—Variables Per Split (default: square root of feature count), Bag Fraction (default: 0.5), Out-of-Bag Mode, and Seed—were kept at their default settings in GEE.

Data Records

The dataset provides annual paddy rice distribution maps for Northeastern China from 2000 to 2023 at a nominal spatial resolution of 30 meters. The dataset is stored in GeoTIFF format, with pixel values coded as 1 for rice and 0 for non-rice areas. The spatial extent of the dataset ranges from 38.7°N to 53.8°N latitude and 115.5°E to 135.0°E longitude. All data layers can be viewed and downloaded via Figshare https://doi.org/10.6084/m9.figshare.2840771047, and are ready for use in GIS software or remote sensing analysis platforms.

Technical Validation

Accuracy assessment with testing samples

We used the remain 30% samples as validation by generating the confusion matrix which the Producer Accuracy (PA), User Accuracy (UA), producer accuracies (PA) and the Kappa coefficient for the annual rice maps were used as validation indicators. Our results indicated that the Overall Accuracy (OA) for all classified results are higher than 0.9, of which the average OA is 0.95 and the highest value emerged in 2007 (0.97). Moreover, the Kappa coefficient varies from 0.89 to 0.95, with the highest value (0.95) in 2007 and the lowest value (0.89) in 2012, with the average at 0.92 (Fig. 7). In general, the high accuracies indicated our classification algorithm performs well and the annual rice maps were reliable.

Fig. 7
figure 7

The overall accuracy (OA), user accuracies (UA), producer accuracies (PA), and the Kappa coefficient (Kappa) for annual rice classification from 2000 to 2023.

Comparison with agricultural statistics

Since the 2023 Statistical Yearbook has not been updated for the time being, we calculated the annual paddy rice areas using the pixel number approach and then compared with the annual statistical data at the municipal level. Upon comparing the mapped paddy rice areas with statistical data, it was observed that the mapped paddy rice areas had relatively high correlations with the statistical data for 2000–2022, indicated by R2 values between 0.7 and 0.99 at the municipal level (Fig. 8). (See Supplementary Figure S10 for annual comparisons between our rice map and municipal-level statistical yearbook data)

Fig. 8
figure 8

Temporal comparison of rice classification accuracy from different products against statistical yearbook data (2000–2022). The figure shows the coefficient of determination (R2) between annual rice planting areas derived from four products (NECAR, PRA, NETCs-30m, and NETCs-10m) and city statistical yearbook data in Northeastern China from 2000 to 2022.

Validation with independent ground truth samples and product comparison

To assess spatial consistency and reliability, we compared our rice classification product (NECAR) with three existing datasets with varying resolutions and temporal coverages: PRA (500 m, 2000–2020)39, NETCs-30m (30 m, 2013–2021)40, and NETCs-10m (10 m, 2017–2019)32. All datasets were resampled or aggregated to 30 m resolution and compared at the municipal level against official statistical yearbook data and among each other.

To evaluate the reliability of the datasets, we first compared the estimated paddy rice area from PRA, NETCs-30m, and NETCs-10m with the official municipal statistical yearbook data. To ensure spatial consistency, all products were resampled to 30 m resolution using the nearest neighbor method. The rice pixels from each product were then aggregated at the municipal level for comparison (Fig. 8). The results revealed moderate to strong correlations (R² > 0.4) between satellite-derived estimates and official statistics, suggesting that all three products maintain a reasonable level of spatial agreement with ground-reported rice areas. Furthermore, a municipal-level comparison was conducted between our rice mapping product (NECAR) and three reference products (PRA, NETCs-30m, and NETCs-10m) (Fig. 9). The correlation coefficients between NECAR and PRA ranged from 0.78 to 0.98 across multiple years, indicating the temporal robustness and reliability. In comparison with NETCs-30m, which shares the same spatial resolution (30 m), the correlations consistently exceeded 0.97, demonstrating a high level of agreement and spatial consistency. For NETCs-10m, the correlation coefficients for the selected years were 0.96, 0.98, and 0.98, further confirming the accuracy and reliability of NECAR in capturing paddy rice distribution patterns.

Fig. 9
figure 9

Municipal-level validation and accuracy assessment of the rice mapping products. (a) Correlation coefficients between NECAR and three reference products (PRA, NETCs-30m, and NETCs-10m) at the municipal scale from 2000 to 2021, reflecting temporal agreement and spatial consistency. (b) Annual Overall Accuracy (OA) of NECAR and reference PRA (500 m), NETCs-30m, and NETCs-10m.

To comprehensively evaluate the classification accuracy of rice mapping products, we also constructed annual ground reference datasets from 2000 to 2021, which were consistent with the temporal coverage of existing products (The annual distribution and total number of truly ground sample points were shown in the Supplement Figure S11). These reference samples were generated through visual interpretation of remote sensing imagery, supported by high-resolution Google Earth images, and validated using long-term ground survey data collected during the period from 2017 to 2021. Based on these annually updated ground samples, we assessed the classification performance of NECAR as well as three existing products with different spatial resolutions. In the main text, we present the Overall Accuracy (OA) for each product and year (Fig. 9), while additional accuracy metrics—including the Kappa coefficient, User’s Accuracy (UA), and Producer’s Accuracy (PA)—are provided in the Supplement Table S2. NECAR consistently outperformed or matched the classification accuracy of the three existing rice mapping products across all evaluated years. Specifically, NECAR exhibited the highest Overall Accuracy (OA) in most years, indicating its robustness and generalization capability over time. In terms of spatial resolution, NETCs-10m showed advantages from 2017 to 2019 due to its finer granularity, but NECAR maintained a more stable performance across the fulltime span. The Kappa coefficients, UA, and PA metrics (Table S2) further confirmed NECAR’s superior balance between commission and omission errors, highlighting its reliability in capturing both rice presence and absence. These findings collectively validate NECAR as a highly accurate and temporally consistent product for annual paddy rice mapping in the study area.

We also compared five locations in 2004, 2010, 2013, 2017 and 2019 to show the differences among the four datasets (Fig. 10). Overall, our rice patch is more elaborate and captures subtler differences in contrast to PRA. Furthermore, our map is more in line with the rice distribution extracted from NETCs-30m and is even more precise. In comparison with NETCs-10m, it could potentially be slightly coarser due to the dissimilar resolutions of Landsat and Sentinel-2. The details of our 2022 rice classification products were shown in Fig. 11.

Fig. 10
figure 10

Visual comparison of four rice datasets (NECAR, PRA, NETCs-30m and NETCs-10m) with Google Earth high-resolution imagery across five locations (Columns 1–5). The top row shows the location and high-resolution reference imagery. The subsequent four rows correspond to a specific dataset. Blank cells represent unavailable data.

Fig. 11
figure 11

The spatial details of rice map in 2022.