Abstract
This study introduces the crop and land cover land use (CLCLU) dataset, a 30 m resolution product providing annual maps of CLCLU across the transnational Middle Rio Grande (MRG) region, spanning both the U.S. and Mexico from 1994 to 2024. The model was trained using the Cropland Data Layer (CDL) on the US side. Dual-month (July and December) Landsat composites and a semantic segmentation model, MANet with ResNeXt-101 encoder, under four strategies were used to address sensor and temporal variability. This model architecture was chosen for its intrinsic ability to capture detailed spatial patterns and contextual dependencies through its attention-based design and ResNeXt-101 encoder, which demonstrated strong performance, particularly in generalizing across data-scarce regions in Mexico. The dataset achieved 97.10% overall accuracy and 78.85% mean Intersection over Union (mIoU), over validation process using a held-out CDL subset. Validation against NLCD and MCD12Q1-UMD confirmed high agreement. Data availability differences, minimal ground truth on the Mexican side, and cloud-related artifacts in early years led to some misclassification.
Similar content being viewed by others
Background & Summary
Land cover (LC) is a fundamental element of the Earth system, linking the biosphere, atmosphere, and hydrosphere through its influence on energy exchange, water balance, and carbon cycling1,2,3,4,5. Organized into hierarchical categories, LC supports diverse habitats and plays a key role in environmental and ecological modeling6,7,8,9. Over recent decades, rapid urban expansion and intensified human activity have significantly altered land cover, placing growing pressure on ecosystems and affecting water resources, air quality, food systems, and biodiversity6,10,11,12,13. As such, accurate and timely information on LC and Land Cover and Land Use Changes (LCLUC) is critical for understanding Earth system dynamics, supporting sustainable development, ensuring food and water security, and managing natural resources effectively14,15,16.
Accurate LC data is especially important in transboundary regions, where ecosystems and natural resources often extend beyond political boundaries. Coordinated management in such areas depends on consistent and reliable datasets17,18,19. However, imbalances in data availability, such as along the U.S.–Mexico border, can hinder joint efforts to monitor and manage shared environments20,21. High-resolution land cover maps support cross-border collaboration, inform sustainable resource management, and enhance the resilience of both ecological systems and human communities22,23. Satellite remote sensing has significantly advanced LC monitoring by providing long-term, high-resolution Earth observation (EO) data24,25. Numerous studies have leveraged satellite imagery for global LC mapping. For instance, MODIS-based products like MCD12Q126 and GLASS datasets27 offer long time series but at coarse resolutions (500 m and 5 km), limiting their utility for detailed analyses. Efforts like the ESA Climate Change Initiative (300 m) have improved accuracy using multi-source data and machine learning, yet still lack fine spatial detail28. The emergence of freely available high-resolution EO data, such as Landsat and Sentinel-2, has enabled finer-scale mapping. Landmark contributions include the FROM_GLC (Finer Resolution Observation and Monitoring of Global Land Cover) 30 m global LC map and its 10 m successor, both utilizing extensive training data and advanced classification methods29,30. Several studies have developed integrated frameworks for land cover mapping that combine multiple classifiers and multisource remotely sensed data. For instance, the LCMM (Land-Cover Mapping with Multiple Classifiers and Multisource data) framework utilizes monthly time-series imagery from sensors such as Landsat and MODIS to enhance classification performance across heterogeneous landscapes31. Other approaches have employed comprehensive geographical partitioning in conjunction with hierarchical classification decision trees and benchmark-based change detection methods to ensure both spatial consistency and temporal reliability in long-term land cover mapping32. Despite recent advancements in LC mapping, many existing datasets classify land cover into broad categories such as forest, shrubland, developed areas, or agricultural fields, which limits their utility for detailed analysis33. More granular information on specific agricultural practices is essential in regions like the Middle Rio Grande (MRG), where effective management of interstate shared resources is critical20,23. The Cropland Data Layer (CDL) is currently the only publicly available dataset that offers detailed insights into crop-specific land use in the US34. However, its temporal coverage, beginning in 2008, and spatial limitation to the conterminous United States (CONUS) restrict its applicability in transboundary regions, where agricultural activities often rely on shared surface water basins and transboundary groundwater aquifers.
Recent advancements in deep learning architectures, particularly their ability to leverage transfer learning, have substantially enhanced the performance of LCLUC mapping35,36. A wide range of image segmentation models have been evaluated on benchmark remote sensing datasets such as Gaofen-2, ISPRS Urban Segmentation, DeepSat (SAT-4), EuroSAT, and the Munich dataset37,38,39,40,41,42,43. These high-resolution and diverse datasets provide robust platforms for assessing segmentation accuracy in EO applications. Leveraging these datasets, deep learning models have achieved significant improvements in classification precision, thereby advancing the effectiveness of LCLUC monitoring.
For example, U-Net has been employed to differentiate between bare and cultivated fields using Sentinel-2 imagery achieving an overall accuracy (OA) of 90%44, while U-Net++ has been evaluated using varying patch sizes and encoder backbones for crop delineation tasks. The model achieved OA ranging from 96.86% to 97.72% and F1 scores from 71.29% to 80.75% over Sentinel-2 imagery, outperforming its application on Gaofen imagery, where OA ranged from 75.34% to 97.72% and F1 scores from 54.89% to 73.25%44,45. Integrating U-Net within a generative adversarial network (GAN) framework has further improved mapping accuracy resulting in a mIoU of 58.62%, which outperformed a standard CNN with mIoU: 49.28%46. Comparative studies involving architectures such as U-Net, DeepLabv3+, and SegNet applied to Landsat 8 data have demonstrated their capability to classify LC into key categories including urban, agricultural, forest, and water classes, where U-Net achieved the highest mIoU of 55.09% and OA of 81.93%. In the same study, traditional machine learning classifiers were also evaluated, such as Maximum Likelihood Estimation (mIoU: 21.23%, OA: 68.12%), Random Forest (mIoU: 26.14%, OA: 74.69%) and Support Vector Machines (mIoU: 24.67%, OA: 72.86%), which were significantly outperformed by deep learning approaches47. These investigations highlight the growing utility of deep learning for high-resolution, scalable LC classification5,45,48.
More recently, the effectiveness of multitemporal input imagery and different semantic segmentation architectures has been investigated for the specific task of Crop and Land Cover Land Use (CLCLU) mapping33. In addition, related research explored the utility of multi-sensor imagery, including Landsat 8 optical data, Synthetic Aperture Radar (SAR), and Digital Elevation Models (DEMs), to enhance the accuracy and robustness of CLCLU classification49. In aforementioned studies, models were trained using data from the United States and transfer learning was applied to predict land cover patterns in Mexico, demonstrating the potential for cross-border generalization and the scalability of deep learning approaches in data-limited regions.
However, there remains a critical need for a high-resolution, long-term historical CLCLU dataset for the transboundary MRG region, one that maps major LC types along with predominant agricultural practices over multiple decades. To address this gap, we utilized multitemporal Landsat imagery (from Landsat 5 and 8), the only EO sensor providing consistent high-resolution data since 1984, to generate annual CLCLU maps50. To the best of our knowledge, this is the first dataset that captures agricultural practices in the MRG region across both sides of the U.S.–Mexico border. For model training and validation, we used the annual CDL for the U.S. side of the MRG (2008–2019) as ground truth and developed a semantic segmentation model using the Multi-Attention Network (MANet) architecture with a ResNeXt-101 encoder34,51,52. The trained model was then applied to predict CLCLU patterns on the Mexican side of the border. The spatial continuity and proximity of agricultural fields across the border enabled the use of transfer learning, under the assumption that agricultural practices exhibit similar patterns on both sides of the border.
The resulting CLCLU product represents the first binational dataset encompassing both the U.S. and Mexican regions of the Middle Rio Grande (MRG), providing annual land cover maps from 1994 to 2024 that capture key crop types and agricultural practices at a 30-meter spatial resolution. The generated maps were validated against the National Land Cover Database (NLCD) and MODIS MCD12Q1-UMD products for core LC classes, including water bodies, developed areas, and croplands53,54. Additionally, we performed a temporal consistency analysis between our product and the CDL over overlapping years. To the best of our knowledge, this is the only data-driven product that relies exclusively on optical EOs to provide a scalable, long-term CLCLU monitoring solution. The insights offered by this dataset are especially valuable given the socio-environmental significance of agriculture in the U.S.–Mexico borderlands, which supports rural livelihoods, food production, and shared water resources. The region faces mounting challenges, including diminishing water availability, declining water quality, and increasing economic and environmental pressures55,56,57. This product can help track long-term trends in land use and cropping patterns, identify key drivers of change, and support early warning efforts for emerging risks such as severe climatic events, prolonged droughts, and yield reductions that may impact regional agricultural sustainability.
Methods
This section outlines the processing workflow employed in this study to generate the annual CLCLU dataset for MRG. The workflow encompasses data preprocessing and analysis, training and validation dataset construction, model training procedures, accuracy assessment, prediction, and inter-comparison with benchmark products (Fig. 1). Each step is described in detail in the following sections.
Study area
This study investigates CLCLU in Middle Rio Grande Basin (36,988 km2), extending from San Antonio, NM, to Presidio, TX, and Ojinaga, Chihuahua (Fig. 2). The Rio Grande River, originating in Colorado’s San Juan Mountains, is a vital hydrological resource governed by international treaties 1906, and 194458,59. The MRG region features a variety of land cover types, ranging from urban areas and crop fields to natural landscapes. The CLCU dataset aims to support adaptive water management strategies in this ecologically and socio-economically significant transboundary watershed, addressing challenges such as climate-driven alterations in water availability, increasing anthropogenic demand, and declining river discharge.
Satellite data source
The Landsat mission provides valuable EO data at a 30-meter spatial resolution since 1984, making it well-suited for large-scale LC monitoring. This study uses atmospherically corrected surface reflectance and land surface temperature (LST) datasets derived from both Landsat 5 Thematic Mapper (TM) (https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LT05_C02_T1_L2#description) and Landsat 8 Operational Land Imager/Thermal Infrared Sensor (OLI/TIRS) imagery (https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2#description). The selected bands include four visible and near-infrared (VNIR) bands, two short-wave infrared (SWIR) bands, and one thermal infrared (TIR) band, all standardized to a 30-meter resolution through orthorectification. These spectral bands were chosen due to their established relevance in distinguishing key land cover types and crop phenological stages: VNIR and SWIR for capturing vegetation structure and soil properties and conditions, and TIR for representing thermal conditions associated with irrigation and cropping intensity. Additionally, NDVI was included as a derived feature to enhance sensitivity to green vegetation cover60,61,62,63,64. The Landsat Collection 2 Level-2 Tier 1 products used here incorporate improved geometric and radiometric corrections, removal of extreme latitude DEM constraints, and adjustments for TIRS anomalies, thereby increasing data fidelity, especially at high latitudes50,65,66. Additional processing details are available in the Data Format Control Books67.
Benchmark LC products
The USDA National Agriculture Statistics Service (NASS) Cropland Data Layer (CDL) (https://developers.google.com/earth-engine/datasets/catalog/USDA_NASS_CDL), derived from Landsat 8/9 OLI/TIRS and Sentinel-2A/2B multispectral imagery throughout growing seasons, provides comprehensive 30-m resolution coverage across the CONUS since 2008, achieving classification accuracies between 85% and 95%, varying by crop type34. Agricultural ground reference data originates from the Farm Service Agency (FSA) Common Land Unit (CLU) Program, supplemented by additional state-specific datasets from sources such as the US Bureau of Reclamation and the California Department of Water Resources68.
The MODIS Land Cover Type Product (MCD12Q1) (https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD12Q1) provides global LC maps at an annual temporal resolution and a spatial resolution of 500 meters from 2001 to the present. This dataset includes several science data sets (SDSs) representing multiple classification schemes, such as the International Geosphere-Biosphere Program (IGBP), University of Maryland (UMD), alongside a three-layer scheme based on the Land Cover Classification System (LCCS) of the Food and Agriculture Organization (FAO). The MCD12Q1 product is generated using supervised classification techniques applied to smoothed spectro-temporal features derived from MODIS Nadir BRDF-Adjusted Reflectance (NBAR) data, subsequently refined by Hidden Markov Models to reduce inter-annual variability26,54.
The National Land Cover Database (NLCD) (https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD_RELEASES_2019_REL_NLCD) is a multi-temporal dataset developed by the U.S. Geological Survey (USGS) and multiple federal partners, providing detailed land cover and land cover change information for the CONUS at 30-meter resolution. Initially released for 1992 and updated in 2001, 2006, and 2011, the latest generation, NLCD, integrates advanced methodologies, including multi-source data fusion, hierarchical classification strategies, and expert knowledge. It employs automated processes for assembling and preprocessing Landsat imagery, multi-temporal change detection, and continuous field modeling to ensure spatial and temporal consistency. NLCD enhances previous databases by including new LC classes, improving shrubland and grassland representation, and refining urban impervious surfaces and tree canopy cover layers69.
Spectral configuration
Considering the phenological cycle of dominant crop types in the MRG region, where July captures peak vegetation conditions and December reflects post-harvest or dormant states, median composite imagery from these two months was identified as the optimal temporal configuration for CLCLU delineation. Among the input strategies tested, the July–December combination yielded the highest segmentation performance using MANet (mIoU: 75.81), outperforming RGB bands of annual median (mIoU: 72.99), annual medians (mIoU: 74.70), and seasonal median composites (mIoU: 75.31)33. Thus, the same spectral combination was consistently employed as input features for training, validation, and prediction in this study. Additional spectral data, such as synthetic apparatus radar imagery or terrain information, were not included due to their unavailability for the entire study period (1994–2024). Table 1 provides detailed information on the spectral combinations used.
Training sample generation
The data utilized in this study were sourced from the Google Earth Engine (GEE) catalog (Landsat Collection 2, Level 2, Tier 1)70. Dual-month (July and December) Landsat imagery composites were generated for the Region of Interest (RoI) through the GEE platform, using median pixel values to mitigate cloud cover and ensure reliability. All composite bands were normalized using their minimum and maximum values. Based on the CDL raster for the U.S. side of the MRG, 41 LC types were identified from 2008 to 2023. Agricultural classes were categorized into Alfalfa/Hay, Cotton, Pecan, and an aggregated “other crops” class, while non-agricultural areas were grouped into Forest/Shrubland, Barren/Grassland, Water bodies, Developed, and Background. The Background represents pixels corresponding to no-data values, those falling outside the defined class mapping scheme, or those located beyond the defined RoI (Table S.1). Annually, 1,000 tiles of 64 × 64 pixels per class were generated for validation, and 10,000 tiles per class for training, each uniquely identified by its center pixel. Tiles were randomly generated with uniform distribution across classes in terms of center pixel, resulting in a dataset configuration of 80,000 tiles for training and approximately 8,000 tiles for validation corresponding to each year.
Semantic segmentation architecture
Multi-Attention-Network (MANet) is a novel semantic segmentation architecture, designed specifically for high resolution remote sensing imagery51. This architecture incorporates an innovative attention mechanism that efficiently explores intricate feature combinations extracted by the encoder module for the task at hand33,49. The MANet’s key innovation lies in its attention mechanism, which integrates kernel attention with linear complexity, thereby significantly reducing the computational requirements of the attention module. This approach represents an advancement over traditional methods by generalizing the dot-product attention (represented in Eq. 1) to a kernel-based formulation (expressed in Eq. 2), referred to as the Kernel Attention Mechanism (KAM).
where
The Channel Attention Mechanism (CAM) further enhances the segmentation outputs, as follows.
The MANet architecture employed in this study leverages a ResNeXt-101 encoder to extract deep, high-level feature representations from the input data. For additional details on the network architecture and its underlying mechanisms, please refer to the supplementary document.
Training protocol
For the period from 2013 to 2024, individual models were trained separately for each year using the corresponding annual training dataset (S.0). Each model was trained for 200 epochs with a batch size of 128 and an initial learning rate of 0.001, optimized using the Adaptive Moment Estimation (Adam) algorithm. To address class imbalance—common in Land Cover (LC) classification—weighted cross-entropy loss was employed, with class weights assigned inversely proportional to class frequency. Additionally, stratified sampling was incorporated using PyTorch’s WeightedRandomSampler to balance class representation within each training batch. For the Landsat 5 period (1994–2011), three distinct training strategies were implemented: S.1 involved initial training on Landsat 8 imagery (2013–2024) followed by fine-tuning using Landsat 5 imagery from years 2008 and 2011; S.2 combined training imagery from 2008, 2011, and the Landsat 8 period (2013–2024); S.3 utilized only Landsat 5 imagery from 2008 and 2011. To improve generalization and robustness, data augmentation techniques such as random rotations and horizontal/vertical flips were applied. All training processes were conducted using two NVIDIA RTX A6000 GPUs.
Data Records
The dataset is available at Zenodo71: https://doi.org/10.5281/zenodo.15116835. The CLCLU dataset comprises 30 annual raster files (see Table 2 for descriptive characteristics) spanning from 1994 to 2024, representing the first long-term, high-resolution Crop and Land Cover Land Use (CLCLU) classification product for the transboundary Middle Rio Grande (MRG) region. Annual classified maps are provided in GeoTIFF format with a spatial resolution of 30 meters and are referenced to the EPSG:4326 coordinate system. Each file contains pixel-level land cover and crop type classifications derived from Landsat 5 and Landsat 8 imagery. The pixel values represent the following CLCLU classes:
-
0 when the pixel is Alfalfa or Hay
-
1 when the pixel is Cotton
-
2 when the pixel is Pecan
-
3 when the pixel is other crops
-
4 when the pixel is Forest or Shrubland
-
5 when the pixel is Grassland or Barren
-
6 when the pixel is Water body
-
7 when the pixel is Developed
-
8 when the pixel is Background
Technical Validation
For models trained on the Landsat 8 period (2013–2024), validation was performed using the corresponding year’s validation dataset. For the Landsat 5 period, models S.1, S.2, and S.3 were validated using data from the years 2008 and 2011 only. Years 2009 and 2010 were excluded from both training and validation due to significant inconsistencies in the CDL ground truth maps, while 2012 was omitted due to data quality issues in Landsat 7 imagery. Once trained and validated, all three models (S.1, S.2, and S.3) were used to predict LC for each year from 1994 to 2011. The final prediction for each year was derived by consolidating outputs: common predictions among models were retained, and discrepancies were resolved by selecting the output from the model with the highest Intersection over Union (IoU) score, supported by visual inspection. The evaluation metrics used to assess segmentation performance are summarized in Table 3.
Accuracy assessment of CLCU
The performance of trained MANet models for CLCLU delineation was assessed using multiple metrics across various classes and temporal validation datasets (Fig. 3 and Table S.2). The Intersection over Union (IoU) metric indicated consistent classification accuracy among the evaluated categories. Specifically, Alfalfa/Hay exhibited an average IoUC of 76.01%, ranging from a minimum of 68.76% to a maximum of 81.49%. The Cotton class achieved an average IoUC of 79.85%, with values spanning from 71.92% to 87.23%. For Pecan, the mean IoUC was 82.93%, ranging between 76.63% and 88.06%. The ‘Other crops’ category demonstrated an average IoUC of 71.66%, varying from 60.85% to 79.03%. Forest/Shrubland achieved a mean IoUC of 78.07%, with minimum and maximum values of 73.96% and 82.81%, respectively. Grassland/Barren had an average IoUC of 62.76%, exhibiting considerable variability (48.78–70.64%). The Water bodies category showed high accuracy, averaging 84.11% IoU, and ranged from 80.23% to 86.76%. Developed areas displayed a mean IoUC of 74.06%, fluctuating between 63.60% and 79.17%. The Background class consistently demonstrated the IoUC, averaging 98.33%, with a narrow range of 97.53% to 98.92%.
Overall classification accuracy across all evaluated years averaged 97.10%, ranging from 96.40% (S.2) to 97.69% (2024). The mIoU was 78.85%, with values spanning from 73.21% (S.2) to 82.77% (2016), indicating robust class separability across land cover types. Recall demonstrated strong average performance at 90.58%, varying between 88.49% (2022) and 92.34% (2016), reflecting the model’s consistency in detecting relevant classes. Notably, the F1-score averaged 87.83%, ranging from a minimum of 83.82% (S.2) to a maximum of 90.40% (2016), underscoring the model’s reliable overall performance. The F2-score, which emphasizes recall more heavily, showed similarly strong results, averaging 89.40%, with a minimum of 86.45% (S.2) and maximum of 91.52% (2016). These results highlight the model’s precision and sensitivity in delineating key land cover and crop classes across diverse years. Detailed metrics are presented in Table S.2 of the supplementary materials.
Intercomparison with existing thematic products
In addition to comparisons with the CDL for the available years (2008–2014), our dataset was also evaluated against other established land cover products, including the National Land Cover Database (NLCD) and the MODIS Land Cover Type Product (MCD12Q1-UMD classification). These thematic datasets were spatially aggregated to a uniform grid size of 0.05° × 0.05°, facilitating the computation of area fraction for each class. To quantitatively assess the level of agreement among these products, scatter plots were generated. Evaluation metrics including the coefficient of determination (R²) and root mean square error (RMSE) were calculated and used to quantify their correspondence. This grid-based area fraction comparison method was adopted from a study, that applied a similar approach in the evaluation of a 30-meter annual land cover dataset for China, with products with significant differences in resolution and thematic classification schemes5. Figure 4 illustrates the area fractions comparing our product with NLCD datasets for crop fields, developed areas, and water bodies on the U.S. side of the MRG for the years 2001, 2004, and 2006. On average, our dataset achieved an R² of 0.8780 and an RMSE of 0.0938 for crop fields, an R² of 0.9143 and an RMSE of 0.0647 for developed areas, and an R² of 0.7953 and an RMSE of 0.0757 for water bodies.
Notably, the year 2004 exhibited the lowest performance metrics, underscoring the significant adverse impact of cloud cover and shadows, as well as their positional impact, on accurate CLCLU delineation and overall classification reliability (Fig. 5). Specifically, cloud contamination and associated shadows disrupted spectral signatures, leading to substantial misclassification and reduced model accuracy.
Figures 6, 7 present comparisons of area fractions for developed areas and crop fields between our dataset and the MCD12Q1-UMD product for the period from 2001 to 2023, encompassing both the U.S. and Mexican sides of the Middle Rio Grande (MRG). On average, our dataset exhibits strong spatial agreement with MCD12Q1-UMD, yielding an R² of 0.8661 and an RMSE of 0.1207 for developed areas, and an R² of 0.8557 and an RMSE of 0.0990 for crop fields. However, a noticeable decline in R² is observed for developed areas from 2021 to 2023. We attribute this reduction primarily to improvements in the classification of developed areas in the CDL map beginning in 2021, combined with the classification criteria of the MCD12Q1-UMD product, which requires a pixel to exhibit at least 60% impervious (built-up) surface to be categorized as developed. For more details regarding the data associated with the intercomparison of our product with NLCD and MCD12Q1-UMD, please refer to the Supplementary Materials section S.2.
LC dynamics and temporal trends
Figure 8 presents temporal trends of various land cover classes for the U.S. side of the Middle Rio Grande (MRG), comparing predicted land cover areas (1994–2024) against the CDL dataset (2008–2024). Overall, strong agreement is observed between our predictions and CDL across most years. However, notable discrepancies occur in cotton areas for 2009 and developed areas for 2010. To further investigate and evaluate these discrepancies, we have visualized both our predicted land cover and the corresponding CDL classifications for these specific years (Fig. 9).
Figure 9 compares our product with CDL maps for the years 2008 to 2011. In 2009, CDL notably misclassified certain areas as cotton, whereas our models identified land cover classes more consistent with the input imagery, demonstrating enhanced temporal stability. Similarly, in 2010, the CDL maps underrepresented developed areas, particularly around El Paso, while our models again provided more accurate and temporally stable classifications when compared to the Landsat 5 imagery.
Usage Notes
The CLCLU maps presented in this study constitute the first product capable of providing annual land cover monitoring at a 30-meter spatial resolution over an extended temporal period (1994–2004) utilizing solely optical imagery as inputs. However, reliance on optical data inherently introduces vulnerability to artifacts arising from cloud coverage and shadow presence. Although we have used median Landsat imagery composites, these artifacts disrupt spectral signatures associated with various land cover classes, thereby adversely affecting classification accuracy.
The impact of cloud and shadow-induced artifacts is less pronounced for the years where CDL maps are available on the U.S. side of the MRG region (2008–2024), as dedicated models have been fine-tuned annually for each specific year. In contrast, for earlier years (1994–2007), three distinct training strategies have been employed, potentially making these years more susceptible to such disruptions and undermining the temporal stability of the generated maps. The training strategies include: (S.1) initial training on Landsat 8 imagery (2013–2024) with subsequent fine-tuning using Landsat 5 imagery from 2008 and 2011; (S.2) combined training with imagery from 2008, 2011, and the Landsat 8 period (2013–2024); and (S.3) exclusive use of Landsat 5 imagery from 2008 and 2011. As demonstrated in Fig. 5, the susceptibility to cloud and shadow presence can affect the robustness and consistency of land cover classifications for earlier periods. Previous studies have demonstrated the effectiveness of incorporating radar imagery to mitigate the adverse effects of cloud contamination49. However, for the earlier period (1994–2014), radar sensor imagery is unavailable72. To overcome this limitation, some research has suggested the application of generative models to synthesize radar imagery corresponding to available optical datasets73,74. Despite this innovative approach, the use of generated radar imagery can introduce additional uncertainties into the classification models. Furthermore, it was indicated that while incorporating multi-sensor data such as synthetic apparatus radar imagery or digital elevation models (DEM) significantly contributes to the delineation of developed areas and water bodies, it does not substantially improve the discrimination of various crop types49. Thus, the benefits of multi-sensor fusion for crop delineation, specifically within agricultural domains, remain limited.
The CDL, while highly detailed in its delineation of crop types and agricultural practices, does not seem to incorporate any form of temporal consistency framework34,75,76. This absence is reasonable, particularly given the temporal variability inherent in agricultural land use, which contrasts with the relatively stable patterns observed in other land cover types. Constraining predictions through posterior probabilities or temporal conditioning, as is common in approaches utilizing Markov Chains or similar methods, may inadvertently introduce additional uncertainties into model outputs due to the dynamic nature of agricultural activities. Therefore, in this study, no temporal consistency strategies were implemented to condition predicted classes on prior or posterior class probabilities. The development of novel methodologies capable of integrating temporal consistency into simultaneous CLCLU delineation represents a critical area for future research. Such frameworks could enhance the robustness and reliability of annual mapping products, particularly in regions characterized by rapid or recurrent changes in agricultural practices.
For comparative analysis, we examined two widely used LC datasets: NLCD and the MCD12Q1-UMD product, as the CDL datasets lack temporally continuous data for model finetuning from 1994 to 2008, the resulting maps reflect direct predictions generated by our proposed method. Quantitative comparisons demonstrate a high degree of spatial agreement between our product and these reference datasets. However, it is important to note that neither NLCD nor MCD12Q1-UMD provides information on crop-specific classifications or management practices to the extent that CDL does. As a result, accurate validation and inference regarding individual crop types for the years prior to 2008 remain limited. As no ground-reference surveys directly used for the transboundary MRG region and the CDL was the sole surrogate label source during training, the maps, especially those for 1994 – 2007, are model-derived estimates generated without fine-tuning or direct observations; they should therefore be interpreted with caution.
Code availability
The PyTorch implementation of the model architecture utilized in this study, along with the corresponding training, and evaluation scripts, are available at the following GitHub repository: https://github.com/waterdmd/Semantic_segmentation_for_LCLUC. This study utilized publicly available satellite and land cover datasets. Landsat 8 surface reflectance data were obtained from the United States Geological Survey (USGS) through the GEE platform: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LC08_C02_T1_L2. Historical imagery from Landsat 5 was similarly accessed via GEE platform: https://developers.google.com/earth-engine/datasets/catalog/LANDSAT_LT05_C02_T1_L2. Cropland classification was performed using the USDA National Agricultural Statistics Service Cropland Data Layer (CDL): https://developers.google.com/earth-engine/datasets/catalog/USDA_NASS_CDL. Additionally, land cover information was sourced from the National Land Cover Database (NLCD): https://developers.google.com/earth-engine/datasets/catalog/USGS_NLCD_RELEASES_2019_REL_NLCD and MODIS Land Cover Type Product (MCD12Q1): https://developers.google.com/earth-engine/datasets/catalog/MODIS_061_MCD12Q1. The final CLCLU product generated in this study is publicly accessible via Zenodo at: https://doi.org/10.5281/zenodo.15116835.
References
Azizi, K., Diko, S. K. & Meier, C. I. A Citizen Science Approach to the Characterisation and Modelling of Urban Pluvial Flooding. 16, (2023).
Ebrahimi, S. et al. Assessing field scale spatiotemporal heterogeneity in salinity dynamics using aerial data assimilation. Agric. Water Manag. 305, 109114 (2024).
Huang, K. et al. Facilitating urban climate forecasts in rapidly urbanizing regions with land-use change modeling. Urban Clim. 36, 100806 (2021).
Justice, C., Gutman, G. & Vadrevu, K. P. NASA Land Cover and Land Use Change (LCLUC): An interdisciplinary research program. J. Environ. Manage. 148, 4–9 (2015).
Yang, J. & Huang, X. The 30 m annual land cover dataset and its dynamics in China from 1990 to 2019. Earth Syst. Sci. Data 13, 3907–3925 (2021).
Tay, W. & He, Y. Detecting land cover and land use change and its impact on biomass carbon emission from 2001 to 2019 in Arkansas, U.S.A. Biogeochemistry 165, 29–42 (2023).
Xian, G. Z. et al. Climate sensitivity to decadal land cover and land use change across the conterminous United States. Glob. Planet. Change 192, 103262 (2020).
Akhani, M., Davatgari-Tafreshi, M. & Pezeshk, S. Adjusting Central and Eastern United States ground-motion models for use in the Coastal Plain considering the sediment thickness. Earthq. Spectra 40, 2669–2691 (2024).
Davatgari‐Tafreshi, M. & Pezeshk, S. Analysis of Single‐Station Standard Deviation Based on Empirical Data Set from the Central and Eastern United States. Bull. Seismol. Soc. Am. https://doi.org/10.1785/0120240191 (2025).
Dumitrașcu, M. et al. Land Use and Land Cover Changes in a Global Environmental Change Context—The Contribution of Geography. in Research Directions, Challenges and Achievements of Modern Geography (eds. Bański, J. & Meadows, M.) 175–203. https://doi.org/10.1007/978-981-99-6604-2_10 (Springer Nature, Singapore, 2023).
Houston, N. A., Thomas, J. V., Foster, L. K., Pedraza, D. E. & Welborn, T. L. Hydrogeologic Framework and Groundwater Characterization in Selected Alluvial Basins in the Upper Rio Grande Basin, Colorado, New Mexico, and Texas, United States, and Chihuahua, Mexico, 1980 to 2015. Scientific Investigations Report https://doi.org/10.3133/sir20215035 (2021).
Ghoochani, S., Hadiuzzaman, M., Mirza, N., Brown, S. P. & Salehi, M. Effects of water chemistry and flow on lead release from plastic pipes versus copper pipes, implications for plumbing decontamination. Environ. Pollut. 337, 122520 (2023).
Khosravi, M. et al. Multivariate Multi-Step Long Short-Term Memory Neural Network for Simultaneous Stream-Water Variable Prediction. Eng 4, 1933–1950 (2023).
Chen, B., Huang, B. & Xu, B. Multi-source remotely sensed data fusion for improving land cover classification. ISPRS J. Photogramm. Remote Sens. 124, 27–39 (2017).
Wu, B. et al. Dynamics of land cover changes and driving forces in China’s drylands since the 1970 s. Land Use Policy 140, 107097 (2024).
Ebrahimi, S., Azizi, K., Khorram, M., Kashani, A. R. & Ali, A. S. A. Evaluation of hydrological variabilities of water quality models considering event-based scenarios: A case study. Stoch. Environ. Res. Risk Assess. 38, 2425–2449 (2024).
Elliott, M., Borja, Á. & Cormier, R. Managing marine resources sustainably – Ecological, societal and governance connectivity, coherence and equivalence in complex marine transboundary regions. Ocean Coast. Manag. 245, 106875 (2023).
Golovina, E. & Shchelkonogova, O. Possibilities of Using the Unitization Model in the Development of Transboundary Groundwater Deposits. Water 15, 298 (2023).
Namany, S., Govindan, R. & Al-Ansari, T. Operationalising transboundary cooperation through game theory: An energy water food nexus approach for the Middle East and North Africa. Futures 152, 103198 (2023).
Betito, G., Arellano, A. & Sorooshian, A. Influence of Transboundary Pollution on the Variability of Surface Ozone Concentrations in the Desert Southwest of the U.S.: Case Study for Arizona. Atmosphere 15, 401 (2024).
Lara-Valencia, F. et al. Water Management on the U.S.-Mexico Border: Achieving Water Sustainability and Resilience through Cross-Border Cooperation. J. Borderl. Stud. 38, 323–334 (2023).
Matherne, A.-M. & Megdal, S. B. Advances in Transboundary Aquifer Assessment. Water 15, 1208 (2023).
Mubako, S., Belhaj, O., Heyman, J., Hargrove, W. & Reyes, C. Monitoring of Land Use/Land-Cover Changes in the Arid Transboundary Middle Rio Grande Basin Using Remote Sensing. Remote Sens. 10, 2005 (2018).
Coulter, L. L. et al. Classification and assessment of land cover and land use change in southern Ghana using dense stacks of Landsat 7 ETM+ imagery. Remote Sens. Environ. 184, 396–409 (2016).
Dong, Y. et al. Recording Urban Land Dynamic and Its Effects during 2000–2019 at 15-m Resolution by Cloud Computing with Landsat Series. Remote Sens. 12, 2451 (2020).
Hansen, M. C., Defries, R. S., Townshend, J. R. G. & Sohlberg, R. Global land cover classification at 1 km spatial resolution using a classification tree approach. Int. J. Remote Sens. 21, 1331–1364 (2000).
Liu, H. et al. Annual dynamics of global land cover and its long-term changes from 1982 to 2015. Earth Syst. Sci. Data 12, 1217–1243 (2020).
Reinhart, V. et al. Comparison of ESA climate change initiative land cover to CORINE land cover over Eastern Europe and the Baltic States from a regional climate modeling perspective. Int. J. Appl. Earth Obs. Geoinformation 94, 102221 (2021).
Fan, H. et al. An improved Deeplab based Model for Extracting Cultivated Land Information from High Definition Remote Sensing Images. in 2019 IEEE International Conference on Signal, Information and Data Processing (ICSIDP) 1–6. https://doi.org/10.1109/ICSIDP47821.2019.9173010 (2019).
Li, B. et al. An improved global land cover mapping in 2015 with 30 m resolution (GLC-2015) based on a multisource product-fusion approach. Earth Syst. Sci. Data 15, 2347–2373 (2023).
Zhong, B. et al. Finer Resolution Land-Cover Mapping Using Multiple Classifiers and Multisource Remotely Sensed Data in the Heihe River Basin. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 8, 4973–4992 (2015).
Yang, A. et al. 30 m 5-yearly land cover maps of Qilian Mountain Area (QMA_LC30) from 1990 to 2020. Sci. Data 11, 1339 (2024).
Ebrahimi, S. & Kumar, S. Semantic segmentation for simultaneous crop and land cover land use classification using multi-temporal Landsat imagery. Remote Sens. Appl. Soc. Environ. 37, 101505 (2025).
Boryan, C., Yang, Z., Mueller, R. & Craig, M. Monitoring US agriculture: the US Department of Agriculture, National Agricultural Statistics Service, Cropland Data Layer Program. Geocarto Int. 26, 341–358 (2011).
Himeur, Y., Rimal, B., Tiwary, A. & Amira, A. Using artificial intelligence and data fusion for environmental monitoring: A review and future perspectives. Inf. Fusion 86–87, 44–75 (2022).
Svoboda, J., Štych, P., Laštovička, J., Paluba, D. & Kobliuk, N. Random Forest Classification of Land Use, Land-Use Change and Forestry (LULUCF) Using Sentinel-2 Data—A Case Study of Czechia. Remote Sens. 14, 1189 (2022).
Not publhed yet... Huppertz, R., Nakalembe, C., Kerner, H., Lachyan, R. & Rischard, M. Using transfer learning to study burned area dynamics: A case study of refugee settlements in West Nile, Northern Uganda. Preprint at https://doi.org/10.48550/arXiv.2107.14372 (2021).
Potapov, P. et al. The Global 2000–2020 Land Cover and Land Use Change Dataset Derived From the Landsat Archive: First Results. Front. Remote Sens. 3, (2022).
Wang, J. et al. Deep High-Resolution Representation Learning for Visual Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 43, 3349–3364 (2021).
Basu, S. et al. DeepSat: a learning framework for satellite imagery. in Proceedings of the 23rd SIGSPATIAL International Conference on Advances in Geographic Information Systems 1–10. https://doi.org/10.1145/2820783.2820816 (Association for Computing Machinery, New York, NY, USA, 2015).
Gallo, I., La Grassa, R., Landro, N. & Boschetti, M. Sentinel 2 Time Series Analysis with 3D Feature Pyramid Network and Time Domain Class Activation Intervals for Crop Mapping. ISPRS Int. J. Geo-Inf. 10, 483 (2021).
Helber, P., Bischke, B., Dengel, A. & Borth, D. EuroSAT: A Novel Dataset and Deep Learning Benchmark for Land Use and Land Cover Classification. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 12, 2217–2226 (2019).
Rußwurm, M. & Körner, M. Multi-Temporal Land Cover Classification with Sequential Recurrent Encoders. ISPRS Int. J. Geo-Inf. 7, 129 (2018).
Kramarczyk, P. & Hejmanowska, B. UNET NEURAL NETWORK IN AGRICULTURAL LAND COVER CLASSIFICATION USING SENTINEL-2. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. XLVIII-1-W3-2023, 85–90 (2023).
Wang, L., Wang, J., Liu, Z., Zhu, J. & Qin, F. Evaluation of a deep-learning model for multispectral remote sensing of land use and crop classification. Crop J. 10, 1435–1451 (2022).
Kulkarni, A., Mohandoss, T., Northrup, D., Mwebaze, E. & Alemohammad, H. Semantic Segmentation of Medium-Resolution Satellite Imagery using Conditional Generative Adversarial Networks. Preprint at https://doi.org/10.48550/arXiv.2012.03093 (2020).
Boonpook, W. et al. Deep Learning Semantic Segmentation for Land Use and Land Cover Types Using Landsat 8 Imagery. ISPRS Int. J. Geo-Inf. 12, 14 (2023).
Thomas Ramos, L. & Sappa, A. D. Multispectral Semantic Segmentation for Land Cover Classification: An Overview. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 17, 14295–14336 (2024).
Ebrahimi, S. & Kumar, S. What Helps to Detect What? Explainable AI and Multi-Sensor Fusion for Semantic Segmentation of Simultaneous Crop and Land Cover Land Use Delineation. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 1–22 https://doi.org/10.1109/JSTARS.2025.3532829 (2025).
Crawford, C. J. et al. The 50-year Landsat collection 2 archive. Sci. Remote Sens. 8, 100103 (2023).
Li, R. et al. Multiattention Network for Semantic Segmentation of Fine-Resolution Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 60, 1–13 (2022).
Mahapatra, S. K., Pattanayak, B. K. & Pati, B. Attendance monitoring of masked faces using ResNext-101. J. Stat. Manag. Syst. 26, 117–131 (2023).
Fry, J. A., Coan, M., Homer, C. G., Meyer, D. K. & Wickham, J. D. Completion of the National Land Cover Database (NLCD) 1992–2001 Land Cover Change Retrofit Product. Open-File Report https://pubs.usgs.gov/publication/ofr20081379 10.3133/ofr20081379 (2009).
Sulla-Menashe, D. & Friedl, M. A. User Guide to Collection 6 MODIS Land Cover (MCD12Q1 and MCD12C1) Product. (2018).
Belhaj, O. et al. Determination of Change in Surface Waterbodies in The Middle Rio Grande Basin by Modified Normalized Difference Water Index (MNDWI) 1994–2020. 4, 67–80 (2022).
Samimi, M. et al. Climate Change Impacts on Agricultural Water Availability in the Middle Rio Grande Basin. JAWRA J. Am. Water Resour. Assoc. 58, 164–184 (2022).
Samimi, M. et al. Adapting irrigated agriculture in the Middle Rio Grande to a warm-dry future. J. Hydrol. Reg. Stud. 45, 101307 (2023).
Pascolini-Campbell, M., Seager, R., Pinson, A. & Cook, B. I. Covariability of climate and streamflow in the Upper Rio Grande from interannual to interdecadal timescales. J. Hydrol. Reg. Stud. 13, 58–71 (2017).
Schmandt, J. Bi-national water issues in the Rio Grande/Rı́o Bravo basin. Water Policy 4, 137–155 (2002).
Alparone, L., Garzelli, A. & Zoppetti, C. Fusion of VNIR Optical and C-Band Polarimetric SAR Satellite Data for Accurate Detection of Temporal Changes in Vegetated Areas. Remote Sens. 15, 638 (2023).
Çimtay, Y. Estimating Plant Nitrogen by Developing an Accurate Correlation between VNIR-Only Vegetation Indexes and the Normalized Difference Nitrogen Index. Remote Sens. 15, 3898 (2023).
Jenal, A. et al. Development of a VNIR/SWIR Multispectral Imaging System for Vegetation Monitoring with Unmanned Aerial Vehicles. Sensors 19, 5507 (2019).
Milewski, R. et al. Analyses of the Impact of Soil Conditions and Soil Degradation on Vegetation Vitality and Crop Productivity Based on Airborne Hyperspectral VNIR–SWIR–TIR Data in a Semi-Arid Rainfed Agricultural Area (Camarena, Central Spain). Remote Sens. 14, 5131 (2022).
Zhong, B. et al. Land cover mapping using time series HJ-1/CCD data. Sci. China Earth Sci. 57, 1790–1799 (2014).
Micijevic, E., Haque, M. O. & Barsi, J. Radiometric performance of Landsat 8 Collection 2 products. in Earth Observing Systems XXVI (eds. Butler, J. J., Xiong, X. (Jack) & Gu, X.) 23. https://doi.org/10.1117/12.2596198 (SPIE, San Diego, United States, 2021).
Rengarajan, R., Choate, M. J., Storey, J. C., Franks, S. & Micijevic, E. Landsat Collection 2 geometric calibration updates. in vol. 11501 (2020).
Earth Resources Observation and Science (EROS) Center. Landsat 8-9 Operational Land Imager/Thermal Infrared Sensor Level-2, Collection 2. U.S. Geological Survey https://doi.org/10.5066/P9OGBGM6 (2013).
USDA - National Agricultural Statistics Service - Research and Science - Cropland Data Layers. https://www.nass.usda.gov/Research_and_Science/Cropland/sarsfaqs2.php.
Yang, L. et al. A new generation of the United States National Land Cover Database: Requirements, research priorities, design, and implementation strategies. ISPRS J. Photogramm. Remote Sens. 146, 108–123 (2018).
Gorelick, N. et al. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote Sens. Environ. 202, 18–27 (2017).
Kumar, S. & Ebrahimi, S. Annual Crop & Land Cover Land Use (CLCLU) Dataset for the Middle Rio Grande (MRG) Region (1994–2024). Zenodo https://doi.org/10.5281/zenodo.15116835 (2025).
Sentinel-1 SAR GRD: C-band Synthetic Aperture Radar Ground Range Detected, log scaling | Earth Engine Data Catalog. Google for Developers https://developers.google.com/earth-engine/datasets/catalog/COPERNICUS_S1_GRD (2024).
Ghelichkhani, S. et al. Generative Adversarial Networks for Artificial Satellite Image Creation and Manipulation. in Adversarial Multimedia Forensics (eds. Nowroozi, E., Kallas, K. & Jolfaei, A.) 125–143. https://doi.org/10.1007/978-3-031-49803-9_5 (Springer Nature Switzerland, Cham, 2024).
Ramathilagam, A. B., Natarajan, S. & Kumar, A. SAR2NDVI: a pix2pix generative adversarial network for reconstructing field-level normalized difference vegetation index time series using Sentinel-1 synthetic aperture radar data. J. Appl. Remote Sens. 17, 024514 (2023).
Copenhaver, K., Hamada, Y., Mueller, S. & Dunn, J. B. Examining the Characteristics of the Cropland Data Layer in the Context of Estimating Land Cover Change. ISPRS Int. J. Geo-Inf. 10, 281 (2021).
Lin, L. et al. Validation and refinement of cropland data layer using a spatial-temporal decision tree algorithm. Sci. Data 9, 63 (2022).
Acknowledgements
This study is supported by the NASA Land-Cover Land-Use Change (LCLUC) program grant number 80NSSC23K0507, titled “Exploring the Nexus between LCLUC, Socio-Economic Factors, and Water for a Vulnerable Arid US-Mexico Transboundary Region”.
Author information
Authors and Affiliations
Contributions
Saman Ebrahimi: Writing – review & editing, Writing – original draft, Visualization, Validation, Software, Methodology, Formal analysis, Data curation, Conceptualization. Mahdis Khorram: Writing – review & editing, Visualization, Validation. Raquel Neri Barranco: Writing – review & editing, Visualization, Validation. Rosario Sanchez: Writing – review & editing, Funding acquisition. Rocky Talchabhadel: Writing – review & editing, Funding acquisition. Santosh Palmate: Writing – review & editing, Validation. Marisol Dominguez: Writing – review & editing, Validation. Elizabeth F. Racine: Writing – review & editing, Funding acquisition. Saurav Kumar: Writing – review & editing, Supervision, Resources, Project administration, Methodology, Investigation, Funding acquisition, Conceptualization.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ebrahimi, S., Khorram, M., Neri Barranco, R. et al. 30 Years of simultaneous crop & land cover land use maps for Middle Rio Grande from 1994 to 2024. Sci Data 12, 1462 (2025). https://doi.org/10.1038/s41597-025-05771-6
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-025-05771-6











