Predicting wheat yield using deep learning and multi-source environmental data

Ashfaq, Muhammad; Khan, Imran; Shah, Dilawar; Ali, Shujaat; Tahir, Muhammad

doi:10.1038/s41598-025-11780-7

Download PDF

Article
Open access
Published: 21 July 2025

Predicting wheat yield using deep learning and multi-source environmental data

Muhammad Ashfaq¹,
Imran Khan¹,
Dilawar Shah²,
Shujaat Ali² &
…
Muhammad Tahir³

Scientific Reports volume 15, Article number: 26446 (2025) Cite this article

3024 Accesses
1 Citations
Metrics details

Subjects

Abstract

Accurate forecasting of crop yields is essential for ensuring food security and promoting sustainable agricultural practices. Winter wheat, a key staple crop in Pakistan, faces challenges in yield prediction because of the complex interactions among climatic, soil, and environmental factors. This study introduces DeepAgroNet, a novel three-branch deep learning framework that integrates satellite imagery, meteorological data, and soil characteristics to estimate winter wheat yields at the district level in southern Pakistan. The framework employs three leading deep learning models—convolutional neural networks (CNN), recurrent neural networks (RNN), and artificial neural networks (ANN)—trained on detrended yield data from 2017 to 2022. The Google Earth Engine platform was used to process and integrate remote sensing, climate, and soil data. CNN emerged as the most effective model, achieving an R² value of 0.77 and a forecast accuracy of 98% one month before harvest. The RNN and ANN models also demonstrated moderate predictive capabilities, with R² values of 0.72 and 0.66, respectively. The results showed that all models achieved less than 10% yield error rates, highlighting their ability to effectively integrate spatial, temporal, and static data. This study emphasizes the importance of deep learning in addressing the limitations of traditional manual methods for yield prediction. By benchmarking the results against Crop Report Services data, this study confirms the reliability and scalability of the proposed framework. The findings demonstrate the potential of DeepAgroNet to improve precision agriculture practices, contributing to food security and sustainable agricultural development in Pakistan. Furthermore, this adaptable framework can serve as a model for similar applications in other agricultural regions around the world.

Enhanced wheat yield prediction through integrated climate and satellite data using advanced AI techniques

Article Open access 24 May 2025

Winter wheat yield prediction using convolutional neural networks from environmental and phenological data

Article Open access 25 February 2022

Impact of agricultural industry transformation based on deep learning model evaluation and metaheuristic algorithms under dual carbon strategy

Article Open access 31 July 2025

Introduction

Wheat (Triticum aestivum L.) is one of Pakistan’s three primary staple foods, playing a pivotal role in guaranteeing a stable food supply and ensuring food security within the nation. At both the national and district levels, the availability of timely and precise wheat production data is of paramount importance for informed agricultural decision-making and the promotion of sustainable growth. Multiple factors influence grain yield, including soil quality, climatic conditions, field management practices, agricultural subsidy policies and grain market pricing. For instance, high grain prices typically motivate farmers to invest more resources to attain larger yields¹.

Consequently, projecting wheat yields over wide geographic areas remains a formidable challenge for researchers and policymakers. In recent decades, significant efforts have been devoted to forecasting crop yields using remotely sensed data². One such method, known as data assimilation, retrieves critical crop status variables, such as the Leaf Area Index (LAI) and evapotranspiration (ET), from remote sensing data. These variables are then employed to recalibrate and enhance the grain yield predictions generated by crop growth models^3,4. However, the data assimilation approach faces two major challenges: first, it demands local calibration and an abundance of crop-specific inputs (including crop characteristics, field management practices, and climatic and soil data) to effectively model crop growth and development throughout the entire crop cycle^5,6. Second, increasing the observational resolution escalates the processing costs of the data assimilation system, rendering it impractical for large-scale applications⁷. Another category of models used for crop yield projection is statistical regression-based approaches, which are commonly employed for regional yield forecasts^8,9. These methods establish empirical relationships between historical yields and remote sensing variables, such as the Normalized Difference Vegetation Index (NDVI) derived from AVHRR or MODIS satellite data. They are often straightforward to implement and do not require an extensive array of inputs. However, a drawback of these empirical approaches is that the correlations they rely on are often limited in scope and challenging to generalize to other agricultural regions^5,10.

In recent years, machine learning and deep learning techniques have gained prominence in various domains, including image recognition, language translation, and signal processing¹¹. Support Vector Machine (SVM) and Random Forest (RF) methods have been widely employed for satellite image classification^12,13, parameter inversion¹⁴, and agricultural yield prediction^15,16. In contrast, deep learning techniques, notably CNN and Long Short-Term Memory (LSTM) models¹⁷, have been used for crop production estimation and forecasting. For instance, a deep learning system for crop yield prediction was presented that utilized various environmental and agricultural factors, including weather patterns, soil conditions, and crop growth stages, to improve the accuracy of yield forecasts¹⁸, which was trained using a novel feature representation based on raw image histograms and demonstrated high accuracy and transfer learning capabilities¹⁹. In another study²⁰, six machine learning models (ordinary least squares (OLS), LASSO, SVM, RF, AdaBoost, and DNN) were utilized to forecast winter wheat yields at the county level in the United States during the growing season. AdaBoost emerged as the most effective algorithm, with an R-squared (R²) value of 0.86 and a Root Mean Square Error (RMSE) of 0.51 t ha-1.

Given the inherent nonlinearity of the factors influencing crop output, a robust model is essential for this task. Artificial Neural Networks (ANN) have shown promise in predicting crop production because of their ability to handle both linear and nonlinear data relationships²¹. Consequently, many researchers have turned to ANN for their investigations. Deep learning models, in particular, have gained preference over traditional machine learning models because of their capacity for enhanced feature learning and automatic feature extraction from raw datasets, resulting in higher accuracy in agricultural yield prediction²².

CNN, in particular, excel at extracting climatic parameters because they can process data in various array formats, including one-dimensional data (signals and sequences), two-dimensional data (images), and three-dimensional data (videos)²³. The use of one-dimensional convolution in conjunction with pooling allows CNNs to capture temporal dependencies in meteorological datasets and effectively summarize input features²³. When a filter is applied to the input data, CNNs can summarize all features of the input dataset, making it easier to interpret. Recurrent Neural Networks (RNNs), on the other hand, are best suited for tasks involving sequential data and capturing time dependencies, making them suitable for forecasting crop yields based on historical data¹⁷. Furthermore, this study presents an ensemble model for cocoa yield prediction in southwest Nigeria, inspired by the excellent predictive performance of CNNs and ensemble models in ecological applications¹⁷.

Another often overlooked issue in machine learning algorithms when dealing with multi-year training data is the presence of yield trends. In recent decades, Pakistan’s wheat production has experienced rapid growth, primarily attributed to the adoption of new crop varieties, improved agricultural management practices, and policy reforms, resulting in an annual growth rate exceeding 1%^24,25. Detrending algorithms, such as linear, quadratic, and cubic models, moving averages, and others, are commonly employed to account for yield trends²⁶. To generate yield trends, these detrending algorithms typically use the years as independent variables. The annual fluctuation of weather conditions, such as water stress, is considered to be connected with the detrended yield, referred to as “yield anomalies” ²⁷. In this study, we focused on predicting winter wheat yields in Pakistan’s primary production areas at the district level, leveraging deep learning algorithms and long-term statistical yield data alongside a diverse array of data sources. Table 1 presents an overall summary of the existing work with techniques, factors, and their Limitations The primary goals of this research are as follows: (1) exploring the viability of deep learning-based yield forecasting using historical yield data; (2) showcasing the significance of yield detrending and the influence of climate, soil, and socioeconomic factors on yield estimation; and (3) quantitatively assessing the uncertainty inherent in the yield prediction model using Gal’s methods²⁸. This study aimed to validate and verify a dataset specific to the region while proposing a method for estimating wheat yield. It seeks to select the most effective deep learning forecasting methods for crops and analyze the ideal window of time for winter wheat preparation, examining variations in yield forecasts across local areas and emphasizing the significance of various factors for Pakistan. The motivation and objective of this study was to minimize the local conventional manual process of wheat crop yield estimation in Pakistan. This study would make accurate crop yield prediction before the harvesting season possible. Table 1 summarizes the existing studies, outlining the references, employed techniques, considered factors, and limitations of each study. The information is concise and provides a quick overview of the key aspects of each referenced work. Traditional crop yield prediction models often rely on statistical methods that use environmental variables, weather data, and historical yield records to predict future outcomes. Although these models have provided valuable insights, they are often limited by their inability to account for the complex, nonlinear interactions between the diverse factors affecting crop growth. This limitation creates a significant gap in the accuracy and reliability of predictions, particularly in the face of changing climatic conditions and other uncertainties.

Table 1 Overview of previous research with applied methods, key factors, and identified limitations.

Subjects

Abstract

Similar content being viewed by others

Enhanced wheat yield prediction through integrated climate and satellite data using advanced AI techniques

Winter wheat yield prediction using convolutional neural networks from environmental and phenological data

Impact of agricultural industry transformation based on deep learning model evaluation and metaheuristic algorithms under dual carbon strategy

Introduction

Materials and methods

Study region

Data set and preprocessing

Satellite data

Metrological data

Soil data

Yield data and detrending

Data preprocessing

Model development

Overview of convolutional neural networks

Implementation details

Wheat yield estimation with convolutional neural network (CNN)

Wheat yield estimation with recurrent neural network (RNN)

Wheat yield estimation with artificial neural network (ANN)

Metrics for model evaluation

Results and discussion

Model performance with convolution neural network (CNN)

Model performance with recurrent neural network (RNN)

Model performance with artificial neural network (ANN)

Comparison analysis of deep learning methods

Comparative analysis with existing studies

Performance capability of DL algorithm toward better yield modeling

Conclusion

Data availability

References

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links