AI-enabled Barilai–Borwein–Blinder–Oaxaca–Bernoulli Deep Classifier for Enhanced Crop Yield Prediction

Dhanaraj, Rajesh Kumar; Sivakumar, Nithya Rekha; Khan, Firoz; Al-Khasawneh, Mahmoud Ahmad

doi:10.1038/s41598-025-03935-3

Download PDF

Article
Open access
Published: 02 July 2025

AI-enabled Barilai–Borwein–Blinder–Oaxaca–Bernoulli Deep Classifier for Enhanced Crop Yield Prediction

Rajesh Kumar Dhanaraj¹,
Nithya Rekha Sivakumar²,
Firoz Khan³ &
…
Mahmoud Ahmad Al-Khasawneh^4,5

Scientific Reports volume 15, Article number: 23225 (2025) Cite this article

606 Accesses
Metrics details

Subjects

Abstract

This article explores the integration of advanced Artificial Intelligence (AI) enabled deep learning methods with accurate crop yield prediction. The objective of the work is to enhance the accuracy, sensitivity, and specificity of crop yield prediction. Also, false positive and false negative cases are minimized in crop yield prediction. AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) is proposed including preprocessing, feature selection, and crop yield prediction. First, the raw samples were collected from the crop yield prediction dataset. Barilai–Borwein Gradient Min–max Normalization-based preprocessing is applied to eliminate all missing values. Second, to provide fine-grained feature subsets, the Blinder–Oaxaca Statistical Decomposition-based feature selection method is used. Finally, an AI-enabled Bernoulli Deep Belief Network is designed to predict the crop yield. The empirical results demonstrate that the BBO-BDC technique significantly improves the accuracy up to 12%, specificity up to 15%, and sensitivity up to 3% with feature selection. Furthermore, the BBO-BDC technique realizes a substantial reduction in convergence speed by 29% and 51% reduction in overhead compared to conventional methods with feature selection. The study of innovative AI method integrated with Min–max Normalization, Barilai–Borwein gradient, Blinder–Oaxaca decomposition function, Deep Belief Network, Xavier Initialization function, Bernoulli distribution function, and Principal Components for achieving better performance in crop yield prediction.

Winter wheat yield prediction using convolutional neural networks from environmental and phenological data

Article Open access 25 February 2022

Predicting land suitability for wheat and barley crops using machine learning techniques

Article Open access 07 May 2025

Predicting hybrid rice performance using AIHIB model based on artificial intelligence

Article Open access 11 June 2022

Introduction

One of the world’s ancient as well as mainly predominant industries is agriculture¹. Also with the swift increase in the world’s population, the insistence for food and employment is also in the increasing trend¹. Due to this, novel automated mechanisms are being evolved to meet up with food prerequisites utilized by farmers because the conventional methods employed by them are found to be insufficient to meet their requirements¹. Technologies like, AI, and ML make routes to virtually every industry. Attempts as well as research are in progress to enhance quality, and quantity of agricultural products by creating them associated as well as intelligence via smart farming.

Multi Layer Perceptron (MLP) analyzed in¹ the effectiveness of temperature, pesticides, and so on in the impact of sustainable agriculture as well as effectiveness of economic condition at the farm level in Saudi Arabia. Also with this model future value predictions of crop yield in Saudi Arabia were also discussed. Here AI technique was utilized in evaluating the influence of environmental features and agro-technical criteria on crop yield prediction. Moreover by employing Artificial Neural Networks (ANNs), an extremely efficient MLP model was constructed for precisely forecasting crop yield on the basis of the environmental data and also reducing the training/testing error¹.

Attention-based random forest by Meta-learning called (MetaRF) was designed in². With this design new reagents were predicted in a swift manner. In addition to enhance the learning performance, a dimensionality reduction-based method as sampling was also introduced for determining valuable samples to be learned with minimal error.

For crop yield prediction, HybridCNN-Deep Neural Networks (DNN) (Hybrid CNN-DNN) was proposed in³. Here, in addition, XGBoost was also utilized as an estimator for selecting essential features to exploit speed and effectiveness. Also, CNN was employed in acquiring data dependencies and extracting pertinent information. Finally, DNN here was utilized as feed-forward propagation for making accurate and timely predictions.

In⁴, with the evolution of technology, there has been perceived a considerable shift in several of the industries globally. AI has initiated to significant part in everyday chores, and begun to play main task in daily lives, widening our extending our awareness as well as potentiality to perceptions and capability to improve the environment around us. Numerous applications of AI in agriculture specifically for spraying, yield prediction, and weeding were investigated.

An outline of modern research in the region of AI-enabled agriculture as well as spotting of mainly eminent applications of artificial intelligence was discussed in⁵. A comprehensive review on the utilization of AI-enabled ML for forecasting crop yield with distinctive importance on yield prediction relating to palm oil was investigated in⁶. Researchers are operating toward the employment of novel IoT techniques to assist farmers utilize AI technology in the evolution of protecting the crop, seed evolution, and fertilizers. A holistic survey of AI applications in the agricultural sector, like machine learning, and computer vision was presented in⁷.

In⁸, the Indian economy is chiefly contingent on agriculture. It is the chief starting point of economy as far as the extensive preponderance of Indian Farmers is concerned. Agriculture till now remains mainly a significant economic factors for the country’s financial development. Nevertheless, farmers cannot acquire the cultivation-related crops, predict market prices and improve productivity. Many new agricultural technologies, like AI are being executed to assist farmers expand more efficiently and advantageously. A review of comprehending regarding vegetation indices as well as environmental variables influence agricultural output through divulging apertures using deep learning was investigated.

In⁹, yet another hybrid method to focus on the accuracy aspects for predicting crop yield by taking into consideration the environmental factors and management tactics employing AI-enabled CNN)and Recurrent Neural Networks (RNN) was presented. However, precision was not covered. To address precision aspects, AI and a family of ML algorithms were presented in¹⁰. Also to achieve precise results, a recommendation system was utilized.

In¹¹, cop yields accurate prediction aided by sophisticated and area-related perceptions it required to enhance agricultural breeding across different climatic circumstances to safeguard against varied climatic conditions. LSTM employing RNN was proposed for the purpose of measuring weekly weather parameters in a precise fashion. Nevertheless, these data provide both spatial and temporal classification that to a greater extent mess up the management performance.

In¹², spatio temporal semantic management of data for improved interoperability was presented and analyzed training and validation using neural networks. However, the resource efficiency was not analyzed. With the inception of AI that has reorganized conventional agriculture mechanisms, improved crop productivity and quality were both ensured in¹³ by employing a distance vector hop positioning algorithm. Despite improvements in accuracy as well as precision, the error rate involved in prediction was not focused. A predictive model employing regression for corn and soybean fields was presented in¹⁴. By using this regression model not only reduced error but selecting features both spatially and temporally improved accuracy considerably.

In¹⁵, innovations in the agriculture field assist in increasing farmland yield to slacken the market economy. Prediction of specific crop yield by employing selected pertinent features would evolve and increase food production. AI-enabled ML algorithms employing random forest were employed that with the aid of forward feature selection improved the accuracy score that laid the foundation for farmers to improve stability in terms of both economically and socially.

Yet another method with detailed performance analysis for selecting relevant features using AI-enabled ML techniques was designed¹⁶. With this type of design, accuracy was improved and also reduced error significantly. Climatic conditions pave a major role in predicting crop yield. To analyze climatic conditions eleven combinations of climate and geography was discussed in¹⁷. Despite yield prediction performed in an accurate fashion, however certain deficiencies like mapping between raw data and crop yield heavily depends on features being extracted. In¹⁸, Deep Recurrent Q-Network method was presented for forecasting the crop yield under varied climatic conditions. Ensemble of deep learning methods was proposed in¹⁹ and feature-selected enabled methods using AI-enabled ML were investigated in²⁰. Advanced ensemble machine learning methods were presented in²¹ for enhancing the predictive accuracy. However, the specificity was not considered.

Objectives of this paper

The objectives of the AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) for paddy and rice crop yield prediction are discussed below.

To provide accurate and precise detection of rice and paddy crop yield, a novel AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) for crop yield prediction is proposed.
To improve the convergence speed, Barilai–Borwein Gradient Min–max Normalization-based preprocessing is applied.
To select relevant and pertinent features with lesser reduction of the false positive and false negative, Blinder–Oaxaca Statistical Decomposition-based feature selection is employed.
To classify crop yield with higher accuracy, an AI-enabled Bernoulli Deep Belief Network has been developed.

The novelty of this paper

The novelty of the proposed BBO-BDC for crop yield prediction is given below,

The proposed BBO-BDC is designed through preprocessing, feature selection, and crop yield forecast process to obtain precise crop yield prediction.
Barilai–Borwein Gradient Min–max Normalization is developed to perform preprocessing. Four distinct input matrices data are normalized by the novelty of the min–max normalization function. The innovation of the Barilai–Borwein gradient function is used to enhance the convergence speed. In this way, the missing value is eradicated.
Feature selection is carried out with novelty Blinder–Oaxaca Statistical Decomposition. It is employed to choose the most significant feature with fewer false positives and false negatives.
AI-enabled Bernoulli Deep Belief Network Classifier is utilized to execute classification with several layers. Innovation of Xavier Initialization function is used to perform weight initialization. Bernoulli distribution function is employed to allocate neurons between the hidden and visible layers. Principal Components are utilized to find total hidden layer nodes. With this, rice and paddy crop yields are categorized with maximum accuracy.

Structure of manuscript

The manuscript is organized as below. Different crop yield prediction methods employing AI-enabled ML and DL methods are discussed in the section “Related works”. The analogous system models of BBO-BDC are provided in the section "Materials and methods", following which the various phases of BBO-BDC are defined comprehensively using figures and pseudo-code representations. The comparison of BBO-BDC and other similar prediction methods together with the experimental setup is provided in the section "Experimentation, results and analysis", following which quantitative analysis of BBO-BDC is also done using graphical representations with analysis in Section "Discussion". At last, the manuscript is summarized in Section "Conclusion".

Related works

AI methods for crop yield prediction

AI being an ingenious tool provokes human intelligence and potentiality procedures by machines, specifically digital equipment²². Numerous applications of AI comprise analog to digital conversion, recognizing speed, and expert systems to mimic the perception to name a few. Hence, the viability of the agriculture field is prime to ensure food security as well as for ever-increasing population. The significance of AI and ML to focus on the agriculture sector was investigated. Despite the selection of the correct crop the chief boosting mechanism to increase crop yield is by performing an in-depth analysis of soil by taking into consideration several metrological constituents into analysis. However, the insufficiency of expertise in soil fertility remains the major reason for moderate production in crops.

In²³, by taking several factors like slope, temperature, rainfall, soil moisture, and humidity into consideration, a method utilizing the list of crops that was predominantly useful for farmers in making efficient decisions was presented. In²⁴, a comparison among both spatial as well as temporal methods employing ANN for crop yield forecast was designed. DMA techniques were designed in²⁵ for analyzing and validating both present and future patterns into considerations for predicting crop yield. An AI-enabled ML system for crop monitoring employing a random forest algorithm focusing on optimization aspects was inspected in²⁶. A review of AI for crop yield forecasting employing ML was investigated in²⁷.

Learning methods for crop yield prediction

Deep Learning (DL), and Machine Learning (ML) are correlated with each other. In²⁸, a significant and precise mechanism utilizing an ML algorithm for crop selection towards maximal yield was presented. At present utilizing AI to enhance appropriateness among land as well as crop types to improve crop yield has harmony between researchers. But several issues are said to exist like constrained crop phenotypic information and deprived execution of AI techniques. In²⁹, with maize considered as an example, both environmental climate and crop phenotypic features were taken into consideration using graph NN to validate crop suitability assessment. With this type, significant improvements were observed in terms of precision.

Deep learning-improved remote sensing approach was designed in³⁰ to predict the rice yield. The designed approach significantly handles the difficulties associated with the processing of large target datasets. Yet another statistical and machine learning technique focusing on climate and rice yield was presented in³¹. A comprehensive literature review using ML for crop yield prediction by extracting significant features was investigated in³².

In³³, the increasing necessitates for food internationally owing to unrivaled population enlargement has resulted in food insecurity in certain populated areas like Africa. One more major factor is a change in climatic conditions and its changeability. A prediction method based on ML for predicting six crops, like, rice, seed cotton, and so on throughout the year was designed. Here several factors like, weather information, yields, and chemical and climatic information were merged to assist both the farmers and decision-makers predict crop yields annually. For this purpose, three different mechanisms were employed.

A method employing gradient boosting to focus on rice production in Bangladesh annually was proposed in³⁴. In spite of the illustrated efficiency of ML techniques as an appropriate replacement for tradition statistical methods, their application for price forecasting remained an area of conflict. Different statistical and ML techniques were proposed in³⁵ with the objective of obtaining accurate forecast in pricing policies. A hybrid CNN and RNN method were designed in³⁶ to focus on wheat yield prediction with minimum error. Variational Autoencoders (VAEs) and Generative Adversarial Networks (GANs) were developed in³⁷ for producing the data. However, the large dataset was not considered. A review of machine learning and deep learning techniques was discussed in³⁸ for evaluating the crop yield. A comparison of existing methods and drawbacks is listed in Table 1.

Table 1 Comparison of existing methods.

Full size table

Motivation

The crop yield prediction data can be utilized for predicting yields of different crops at different time periods in an accurate fashion. In such circumstances, for the prediction of rice and paddy crop yield, we can utilize AI-enabled deep neural network methods. This procedure is in particular essential to predict the yield according to several factors like, rainfall, temperature, pesticide and so on. In addition, although AI supports different application types as discussed in the previous sections, it also suffers from overhead and convergence speed-related issues. This in turn would compromise the overall sensitivity and specificity rate of the prediction being made. The present methods were found to be deficient in overhead and convergence speed functionality features and hence are also found to be susceptible to storage issues. In addition, they do not possess fine-tuned feature selection aspects. Hence, it is crucial to impart feature analysis in an AI-enabled environment, which could overcome the existing methods. This motivates us to design a new AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) for crop yield forecast.

Materials and methods

Materials

One of the considerable regions taking part in notable part in the economy globally is agriculture. The science of training machines for learning as well as generating prototypes for early forecasts has been considerably utilized over the past few decades. In addition through mushrooming growth of human population crop yield remains the key to addressing smart agriculture. As a result, crop yield forecast over precedent few decades has been studied as a paramount agricultural issue³⁹. Also, the agricultural yield is said to be regulated and controlled by numerous factors, like, pesticides, rain, temperature, and so on. The entire Crop Yield Prediction Dataset employed in our work obtained from https://www.kaggle.com/datasets/patelris/crop-yield-prediction-dataset is divided into four distinct files, naming, pesticide, temperature, yield, finally combined into a single file, naming, yield data file (yield_df.csv). The pesticide file comprises of features including, domain, area, element, item, year, unit, and value, and on the other hand the rainfall file consists of area, year and average rainfall. The details of the Crop Yield Prediction Dataset are given below in Table 2.

Table 2 Crop Yield Prediction Dataset description.

Full size table

As given in the above table, seven features are present in the pesticide csv file, three features are present both in the temperature and rainfall csv files, and finally 12 features are in the yield csv files.

Proposed methodology: AI-enabled Barilai–Blinder–Oaxaca–Bernoulli deep classifier (BBO-BDC)

Crop yield forecast investigation warrants a diversity of making criteria as well as techniques. Techniques for identifying most prognostic features for crop yield are employed by through few farmers⁴⁰, as other farmers are employed to discover predictions⁴¹. This section comprises an AI-enabled deep-learning method for precise and premature crop yield production. The proposed BBO-BDC is designed with Barilai–Borwein Gradient Min–max Normalization, Blinder–Oaxaca Statistical Decomposition, and AI-enabled Bernoulli Deep Belief Network. Barilai–Borwein Gradient Min–max Normalization-based preprocessing algorithm is employed for eradicating the missing values. Blinder–Oaxaca Statistical Decomposition-based feature selection algorithm is designed to select the pertinent features. AI-enabled Bernoulli Deep Belief Network classification algorithm is used to provide exact crop yield prediction. Figure 1 shows the structure of AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) method.

As shown in the above figure, the input data set consists of every detail related to the crop acquired from the crop yield prediction dataset. Then preprocessing is performed to eliminate all those missing values in the input Crop Yield Prediction Dataset using the Barilai–Borwein Gradient Min–max Normalization-based preprocessing algorithm. Then pertinent features are selected using the Blinder–Oaxaca Statistical Decomposition-based feature selection algorithm. Feature selection assists in accomplishing precise and accurate results by obtaining differences in the means of a dependent variable (i.e., dependent feature) and independent variable (i.e., independent feature) between groups (i.e., from different input vector matrices acquired from pesticides, rainfall, and temperature). Finally, the AI-enabled Bernoulli Deep Belief Network classification algorithm is employed for accurate and precise crop yield prediction.

System model

In this section, the system model involved in the design of AI-enabled DL-based crop yield forecast is introduced. The crop yield prediction with the AI system model here represents a process-oriented flow between four input matrices, i.e., pesticide, rainfall, temperature, and yield. In our work, these three CSV files are stored in the form of matrices separately as given below.

$$P=\left[\begin{array}{cccc}{P}_{11}& {P}_{12}& \dots & {P}_{1p}\\ {P}_{21}& {P}_{22}& \dots & {P}_{2p}\\ \dots & \dots & \dots & \dots \\ {P}_{i1}& {P}_{i2}& \dots & {P}_{ip}\end{array}\right], where\, i=7$$

(1)

$$R=\left[\begin{array}{cccc}{R}_{11}& {R}_{12}& \dots & {R}_{1r}\\ {R}_{21}& {R}_{22}& \dots & {R}_{2r}\\ \dots & \dots & \dots & \dots \\ {R}_{j1}& {R}_{j2}& \dots & {R}_{jr}\end{array}\right], where\, j=3$$

(2)

$$T=\left[\begin{array}{cccc}{T}_{11}& {T}_{12}& \dots & {T}_{1t}\\ {T}_{21}& {T}_{22}& \dots & {T}_{2t}\\ \dots & \dots & \dots & \dots \\ {T}_{k1}& {T}_{k2}& \dots & {T}_{kt}\end{array}\right], where\, k=3$$

(3)

$$Y=\left[\begin{array}{cccc}{Y}_{11}& {Y}_{12}& \dots & {Y}_{1y}\\ {Y}_{21}& {Y}_{22}& \dots & {Y}_{2y}\\ \dots & \dots & \dots & \dots \\ {Y}_{l1}& {Y}_{l2}& \dots & {Y}_{ly}\end{array}\right], where\, l=12$$

(4)

The above four input matrices attained separately for pesticide ‘$P$’, rainfall ‘$R$’, temperature ‘$T$’ as well as yield ‘$Y$’ forms as input. These four input matrices are used in proposed crop yield prediction.

Barilai–Borwein gradient min–max normalization-based preprocessing

The given Crop Yield Prediction Dataset input dataset with four distinct input matrices contains certain missing values and it is mandatory to eliminate those missing values in the preliminary stage itself. Preprocessing is utilized in eliminating each those missing values⁴² there in input Crop Yield Prediction Dataset input dataset. Normalization as well as the multi-variable function of attribute methods called, Barilai–Borwein Gradient Min–max Normalization-based preprocessing are utilized in the proposed technique.

As illustrated in Fig. 2 through crop yield prediction dataset given as input, for preprocessed data in a convergence speed-effective manner. Owing to the reason that the scope of raw data varies widely with four distinct input matrices employed in our work, crop yield prediction will not work properly without normalization⁴³. As a result, the range of all features is normalized with the objective that each feature present in the four distinct input matrices contributes approximately proportionate to the final distance. The Min–max Normalization consists of rescaling the feature range for a min–max of ‘$\left[\text{0,1}\right]$’. The mathematical formulation is written as given below.

$${S}{\prime}=\frac{S-Min\left(S\right)}{Max\left(S\right)-Min\left(S\right)}$$

(5)

Based on the above Eq. (5) results, four distinct input matrices data is normalized in such manner which every features possess similar weight. Also, ‘$Min\left(S\right)$’ and ‘$Max \left(S\right)$’ denotes the minimum and maximum value respectively. Also owing to the utilization of four distinct input matrices multi-variable function via Barilai–Borwein gradient function is applied with the purpose of evolving good convergence speed. This Barilai–Borwein gradient function is mathematically stated as given below.

$$\gamma =\frac{{\left({S}_{N}{\prime}-{S}_{N-1}{\prime}\right)}^{T}-\left[\nabla fun \left({S}_{N}{\prime}\right)-\nabla fun \left({S}_{N-1}{\prime}\right)\right]}{{\left|\nabla fun \left({S}_{N}{\prime}\right)-\nabla fun \left({S}_{N-1}{\prime}\right)\right|}^{2}}$$

(6)

From the above Eq. (6) results, using the Barilai–Borwein gradient function ‘$\gamma$’ convergence to a local minimum is assured. When the function ‘$fun$’ with respect to min–max normalization sample result is found to be convex all local minima^44,45 are said to be global minima and hence in this case Barilai–Borwein gradient function converge to the global solution. With this all the missing values are eliminated from further processing. The pseudo-code representation of Barilai–Borwein Gradient Min–max Normalization-based preprocessing is given below.

As given in the above algorithm with four distinct input vector matrices forming the samples from the given crop yield prediction dataset, each input vector matrices possess different numbers of features. To this input vector matrices, initially min–max normalization function is applied to return the normalized resultant values. Next to the normalized resultant values, to improve the convergence speed, the Barilai–Borwein gradient function is applied. With this not only improves the convergence speed but also improves the true positive and true negative rates in a significant manner.

BlinderOaxaca Statistical Decomposition-based feature selection

The feature selection techniques assist in catering to those features that are pertinent⁴⁶ in crop yield forecasting algorithms. The definite and well-defined feature subsets chosen are utilized for crop yield prediction. As a substitute for a thorough feature set⁴⁷, feature subsets provide fine-grained results with less computational time. Moreover, accuracy is also said to be improved by selecting a fine-grained subset, therefore minimizing overfitting. Feature selection algorithm^48,49 has relevance to observing the essential features that are impenetrable with crop yield prediction. In this section, Blinder–Oaxaca Statistical Decomposition-based feature selection (Fig. 3) is designed to obtain fine-grained results with less computational time.

In Fig. 3 with the preprocessed data acquired as input, to measure how jointly two features are associated with one another, the Blinder–Oaxaca decomposition function is employed. The Blinder–Oaxaca decomposition function is the most frequently employed of the distinct correlation coefficient in identifying dependent features or variables. We determine the Blinder–Oaxaca decomposition function between the most significant feature and crop yields to explore the influence of climatic changes on crop yields. The Blinder–Oaxaca decomposition function a statistical function describes the difference in the means of a dependent variable (i.e., dependent feature) between two groups (i.e., two input vector matrices).

The function decompose the slot into that portion in such a manner as to obtain mean values differences of independent variable or feature within the input matrices on one hand and group differences in the consequences of the independent variable (i.e., independent feature) on the other hand. Using the formula in Eq. (7), the association between two input vector matrices, average temperature ‘$AT$’ and crop yields ‘$CY$’ is given below.

$$In \left({FS}_{{AT}_{i}}\right)={X}_{{AT}_{i}}{\beta }_{AT}+{\mu }_{{AT}_{i}}$$

(7)

$$In \left({FS}_{{CY}_{i}}\right)={X}_{{CY}_{i}}{\beta }_{CY}+{\mu }_{{CY}_{i}}$$

(8)

From the above Eqs. (7) and (8), ‘${X}_{{AT}_{i}}$’ represents the vector explanatory variables such as year and country, ‘${X}_{{CY}_{i}}$’ denoting the vector explanatory variables such as domain code, domain, area code, area, element code, element, item code, item, year code, year, unit, and value. In a similar manner ‘${\beta }_{AT}$’ and ‘${\beta }_{CY}$’ represents the vector of coefficients with ‘${\mu }_{{AT}_{i}}$’ and ‘${\mu }_{{CY}_{i}}$’ denoting significant error terms respectively. Let ‘${b}_{AT}$’ and ‘${b}_{CY}$’ denote the regression estimates of ‘${\beta }_{AT}$’ and ‘${\beta }_{CY}$’, then, the average value of residuals (i.e., features selected with respect to average temperature and crop yield) is mathematically formulated as given below.

$${FS}_{ACY}=mean\left(In\left({FS}_{{AT}_{i}}\right)\right)-mean\left(In\left({FS}_{{CY}_{i}}\right)\right)={b}_{AT}\left(mean\left({FS}_{AT}\right)-mean\left({FS}_{CY}\right)\right)+mean\left({FS}_{CY}\right)\left({b}_{AT}-{b}_{CY}\right)$$

(9)

Next, we evaluate we calculate the association between two input vector matrices, pesticides in terms of tonnes ‘$PT$’ and crop yields ‘$CY$’ as given below.

$$In \left({FS}_{{PT}_{i}}\right)={X}_{{PT}_{i}}{\beta }_{PT}+{\mu }_{{PT}_{i}}$$

(10)

$$In \left({FS}_{{CY}_{i}}\right)={X}_{{CY}_{i}}{\beta }_{CY}+{\mu }_{{CY}_{i}}$$

(11)

From the above Eqs. (10) and (11), ‘${X}_{{PT}_{i}}$’ represents the vector explanatory variables such as domain, area, element, item, year, unit, and value, ‘${X}_{{CY}_{i}}$’ denoting vector explanatory variables for crop yield prediction, ‘${\beta }_{AT}$’ and ‘${\beta }_{CY}$’ represents the vector of coefficients with ‘${\mu }_{{PT}_{i}}$’ and ‘${\mu }_{{CY}_{i}}$’ denoting error terms respectively. Let ‘${b}_{PT}$’ and ‘${b}_{CY}$’ denote the regression estimates of ‘${\beta }_{PT}$’ and ‘${\beta }_{CY}$’, then, the average value of residuals (i.e., features selected with respect to pesticides and crop yield) is mathematically formulated as given below.

$${FS}_{PCY}=mean\left(In\left({FS}_{{PT}_{i}}\right)\right)-mean\left(In\left({FS}_{{CY}_{i}}\right)\right)={b}_{PT}\left(mean\left({FS}_{PT}\right)-mean\left({FS}_{CY}\right)\right)+mean\left({FS}_{CY}\right)\left({b}_{PT}-{b}_{CY}\right)$$

(12)

Finally, the association between two input vector matrices, average rainfall ‘$AR$’ and crop yield ‘$CY$’ is mathematically represented as given below.

$$In \left({FS}_{{AR}_{i}}\right)={X}_{{AR}_{i}}AR+{\mu }_{{AR}_{i}}$$

(13)

$$In \left({FS}_{{CY}_{i}}\right)={X}_{{CY}_{i}}{\beta }_{CY}+{\mu }_{{CY}_{i}}$$

(14)

From the above Eqs. (13) and (14), ‘${X}_{{AR}_{i}}$’ represents the vector explanatory variables such as area, year and average rainfall, ‘${X}_{{CY}_{i}}$’ denoting the vector explanatory variables for crop yield prediction, ‘${\beta }_{AR}$’ and ‘${\beta }_{CY}$’ represents the vector of coefficients with ‘${\mu }_{{AR}_{i}}$’ and ‘${\mu }_{{CY}_{i}}$’ denoting error term respectively. Let ‘${b}_{AR}$’ and ‘${b}_{CY}$’ denote the regression estimates of ‘${\beta }_{AR}$’ and ‘${\beta }_{CY}$’, then, the average value of residuals (i.e., features selected with respect to rainfall and crop yield) is mathematically formulated as given below.

$${FS}_{ARCY}=mean\left(In\left({FS}_{{AR}_{i}}\right)\right)-mean\left(In\left({FS}_{{CY}_{i}}\right)\right)={b}_{AR}\left(mean\left({FS}_{AR}\right)-mean\left({FS}_{CY}\right)\right)+mean\left({FS}_{CY}\right)\left({b}_{AR}-{b}_{CY}\right)$$

(15)

$$FS={FS}_{ACY} \cup { FS}_{PCY} \cup {FS}_{ARCY}$$

(16)

Finally, the equation given above (16) forms the resultant features selected based on the crop yield differential between three different groups, i.e., pesticides in tonnes, average rainfall, and average temperature respectively. The pseudo-code representation of Blinder–Oaxaca Statistical Decomposition-based feature selection is given below.

In algorithm 2, through the objective of reducing the false positive and false negative relevant and pertinent features should be retained whereas the irrelevant features should be discarded from further processing. With this objective, the Blinder–Oaxaca Statistical Decomposition function is applied to the preprocessed data. The Blinder–Oaxaca Statistical Decomposition function employed identifies and quantifies separate contributions of group variances (i.e., pesticide, rainfall and temperature with respect to crop yield) in quantifiable features such as domain, area, element, item, year, unit value, etc. This in turn assists in minimizing the false positive and false negative considerably. The selected features of the Blinder–Oaxaca Statistical Decomposition-based feature selection algorithm are listed in Table3.

Table 3 Relevant features selected.

Full size table

With the above relevant features selected (i.e., area, item, year, unit, value, average_rainfall, average_temperature) crop yield prediction is discussed in the next section.

AI-enabled Bernoulli Deep Belief Network classifier for crop yield prediction

Farmers employing AI-powered systems generate accurate and precise methods which lead the way and comprehend optimal management of water and nutrients, harvesting crops in an optimal manner, and so on. AI has the prospective to handle or navigate an agricultural insurgence at a time when the world requires revolution at a time when the world requires to induce additional food by minimal resources. However, accurate and precise crop yield is the reason for concern. In this work, an AI-enabled Bernoulli Deep Belief Network Classifier for crop yield prediction (i.e., rice and paddy for years between 1960 and 1980) is designed. Figure 4 shows the structure of the AI-enabled Bernoulli Deep Belief Network Classifier.

As illustrated in Fig. 4, preprocessed features selected samples form as input to input layer. Following these three hidden layers are employed in our work with which the actual output of crop yield for rice and paddy for the year between 1960 and 1980 are obtained. The AI-enabled Bernoulli Deep Belief Network represents an arrangement of unsupervised networks, where each hidden layer serves as a visible layer for the next layer. The AI-enabled Bernoulli Deep Belief Network for crop yield prediction is modeled with a visible input layer and a hidden layer and connections between them but not within layers. The AI-enabled Bernoulli Deep Belief Network ‘$H$’ and ‘$V$’ represent the hidden and visible units respectively and is mathematically formulated as given below.

$$\theta =\left\{W, VE,HE\right\}, where VE=\left\{{p}_{i}\in {R}^{m}\right\}\& HE=\left\{{q}_{j}\in {R}^{n}\right\}$$

(17)

From the above Eq. (17), ‘$W$’, ‘$VE$’ and ‘$HE$’ represents the weight initialized using Xavier Initialization function, visible element and hidden element respectively. In addition, ‘$i-th$’ visible unit threshold (i.e., for rice and paddy) is governed by ‘${p}_{i}$’ and ‘$j-th$’ hidden unit threshold (i.e., year between 1960 and 1980) is governed by ‘${q}_{j}$’. With this, a total of ‘$5000$’ neurons are assigned in each layer with a learning rate of ‘$0.03\%$’.

The objective of using Xavier Initialization function remains in initializing the weights^50,51 in such a manner so as to keep the deviation of the activations the same across every layer. This constant deviation assists in averting the gradient from vanishing. Weight initialization is performed using the Xavier Initialization function as given below.

$${W}_{ij}=UD\left(S\left[FS\right]\right)\left[-\frac{\sqrt{6}}{\sqrt{{Size}_{PL}+{Size}_{CL}}},\frac{\sqrt{6}}{\sqrt{\sqrt{{Size}_{PL}+{Size}_{CL}}}}\right]$$

(18)

From the above Eq. (18), with a uniform distribution of sample features selected as input ‘$UD\left(S\left[FS\right]\right)$’, the weight at every layer is initialized employing size of the preceding layer ‘${Size}_{PL}$’ and the size of current layer ‘${Size}_{CL}$’ respectively. Following this based on the Bernoulli Distribution function, the distribution of neurons between hidden as well as visible layer are done and energy equation is mathematically formulated as given below.

$$Energy\left(V,H\left[\theta \right]\right)=-\sum_{i=1}^{m}{p}_{i}{V}_{i}-\sum_{j=1}^{n}{q}_{i}{H}_{j}-\sum_{i=1}^{m}\sum_{j=1}^{n}{V}_{i}{W}_{ij}{H}_{j}$$

(19)

The energy function provided as given in the above Eq. (19) denotes the value of energy for the visible node ‘${p}_{i}{V}_{i}$’, hidden node ‘${q}_{i}{H}_{j}$’ and weights associating hidden and visible nodes ‘${V}_{i}{W}_{ij}{H}_{j}$’ respectively. Following this, the total hidden layer employing Principal Components (i.e., features selected) is mathematically stated as given below.

$$\sum Var\left({PC}_{i}\right)=\sum Complexity\left({H}_{i}\right)$$

(20)

From the above Eq. (20), ‘${PC}_{i}$’ denotes the principal components (i.e., features selected as input in our work correspond to principal components ‘$7$’) and ‘${H}_{i}$’ denotes the hidden layer respectively. With these only three principal components, a total number of three hidden layers were employed in our work. Then, the probability of neurons occurring in either visible or hidden layers is mathematically formulated as given below.

$$Prob\left({H}_{j}=1|V\right)=\sigma \left({q}_{j}+\sum_{i}{V}_{i}{W}_{ij}\right)$$

(21)

$$Prob\left({V}_{i}=1|H\right)=\sigma \left({p}_{i}+\sum_{i}{H}_{j}{W}_{ij}\right)$$

(22)

With the above Eqs. (21) and (22) results, the rice and paddy crop yield forecast for the year 1960 and 1980 is retrieved. The pseudo-code representation of the AI-enabled Bernoulli Deep Belief Network Classifier is given below.

As given in the above algorithm with the objective of generating accurate and precise results as output, an AI-enabled Bernoulli Deep Belief Network is designed. First, preprocessed features selected samples are provided as input to the visible layer. Following this, the Xavier Initialization function is used for the initializing weight that activation variable is found to be uniform, therefore ensuring optimal distribution. Next, the Bernoulli distribution function is applied with the purpose of distributing neurons equivalently. Finally, total hidden layer nodes are determined using Principal Components (i.e., features selected) therefore providing accurate and precise means of crop yield prediction.

Experimentation, results and analysis

Experimental evaluation

The results of simulations employed to validate the method, called, AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) for crop yield prediction for crop yield forecast described. The experiment uses the Crop Yield forecast dataset. The dataset consists of test files, training files, and sample submission files each with a different number of samples. The proposed BBO-BDC method has been implemented in Python language. The simulation results from the proposed BBO-BDC method and existing methods, Multi-Layer Perceptron (MLP) [1], attention-based random forest with meta-learning (MetaRF) [2], and Hybrid CNN-DNN) [3] are detailed below in terms of different performance parameters, sensitivity, specificity, accuracy, convergence speed, and overhead. To ensure a fair comparison between proposed BBO-BDC and existing methods [1], [2] and [3], similar samples from the Crop Yield Prediction dataset are applied for an average of 10 different simulation runs. Afghanistan country is considered for crop yield investigation. Also, the rice and paddy crops are forecasted in this work.

Experimental parameters

To measure sensitivity, specificity and accuracy, four different performance factors are involved namely, true positive ‘$TP$’, true negative ‘$TN$’, false positive ‘$FP$’ and false negative ‘$FN$’. TP refers to a particular crop that yields a certain percentage and the test gives similar results. True negative refers to the particular crop that does not yield a certain percentage and the test results give the negative results. False positive refers to the crop does not produce a certain yield and the test results gives positive results. False negative refers to the crop yielding a certain result and the test is negative. Sensitivity⁵² measures the potentiality of a test (i.e., crop yield prediction) to correctly identify crops with specific yield.

$$Sen=\frac{TP}{TP+FN}$$

(23)

Specificity measures the potentiality of a test (i.e., crop yield prediction) to correctly identify crops without a significant yield.

$$Spe=\frac{TP}{TP+FP}$$

(24)

Accuracy⁵³ is utilized as a statistical measure of how efficiently a classification test correctly identifies crop yield or excludes a condition. Accuracy is ratio of correct forecasts (i.e., involving both true positives and true negatives) among the total numbers of sample cases considered for simulation. Accuracy is mathematically formulated as given below.

$$Acc=\frac{TP+TN}{TP+TN+FP+FN}$$

(25)

Convergence speed refers to time utilized in carry out entire process, i.e., the crop yield prediction. The convergence speed is mathematically stated as given below.

$$CS=\sum_{i=1}^{n}{S}_{i}*Time \left(Prediction\right)$$

(26)

From the above Eq. (26), convergence speed ‘$CS$’ is evaluated employing the sample instances ‘${S}_{i}$’ and the actual time consumed in prediction ‘$Time \left(Prediction\right)$’ (i.e., involving preprocessing, feature selection and classification). It is measured in milliseconds (ms). Finally, overhead measures the memory consumed in the prediction process and is mathematically given as below.

$$Overhead=\sum_{i=1}^{n}{S}_{i}*Mem \left(Prediction\right)$$

(27)

From the above Eq. (27), overhead ‘$Overhead$’, is measured based on the samples ‘${S}_{i}$’ involved in simulation as well as memory utilized in the actual prediction ‘$Mem \left(Prediction\right)$’ process. It is measured in kilobytes (KB).

Results

sensitivity, specificity, accuracy with and without feature selection

In this section performance analysis of sensitivity, specificity, and accuracy with and without feature selection is discussed in detail. Table 4 given below shows the comparison between the proposed AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) and existing methods, Multi-Layer Perceptron (MLP) [1], attention-based random forest with meta-learning (MetaRF) [2] and Hybrid CNN-DNN [3]. Outcomes show BBO-BDC method shows improved results than the existing methods [1], [2], and [3] of accuracy, sensitivity and specificity through relevant features selected.

Table 4 Comparison of accuracy, sensitivity, specificity of different methods for crop yield prediction both with and without feature selection.

Full size table

Table 4 illustrates the results of accuracy, sensitivity, and specificity for different methods. To guarantee fair comparison among proposed BBO-BDC and existing methods, MLP [1], MetaRF [2], and Hybrid CNN-DNN [3], similar samples from the Crop Yield Prediction dataset are employed for an average of 10 different simulation runs. The same metrics are used for analyzing the performance of proposed BBO-BDC and existing methods, MLP [1], MetaRF [2], and Hybrid CNN-DNN [3].

Figure 5 depicts graphical representations of accuracy, sensitivity and specificity using the proposed BBO-BDC and existing methods, Multi-Layer Perceptron (MLP) [1], attention-based random forest with meta-learning (MetaRF) [2], and Hybrid CNN-DNN [3] respectively. To ensure fair comparison, 10,000 sample images were employed for three methods, the TP rate using proposed technique (i.e., with feature selection) was found to be 1865, whereas using the existing [1], [2] and [3] was observed to be 1800, 1750 and 1725 respectively. In a similar manner, the false negative rate using the proposed method was identified to be 180 whereas 200, 222 and 250 using [1], [2] and [3] respectively with feature selection.

As a result, the overall sensitivity using the three methods was found to be 91.19%, 90%, 88.74% and 87.34% respectively with feature selection. On the other hand in case of sensitivity without feature selection, the true positive and false negative rate using the proposed BBO-BDC method was 1845 and 195, 1750 and 215 using [1], 1720 and 237 using [2], 1700 and 265 using [3] therefore entire sensitivity was 90.44%, 89.50%, 87.88% and 86.51% respectively.

In a similar manner, the specificity rate by BBO-BDC method when applied with feature selection was found to be 87.50%, 81.25% [1], 75% [2] and 72.04% respectively whereas 85%, 77.5% [1], 72.5% [2] and 71.83% [3] respectively without applying feature selection.

In case of accuracy, when applied with the feature selection model, it was found to be 88.25% using the proposed method whereas 83% [1], 77.7% [2] and 75.14% [3] respectively. When applied without the feature selection model, accuracy was found to be 85.82% using the proposed method whereas 79.77% [1], 75.52% [2] and 74.22% [3] respectively.

The sensitivity of BBO-BDC method is improved by 2%, 3% and 4% (with feature selection) than the [1], [2] and [3] with feature selection. The sensitivity of BBO-BDC method is improved by 2%, 3% and 5% (without feature selection) than the [1], [2] and [3]. The specificity of BBO-BDC method is enhanced by 8%, 17% and 21% (with feature selection) than the [1], [2] and [3] with feature selection. The specificity of BBO-BDC method is improved by 10%, 17% and 18% (without feature selection) than the [1], [2] and [3]. The accuracy of BBO-BDC method is enhanced by 6%, 14% and 17% (with feature selection) than the [1], [2] and [3] with feature selection. The accuracy of BBO-BDC method is improved by 8%, 14% and 16% (without feature selection) than the [1], [2] and [3].

From the inferences accuracy, sensitivity as well as specificity were found to be comparatively better using BBO-BDC method than the [1], [2] and [3]. Enhancement was owing to the application of AI-enabled Bernoulli Deep Belief Network Classifier from crop yield prediction. Here, the preprocessed features selected samples remained as input to the visible layer. Next, weight initialization was performed using arbitrary Xavier Initialization function instead of a threshold therefore ensured optimal distributions between input and hidden layers. Also neurons were distributed equivalently by means of Bernoulli distribution function that in turn aided in minimizing the false positive and false negative samples. This in turn improved the overall sensitivity, specificity and accuracy in a significant manner.

Performance analysis on convergence speed and overhead with and without feature selection

In this section, the convergence speed and overhead resultant values using the proposed BBO-BDC and existing three methods Multi Layer Perceptron (MLP) [1], attention-based random forest with meta-learning (MetaRF) [2] and Hybrid CNN-DNN [3] when applied with and without feature selection is presented. Table 5 tabulates the validation results of the proposed BBO-BDC method with all other compared methods [1], [2] and [3] for all ten distinct samples respectively.

Table 5 Comparison of convergence speed and overhead using different methods for crop yield prediction both with and without feature selection.

Full size table

Figure 6 given above illustrates the investigation of convergence speed and overhead values acquired by the proposed BBO-BDC method and all the looked at methodologies Multi Layer Perceptron (MLP) [1], attention-based random forest with meta-learning (MetaRF) [2] and Hybrid CNN-DNN [3] on differing samples performed for a simulation of 10 distinct runs both with and without feature selection. From the above figure it is inferred that the BBO-BDC method imparted permissible results in terms of both convergence speed and overhead using feature selection than without feature selection. However, all the other compared ones [1], [2] and [3] in perspective on their reduced true positive and true negative possibly not work at high samples. The improvement in terms of convergence speed and overhead was owing to the application of Blinder–Oaxaca Statistical Decomposition. By using this algorithm first, convergence speed was focused employing Barilai–Borwein Gradient Min–max Normalization-based preprocessing model. Here, min–max normalization along with the Barilai–Borwein gradient function were applied to address on missing data even in case of multi-variable function involving different input vector matrices. It minimized the convergence speed with BBO-BDC method by 27% than the [1], [2] and 13% compared to [3]. In addition, by using Blinder–Oaxaca Statistical Decomposition function, fine-grained subset was selected. This in turn reduced the overhead incurred in crop yield prediction considerably using BBO-BDC method by 42% 21% 12% than the [1], [2], [3] respectively.

Discussion

In this study, the proposed BBO-BDC method, existing MLP [1], MetaRF [2], and Hybrid CNN-DNN [3] are discussed in the Crop Yield forecast dataset with several parameters namely sensitivity, specificity, accuracy, convergence speed, and overhead. In Table 5, the BBO-BDC method was evaluated against existing MLP [1], MetaRF [2], and Hybrid CNN-DNN [3]. Convergence speed is increased and overhead is reduced by using Barilai–Borwein Gradient Min–max Normalization and Blinder–Oaxaca Statistical Decomposition function. Also, the missing value is eliminated and relevant features are chosen. From the results, the BBO-BDC method of sensitivity, specificity, and accuracy is enhanced than the existing MLP [1], MetaRF [2] and Hybrid CNN-DNN [3]. The reason for higher sensitivity, specificity, and accuracy is to apply the AI-enabled Bernoulli Deep Belief Network Classifier in above Table 4. In addition, the Xavier Initialization function is investigated with weight initialization. The optimal distributions are determined to achieve crop yield prediction. With limitation of this method did not discuss about failure in analyzing diverse geographies involved in predicting crop yield.

Conclusion

In this paper, an efficient method called AI-enabled Barilai–Blinder–Oaxaca–Bernoulli Deep Classifier (BBO-BDC) for crop yield prediction has been proposed, which employs Barilai–Borwein Gradient Min–max Normalization for performing preprocessing and Blinder–Oaxaca Statistical Decomposition for feature selection and AI-enabled Bernoulli Deep Belief Network for actual crop yield prediction. In this regard, three distinct processes were performed wherein the samples were first subjected to normalization using Barilai–Borwein Gradient Min–max Normalization-based preprocessing algorithm. Second, computationally efficient features were selected using Blinder–Oaxaca Statistical Decomposition function. Also, the actual classification was done by utilizing AI-enabled Bernoulli Deep Belief Network Classifier to classify preprocessed feature selected samples for appropriate crop yield (i.e., rice and paddy) in an accurate and precise manner. Experiments were conducted on Crop Yield Prediction dataset to check performance of proposed method. Experimental outcomes demonstrate that BBO-BDC method achieved high accuracy and sensitivity with minimum convergence speed and overhead upon comparison to the state-of-the-art methods.

In future work, the proposed method is further extended to provide precise, real-time suggestions tailored to specific agricultural factors such as weather, soil type, and time by using a novel deep learning network⁵⁴.An attention mechanism will be used to choose the most significant features for precise crop recommendations. Also, the XAI methods such as LIME and SHAP are transparent and reliable in modern agriculture⁵⁵. How XAI can be incorporated and investigated into crop recommendation systems to address the “black box” nature of AI models. In addition, the XAI technique will be to create AI-driven recommendations for farmers.

Availability of data and materials

All the data generated or analyzed during this study are included in this research article.

Code availability

The code and supporting materials used to reproduce the findings of this study have been uploaded as a supplementary file.

References

Al-Adhaileh, M. H. & Aldhyani, T. H. H. Artificial intelligence framework for modeling and predicting crop yield to enhance food security in Saudi Arabia. PeerJ Comput. Sci. 8, e1104. https://doi.org/10.7717/peerj-cs.1104 (2022).
Article PubMed PubMed Central Google Scholar
Chen, K. et al. MetaRF: attention-based random forest for reaction yield prediction with a few trails. J. Cheminform. 15(1), 43. https://doi.org/10.1186/s13321-023-00715-x (2023).
Article PubMed PubMed Central Google Scholar
Oikonomidis, A. et al. Hybrid deep learning-based models for crop yield prediction. Appl. Artif. Intell.: AAI https://doi.org/10.1080/08839514.2022.2031823 (2022).
Article Google Scholar
Talaviya, T., Shah, D., Patel, N., Yagnik, H. & Shah, M. Implementation of artificial intelligence in agriculture for optimisation of irrigation and application of pesticides and herbicides. Artif. Intell. Agric. 4, 58–73. https://doi.org/10.1016/j.aiia.2020.04.002 (2020).
Article Google Scholar
Subeesh, A. & Mehta, C. R. Automation and digitization of agriculture using artificial intelligence and internet of things. Artif. Intell. Agric. 5, 278–291. https://doi.org/10.1016/j.aiia.2021.11.004 (2021).
Article Google Scholar
Rashid, M. et al. A comprehensive review of crop yield prediction using machine learning approaches with special emphasis on palm oil yield prediction. IEEE Access: Pract. Innov. Open Solut. 9, 63406–63439. https://doi.org/10.1109/access.2021.3075159 (2021).
Article Google Scholar
Elbasi, E. et al. Artificial intelligence technology in the agricultural sector: a systematic literature review. IEEE Access: Pract. Innov. Open Solut. 11, 171–202. https://doi.org/10.1109/access.2022.3232485 (2023).
Article Google Scholar
Bharadiya, J. P. et al. Predicting crop yield using deep learning and remote sensing. J. Eng. Res. Rep. 24(12), 29–44. https://doi.org/10.9734/jerr/2023/v24i12858 (2023).
Article Google Scholar
Khaki, S., Wang, L. & Archontoulis, S. V. A CNN-RNN framework for crop yield prediction. Front. Plant Sci. 10, 1750. https://doi.org/10.3389/fpls.2019.01750 (2020).
Article PubMed PubMed Central Google Scholar
Apat, S. K. et al. An artificial intelligence-based crop recommendation system using machine learning. J. Sci. Ind. Res. (JSIR) 82(05), 558–567. https://doi.org/10.56042/jsir.v82i05.1092 (2023).
Article Google Scholar
Shook, J. et al. Crop yield prediction integrating genotype and weather variables using deep learning. PLoS ONE 16(6), e0252402. https://doi.org/10.1371/journal.pone.0252402 (2021).
Article CAS PubMed PubMed Central Google Scholar
de la Parte, M. S. E., Martínez Ortega, J. F., Hernández Díaz, V. & Lucas Martínez, N. Big Data and precision agriculture: a novel spatio-temporal semantic IoT data management framework for improved interoperability. J. Big Data 10(1), 26. https://doi.org/10.1186/s40537-023-00703-1 (2023).
Article Google Scholar
Wei, Y., Han, C. & Yu, Z. An environment safety monitoring system for agricultural production based on artificial intelligence, cloud computing and big data networks. Journal of Cloud Computing: Advances, Systems and Applications 12, 25. https://doi.org/10.1186/s13677-023-00389-6 (2023).
Article Google Scholar
Ansarifar, J. et al. An interaction regression model for crop yield prediction. Sci. Rep. 11(1), 17754. https://doi.org/10.1038/s41598-021-97221-7 (2021).
Article CAS PubMed PubMed Central Google Scholar
Mohapatra, S. & Chaudhary, N. Statistical analysis and evaluation of feature selection techniques and implementing machine learning algorithms to predict the crop yield using accuracy metrics. Eng. Sci. 21(3), 182–193 (2023).
Google Scholar
Gopal, M. & Bhargavi, R. Performance evaluation of best feature subsets for crop yield prediction using machine learning algorithms. Appl. Artif. Intell.: AAI 33(7), 621–642. https://doi.org/10.1080/08839514.2019.1592343 (2019).
Article Google Scholar
Guo, Y. et al. Integrated phenology and climate in rice yields prediction using machine learning methods. Ecol. Indic. 120(106935), 106935. https://doi.org/10.1016/j.ecolind.2020.106935 (2021).
Article Google Scholar
Elavarasan, D. & Durairaj-Vincent, P. M. Crop yield prediction using deep reinforcement learning model for sustainable agrarian applications. IEEE Access: Pract. Innov. Open Solut. 8, 86886–86901. https://doi.org/10.1109/access.2020.2992480 (2020).
Article Google Scholar
Olofintuyi, S. S. et al. An ensemble deep learning approach for predicting cocoa yield. Heliyon 9(4), e15245. https://doi.org/10.1016/j.heliyon.2023.e15245 (2023).
Article PubMed PubMed Central Google Scholar
Gupta, S. et al. Machine learning- and feature selection-enabled framework for accurate crop yield prediction. J. Food Qual. 2022, 1–7. https://doi.org/10.1155/2022/6293985 (2022).
Article Google Scholar
Akkem, Y. et al. Streamlit application for advanced ensemble learning methods in crop recommendation systems: a review and implementation. Indian J. Sci. Technol. 16(48), 4688–4702. https://doi.org/10.17485/ijst/v16i48.2850 (2023).
Article Google Scholar
Ayed, R. B. & Hanana, M. Artificial intelligence to improve the food and agriculture sector. J. Food Qual. 2021, 1–9. https://doi.org/10.1155/2021/5572464 (2021).
Article Google Scholar
Anjana, N. et al. An efficient algorithm for predicting crop using historical data and pattern matching technique. Glob. Transit. Proc. 2(2), 294–298. https://doi.org/10.1016/j.gltp.2021.08.060 (2021).
Article Google Scholar
Guo, W. W. & Xue, H. Crop yield forecasting using artificial neural networks: a comparison between spatial and temporal models. Math. Probl. Eng. 2014, 1–7. https://doi.org/10.1155/2014/857865 (2014).
Article Google Scholar
Kamath, P. et al. Crop yield forecasting using data mining. Glob. Transit. Proc. 2(2), 402–407. https://doi.org/10.1016/j.gltp.2021.08.008 (2021).
Article Google Scholar
Adebiyi, M. O., Ogundokun, R. O. & Abokhai, A. A. Machine learning–based predictive farmland optimization and crop monitoring system. Scientifica 2020, 1–11. https://doi.org/10.1155/2020/8812586 (2020).
Article Google Scholar
Fraisse, C. et al. Artificial intelligence (AI) for crop yield forecasting: Ae571/Ae571, 4/2022. EDIS https://doi.org/10.32473/edis-ae571-2022 (2022).
Article Google Scholar
Ikram, A. et al. Crop yield maximization using an IoT-based smart decision. J. Sens. 2022, 7696417. https://doi.org/10.1155/2022/7696417 (2022).
Article Google Scholar
Zhang, Q. et al. Suitability evaluation of crop variety via graph neural network. Comput. Intell. Neurosci. 2022, 5614974. https://doi.org/10.1155/2022/5614974 (2022).
Article PubMed PubMed Central Google Scholar
Jeong, S., Ko, J., Ban, J.-o, Shin, T. & Yeom, J.-M. Deep learning-enhanced remote sensing-integrated crop modeling for rice yield prediction. Ecol. Inform. 84, 1–11 (2024).
Article Google Scholar
Wickramasinghe, L. et al. Modeling the relationship between rice yield and climate variables using statistical and machine learning techniques. J. Math. 2021, 1–9. https://doi.org/10.1155/2021/6646126 (2021).
Article MathSciNet Google Scholar
van Klompenburg, T. et al. Crop yield prediction using machine learning: a systematic literature review. Comput. Electron. Agric. 177(105709), 105709. https://doi.org/10.1016/j.compag.2020.105709 (2020).
Article Google Scholar
Cedric, L. S. et al. Crops yield prediction based on machine learning models: case of West African countries. Smart Agric. Technol. 2, 100027. https://doi.org/10.1016/j.atech.2022.100027 (2022).
Article Google Scholar
Noorunnahar, M., Chowdhury, A. H. & Mila, F. A. A tree based eXtreme Gradient Boosting (XGBoost) machine learning model to forecast the annual rice production in Bangladesh. PLoS ONE 18(3), e0283452. https://doi.org/10.1371/journal.pone.0283452 (2023).
Article CAS PubMed PubMed Central Google Scholar
Kumari, P. et al. Recurrent neural network architecture for forecasting banana prices in Gujarat, India. PLoS ONE 18(6), e0275702. https://doi.org/10.1371/journal.pone.0275702 (2023).
Article CAS PubMed PubMed Central Google Scholar
Srivastava, A. K. et al. Winter wheat yield prediction using convolutional neural networks from environmental and phenological data. Sci. Rep. 12(1), 1–13. https://doi.org/10.1038/s41598-022-24400-3 (2022).
Article Google Scholar
Akkem, Y., Biswas, S. K. & Varanasi, A. A comprehensive review of synthetic data generation in smart farming by using variational autoencoder and generative adversarial network. Eng. Appl. Artif. Intell. 127, 107562. https://doi.org/10.1016/j.engappai.2023.107562 (2024).
Article Google Scholar
Akkem, Y., Biswas, S. K. & Varanasi, A. Smart farming using artificial intelligence: a review. Eng. Appl. Artif. Intell. 116, 105428. https://doi.org/10.1016/j.engappai.2022.105428 (2023).
Article Google Scholar
Almeyda, E. & Ipanaque, W. Recent developments of artificial intelligence for banana: application areas, learning algorithms and future challenges. Eng. Agríc. 42(1), 1–13. https://doi.org/10.1590/1809-4430-Eng.Agric.v42n1e20210143/2022 (2022).
Article Google Scholar
Singh, R., Singh, R. & Kaur, P. Rice crop yield prediction study by artificial intelligence techniques. Int. J. Adv. Multidiscip. Res. Stud. 3(2), 186–190 (2023).
Google Scholar
Dhanaraj, R. K. & Chandraprabha, M. Ant lion optimization in deep neural network for forecasting the rice crop yield based on soil nutrients. Prog. Artif. Intell. https://doi.org/10.1007/s13748-024-00351-y (2024).
Article Google Scholar
Kumar, I., Rawat, J., Mohd, N. & Husain, S. Opportunities of artificial intelligence and machine learning in the food industry. J. Food Qual. 2021, 1–10. https://doi.org/10.1155/2021/4535567 (2021).
Article Google Scholar
Archana, S. & Kumar, P. S. A survey on deep learning based crop yield prediction. Nat. Environ. Pollut. Technol. 22(2), 579–592. https://doi.org/10.46488/nept.2023.v22i02.004 (2023).
Article CAS Google Scholar
Han, X. et al. Research on rice yield prediction model based on deep learning. Comput. Intell. Neurosci. 2022, 1922561. https://doi.org/10.1155/2022/1922561 (2022).
Article PubMed PubMed Central Google Scholar
Tian, H. et al. Mapping winter crops in China with multi-source satellite imagery and phenology-based algorithm. Remote Sens (Basel, Switzerland) 11(7), 820. https://doi.org/10.3390/rs11070820 (2019).
Article Google Scholar
Liu, Q. et al. Machine learning crop yield models based on meteorological features and comparison with a process-based model. Artif. Intell. Earth Syst. https://doi.org/10.1175/aies-d-22-0002.1 (2022).
Article Google Scholar
Aggarwal, S. et al. Rice disease detection using artificial intelligence and machine learning techniques to improvise agro-business. Sci. Program. 2022, 1–13. https://doi.org/10.1155/2022/1757888 (2022).
Article Google Scholar
Sahni, V. et al. Modelling techniques to improve the quality of food using artificial intelligence. J. Food Qual. 2021, 1–10. https://doi.org/10.1155/2021/2140010 (2021).
Article Google Scholar
ElBeheiry, N. & Balog, R. S. Technologies driving the shift to smart farming: a review. IEEE Sens. J. 23(3), 1752–1769. https://doi.org/10.1109/jsen.2022.3225183 (2023).
Article Google Scholar
Vimalajeewa, D. et al. A service-based joint model used for distributed learning: application for smart agriculture. IEEE Trans. Emerg. Top. Comput. https://doi.org/10.1109/tetc.2020.3048671 (2021).
Article Google Scholar
Amaratunga, V., Wickramasinghe, L., Perera, A., Jayasinghe, J. & Rathnayake, U. Artificial neural network to estimate the paddy yield prediction using climatic data. Math. Probl. Eng. 2020, 7629840. https://doi.org/10.1155/2020/7629840 (2020).
Article Google Scholar
Ikram, A. et al. Crop yield maximization using an IoT-based smart decision. J. Sens. 2022, 5173018. https://doi.org/10.1155/2022/5173018 (2022).
Article Google Scholar
Yan, L. Development of international agricultural trade using data mining algorithms-based trade equality. Mob. Inf. Syst. 2021, 1–9. https://doi.org/10.1155/2021/5046244 (2021).
Article CAS Google Scholar
Akkem, Y. & Biswas, S. K. Analysis of an intellectual mechanism of a novel crop recommendation system using improved heuristic algorithm-based attention and cascaded deep learning network. IEEE Trans. Artif. Intell. 6(05), 1100–1113. https://doi.org/10.1109/TAI.2024.3508654 (2025).
Article Google Scholar
Akkem, Y., Biswas, S. K. & Varanasi, A. Streamlit-based enhancing crop recommendation systems with advanced explainable artificial intelligence for smart farming. Neural Comput. Appl. 36(32), 20011–20025. https://doi.org/10.1007/s00521-024-10208-z (2024).
Article Google Scholar

Download references

Funding

Open access funding provided by Symbiosis International (Deemed University). There is no funding received for this research.

Author information

Authors and Affiliations

Symbiosis Institute of Computer Studies and Research (SICSR), Symbiosis International (Deemed University), Pune, India
Rajesh Kumar Dhanaraj
Department of Computer Sciences, College of Computer and Information Science, Princess Nourah Bint Abdulrahman University, Riyadh, Saudi Arabia
Nithya Rekha Sivakumar
Center for Information and Communication Sciences, Ball State University, Muncie, USA
Firoz Khan
Hourani Center for Applied Scientific Research, Al-Ahliyya Amman University, Amman, Jordan
Mahmoud Ahmad Al-Khasawneh
School of Computing, Skyline University College, University City Sharjah, 1797, Sharjah, United Arab Emirates
Mahmoud Ahmad Al-Khasawneh

Authors

Rajesh Kumar Dhanaraj
View author publications
Search author on:PubMed Google Scholar
Nithya Rekha Sivakumar
View author publications
Search author on:PubMed Google Scholar
Firoz Khan
View author publications
Search author on:PubMed Google Scholar
Mahmoud Ahmad Al-Khasawneh
View author publications
Search author on:PubMed Google Scholar

Contributions

Rajesh Kumar Dhanaraj—Conceptualization, methodology, and overall supervision of the study Nithya Rekha Sivakumar, Firoz Khan, Mahmoud Ahmad Al-Khasawneh—overall supervision of the study.

Corresponding author

Correspondence to Rajesh Kumar Dhanaraj.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Supplementary Information.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Dhanaraj, R.K., Sivakumar, N.R., Khan, F. et al. AI-enabled Barilai–Borwein–Blinder–Oaxaca–Bernoulli Deep Classifier for Enhanced Crop Yield Prediction. Sci Rep 15, 23225 (2025). https://doi.org/10.1038/s41598-025-03935-3

Download citation

Received: 11 February 2025
Accepted: 23 May 2025
Published: 02 July 2025
DOI: https://doi.org/10.1038/s41598-025-03935-3

Subjects

Abstract

Similar content being viewed by others

Winter wheat yield prediction using convolutional neural networks from environmental and phenological data

Predicting land suitability for wheat and barley crops using machine learning techniques

Predicting hybrid rice performance using AIHIB model based on artificial intelligence

Introduction

Objectives of this paper

The novelty of this paper

Structure of manuscript

Related works

AI methods for crop yield prediction

Learning methods for crop yield prediction

Motivation

Materials and methods

Materials

Proposed methodology: AI-enabled Barilai–Blinder–Oaxaca–Bernoulli deep classifier (BBO-BDC)

System model

Barilai–Borwein gradient min–max normalization-based preprocessing

BlinderOaxaca Statistical Decomposition-based feature selection

AI-enabled Bernoulli Deep Belief Network classifier for crop yield prediction

Experimentation, results and analysis

Experimental evaluation

Experimental parameters

Results

sensitivity, specificity, accuracy with and without feature selection

Performance analysis on convergence speed and overhead with and without feature selection

Discussion

Conclusion

Availability of data and materials

Code availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher’s note

Supplementary Information

Supplementary Information.

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Quick links