A novel early stage drip irrigation system cost estimation model based on management and environmental variables

Pourgholam-Amiji, Masoud; Ahmadaali, Khaled; Liaghat, Abdolmajid

doi:10.1038/s41598-025-88446-x

Download PDF

Article
Open access
Published: 03 February 2025

A novel early stage drip irrigation system cost estimation model based on management and environmental variables

Scientific Reports volume 15, Article number: 4089 (2025) Cite this article

3552 Accesses
5 Citations
Metrics details

Subjects

Abstract

One of the most significant, intricate, and little-discussed aspects of pressurized irrigation is cost estimation. This study attempts to model the early-stage cost of the drip irrigation system using a database of 515 projects divided into four sections the cost of the pumping station and central control system (TC_P), the cost of on-farm equipment (TC_F), the cost of installation and operation on-farm and pumping station (TC_I), and the total cost (TC_T). First, 39 environmental and management features affecting the cost of the listed sectors were extracted for each of the 515 projects previously mentioned. A database (a matrix of 515 × 43) was created, and the costs of all projects were updated for the baseline year of 2022. Then, several feature selection algorithms, such as WCC, LCA, GA, PSO, ACO, ICA, LA, HTS, FOA, DSOS, and CUK, were employed to choose the most significant features that had the biggest influence on the system cost. The selection of features was carried out for all features (a total of 39 features) as well as for easily available features (those features that existed before the irrigation system’s design phase, 18 features). Then, different machine learning models such as Multivariate Linear Regression, Support Vector Regression, Artificial Neural Networks, Gene Expression Programming, Genetic Algorithms, Deep Learning, and Decision Trees, were used to estimate the costs of each of the of the aforementioned sections. Support vector machine (SVM) and optimization algorithms (Wrapper) were found to be the best learner and feature selection techniques, respectively, out of all the available feature selection algorithms. The two LCA and FOA algorithms produced the best estimation, according to the evaluation criteria results. Their RMSE for all features was 0.0020 and 0.0018, respectively, and their R² was 0.94 and 0.94. For readily available features, these criteria were 0.0006 and 0.95 for both algorithms. In the part of the overall feature, the early-stage cost modeling with selected features revealed that the SVM model (with RBF Kernel) is the best model among the four cost sections discussed. Its evaluation criteria in the training stage are R² = 0.923, RMSE = 0.008, and VE = 0.082; in the testing stage, they are R² = 0.893, RMSE = 0.009, and VE = 0.102. The ANN model (MLP) was found to be the best model for a subset of features in the easily available feature part, with R² = 0.912, RMSE = 0.008, and VE = 0.083 in the training stage and R² = 0.882, RMSE = 0.009, and VE = 0.103 in the testing stage. The findings of this study can be utilized to highly accurately estimate the cost of local irrigation systems based on the recognized environmental and management parameters and by employing particular models.

Automated smart drip irrigation system in internet of things using adaptive residual hybrid network for precision farming

Article Open access 30 January 2026

Risk-indexed artificial neural network for predicting duration and cost of irrigation canal-lining projects using survey-based calibration and python validation

Article Open access 17 November 2025

Integrated strategic planning and multi-criteria decision-making framework with its application to agricultural water management

Article Open access 19 May 2022

Introduction

Background research

Usually, cost estimation is an experience-based task that includes the evaluation of unknown conditions and complex relationships of factors affecting the cost. An Artificial Neural Network (ANN) is an analogy-based process that is best suited for cost estimation. The main advantages of ANNs include their ability to learn by example (past projects) and to generalize solutions to applications (future projects)^1,2. In general, the mechanism that drives advanced technologies and gives rise to innovative tools, especially in the agricultural sector, is ML^3,4,5.

In this regard, Elhag & Boussabaine⁶investigated an artificial neural system for estimating the cost of construction projects. In this study, two ANN models were developed to predict the lowest tender price. 30 projects participated in this study. In model I, 13 cost-determining features were used, but in contrast, only 4 input variables were involved in the development of model II. The findings showed that the two ANN models learned well in the training phase and obtained good generalization ability in the test phase. Models I and II achieved average accuracy percentages of 79.3 and 82.2%, respectively. In this regard, Ahiaga-Dagbui & Smith⁷ used ANN to model the final cost of water projects. For this purpose, data from 98 water-related construction projects between 2007 and 2011 in Scotland were used. As a prototype of extensive research, the performance of the final model was very satisfactory and the results indicated the high ability of ANN to capture the interaction between estimator features and final cost. Elfaki et al.⁸ conducted a 10-year review of intelligent techniques in construction project cost estimation and emphasized the high capability of ANN in recognizing, checking, weighting important criteria and finally estimating the initial and final cost of projects. Also, Juszczyk et al.⁹ investigated the ANN approach based on estimating the construction costs of sports field. Apart from the general conclusion about the application of ANNs, the Multi Layer Perceptron (MLP) model was selected from a wide set of different networks. The analysis of the results shows the optimal performance of the selected network in terms of the correlation between the actual and estimated cost. The level of errors was acceptable and the accuracy of the model was evaluated as suitable.

In a study, Roxas & Ongpeng¹⁰ used an artificial neural network approach to estimate the cost of construction projects in the Philippines. The objective of this study was to develop an ANN model that can predict the total structural cost of construction projects. The data of 30 construction projects were collected and randomly divided into three sets: 60% was considered for training, 20% for performance validation and 20% as a fully independent test of network generalization and six input parameters. The results indicated that the obtained ANN model reasonably predicted the total structural cost of construction projects with favorable results in the training and testing phase.

In research by Yadav et al.¹¹also developed a Cost Estimation Model (CEM) using ANN, which is able to predict the total structural cost of residential complexes by considering different parameters. In this research, the data of the last 23 years were collected. The resulting ANN model reasonably predicted the total structural cost of construction projects with a correlation factor of R = 0.9960 and RSquared = 0.9905, which provides favorable training and testing phase results. Also, Leszczyński & Jasiński¹²used ANN approach to estimate product cost in a case study. The aim of this study was to present artificial neural networks (ANN) as a method for estimating product cost theoretically and practically, and the main problem was to model artificial neural networks for the process of estimating product cost with advanced production technology. The theoretical and experimental analysis carried out showed that ANN models are the most innovative tools for product cost estimation in an industrial environment with advanced technology and digitization of production. In another study, Sharma et al.¹³, evaluated machine learning and deep learning (transfer learning) techniques for detecting rice diseases such as bacterial blight, rice blast, and brown spot. Their comparative analysis demonstrated that transfer learning models, particularly InceptionResNetV2 and XceptionNet, outperformed conventional machine learning techniques. This research underscores the potential of these methods in enabling early disease diagnosis to assist farmers. The authors recommend future studies focus on larger datasets to improve generalizability.

Estimating the construction cost of construction projects in the early stages with higher accuracy plays a vital role in the success of any project. For this purpose, the estimation of building construction cost using ANN was investigated by Chandanshive & Kambekar¹⁴. Based on the data set of 78 construction projects from the big city of Mumbai and its geographical area, the most influential design parameters of the construction cost of buildings were identified as input, and the total cost of the structure skeleton was the output of the neural network models. The results obtained from the trained neural network model showed that it was able to predict the cost of construction projects in the early stages of construction. In another research, Omotayo et al.¹⁵presented an artificial neural network approach to predict the most applicable Post-Contract Cost Controlling Techniques (PCCTs) in construction projects. This study aimed to propose a structured decision support method for predicting the most applicable PCCTs using ANN, and for this purpose, the data from 135 samples were used. The instrumentality of ANNs in this study enabled the development of a structured decision support methodology for the analysis of the most suitable PCCTs for deployment at different stages of the construction process, and the RMSE criteria equal to 0.073 and RSquare equal to 0.726 were obtained in the validation stage. In a study, Singh & Singh¹⁶used Featurewiz as an effective method for data normalization. Experiments were conducted on 18 benchmark datasets to demonstrate the effectiveness of the proposed approach compared to conventional normalization, and the obtained data were evaluated on four learning algorithms. The results showed that Featurewiz performed better than normal data normalization in four famous machine learning algorithms (KNN, MLP, GNB, and L-SVM) and the statistical analysis also proved the same. Sharma & Kumar¹⁷ explored transfer learning models for diagnosing rice plant diseases using publicly available datasets from Mendeley and Kaggle. Their study tested individual models like InceptionV3, ResNet152V2, and DenseNet201, and introduced model ensembling to enhance diagnostic accuracy. Results showed that most ensemble models outperformed individual transfer learning models and even advanced approaches like Convolution-XGBoost, highlighting the potential of ensemble techniques in improving automated rice disease detection systems.

Definition of the problem

In Iran, less than 30% of irrigated lands are covered by modern irrigation systems. This issue shows the importance of examining the weak and strong points, determining the factors affecting the costs of different irrigation systems, and then economic modeling of pressurized irrigation systems, especially estimating the cost of an irrigation system before construction. On the other hand, one of the most important needs of the Ministry of Agricultural Jihad, the Ministry of Energy, and other trustees of the water industry (in Iran and all over the world) as well as employers, consultants, and contractors is to estimate the costs of projects in the initial stage and even before design.

Since the development of different pressurized irrigation systems is part of the strategic policies of the government and the Ministry of Agricultural Jihad of Iran, having knowledge, information, and foresight of the factors affecting the costs of an irrigation system in different regions and before implementation, will be a great help in cost management. Also, considering the significant extent of lands covered by pressurized irrigation systems and their development potential in Iran and the world, this identification of the features affecting cost and then its modeling will play a significant role in the country’s annual budgeting, and paying attention is necessary. On the other hand, the cost per hectare of implementing an irrigation system based on the type of crop, area, water, climate, and other geometric and geographical factors has a lot of variance, and a single version cannot be considered for all conditions. As a result, it is necessary to identify the features affecting the cost of irrigation systems and to model the costs before designing and implementing them so that the cost information of all parts can be estimated.

The importance of pressurized irrigation systems

Regarding the development process of pressurized irrigation systems in Iran in the first to sixth development plans of the country, it can be said that economic issues and the payment of facilities by the government for the implementation of these systems, the most important and also the most influential factor in changing the trend in the area of irrigation systems implemented in the country are. In total, according to the statistics presented in the statistics of the Ministry of Agricultural Jihad, from 1990 to the end of 2021, a total of 1,987,548 hectares of modern pressurized irrigation systems have been implemented in Iran.

Although the quantitative development process of modern pressurized irrigation systems has changed due to economic fluctuations and commercial equations, overall, the process of equipping the lands covered by these systems in Iran was upward. According to surveys, if the statistics up to 2022 are considered, the implemented levels of pressurized irrigation systems in Iran have increased to more than 2.6 million hectares¹⁸.

Cost estimation

The use of techniques and formulas that can estimate the cost of the project using a series of basic information will be of great help to consulting engineering companies active in water engineering. It can be used as a comprehensive guideline for estimating the costs of building a pressurized system before its implementation. Among the various methods that can be used for cost estimation in different stages of a project, there are traditional detailed cost estimation methods, simple cost estimation, cost estimation based on cost functions, cost estimation based on activity, cost index method, and expert systems^19,20,21,22,61.Hadadian Nejad^23,24.

Therefore, early cost estimation plays an important role in the early decisions of the construction project, even when the project is not yet finalized and there is still very limited information about the detailed design available at these stages^14,25,26. In addition, cost estimation plays a key role in the successful completion of construction projects. Due to the lack of information, details, maps, and many important factors affecting cost estimation in the initial stage and planning, the project will be at risk. Therefore, cost estimation plays an important role in construction project decisions, and for success in construction projects, cost estimation with high accuracy and less error will be urgently and seriously needed^{27,28,29,30,31}.

In this regard, various methods are available for cost estimation. With the increase in computing power, now a greater tendency to use methods based on Machine Learning (ML) such as Artificial Neural Networks (ANN), Fuzzy Logic (FL), Deep Learning (DL), and Genetic Algorithm (GA) including Gene Expression Programming (GEP) and Genetic Programming (GP), etc. there is for more accurate estimation of project duration and costs. These methods can still be reliable despite insufficient details at the initial stage and even with little data and identify non-linear relationships between cost factors and project costs^30,32,Hadadian Nejad²³. While the use of ANN for cost estimation from the perspective of contractors has been extensively investigated, there are limited studies on the development and application of ML-based methods for consulting engineering firms. Considering that the nature of the products/services provided by the consulting engineering companies are inherently different from those of the contractors and also considering that the type and level of details of the information available in the bidding stage are different, investigating the application of ML-based methods for cost estimation in consulting companies it is important^20,27,33³⁴,Drenthe et al., 2019^26,35,36;).

Feature selection

Feature Selection (FS) and data processing a fundamental component of many classification, modeling, and regression problems. Because some data have the same effect, some have a misleading effect, and some have no effect on classification or regression problems, and therefore choosing the optimal and minimum size for the features can be useful^37,38,39,40. Also, feature selection as one of the most important data processing problems is an important and up-to-date research topic in Pattern Recognition (PR), Machine Learning (ML) and Data Mining (DM). This approach is very widely used and the projects will be valuable when the feature selection technique and Sensitivity Analysis (SA) are used in them. With the development of information storage, the input data has a large number of attributes, which may include a large number of irrelevant or unimportant features. Unnecessary features often lead to low algorithm learning efficiency and difficulty. Therefore, selecting relevant and necessary features for a given learning task is a very important step^41,42,43. Therefore, feature selection is an important process of data science model development workflow. There are various feature selection techniques and methods that data scientists use to remove redundant features.

Summaries of the previous studies and innovation this research

In Iran’s internal studies, the economic analysis of irrigation systems by switching from one system to another and its effect on crop performance and economic productivity have been discussed; In international studies, the recognition of the influential components and the estimation of the final costs in road and construction engineering projects have been addressed, and the studies related to the water industry and irrigation systems have been the missing link in these discussions. On the other hand, in the past, when using linear and regression relationships, artificial neural networks, and in general, machine learning and artificial intelligence algorithms, all features and variables were used. This work had several major flaws; (1) it would make the execution time long and spend high time–cost. (2) The results were not user-friendly and every person could not take sufficient advantage of the extracted relations. (3) Most importantly, using all the features would lead to complicated and impractical results. Now, with the advancement of information technology, the feature selection approach seeks to facilitate this process. Therefore, finding a relationship to identify important factors affecting the final cost of an irrigation system and also formulating it for use in areas with different characteristics is what the current research is looking for; Because this research aims to estimate the cost of pressurized irrigation projects in the early stages of design using machine learning methods by using the data of many pressurized irrigation projects carried out in different parts of Iran in different years. Also, the next goals included finding a single and generalizable algorithm for estimating the final cost of an irrigation system and identifying the most important components influencing the cost of implementing an irrigation system. Considering the previous studies and the innovation of this work, it can be said that such a study has not been done until now. Also, this article’s distinguishing feature from previous research is the use of new and numerous models, software, and approaches to modeling the cost of pressurized irrigation systems.

Materials and methods

The early cost stages modeling of pressurized irrigation projects was done in several stages, including collecting the required data, updating the cost of the projects, selecting the best features, and training and validating the cost estimation models in four parts, including the Cost of pumping station and central control system (TC_P), Cost of on-farm equipment (TC_F), Cost of installation and operation on-farm and pumping station (TC_I) and Total cost (TC_T).

Collecting the required data

In this research, a comprehensive and complete data bank was prepared from the statistics and information of 515 drip irrigation systems implemented between 2006 and 2020 in different parts of Iran, which were obtained from reputable consulting engineering companies. For each irrigation system, statistics and information used from irrigation plan reports, AutoCAD maps, and Excel files related to design calculations were extracted, and the cost of the systems was categorized into two general parts: pumping station cost and farm cost. Types of candidate variables for the input of drip irrigation system cost estimation models include; the geometric variables of land, soil, water source, plant, and climate are the variables of irrigation and hydraulic management which were extracted as follows:

General information of the project (province, city, owner, type of cultivated crop, water source, energy, and year of implementation), water source information (the amount and status of water rights, electrical conductivity, acidity, sodium, potassium, calcium, magnesium, carbonate and bicarbonate), crop information (the distance between rows of trees, the distance between trees on the row, the maximum daily evapotranspiration of the crop, the shader surface, and the depth of root development), soil information (water holding capacity in the soil, percentage of allowed moisture discharge, percentage of wetted surface, final permeability and apparent specific gravity), irrigation system information (average operating flow rate, average operating pressure, type of arrangement of laterals, number of emitters for each plant and emitter distance), farm irrigation management information (irrigation interval obtained in the design, net irrigation requirement, gross irrigation requirement, irrigation duration, maximum irrigation hours in a day and night, maximum number of irrigation turns in each interval, number of irrigation turns, average area of each irrigation unit and discharge average per irrigation unit), Farm characteristics information (geometric shape of the farm, average slope, height difference of the water source to the highest point of the farm, length of laterals, diameter, and length of main, semi-main pipes, manifolds, and connections), pumping station information (pump type, engine power, pumping height, pumping flow rate/ discharge, central control system connections and equipment-accessories).

Project cost updating and data preprocessing

In the current research, using annual inflation (in a stepwise manner), the price of all 515 drip irrigation projects (2006–2020) were updated from the following relationship for the base year of 2022⁴⁴:

$${X}_{t}={X}_{0}({1+r)}^{n}$$

(1)

where ${X}_{t}$ is the current value of capital, ${X}_{0}$ is the base value of capital (investment value in the year of implementation of the system), r is the average annual bank interest rate and n is the number of years from the year of implementation of the system until now.

To pre-process the data from different standardization methods, the data was standardized and after ensuring the randomness of the data, the data classification was done for model training and also for testing and validation. In this research, 75–80% of the data were considered as training data and 25–20% as testing data^39,45. After extracting variables affecting the cost of drip irrigation systems, the next step was to select the best features that have the greatest impact on the output amount of the model (i.e. cost). Table 1 shows the candidate variables to determine the relationship between independent and dependent variables.

Table 1 Candidate variables for cost modeling of drip irrigation systems.

Full size table

Feature selection technique

After extracting the variables affecting the cost of pressurized irrigation systems, the next step was to select the most important variables, which are referred to as feature selection (FS). It is very important to choose the best features that have the greatest impact on the cost of different parts^46,47. Selecting a subset of features is one of the new and active areas of research in machine learning, which is used for regression and classification problems. Feature selection and extraction are two main steps in machine learning programs and modeling. In feature extraction, some features of the existing data that are informative are extracted. However, not all features derived in the learning process of a machine are constructive, and the most important features should be identified using different models, methods, and algorithms^45,48.

Feature selection techniques using supervised models are mainly classified into three main categories or in some previous literature into five categories^{37,39,40,42,49}. These include Filter Methods (FM), Wrapper Methods (WM), Embedded Methods (EM), Online Methods (OM) and Hybrid Methods (HM) (Fig. 1).

In the figure above, the filter method weights and selects the features. The wrapper method obtains a subset of features based on the learner’s performance. The embedded method selects features based on the learner’s selection order. The online method is based on online tools and the hybrid method combines different methods to achieve better results⁴⁵. Applying these methods requires coding in different environments or spending a lot of time, but new methods were developed for the important topic of feature selection, discussed below.

Feature selection methods

Eureka formulaize

In this research, by using the evolutionary algorithms, the most famous of which is the Genetic Algorithm (GA) and the subsection of Genetic Programming (GP) and Gene Expression Programming (GEP), the relationships between parameters were discovered and modeled³⁶. For this purpose, Eureqa Formulize software was used to identify the features affecting the cost of drip irrigation systems. This software automatically uses data pre-processing such as normalization, removal of outliers, and data randomization, thereby minimizing the calculation error due to the absence of noise in the data.

This program was later designed and developed by Nutonian Company (http://nutonian.wikidot.com). Finally, this program provides the user with the final equation in the form of symbolic regression by presenting a set of mathematical relations and simplifies the analysis of the presented model for the user by providing different outputs. Here, after data pre-processing, 70% of the data were considered as training data and 30% as testing data⁵⁰. This software has the ability to identify the most important features and can perform modeling well and with high accuracy.

winGamma

In this study, using GT and MT tests and three techniques of genetic algorithms (GA), Hill Climbing (HC), and Full Embedding (FE), the most important parameters affecting the cost of different parts of drip irrigation systems and the optimal percentage of training and testing data was identified for cost modeling of each part⁵¹. The Gamma Test (GT) was specifically developed for modeling and predicting nonlinear systems. Using this software set and the gamma test, it is possible to obtain the order of the importance of the input variables and the best combination among all possible combinations. The gamma test was first reported by Koncar⁵² and Stefánsson et al.⁵³and later it was used by other researchers such as Durant⁵⁴ and Tsui et al.⁵⁵. The gamma test model was introduced by Durrant⁵⁴ as a software package (https://users.cs.cf.ac.uk/O.F.Rana/Antonia.J.Jones/GammaArchive/Gamma%20Software/winGamma/winGamma.htm).

M test (MT), one of the basic and discussed topics in data series modeling is choosing the right interval for model preparation and a range for model testing. It is often suggested in theoretical discussions that 70% of the data series be used for training (model preparation) and 30% for model testing. But to separate these two from each other, there is a scientific method and basis that can be used to easily define and determine these two intervals. There is a test called M-test that can be used to do this. The goals of using the Gamma and M test in winGamma software can be summarized as follows^52,53,54,56,Nekue et al., 2021⁵¹;: Finding the minimum data required to produce a near-optimal model, scientific determination of the number of data for the training and testing stage of modeling, determining the best embedding dimension and lag time for time series, creating the automatic and fast structure of the neural network and minimum weight to model the data in the best way, and determining the best set of inputs from the list of possible inputs to a neural controller.

Featurewiz

The feature selection in Python is the process of automatically or manually selecting features in a dataset that contribute the most to the estimated variable or desired output. It should be noted that not all features presented in the dataset are important to provide the best model performance. The four main reasons for applying feature selection in Python are: (1) it improves the accuracy of the model if the appropriate subset is selected. (2) Reduces the fit too much. (3) Enables the machine learning algorithm to train faster. (4) Reduce the complexity of a model and make it easier to interpret^57,58. A real dataset has many features, some of which are useful for training a robust data science model, and others are extra features that can affect model performance. Feature selection is an important element of the data science model development workflow.

There are various feature selection techniques and methods that data scientists use to remove redundant features. A new, improved, and fast way to select the best features in a dataset is Featurewiz, which offers feature engineering capabilities (https://github.com/AutoViML/featurewiz). The Featurewiz API has a “Feature_Engg” parameter that can be set to “Interactions”, “Grouping” and “Target”, creating hundreds of features in one go. Also, it can reduce the number of features and select the best set of features to train a robust model. Featurewiz uses two algorithms to select the best features from the dataset⁵⁹: SULOV and Recursive XGBoost.

SULOV Algorithm: Abbreviation of uncorrelated variable list search expression, which is very similar to MRMR algorithm. By searching the uncorrelated list of variables, this method finds pairs of variables that have crossed the correlation threshold and are therefore called highly correlated.
Recursive XGBoost Algorithm: After the SULOV algorithms selected the best set of features with lower correlation and high mutual information score, the recursive XGBoost algorithm was used to calculate the best features among the remaining variables.

So Featurewiz uses the two algorithms discussed above to find the best set of features that can be further used to train a robust machine learning model. Featurewiz can not only handle datasets with one target variable, but it also can handle datasets with different target variables. The output of this part is important because it can be used to decide which features are more important and which features are less important for predicting the target variable^57,58⁵⁹,).

FeatureSelect

This study used three types of learners to select the feature. The first one is SVM (Support Vector Machine). The second is ANN, which includes only one parameter (training repetition). After examining the types of artificial neural networks, the results showed that the optimization algorithms can lead to better results in the training phase of the artificial neural network. Selecting features by SVM or DT (Decision Tree) and then using ANN to obtain an efficient model is also possible. The third learner is the Decision Tree (DT). Also, three types of feature selection methods were evaluated: (1) Wrapper method (optimization algorithm), (2) Filter method: this type of feature selection consists of five common methods. Experimental results show that each learner and method has its perspective on the dataset, but wrapper methods can generally lead to better results than filter methods. (3) Hybrid-Ensemble method: two-stage feature selection can be used using a combination of filter and wrapper methods. In the following, 11 algorithms were used to select the best feature(s) from the set of features in the wrapper method section. Algorithms developed for this task that can be used in FeatureSelect software include World Competitive Contest (WCC), League Championship Algorithm (LCA), GA, Particle Swarm Optimization (PSO), Ant Colony Optimization (ACO), Imperialist Competitive Algorithm (ICA), Learning Automata (LA), Heat Transfer Optimization Algorithm (HTS), Forest Optimization Algorithm (FOA), Discrete Symbiotic Organisms Search (DSOS), and Cuckoo Optimization (CUK)^39,45.

FeatureSelect software is a new program for feature selection based on machine learning methods developed by Masoudi-Sobhanzadeh et al.⁴⁵ was developed in the Laboratory of System Biology and Bioinformatics of the University of Tehran. This software can be applied to problems where there is a need to select an important and effective subset of features from the entire set of features. It was observed in previous sources that some studies have introduced tools and software such as WEKA. While these tools or software are based on filter methods that are less efficient than wrapper methods,FeatureSelect consists of optimization and learning algorithms in addition to filter methods. Here, data normalization and data fuzzification are also performed. Therefore, using FeatureSelect software has three main goals: (1) easy use of LIBSVM, ANN, and DT. (2) Feature selection for regression problems. (3) Feature selection for classification problems. So, FeatureSelect is a feature or gene selection software application available on GitHub (https://github.com/LBBSoft/FeatureSelect).

Different data mining algorithms

Our models (AI) are widely recognized and commonly applied in water and environmental sciences. Previous research has shown that these models have been used the most and have provided excellent results^60,61,62,63. More details are given in the following sections.

MLR

Linear regression is divided into two types: simple linear regression and Multivariate Linear Regression (MLR). Simple linear regression predicts the value of a dependent variable based on the value of an independent variable, but multiple regression is a method for collective and individual participation of two or more independent variables in the changes of a dependent variable. Therefore, multivariate regressions have a much wider application. In definition; the rate of change of one variable for other variables is called regression coefficients, and in other words, the rate of change in the dependent variable that occurs due to a unit change in the independent variable^64,65,66. The degree of correlation between predictor variables is shown by coefficients. To determine the regression in the present study, the following relationship is used:

$$Y={{\beta }_{0}+\beta }_{1}{X}_{1}+{\beta }_{2}{X}_{2}+\dots +{\beta }_{n}{X}_{n}+\varepsilon$$

(2)

where Y is the dependent variable, $\beta o$ is the constant coefficient, $\varepsilon$ is the error rate, and ${X}_{1}$، ${X}_{2}$ and … ${X}_{n}$are the independent variables used in the models used in this study⁶⁷. With the explanation that the cost amounts of the whole system are considered as dependent variables and the cost of used components of the farm and pumping parts are considered as independent variables and the multivariable linear regression method was analyzed. To enter the variables in the regression model, there are five methods, depending on the purpose, a number of them were used. Different methods of multivariate linear regression are: (1) Enter, Backward, Froward, Remove, and Stepwise⁶⁶.

SVR

The Support Vector Regression (SVR) method was introduced in 1995 by Cortes and Vapnik. They stated that although this method is less used than SVM, SVR has been proven to be an effective tool in estimating real performance⁶⁸. The main difference between SVM and SVR is their output type. With the help of SVM, linear and non-linear models can be created and its parameters can be calculated. This is achieved by using a non-linear kernel function (such as a polynomial). The choice of kernel for SVR depends on the amount of training data and the dimensions of the feature vector. In practice, four types of Linear Kernel, Polynomial Kernel, Hyperbolic Tangent Kernel, and Gaussian Kernel are used^69,70. The formulation of the SVR problem using the one-dimensional example in Fig. 2 is the best form from a geometric point of view. The approximate continuous-valued function can be written as Eq. (3). For multidimensional data, x should be increased one by one and b should be placed in the w vector in order to easily obtain the multivariate regression in Eq. (4)^71,72:

$$y=f\left(x\right)=<w,x>+b=\sum_{i=1}^{M}{w}_{i}{x}_{i}+b, y,b\in {\mathbb{R}},x,w {\mathbb{R}}^{M}$$

(3)

$$f\left(x\right)=\left[\begin{array}{c}w\\ b\end{array}\right]^T\left[\begin{array}{c}x\\ 1\end{array}\right]= {{\varvec{w}}}^{T}{\varvec{x}}+b x,w\in {\mathbb{R}}^{M+1}$$

(4)

ANN

Artificial Neural Networks (ANN) were first introduced in 1943 by McCulloch and Pitts⁷³and in 1962 by Rosenblatt in a serious and influential way⁷⁴. Later, with the development of computers and the appearance of the backpropagation training algorithm for feedforward neural networks by Rumelhart et al.⁷⁵, their use entered a new stage. ANN is an idea for information processing that is inspired by the biological nervous system and processes information like the human brain. This system consists of many elements called neurons that work together to solve a problem. A neuron is the smallest information processing unit that forms the basis of ANN operation. Each network consists of an input, an output layer and one or more intermediate layers. Figure 3 shows a schematic of ANN. In this figure, i is an input vector of the system consisting of a number of causal variables that influence the behavior of the system, and o is an output vector of the system consisting of a number of result variables that re-express the behavior of the system.

MLP

Multilayer Perceptron (MLP) neural network is one type of ANN in which weights and biases can be trained to produce a specific target. MLP is noteworthy because of its good performance⁷⁶. This network is a set of neurons that are placed in different layers one after the other and therefore it is a complex and non-linear system. The MLP uses supervised learning, which includes providing inputs and outputs to the network and minimizing the estimation error, for training⁷⁷. Figure 4 shows the schematic of an MLP. In this research, the error backpropagation (BP) training algorithm was used to train MLP. Also, the sigmoid transfer function for the hidden layer and the linear transfer function for the output layer were considered. Determining the output in the MLP method is in the form of the following relationship, where n is the net input, w is the weight, p is the input, b is the bias, and f is the driving function.

$$n=wp+b \gg a=f\left(n\right)=f\left(wp+b\right)$$

(5)

RBF

The neural network based on the Radial Basis Function (RBF) has a very strong mathematical foundation based on the hypothesis of regularization to solve problems. In general, this network consists of three input parts, a hidden layer, and an output layer⁷⁸. In this network, the Gaussian transfer function is used in the hidden layer and the linear transfer function is used in the output layer. In Figure 5, the RBF neuron is a Gaussian function. The input of this function is the Euclidean distance between each input to the neuron with the specified vector equal to the input vector. This Gaussian function uses the following relationship⁷⁷:

$$f \left({X}_{r} , b\right)={e}^{{-(\Vert {X}_{r}-{X}_{b}\Vert \times \frac{0.8326}{h})}^{2}}$$

(6)

In this relation, ${X}_{r}$ is the input of the network with unknown output, ${X}_{b}$ is the input of observations in time or place, and b and hare the parameters that control the width of the Gaussian function. The output of this function is a variable between zero and one⁷⁷. The calculation of output ${Y}_{r}$ based on the independent variable ${X}_{r}$ is obtained as follows:

$${Y}_{r}=LW\times f \left({X}_{r} , b\right)+Bias$$

(7)

In this regard, LW is the weight of the communication matrix between the hidden layer and the output layer, and Biasis the bias matrix of the output layer⁷⁷.

GRNN

Generalized Regression Neural Network (GRNN) is a network for solving regression problems based on statistics. This network is another type of RBF network. GRNN was introduced in 1991 by Specht⁷⁹. GRNN is a three-layer network, the number of neurons of which is much easier to choose compared to MLP because they are considered equal to the number of observations. Figure 6 shows a GRNN network. Like RBF, this network uses the Gaussian function in the middle layer, but in the output layer, an additional part of RBF is included in the calculations. The following relationship was used to calculate the output amount in this network:

$${Y}_{r}=\frac{1}{\sum_{b=1}^{n}f ({X}_{r} , b)}+\sum_{b=1}^{n}\left[f ({X}_{r} , b)\times {T}_{b}\right]$$

(8)

where ${T}_{b}$ is the target corresponding to the b^th observation and nis the number of observations⁷⁷.

ANFIS

The Adaptive Neuro-Fuzzy Inference System (ANFIS) was introduced by Jang in 1993⁸⁰. ANFIS is similar to a multilayer neural network,with the difference that in addition to ANN learning algorithms, it also uses Fuzzy Logic (FL). An ANFIS model consists of five layers. These five layers are respectively: information input layer, fuzzy rule weight calculation layer, obtained rule weight normalization layer, rule calculation layer, summation layer, and network output⁷⁷. In this research, the considered membership function, the trapezoidal membership function, and the network training algorithm were considered as a hybrid method. Figure 7 shows a schematic of ANFIS.

DL

Deep learning (DL) is a type of ML that works based on the structure and function of the human brain and uses artificial neural networks to perform complex calculations on large amounts of data. These algorithms have self-learning representations^81,82,83,84. Therefore, deep learning is a sub-branch of machine learning and is based on a set of algorithms that are trying to model high-level abstract concepts in data. The more (deeper) the hierarchy of layers, the more nonlinear features are obtained. For this reason, more layers are used in deep learning^{82,84,85,86,87,88}.

The word deep in deep learning refers to the number of layers through which data is transformed into output. Deep learning models can extract better features than shallow models and hence additional layers help in learning features^81,83. The difference between deep learning and neural networks is that deep learning has a wider scope than neural networks and includes Reinforcement Learning algorithms. Figure 8 shows the structure of the deep neural network and its schematic.

GEP and GP

Gene Expression Programming (GEP), which has been created in the evolution of intelligent models, is one of the circular algorithm methods, all of which are based on Darwin’s theory of evolution. GEP, which is the developed form of Genetic Programming (GP), was presented by Ferreira in 1999⁸⁹. This algorithm can automatically select the input variables that have the most influence on the modeling⁹⁰. One of the main advantages of GEP and GP algorithms is that they can be used in the following conditions: (1) the relationship between the variables of the problem is not well known, or the validity of the current knowledge of the mentioned relationship is doubtful, (2) finding the final solution to the problem under consideration is difficult, (3) conventional mathematical solution does not exist (or requires an analytical solution), (4) the approximate solution is acceptable, and (5) the number of data that must be tested, categorized and summarized by the computer is large (such as satellite data)^89,91,92,93. GEP, like GA and GP, is a genetic algorithm; it uses a population of individuals, selects them according to suitability, and applies genetic changes using one or more genetic operators (such as mutation and combination). In general, it can be stated that the basic difference between these three algorithms is related to the nature of their people.

The modeling process of estimating the early-stage cost of constructing drip irrigation systems based on farm and pumping costs was carried out as follows: (1) the first step is to choose the appropriate fitting function. In this study, the Root Mean Square Error (RMSE) function was chosen as the fitting function. (2) The second step is to select a set of input variables and a set of functions to produce chromosomes. In the present problem, the terminals consist of the amounts of costs in different years and different irrigation systems. Four main operators were used here, including {$\div , \times , -, +$} and mathematical functions {$log \left(x\right),\sqrt{x}, \sqrt[3]{x}, {x}_{2}, {x}_{3},\mathit{tan}(x)$}. (3) The third step includes choosing the structure and architecture of chromosomes. (4) The fourth step is to select the link function, which in this study, the mathematical operation of addition was used to create a link between sub-branches. (5) Finally, in the fifth step, the genetic operators and the rate of each of them were selected. Figure 9 shows an example of a gene expression program.

DT

A Decision Tree (DT) is one of the data mining methods and one of the powerful and common tools for classification and prediction or estimation, which, unlike ANN, produces rules. That is, DT explains its prediction in the form of a series of rules. In ANN, only the prediction is expressed, and how it is hidden in the network itself. In addition, in DT, unlike ANN, non-numerical data can be used⁹⁴. The DT approach is used in many fields, including pattern identification, pattern classification, classification, decision support systems, expert systems, etc. Another advantage is that it can classify both types of numerical, non-numerical, analytical, qualitative, and ranking data.

DT usually consists of several nodes known as input and output nodes. Rules created in DT are expressed as “if” and “then”. Different algorithms can predict/estimate the target (dependent) variable based on the independent variables. One of the most important and widely used of them is the CART algorithm^95,96. It is also reported that among the algorithms used in the construction of DT, the most important of them is the C5 algorithm (Algorithm for implementing Decision Tree), which is the developed mode of ID3 (Iterative Dichotomiser 3). Other DT algorithms include Q&RT (Quick Unbiased Efficient Statistical Tree), QUEST, and CHAID. The construction of trees is usually based on three principles Fig. 10:

A set of questions in the form x ≤ d? Where x is an independent variable, and d is a fixed value and the answer to each question is yes/no.
Determining the best criteria for branching to choose the best independent variable to create a branch.
Generate summary statistics for the terminal node⁹⁷.

A DT is a combination of several logical implications (if–then rules). Decision trees are not only a representation of the decision-making process but they can also be used to solve classification problems. Usually, the set of rules extracted from DT is the most important information obtained from them⁹⁸⁹⁹,). The purpose of this research is to investigate the effectiveness of the DT model in predicting the early cost of implementing drip irrigation systems based on different variables and system components (independent variable) that affect the final cost (dependent variable)¹⁰⁰.

The schematic and classification of the types of methods used to model the cost of drip irrigation systems are shown in Fig. 11.

Hyperparameter setting

Due to the wide range of models used, their training was done in several stages. We adopted a range of machine learning algorithms to capture different perspectives on feature importance and regression performance. For this reason, we have provided a table detailing the hyperparameter settings for each algorithm to increase clarity (Table 2).

Table 2 Hyperparameter setting for each of the used algorithms and models in this research.

Full size table

Evaluation criteria

To evaluate the models/algorithms and compare the results of different approaches and methods with observational data, three evaluation criteria were used: Coefficient of Determination (R²), Root Mean Square Error (RMSE), and Volume Error (VE). These criteria are defined as follows:

$${R}^{2}=\frac{{[\sum_{i=1}^{n}({o}_{i}-\overline{o} )({p}_{i}-\overline{p} )]}^{2}}{\sum_{i=1}^{n}{({o}_{i}-\overline{o})}^{2}\times \sum_{i=1}^{n}{({p}_{i}-\overline{p})}^{2}}$$

(9)

$$RMSE= \sqrt{\frac{\sum_{i=1}^{n}{({p}_{i}-{o}_{i})}^{2}}{n}}$$

(10)

$$VE=\frac{\sum_{i=1}^{n}\left|\frac{{O}_{i}-{P}_{i}}{{O}_{i}}\right|}{n}\times 100$$

(11)

In these relationships, O_i is the observed values, P_i is the predicted values, $\overline{o}$ is the average of the observed values, $\overline{p}$ is the average of the predicted values, and nis the number of data^101,102. Any model with more R² and less RMSE and VE is more desirable. This research used MATLAB and Python software environments to code the models and algorithms. Minitab and SAS software were also used for statistical analysis. To examine the linear and nonlinear correlation between independent and dependent variables, correlation and statistical significance tests were used. Since the relationship between variables is expected to be nonlinear, the use of artificial intelligence algorithms and machine learning models is likely. It is important to note that if the P-value is large (usually greater than 0.05), it indicates that the observed results could have occurred by chance and the null hypothesis is not rejected. If the P-value is small (usually less than 0.05), it indicates that the observed results are unlikely to have occurred by chance and the null hypothesis is rejected. In Fig. 12, the general process of the steps of this research, from data collection to training and testing the models, is fully described.

Results and discussion

Examining the correlation between variables

Correlation results between 39 independent variables mentioned with the Cost of pumping station and central control system (TC_P), the Cost of on-farm equipment (TC_F), the Cost of installation and operation of on-farm and pumping station (TC_I), and the total cost (TC_T) in Table 3 are visible. The correlation results can be summarized as follows: 12 variables with a significance at a one percent probability level, four with a significance at a five percent probability level, and 23 with no significant difference with the cost of pumping station and central control system section (TC_P). This issue was as follows for other sections: in the cost of on-farm equipment section (TC_F); 11 variables with a significance at a one percent probability level, and 28 with no significant difference, in the Cost of installation and operation of on-farm and pumping station section (TC_I); Seven variables with a significance at a one percent probability level, four with a significance at a five percent probability level, and 28 with no significant difference, and finally in the total cost section (TC_T); Nine variables with a significance at a one percent probability level, six with a significance at a five percent probability level, and 24 with no significant difference (Table 3).

Table 3 Correlation and significance results (P-Value) between independent and dependent variables.

Full size table

Analysis of cost to area ratio of projects

With the investigations and analyses conducted on the information and data pre-processing of 515 drip irrigation system projects, it was found that the total cost (TC_T) of each hectare of drip irrigation system is equal to 510 million Rials on average. Separately, 12.4% of it is related to the cost of the pumping station and central control system (TC_P), 62.1% is related to the cost of on-farm equipment (TC_F), and 25.5% is related to the cost of installation and operation of on-farm and pumping station (TC_I). Meanwhile, the ratio of purchase of equipment to the total cost based on the updated data of 515 drip irrigation projects was equal to 71% Fig. 13. It is true that in the current situation, the standard of budget allocation by the government for each hectare is a fixed amount of 500 to 550 million Rials, but the changes are much more than this and the amount is different in places, crops, unevenness, climatic conditions, and other conditions. Based on the information received from the reputable consulting engineering companies on which the quality control of the design was carried out, as well as the review of the data bank prepared for this research, the minimum and maximum total cost of a drip irrigation system is equal to 127.3 and 1707.1 million Rials, respectively. This means that the cost changes per hectare fluctuate a lot and therefore require a special investigation.

It is important to mention that by examining the final cost of the projects, it was found that its standard deviation was equal to 200 million Rials. This means that the cost per hectare for each drip irrigation system is not around 510 million Rials and varies from land to land. Also, the standard deviation of the cost per hectare indicates that for each specific geometric and geographical condition, relevant calculations should be made. Then an estimated cost should be announced. Therefore, modeling the cost of drip irrigation systems is very necessary, and while preventing excessive costs, it provides an accurate price estimate for each project. Also, to achieve a simple relationship to objectively observe the relationship between the area and the final cost of the projects, the area information of 515 drip irrigation system projects and the updated total cost (TC_T) were plotted in Fig. 14. The fitted linear (green color) and polynomial (red color) line indicates the determination coefficient of 0.9 and 0.91, respectively.

Selection of features

Evaluation of feature selection algorithms in the train phase

After reviewing different methods, models, and algorithms for feature selection, the result of the best-chosen method, which is FeatureSelect, was briefly presented in this section. The results of this article are divided into two separate parts. (1) The first part is all the features that affect the costs of a drip irrigation system, which, as mentioned earlier, is 39 in number. (2) The second part is related to the features that can be accessed before designing and implementing an irrigation system. The number of these features is 18 and the feature was selected from among them.

It is worth mentioning that the model training was done with 80% of the data and the initial preprocessing and the evaluation criteria were obtained in the total features section. The classifier was SVM and the feature selection method was optimization algorithms (Wrapper). The reason for choosing these is the recommendation made by Masoudi-Sobhanzadeh et al.⁴⁵ that emphasized this importance in the results of their work. The model training results in the total feature section, which included 39 features, showed that the RMSE criterion, SVM elapsed time in seconds, and R²were 0.007, 1.30, and 0.92, respectively. It has been stated in all research sources that the numerical value of these criteria is completely reasonable and shows the accuracy of the input data and the correct training of the algorithms^45,103. Also, model training and obtaining initial evaluation parameters in the feature section before the design stage (18 features) was done. In this section, RMSE, SVM elapsed time in seconds, and R² Criteria were obtained as 0.003, 2.01, and 0.89 respectively. The slight change in the evaluation criteria rate in this section compared to the previous one is that from the set of features affecting the costs of the drip irrigation system (the Cost of pumping station and central control system (TC_P), the Cost of on-farm equipment (TC_F), the Cost of installation and operation of on-farm and pumping station (TC_I), and the total cost (TC_T)), 18 features were separated and then were selected. Therefore, the training accuracy of the models decreased slightly, but these criteria were within the permissible and excellent range. Other researchers also reported these criteria in their studies and considered this to be a sign of correct training¹⁰⁴.

Results of selected algorithms

The results of the selected algorithms for the regression problems are shown in the total section of the features affecting the costs of the drip irrigation system in Fig. 15 and the features before the design phase (BD) in Fig. 16. It is worth mentioning that among 11 algorithms, four algorithms WCC, LCA, LA, and FOA were selected to identify the most important features. There were two reasons for choosing these: (1) First, Masoudi-Sobhanzadeh et al.⁴⁵, who are the developers of the FeatureSelect software, after many reviews and tests, stated that these algorithms will achieve the best results. (2) Secondly, implementing these algorithms took hours and sometimes several days and nights, so it is not logical to spend more time on other algorithms. According to this introduction, the graphs produced using the SVM learning model and the feature selection method of the optimization algorithms (Figs. 3 and 4) compare the performance of the algorithms based on error, RMSE, and correlation scores. Convergence, average convergence, and stability graphs are shown for each score. The purpose of presenting these figures is whether the algorithms are implemented correctly or not. The average convergence criterion is that the answers should improve when the number of iterations or the execution time allocated to the algorithms increases. In addition to convergence, there is also the concept of average convergence. The difference between the two is that convergence is achieved by extracting the best answer at the end of each iteration, while average convergence is calculated based on the average scores of potential solutions at the end of each iteration. Stability also tells the results of algorithms over time and after more executions.

As can be seen, all potential responses generated by the algorithms (WCC, LCA, and FOA) except LA improve with increasing iteration. According to Fig. 15, for the criterion of convergence of error, as the repetition increases (30 times), the amount of error for all algorithms decreases and reaches a fixed limit. However, the LA algorithm had a different trend and was no longer effective after a certain number of iterations. In the convergence section, the best models registered an error below 0.002 and a correlation greater than 0.93. The same process has been followed in the convergence of correlation criterion. In the average convergence of error and correlation section, the LA algorithm has not reached a reasonable and logical result with many iterations, and the WCC algorithm has reached a fixed limit, but its RMSE amount is higher than that of the LCA and FOA algorithms. In the average convergence section, the best model (FOA) showed an RMSE criterion of less than 0.003 and a high correlation of 0.92. Stability graphs also show that if an algorithm is better than other algorithms, its results are forward in the graph and its average results are better than other algorithms. In the sustainability section, it can be found that the FOA and LCA algorithms have the least changes, and with more execution, some of their error and correlation have been reduced and finally fixed. The error and correlation of the best algorithms are about 0.002–0.025 and 0.93–0.94, respectively. It should be noted that although the WCC algorithm has better results than LA, it has weaknesses compared to the two selected algorithms. The findings of other researchers are also in line with the results of this research^104,105.

Figure 16 shows the results of selected algorithms for features before the design phase (BD) that affect the cost of drip irrigation systems. Similar to Fig. 15, convergence, average convergence, and stability criteria have been analyzed and interpreted. In the convergence of error and correlation section, except for the result of the LA algorithm, the others have converged and reached a fixed limit. The error of the best algorithms (WCC, LCA, and FOA) is around 0.0006 and the correlation of the best algorithm is above 0.95, which shows the good training of the algorithm and its prediction accuracy during many iterations. In the section on features before the design phase, the FOA algorithm is better than the others by a margin and is not competitive with LCA and WCC. The results of average convergence of error and correlation also indicate that the best-trained algorithm is FOA, and its error is the least (0.0007) and its correlation compared to other algorithms is the maximum (0.96). The stability criterion after 30 executions showed that the LCA and FOA algorithms were better than the LA and WCC algorithms by a margin, and while the error was minimal, the correlation was higher than 0.95 Fig. 16. From the results of this section, it is clear that the algorithms have received proper training, and this has shown itself in the convergence, convergence average, and stability criteria in the two parts of error and correlation (Figs. 15 and 16). The findings of this section are in complete agreement with the results of other researchers^106,107.

Evaluation of algorithms in the test phase

The results shown in Tables 4 and 5 are calculated based on the stability results of Figs. 15 and 16, respectively. After examining how to train the models, as well as examining the efficiency of different algorithms and identifying them in the previous two sections, we finally tested (validation) and selected features by these algorithms and presented the results of the evaluation criteria. In Table 4, the results of the evaluation criteria for selecting the most important features among all the features affecting the total cost of drip irrigation systems (39 features) are presented for the verification stage. Because this table is derived from the stability results of previous figures, it has two main criteria and several sub-criteria. The main metrics include Error rate (ER), Correlation (CR), Number of selected Features (NOF) and Elapsed Time (ET). The sub-criteria of Standard Deviation (STD), Confidence Interval (CI), Probability Value (P-Value), and Test Statistic (TS) are used for both error and correlation criteria measures.

Table 4 The results of evaluation criteria for selecting the most important features among all the features affecting the total cost of drip irrigation systems.

Full size table

Table 5 The results of evaluation criteria for selecting the most important features among the features before the design phase affecting the total cost of drip irrigation systems.

Full size table

Table 4 shows the NOF identified by each algorithm, a critical parameter for this research. The number of features selected based on the results of WCC, LCA, LA, and FOA algorithms are equal to 17, 11, 15, and 8, respectively. After executing the algorithms 30 times, the best results were selected for each. ET shows how much time in seconds was spent in execution to get the best result for an algorithm. Algorithms have different ETs due to different steps in execution. The ER is also a seal of approval on the algorithms identified in the previous step, and its value was obtained for the superior LCA and FOA algorithms 0.0020 and 0.0018, respectively, and is lower than the others. The ER-STD sub-criteria indicates how different the results are from the mean results. Therefore, it is desirable that this criterion be minimal and it is also clear in the results that it was equal to 0.004 for WCC and LA algorithms and 0.002 for LCA and FOA algorithms. The ER-CI sub-criteria represents a range of values, and results are expected to fall within this range with a maximum specific probability. To achieve increased accuracy, this criterion was repeated twice. In both times, almost the same result was reported and again the two top algorithms LCA and FOA had the lowest value.

ER-P sub-criteria is one of the most important parameters in evaluating the results of models and algorithms. The P-Value expresses the similarity of the obtained results with random values. An algorithm with a minimum P-Value is more reliable than others. It is also clear in the results that all four algorithms were reliable and the reason for that is the tendency of the P-Value to be zero. The ER-TS sub-criteria is usually used to reject or accept a null hypothesis. When TS is maximum, P value is minimum. It is interesting to note that the criterion of CR is excellent for all algorithms and its value was obtained for four algorithms WCC, LCA, LA, and FOA equal to 0.9345, 0.9365, 0.9165, and 0.9378 respectively. This shows the ability and accuracy of these algorithms, especially the two LCA and FOA algorithms. Other sub-criteria which were examined and analyzed in the error section were also repeated for correlation (CR) and the trend of the results was the same as the error sub-criteria (ER). Due to the excessive volume of the article, further explanations are avoided (Table 4). The results of this section with the findings of Schubert et al.¹⁰⁸, Panday et al.¹⁰⁹, and Masoudi-Sobhanzadeh et al.⁴⁵ are consistent.

Also, in Table 5, the results of the evaluation criteria for selecting the most important features from among the features before the design phase affecting the total cost of drip irrigation systems (18 features) are shown. Except for the NOF criterion, which for all algorithms (WCC, LCA, LA, and FOA) was 6 out of 18 features, in the rest of the criteria and sub-criteria, LCA and FOA algorithms had higher accuracy and correlation and less error. For the two selected LCA and FOA algorithms, the ET criterion was 344.5740 and 153.7386 s respectively, the ER criterion was 0.0006 for both algorithms, and the ER-STD sub-criteria was 0.0003 and 0.0003 respectively. Also, the ER-CI sub-criteria are 0.0008 and 0.0009 in the first iteration and 0.0010 and 0.0011 in the second iteration, the ER-P sub-criteria equals zero for both algorithms, and the ER-TS sub-criterion was 7706/18 and 2511/19 respectively, for algorithms of LCA and FOA. Finally, the CR criterion equal to 0.9541 was obtained for both algorithms. The same process that was followed in the error sub-criteria (ER), also applies to correlation (CR) (Table 5). The results of this section are consistent with the findings of other researchers^45,106.

Selecting the best features

After verifying (or testing) the results of the algorithms and checking the evaluation criteria, while confirming these results, the desired features were extracted (Table 6). Choosing a set of the most useful, best, effective, important, and accurate features that can be used to model the early stage cost of pressure irrigation systems, especially drip, was the main goal of this research. It is worth mentioning that in the feature selection section, among the total features (39 variables), the results of WCC, LCA, LA, and FOA algorithms were equal to 17, 11, 15, and 8, respectively. To select the feature from among the features before the design phase (18 variables), WCC, LCA, LA, and FOA algorithms all agreed on six features. Based on this and in the last step, the best features in each section were determined. These features were obtained by combining the common features in each section. From an expert view, it can be said that the selected features in both sectors are very vital in the design and implementation of irrigation systems and play an essential role in costs. Even a non-professional person or an expert with little-experience in consulting engineering companies can confirm this claim.

Table 6 Summary of the results of different feature selection methods to identify features that affect the cost of drip irrigation systems.

Full size table

What was important in this research was the application of new feature selection methods, for which coding was used in MATLAB and Python software. For this reason, different feature selection algorithms, models, and methods were trained and tested. Finally, the best features are extracted and presented in Table 6. Also, a summary of the results of different feature selection methods is provided for comparison and observation. It can be seen that by changing the feature selection methods and using more and better algorithms and models, an evolutionary process has been completed and a better result has been obtained. The RMSE and R² evaluation criteria show these changes well. Finally, the feature selection result of the FeatureSelect method in two separate sections (selection from all features and features before the design phase) was selected as the best method for cost modeling of drip irrigation systems in Iran.

Cost modeling with artificial intelligence algorithms

After selecting the most effective features on the cost of different parts of drip irrigation systems, identified in the last two rows of Table 7 in the form of two parts of all features and features before the design phase, cost modeling was done. In general, six main models covering all statistical methods, neural network (NN), artificial intelligence (AI), and machine learning (ML) were developed for this work. These include MLR, DT, DL, GP, ANN, and SVM. Of course, the ANN model has four subsections: MLP, RBF, GRNN, and ANFIS, which are discussed more during economic modeling.

Table 7 The results of the relationship between selected features and the cost of different parts of drip irrigation systems.

Full size table

The beginning of the results of this research was done by examining the correlation between the input variables and the cost of different parts. In this section, after applying different feature selection methods and identifying the most effective features on the cost of drip irrigation systems, the relationship between independent variables (features) and dependent variables (costs) is again presented in Table 7. Finally, to summarize the selected features, we can pay attention to the type of feature, the R² evaluation criterion, and the cost part.

Cost modeling based on all features

First, cost modeling was done based on all features (39 variables affecting the costs of drip irrigation systems). At this stage, 70% of the data was used for training and the remaining 30% of the data was used for testing artificial intelligence and machine learning models. The statistical evaluation of the extracted models showed that neural networks (ANN) and support vector machines (SVM) were among the best models and provided the best statistical indicators. The results of the evaluation criteria presented in Table 8 show the acceptable accuracy of the algorithms used in the cost modeling of different parts. However, the excellent correlation, high accuracy, and the least error are related to ANN and SVM models in both the training and testing stages. The evaluation criteria of the best model, i.e. ANN, in the cost of the pumping station and central control system (TC_P) part in the training phase are equal to R² = 0.877, RMSE = 0.009, and VE = 0.093, respectively, and in the testing phase, respectively, equal to R² = 0.847, RMSE = 0.010 and VE = 0.113 were obtained. The R² criterion, which indicates the validity of the models; Along with the small and near-zero error of the RMSE and VE criteria, indicates that the ANN model is better trained than the others and has estimated the cost of the TC_P section more accurately than other models in the test phase. Of course, it is worth mentioning that the SVM model also achieved excellent results and is close to the ANN model and should not be neglected. In estimating the cost of the TC_P segment, the least accurate model was DT, followed by MLR in both the training and testing stages.

Table 8 Results of cost modeling evaluation criteria for drip irrigation systems using all features in different parts.

Full size table

Except for the TC_P cost part, where the ANN model was recognized as the least error model, in the cost parts of on-farm equipment (TC_F), in-farm installation and operation of on-farm and pumping station (TC_I), and total cost (TC_T); The SVM model achieved the best result. This goes back to the type and structure of the model because it uses a kernel function to move the data to a higher space and separate them with a page. It is worth mentioning that the results of SVM and ANN models are very close to each other and it is impossible to distinguish between the evaluation criteria. Still, models such as MLR and DT always recorded a great distance from the results of other models. In most cases, the DL and GP models have been moderate. The numerical value of the evaluation criteria of the superior model (SVM) in estimating the cost of the TC_F part in the two stages of training and testing was equal to R² = 0.923, 0.893, RMSE = 0.008, 0.009, and VE = 0.082, 0.102, respectively. The same process was repeated in the other two parts of the cost, namely TC_I and TC_T. In general, in modeling the cost of drip irrigation systems using all the features, i.e. 39 variables, the SVM model had the best results. Also, the best criteria were obtained in the TC_F, TC_T, TC_I, and TC_F parts, respectively (Table 8).

To summarize the findings of this section, it should be stated that the evaluation criteria generally showed better results during the training phase than during the testing stage. Although the evaluation criteria for each model show a lack of improvement in only one testing stage, most of the models showed good statistical performance during the training and testing stages. In this way, according to the acceptable values of the models within the permissible range of error and accuracy, it is qualified to be used to estimate the costs of drip irrigation systems. However, to distinguish the performance of the models, it should be said that the two ANN models, and especially the SVM, achieved the best results in Part I (cost modeling based on all features). Even without reasoning, the best models were present in the research during the training and testing stages. Therefore, this study recommends future researchers apply and develop the SVM model in estimating and predicting the cost of drip irrigation systems for modeling purposes. In addition to the SVM model, ANN was the second-best model according to its performance in the training and testing phase. Apart from the two MLR and DT models whose results were moderate, the two DL and GP models (GA and GEP); if it is combined with other models and form a hybrid model, their results will be promising and those models can be developed and used. The results of this section are consistent with the findings of Aghelpour et al.¹¹⁰ and Elbeltagi et al.¹¹¹.

Following and completing the conclusions obtained from the statistical analysis of Table 8, Fig. 17 was presented. In this figure, the observed/actual and estimated/predicted values of different parts of the cost of drip irrigation systems in Iran were prepared based on the results of the best models in two stages of training and testing. In general, and based on the evaluation criteria of the models in the training and testing stages, this section is trying to show the performance of the top models in the testing stage in a special way. These results were obtained based on the use of all features (39 variables) and their modeling using artificial intelligence and machine learning algorithms. The scatter diagrams Fig. 17 show that in all cost parts, the slope of the fitted regression line between the observed and estimated values is very small and is in agreement with the X = Y line (angle bisector of the first and third quadrants). Also, the desired points are well centered on their regression line, and this focus is mostly on the graph related to the TC_F part (R² = 0.923). On the other hand, the scatter plots of Fig. 17 show that the best performance is related to the predictions of the SVM model with the highest R² coefficient in the cost of on-farm equipment (TC_F) part. Accordingly, the ANN model with MLP architecture in the TC_P section, the RBF-type SVM model in the TC_F section, the RBF-type SVM model in the TC_I section, and finally the Sigmoid-type SVM model in the TC_T section, had the best results and they were chosen as the best models.

Cost modeling based on features before the design phase

Since the final goal of this research is to build comprehensive software to estimate the cost of different parts of the drip irrigation system based on the minimum features, the results of this part are more important. Naturally, using all features achieves better results, and the best evaluation criteria are obtained. However, the inflection is where it is possible to perform the best cost modeling by identifying the most effective features that are readily available and accessible to everyone. The results of such modeling are exactly suitable for making software because it requires the least input and will provide appropriate results.

Based on the above explanation, the evaluation criteria result of cost modeling drip irrigation systems using features before the design phase (18 variables) in different cost parts are presented in Table 9. By using the features before the design phase in the TCP and TCF section, two ANN and SVM models showed the best results (maximum of correlation and accuracy, and minimum error). The evaluation criteria of the training phase in the TC_P part and for the ANN model were achieved equal to R² = 0.867, RMSE = 0.010, and VE = 0.103 respectively, and in the TC_F part and for the SVM model are equal to R² = 0.912, RMSE = 0.008, and VE = 0.083 respectively. The same results of modeling the cost of TC_P and TC_F parts in the testing phase (2017–2022) were obtained with R² = 0.837, 0.882, RMSE = 0.011, 0.009, and VE = 0.123, 0.103 respectively for ANN and SVM models. The models that provided average and intermediate results were DL and GP, and in all cost parts, the two models of MLR and DT had the weakest evaluation criteria. Finally, in the TC_I and TC_T parts, the ANN model was the best; after that, the SVM had the best evaluation criteria. Therefore, the ANN model or its combination with SVM can be used to develop the results and build software.

Table 9 Results of cost modeling evaluation criteria for drip irrigation systems using features before the design phase in different parts.

Full size table

Another point that can be obtained from Table 9 is the extraction of the range of evaluation criteria changes. In the training phase, the range of R²criterion changes (between the worst and the best model) was from a minimum of 17 to a maximum of 30%, which fluctuated between 26.3–38.7% for the testing phase. This process was repeated for other evaluation criteria. Based on all the above-mentioned contents, a model should be selected that has the minimum error criterion and the maximum accuracy and correlation. Otherwise, the results cannot be developed and generalized, nor can a model be prepared from it. Because the superior model must have all its optimal parameters with an evolutionary process and its evaluation criteria should fluctuate within a certain range. These findings are consistent with the results of other researchers^20,26. Also, Sharma et al.¹¹², conducted a systematic literature review to explore the application of Machine Learning (ML) and Deep Learning (DL) in monitoring and diagnosing rice crop health and disorders. Their study, spanning 91 articles (2013–2023), highlights the strength of these advanced techniques in addressing critical challenges like disease detection and nutrient deficiency diagnosis. ML/DL models enable accurate classification, efficient segmentation, and effective feature selection, providing a robust framework for enhancing rice crop productivity.

Figure 18 shows the scatter diagrams between the observed and estimated cost of drip irrigation systems based on the best models in the features section before the design phase. This section was presented to complete the results of statistical analysis and based on the results of the best models in two training and testing phases, focusing on revealing the performance of the best models in the testing phase. What is understood from the results is that when using the features before the design phase (selection of 7 out of 18 variables), the accuracy of the top models is still high (compared to the selection of 10 out of 39 variables) and its error is minimal. The important thing to mention is that usually when using features and less data for modeling, they accept some error so that models are simpler and access to input features is minimal so that they can be easily achieved. However, the results of Fig. 18 and Table 9 showed that not only the superior models have no weaknesses, but also have appropriate evaluation criteria, such as the use of all features. Therefore, cost modeling of drip irrigation systems was done based on the best artificial intelligence and machine learning models using the least features. The results of the scatter diagrams Fig. 18 show that in all cost parts, the estimated values are all around the axis of the angle bisector of the first and third quadrants (X = Y line) and the slope of the fitting line is minimal. The best model in this section is ANN, which was repeated three times in four cost parts, and its highest accuracy was obtained in the TC_F section (with R² = 0.912). In a general summary and to summarize the results of the scatter diagrams, it should be stated that the ANN model of the GRNN-type in the TC_P part, the SVM model of the RBF-type in the TCF part, the ANN model of the GRNN-type in the TC_I part, and finally the ANN model of the MLP-type in The TC_T part obtained the best results and were selected as the top models. Finally, it can be recommended that the ANN model is a suitable model for modeling and software development/making due to providing the most accurate results as well as repeating it in different parts of the cost.

Choosing the optimal combination of parameters of selected models

The summary of the results of the optimal parameters of the top models in four cost sections including TC_P, TC_F, TC_I and TC_T is presented in Table 10. Based on this and when using all the features (39 variables), the SVM model was the best and the most frequent, and the ANN model achieved the best result in the TC_P cost section. Using the features before the design phase (18 variables), two ANN and SVM models were superior, and the optimal parameters for these models can be seen in Table 10. Other researchers also achieved such optimal parameters when using machine learning algorithms and artificial intelligence models^{45,62,104,105,106}.

Table 10 Optimal parameters of selected models used to estimate the costs of drip irrigation systems.

Full size table

Conclusion

Estimation of construction costs is generally based on qualitative criteria derived from the experience of experts, which are not very practical due to the impossibility of direct use in mathematical models. Traditionally, a modeler has to use trial and error to build mathematical models such as artificial neural networks for different input combinations, which is very time-consuming because the modeler needs to train and test different models with all possible input combinations. Various machine learning and artificial intelligence algorithms make it possible to identify the best data with the highest evaluation criteria and achieve accurate and low-error modeling by spending the least time and cost. The modeling technique also makes it possible to estimate any output based on the input variables with accurate data and knowledge of the relationships between the characteristics and get an acceptable result. Therefore, using knowledge-based and numerical methods, which are difficult to access all the required parameters, or require a lot of time and money to measure them, have received less attention from researchers. In contrast, data-based computational intelligence models are used, which have high accuracy and reliability and require fewer and more accessible input parameters. In this regard, research was conducted to use machine learning algorithms and feature selection to model the early-stage cost of drip irrigation systems in Iran. The result of this extensive study showed that:

The results of different feature selection methods showed that the FeatureSelect tool is one of the best approaches for feature selection from a large number of feature sets, and the results of this tool make artificial intelligence models deliver better output.
Generally; the accuracy of the evaluation criteria of the models in the section of all features (10 out of 39) was better than the features before the design phase (7 out of 18). The reason is clear; more important variables were involved in estimating and modeling.
In both sectors, all models have better estimated the total cost and the farm because the characteristics have a higher correlation with the total cost and this was seen in the results. Of course, the accuracy of the modeling of different cost parts was also appropriate; it means close correlations and small errors.
It is natural that the evaluation criteria are higher in the training phase, but its insignificant difference with the test phase showed the accurate learning of the models and its acceptable estimation and modeling.
As an important point, it should be said that with the features before the design phase, accurate modeling of the cost of pressurized irrigation systems can be done.
The excellent results of the features section before the design stage indicated that suitable modeling can be achieved using the least features. Also, the cost estimation software for pressurized irrigation systems can be developed with the least input data.
Among the different data mining models, it showed that ANN and SVM models had the best estimates, and it is interesting that such results were obtained in most of the water and environmental science and engineering studies.

Currently, decisions by engineers, consultants, and policymakers to allocate budgets for pressurized irrigation systems are based on experience or traditional calculations. However, the results of the feature selection methods in this study showed that by simply having or knowing a few important parameters that affect the costs of drip irrigation systems, it is possible to have an accurate estimate of the costs before design and implementation. Also, cost modeling helps managers make the best decisions in different situations. The general results of this research also showed that for ease of modeling the cost of different parts of drip irrigation systems, we can rely on the results of this research and obtain optimal models. Then, by modeling the cost of irrigation systems before implementing a system, one can have a correct understanding of the amount of costs and properly manage the budgeting for credit allocation.

Data availability

The datasets generated and/or analysed during the current study are not publicly available due [REASON WHY DATA ARE NOT PUBLIC] but are available from the corresponding author on reasonable request. The datasets are not publicly accessible because this article was extracted from a doctoral dissertation. Because this work is still ongoing, the data cannot be released to the public.

References

Arora, S. & Mishra, N. Software cost estimation using artificial neural network. In Soft Computing: Theories and Applications 51–58 (Springer, 2018).
Chapter MATH Google Scholar
Mevellec, P. Cost systems: A new approach. Academia Letters, Article 858 (2021)
Arora, S. & Mishra, N. Software cost estimation using single layer artificial neural network. Int. J. Adv. Eng. Res. Sci. 4(9), 237250 (2017).
Article MATH Google Scholar
Sharma, A., Jain, A., Gupta, P. & Chowdary, V. Machine learning applications for precision agriculture: A comprehensive review. IEEE Access 9, 4843–4873 (2020).
Article MATH Google Scholar
Teksin, S., Azginoglu, N. & Akansu, S. O. Structure estimation of vertical axis wind turbine using artificial neural network. Alex. Eng. J. 61(1), 305–314 (2022).
Article Google Scholar
Elhag, T. M. S., & Boussabaine, A. H. An artificial neural system for cost estimation of construction projects. In 14th Annual ARCOM Conference. University of Reading: Association of Researchers in Construction Management 219–226 (1998)
Ahiaga-Dagbui, D. D., & Smith, S. D. Neural networks for modelling the final target cost of water projects (2012)
Elfaki, A. O., Alatawi, S. & Abushandi, E. Using intelligent techniques in construction project cost estimation: 10-year survey. Adv. Civil Eng. https://doi.org/10.1155/2014/107926 (2014).
Article MATH Google Scholar
Juszczyk, M., Leśniak, A. & Zima, K. ANN based approach for estimation of construction costs of sports fields. Complexity 2018, 1–11 (2018).
Article MATH Google Scholar
Roxas, C. L. C., & Ongpeng, J. M. C. An artificial neural network approach to structural cost estimation of building projects in the Philippines. Proc. DLSU Res. Congr. (2014)
Yadav, R., Vyas, M., Vyas, V. & Agrawal, S. Cost estimation model (CEM) for residential building using artificial neural network. Int. J. Eng. Res. Technol. (IJERT) 5(1), 430–432 (2016).
MATH Google Scholar
Leszczyński, Z. & Jasiński, T. An artificial neural networks approach to product cost estimation. The case study for electric motor. Informatyk Ekonomiczna 1(47), 72–84 (2018).
Article MATH Google Scholar
Sharma, M., Kumar, C. J. & Deka, A. Early diagnosis of rice plant disease using machine learning techniques. Arch. Phytopathol. Plant Prot. 55(3), 259–283 (2022).
Article MATH Google Scholar
Chandanshive, V. & Kambekar, A. R. Estimation of building construction cost using artificial neural networks. J. Soft Comput. Civil Eng. 3(1), 91–107 (2019).
Google Scholar
Omotayo, T., Bankole, A. & Olubunmi Olanipekun, A. An artificial neural network approach to predicting most applicable post-contract cost controlling techniques in construction projects. Appl. Sci. 10(15), 5171 (2020).
Article CAS MATH Google Scholar
Singh, D. & Singh, B. Feature wise normalization: An effective way of normalizing data. Pattern Recognit. 122, 108307 (2022).
Article MATH Google Scholar
Sharma, M. & Kumar, C. J. Improving rice disease diagnosis using ensemble transfer learning techniques. Int. J. Artif. Intell. Tools 31(08), 2250040 (2022).
Article MATH Google Scholar
Kiani, A. & Shaker, M. Evaluating the effectiveness of pressurized irrigation systems in Iran. Water Manag. Agric. 8(2), 167–182 (2022) ((In Persian)).
MATH Google Scholar
Arafa, M. & Alqedra, M. Early stage cost estimation of buildings construction projects using artificial neural networks. J. Artif. Intell. 4(1), 63–75 (2011).
Article MATH Google Scholar
Matel, E., Vahdatikhaki, F., Hosseinyalamdary, S., Evers, T. & Voordijk, H. An artificial neural network approach for cost estimation of engineering services. Int. J. Constr. Manag. 22(7), 1274–1287 (2022).
Google Scholar
Pettang, C., Mbumbia, L. & Foudjet, A. Estimating building materials cost in urban housing construction projects, based on matrix calculation: The case of Cameroon. Constr. Build. Mater. 11(1), 47–55 (1997).
Article Google Scholar
Zhang, Y. F. & Fuh, J. Y. H. A neural network approach for early cost estimation of packaging products. Comput. Ind. Eng. 34(2), 433–450 (1998).
Article MATH Google Scholar
YousefiNajafabadiTohidi, H. N. M. G. S. H. E. A new well-balanced spectral volume method for solving shallow water equations over variable bed topography with wetting and drying. Eng. Comput. 39(5), 3099–3130 (2023).
Article MATH Google Scholar
Islam, A. et al. Hydro-chemical characterization and irrigation suitability assessment of a tropical decaying river in India. Sci. Rep. 14(1), 20096 (2024).
Article MathSciNet CAS PubMed PubMed Central MATH Google Scholar
Ekung, S., Lashinde, A. & Adu, E. Critical risks to construction cost estimation. J. Eng. Proj. Prod. Manag. 11(1), 19–29 (2021).
Google Scholar
Waliulu, Y. E. P. R. & Adi, T. J. W. A system dynamic thinking for modeling infrastructure project duration acceleration. Proc. Comput. Sci. 197, 420–427 (2022).
Article MATH Google Scholar
Alshahethi, A. A. A. & Radhika, K. L. Estimating the final cost of construction project using neural networks: A case of yemen construction projects. Int. J. Res. Appl. Sci. Eng. Technol. 6(11), 2141–2151 (2018).
MATH Google Scholar
Arage, S. S., & Dharwadkar, N. V. Cost estimation of civil construction projects using machine learning paradigm. In 2017 International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC) (pp. 594–599). IEEE (2017)
Ashrafi, A., Ebrahimian, H., Maarefi, T., Dehghanisanij, H. & Sharifi, M. White water footprint: Valuable subdivision in water footprint. Water Int. 49(7), 849–851 (2024).
Article Google Scholar
Babaei, M., Rashidi-baqhi, A. & Rashidi, M. Estimating project cost under uncertainty using universal generating function method. J. Constr. Eng. Manage. 148(2), 04021194 (2022).
Article MATH Google Scholar
Karbachevsky, A. et al. Early-stage neural network hardware performance analysis. Sustainability 13(2), 717 (2021).
Article Google Scholar
Cheng, M. Y., Tsai, H. C. & Sudjono, E. Conceptual cost estimates using evolutionary fuzzy hybrid neural network for projects in construction industry. Expert Syst. Appl. 37(6), 4224–4231 (2010).
Article MATH Google Scholar
Lester, E. I. A. Estimating. In Project management, planning and control 61–65 (Elsevier, 2017).
Chapter MATH Google Scholar
NASA Executive Cost Analysis Steering Group. Vol. 63(4), 52 pp. (NASA cost estimating handbook. NASA: Washington, DC, USA, 2015).
Gransberg, D. D. & Rueda, J. A. Construction equipment management for engineers, estimators, and owners (CRC Press, 2020).
Book MATH Google Scholar
Pourgholam-Amiji, M., Liaghat, A. & Ahmadaali, K. Early stage cost modeling of drip irrigation systems. Irrig. Drain. Struct Eng. Res. 22(82), 1–22 (2021) ((In Persian)).
Google Scholar
Chandrashekar, G. & Sahin, F. A survey on feature selection methods. Comput. Electr. Eng. 40(1), 16–28 (2014).
Article MATH Google Scholar
Miao, J. & Niu, L. A survey on feature selection. Proc. Comput. Sci. 91, 919–926 (2016).
Article MATH Google Scholar
Pourgholam-Amiji, M., Ahmadaali, K. & Liaghat, A. Identifying the features affecting the cost of drip irrigation systems using feature selection methods. J. Water Res. Agric. 36(4), 421–440 (2023) ((In Persian)).
Google Scholar
Solorio-Fernández, S., Carrasco-Ochoa, J. A. & Martínez-Trinidad, J. F. A review of unsupervised feature selection methods. Artif. Intell. Rev. 53(2), 907–948 (2020).
Article MATH Google Scholar
Liu, J., Lin, Y., Lin, M., Wu, S. & Zhang, J. Feature selection based on quality of information. Neurocomputing 225, 11–22 (2017).
Article MATH Google Scholar
Pazoki, M., Yadav, A. & Abdelaziz, A. Y. Pattern-recognition methods for decision-making in protection of transmission lines. In Decision making applications in modern power systems 441–472 (Academic Press, 2020).
Chapter Google Scholar
Talukdar, S. et al. Coupling geographic information system integrated fuzzy logic-analytical hierarchy process with global and machine learning based sensitivity analysis for agricultural suitability mapping. Agricu. Syst. 196, 103343 (2022).
Article MATH Google Scholar
Park, C. S. Fundamentals of Engineering Economics. Chan S. Park. Pearson Education (2012)
Masoudi-Sobhanzadeh, Y., Motieghader, H. & Masoudi-Nejad, A. FeatureSelect: A software for feature selection based on machine learning approaches. BMC Bioinform. 20(1), 170 (2019).
Article Google Scholar
Dickinson, R. P. & Gelinas, R. J. Sensitivity analysis of ordinary differential equation systems—a direct method. J. Comput. Phys. 21(2), 123–143 (1976).
Article ADS MathSciNet MATH Google Scholar
Saltelli, A. et al. Why so many published sensitivity analyses are false: A systematic review of sensitivity analysis practices. Environ. Model. Softw. 114, 29–39 (2019).
Article MATH Google Scholar
Ghaddar, B. & Naoum-Sawaya, J. High dimensional data classification and feature selection using support vector machines. Eur. J. Oper. Res. 265(3), 993–1004 (2018).
Article MathSciNet MATH Google Scholar
Rahmaninia, M. & Moradi, P. OSFSMI: Online stream feature selection method based on mutual information. Appl. Soft Comput. 68, 733–746 (2018).
Article Google Scholar
Schmidt, M. & Lipson, H. Distilling free-form natural laws from experimental data. Science 324(5923), 81–85 (2009).
Article ADS CAS PubMed MATH Google Scholar
Pourgholam-Amiji, M., Ahmadaali, K. & Liaghat, A. Sensitivity analysis of parameters affecting the early cost of drip irrigation systems using meta-heuristic algorithms. Iran. J. Irrig. Drain. 15(4), 737–756 (2021) ((In Persian)).
Google Scholar
Koncar, N. Optimisation methodologies for direct inverse neurocontrol (Doctoral dissertation, University of London, 1997).
Stefánsson, A., Končar, N. & Jones, A. J. A note on the gamma test. Neur. Comput. Appl. 5(3), 131–133 (1997).
Article MATH Google Scholar
Durrant, P. J. (2001). winGamma: A non-linear data analysis and modelling tool with applications to flood prediction. Unpublished PhD thesis, Department of Computer Science, Cardiff University, Wales, UK.
Tsui, A. P., Jones, A. J. & Guedes de Oliveira, A. The construction of smooth models using irregular embeddings determined by a gamma test analysis. Neur. Comput. Appl. 10(4), 318–329 (2002).
Article MATH Google Scholar
Otani, M., & Jones, A. J. Guiding chaotic orbits. Research Report, Imperial College of Science Technology and Medicine, 130 (1997)
Alsahaf, A., Petkov, N., Shenoy, V. & Azzopardi, G. A framework for feature selection through boosting. Exp. Syst. Appl. 187, 115895 (2022).
Article MATH Google Scholar
De Gregorio, G., Della Cioppa, A., & Marcelli, A. Negative Selection Algorithm for Alzheimer’s Diagnosis: Design and Performance Evaluation. In International Conference on the Applications of Evolutionary Computation (Part of EvoStar) (pp. 531–546). (Springer, Cham, 2022)
Ferrato Melo de Carvalho, L. V. (2022). Machine Learning in Poultry Companies’ Data. Applications and Methodologies. North Carolina State University. Ph.D. dissertation, 139.
Arefinia, A., Bozorg-Haddad, O. & Chang, H. The role of data mining in water resources management. In Essential Tools for Water Resources Analysis Planning and Management 85–99 (Singapore: Springer, 2021).
Chapter Google Scholar
Arefinia, A. et al. Estimation of geographical variations in virtual water content and crop yield under climate change: Comparison of three data mining approaches. Environ, Dev. Sustain. 24(6), 8378–8396 (2022).
Article Google Scholar
Ogbu, A. D., Iwe, K. A., Ozowe, W. & Ikevuje, A. H. Advances in machine learning-driven pore pressure prediction in complex geological settings. Comput. Sci. IT Res. J. 5(7), 1648–1665 (2024).
Article Google Scholar
Sarzaeim, P., Bozorg-Haddad, O., Bozorgi, A. & Loáiciga, H. A. Runoff projection under climate change conditions with data-mining methods. J. Irrig. Drain. Eng. 143(8), 04017026 (2017).
Article MATH Google Scholar
Naseem, I., Togneri, R. & Bennamoun, M. Linear regression for face recognition. IEEE Trans. Pattern Anal. Mach Intell 32(11), 2106–2112 (2010).
Article PubMed MATH Google Scholar
Welham, S. J., Gezan, S. A., Clark, S. J. & Mead, A. Statistical methods in biology: design and analysis of experiments and regression (CRC Press, 2014).
Book MATH Google Scholar
Young, D. S. Handbook of regression methods (CRC Press, 2018).
Book MATH Google Scholar
Balan, B., Mohaghegh, S. & Ameri, S. State-of-the-art in permeability determination from well log data: Part 1-A comparative study, model development (Society of Petroleum Engineers, 1995).
MATH Google Scholar
Cortes, C. & Vapnik, V. Support-vector networks. Machine learning 20(3), 273–297 (1995).
Article MATH Google Scholar
Deka, P. C. Support vector machine applications in the field of hydrology: A review. Appl. Soft Comput. 19, 372–386 (2014).
Article MATH Google Scholar
Hamel, L. H. Knowledge discovery with support vector machines (John Wiley & Sons, 2011).
MATH Google Scholar
Awad, M. & Khanna, R. Support vector regression. In Efficient learning machines (eds Awad, M. & Khanna, R.) 67–80 (Apress, Berkeley, 2015).
Chapter MATH Google Scholar
Drucker, H., Burges, C. J., Kaufman, L., Smola, A. J. & Vapnik, V. Support vector regression machines. Adv. Neur. Inform. Process. Syst. 9, 155–161 (1997).
MATH Google Scholar
McCulloch, W. S. & Pitts, W. A logical calculus of the ideas immanent in nervous activity. Bull. Math. Biophys. 5(4), 115–133 (1943).
Article MathSciNet MATH Google Scholar
Rosenblatt, F. Principles of Neurodynamics" (Spartan Books, 1962).
MATH Google Scholar
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. Learning internal representations by error propagation (No. ICS-8506). California Univ San Diego La Jolla Inst for Cognitive Science (1985)
Pal, S. K. & Mitra, S. Multilayer perceptron, fuzzy sets, classification. IEEE Trans. Neur. Netw. 10(1109/72), 159058 (1992).
MATH Google Scholar
Ahmadaali, K., Liaghat, A., Heydari, N. & Bozorg-Haddad, O. Application of artificial neural network and adaptive neural-based fuzzy inference system techniques in estimating of virtual water. Int. J. Comput. Appl. 76, 12–19 (2013).
Google Scholar
Chen, S., Cowan, C. F. N. & Grant, P. M. Orthogonal least squares learning algorithm for radial basis function networks. IEEE Trans. Neural Netw. 2(2), 302–309 (1991).
Article CAS PubMed MATH Google Scholar
Specht, D. F. A general regression neural network. IEEE Trans. Neur. Netw. 2(6), 568–576 (1991).
Article CAS MATH Google Scholar
Jang, J. S. ANFIS: adaptive-network-based fuzzy inference system. IEEE Trans. Syst, Man, Cybern. 23(3), 665–685 (1993).
Article MATH Google Scholar
Bengio, Y., Goodfellow, I. & Courville, A. Deep learning Vol. 1 (MIT press, 2017).
MATH Google Scholar
Eilschou, A. Deep learning?. BMJ Br. Med. J 319(7209), 1–16 (2014).
MATH Google Scholar
Huang, G. B. What are extreme learning machines? Filling the gap between Frank Rosenblatt’s dream and John von Neumann’s puzzle. Cogn. Comput. 7(3), 263–278 (2015).
Article MATH Google Scholar
Tappert, C. C. Who is the father of deep learning?. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (pp. 343–348). IEEE. (2019)
Buduma, N., Buduma, N. & Papa, J. Fundamentals of deep learning (O’Reilly Media, Inc, 2022).
MATH Google Scholar
Kamilaris, A. & Prenafeta-Boldú, F. X. Deep learning in agriculture: A survey. Comput. Electr. Agric. 147, 70–90 (2018).
Article Google Scholar
LeCun, Y., Bengio, Y. & Hinton, G. Deep learning. Nature 521(7553), 436–444 (2015).
Article ADS CAS PubMed MATH Google Scholar
Schmidhuber, J. Deep learning in neural networks: An overview. Neur. Netw. 61, 85–117 (2015).
Article MATH Google Scholar
Ferreira, C. Gene expression programming: a new adaptive algorithm for solving problems. arXiv preprint cs/0102027 (2001)
Nourani, V., Baghanam, A. H., Adamowski, J. & Kisi, O. Applications of hybrid wavelet–artificial intelligence models in hydrology: A review. J. Hydrol. 514, 358–377 (2014).
Article MATH Google Scholar
Banzhaf, W., Nordin, P., Keller, R. E. & Francone, F. D. Genetic programming 512 (Springer, 1998).
Book MATH Google Scholar
Dananndeh Mehr, A. D. et al. Genetic programming in water resources engineering: A state-of-the-art review. J. hydrol. 566, 643–667 (2018).
Article ADS Google Scholar
Shiri, J. & KişI, Ö. Comparison of genetic programming with neuro-fuzzy systems for predicting short-term water table depth fluctuations. Comput. Geosci. 37(10), 1692–1701 (2011).
Article ADS MATH Google Scholar
Rokach, L. & Maimon, O. Decision trees. In Data mining and knowledge discovery handbook 165–192 (Springer, 2005).
Chapter MATH Google Scholar
Loh, W. Y. Classification and regression trees. Wiley Interdiscip. Rev.: Data Min. Knowl. Discov. 1(1), 14–23 (2011).
MATH Google Scholar
Steinberg, D. & Colla, P. CART: Classification and regression trees. In The top ten algorithms in data mining (eds Wu, X. & Kumar, V.) 179 (Chapman Hall/CRC, 2009).
Chapter MATH Google Scholar
Breiman, L., Friedman, J., Stone, C. J. & Olshen, R. A. Classification and regression trees (CRC Press, 1984).
MATH Google Scholar
Drucker, H. & Cortes, C. Boosting decision trees. Adv. Neur. Inform. Process. Syst. 8, 479–485 (1996).
MATH Google Scholar
Gonzalez, O., O’Rourke, H. P., Wurpts, I. C. & Grimm, K. J. Analyzing Monte Carlo simulation studies with classification and regression trees. Struct. Equ. Model.: A Multidiscip. J. 25(3), 403–413 (2018).
Article MathSciNet Google Scholar
Enayati, M., Bozorg-Haddad, O., Pourgholam-Amiji, M., Zolghadr-Asli, B. & Tahmasebi Nasab, M. Decision tree (DT): A valuable tool for water resources engineering. In Computational Intelligence for Water and Environmental Sciences 201–223 (Singapore: Springer Nature, 2022).
Chapter Google Scholar
Norvig, P. R. & Intelligence, S. A. A modern approach. Prentice Hall Upper Saddle River, NJ, USA: Rani, M., Nayak, R., & Vyas, OP (2015). An ontology-based adaptive personalized e-learning system, assisted by software agents on cloud storage. Knowl.-Based Syst. 90, 33–48 (2002).
Google Scholar
Winston, P. H. Artificial intelligence (Addison-Wesley Longman Publishing Co., 1984).
MATH Google Scholar
Masoudi-Sobhanzadeh, Y. & Motieghader, H. World competitive contests (WCC) algorithm: A novel intelligent optimization algorithm for biological and non-biological problems. Inform. Med. Unlocked 3, 15–28 (2016).
Article PubMed PubMed Central Google Scholar
Kashan, A. H. League championship algorithm (LCA): An algorithm for global optimization inspired by sport championships. Appl. Soft Comput. 16, 171–200 (2014).
Article MATH Google Scholar
Alweshah, M. Solving feature selection problems by combining mutation and crossover operations with the monarch butterfly optimization algorithm. Appl. Intell. 51(6), 4058–4081 (2021).
Article MATH Google Scholar
Ghaemi, M. & Feizi-Derakhshi, M. R. Feature selection using forest optimization algorithm. Pattern Recogn. 60, 121–129 (2016).
Article ADS Google Scholar
Rastegar, R., Rahmati, M. & Meybodi, M. R. A clustering algorithm using cellular learning automata based evolutionary algorithm. In Adaptive and Natural Computing Algorithms 144–150 (Springer, 2005).
Chapter MATH Google Scholar
Schubert, A. L., Hagemann, D., Voss, A. & Bergmann, K. Evaluating the model fit of diffusion models with the root mean square error of approximation. J. Math. Psychol. 77, 29–45 (2017).
Article MathSciNet MATH Google Scholar
Panday, D., de Amorim, R. C. & Lane, P. Feature weighting as a tool for unsupervised feature selection. Inform. Process. Let. 129, 44–52 (2018).
Article MathSciNet MATH Google Scholar
Aghelpour, P., Varshavian, V., Khodamorad Pour, M. & Hamedi, Z. Comparing three types of data-driven models for monthly evapotranspiration prediction under heterogeneous climatic conditions. Sci. Rep. 12(1), 17363 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Elbeltagi, A. et al. Forecasting vapor pressure deficit for agricultural water management using machine learning in semi-arid environments. Agric. Water Manag. 283, 108302 (2023).
Article MATH Google Scholar
Sharma, M., Kumar, C. J. & Bhattacharyya, D. K. Machine/deep learning techniques for disease and nutrient deficiency disorder diagnosis in rice crops: A systematic review. Biosyst. Eng. 244, 77–92 (2024).
Article CAS MATH Google Scholar

Download references

Acknowledgements

The present study results from several years of effort by the authors to prepare data, apply software, and models analyze, code, interpret, and then write it. On the other hand, reviewing the sources revealed that such a study was required and prompted the authors to address this issue and submit the research results to the “Scientific Reports” as a research article. This article is extracted from the Ph.D. dissertation, the first author of the article. For this purpose, the authors of the article would like to thank the Department of Irrigation and Reclamation Engineering, Faculty of Agriculture, College of Agriculture and Natural Resources, University of Tehran, Karaj, Iran for their great cooperation and for providing the relevant laboratories. So far, no such research has been done in the world. Also on behalf of all authors, the corresponding author states that there is no conflict of interest..

Author information

Authors and Affiliations

Department of Irrigation and Reclamation Engineering, Faculty of Agriculture, College of Agriculture and Natural Resources, University of Tehran, P. O. Box 4111, Karaj, 31587-77871, Iran
Masoud Pourgholam-Amiji, Khaled Ahmadaali & Abdolmajid Liaghat

Authors

Masoud Pourgholam-Amiji
View author publications
Search author on:PubMed Google Scholar
Khaled Ahmadaali
View author publications
Search author on:PubMed Google Scholar
Abdolmajid Liaghat
View author publications
Search author on:PubMed Google Scholar

Contributions

M.P.A: Data Preparation and Review, Software, Resources, Results Interpretation, Writing-Original Draft Preparation, Visualization. K.A: Methodology, Conceptualization, Formal Analysis and Investigation, Final Review. A.L: Results Interpretation, Review and Editing, Supervision, Final report review.

Corresponding author

Correspondence to Khaled Ahmadaali.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Correspondence and requests for materials should be addressed to K.A.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Pourgholam-Amiji, M., Ahmadaali, K. & Liaghat, A. A novel early stage drip irrigation system cost estimation model based on management and environmental variables. Sci Rep 15, 4089 (2025). https://doi.org/10.1038/s41598-025-88446-x

Download citation

Received: 21 September 2024
Accepted: 28 January 2025
Published: 03 February 2025
Version of record: 03 February 2025
DOI: https://doi.org/10.1038/s41598-025-88446-x

Keywords

This article is cited by

Risk-indexed artificial neural network for predicting duration and cost of irrigation canal-lining projects using survey-based calibration and python validation
- Boshra Taha
- Ahmed H. Ibrahim
- Asmaa A. Soliman
Scientific Reports (2025)

Subjects

Abstract

Similar content being viewed by others

Automated smart drip irrigation system in internet of things using adaptive residual hybrid network for precision farming

Risk-indexed artificial neural network for predicting duration and cost of irrigation canal-lining projects using survey-based calibration and python validation

Integrated strategic planning and multi-criteria decision-making framework with its application to agricultural water management

Introduction

Background research

Definition of the problem

The importance of pressurized irrigation systems

Cost estimation

Feature selection

Summaries of the previous studies and innovation this research

Materials and methods

Collecting the required data

Project cost updating and data preprocessing

Feature selection technique

Feature selection methods

Eureka formulaize

winGamma

Featurewiz

FeatureSelect

Different data mining algorithms

MLR

SVR

ANN

MLP

RBF

GRNN

ANFIS

DL

GEP and GP

DT

Hyperparameter setting

Evaluation criteria

Results and discussion

Examining the correlation between variables

Analysis of cost to area ratio of projects

Selection of features

Evaluation of feature selection algorithms in the train phase

Results of selected algorithms

Evaluation of algorithms in the test phase

Selecting the best features

Cost modeling with artificial intelligence algorithms

Cost modeling based on all features

Cost modeling based on features before the design phase

Choosing the optimal combination of parameters of selected models

Conclusion

Data availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

This article is cited by

Risk-indexed artificial neural network for predicting duration and cost of irrigation canal-lining projects using survey-based calibration and python validation

Search

Quick links