Abstract
The purpose of this study is to develop and validate a risk-driven predictive model for estimating project duration and cost in irrigation canal lining projects, where uncertainties often lead to delays and budget overruns. Ninety-three factors were first reduced to twenty using AHP–RII (Cronbach’s \(\alpha = 0.954\)). A multi-layer perceptron (128–64–32, ReLU, Adam, early stopping) was trained on 5000 simulated scenarios and validated on eight projects with leave-one-project-out cross-validation. The model had \(R^2 = 0.92\) (training), 0.82 (testing), and made errors within the limits of 0.87 months (time) and EGP 102,500 (cost) on average.The developed model was deployed as a Python-based desktop application, enabling engineers and planners to generate accurate time and cost forecasts during early project stages. This research introduces an integrated ANN-based framework that combines expert-driven risk assessment with machine learning, providing a practical decision-support tool for infrastructure projects.
Similar content being viewed by others
Introduction
Egyptian irrigation canal lining projects (EICLPs) are critical infrastructure projects under the national plan to modernize and conserve water resources in rural areas. These projects are typically characterized by complex cost structures, unpredictable timelines, and multiple risks, necessitating precise cost and time estimations. Conventionally, estimators have relied on intuition and generic percentages to allocate contingency funds, often overlooking specific risks and project attributes. Estimators must identify and evaluate the various elements that influence a project’s overall cost and duration in order to overcome these obstacles. Additionally, risk assessment determines the probability and possible outcomes of unfavorable events, enabling the anticipated impact to be included as a contingency in the original estimates1. Traditional estimation techniques, including the Critical Path Method (CPM), Linear Scheduling Method (LSM), and contingency-based approaches, largely depend on historical averages and subjective expert judgment. Although these methods provide baseline estimates, they fail to capture the nonlinear and interdependent nature of risks inherent in complex projects. As a result, their predictive capability under uncertainty remains limited, often leading to insufficient contingency planning and unexpected budget overruns2,3,4. Advances in artificial intelligence–especially Artificial Neural Networks (ANNs)–have opened new possibilities for modeling complex, nonlinear relationships among project variables without the need for strict assumptions5,6,7. Despite their strong predictive capabilities, most previous research in construction has addressed either time or cost in isolation, with limited efforts to integrate risk-related factors into a unified framework. In addition, only a small number of studies have attempted to implement these models in the form of practical, user-oriented tools, particularly within the domain of irrigation-related infrastructure projects8,9. To address this gap, this study develops and validates a risk-driven ANN-based model for predicting time and cost in irrigation canal lining projects. Using the Relative Importance Index (RII), twenty significant cost, time, and danger variables were found and added as inputs to a Multi-Layer Perceptron architecture. The model was trained on historical data and validated using eight real-world projects. Finally, the model was implemented in a Python-based application, enabling engineers and decision-makers to perform real-time forecasting during early project planning stages10. This study contributes to the existing body of knowledge by introducing an integrated ANN-based framework that simultaneously predicts both time and cost under risk conditions, a feature rarely addressed in previous research. Unlike earlier works that focused on either time or cost in isolation, our model incorporates twenty risk-driven factors to provide a holistic estimation approach. Furthermore, the development of a user-friendly Python desktop application bridges the gap between theory and practice, enabling real-time predictions during early project planning–an aspect that has not been sufficiently explored in similar studies. Traditional methods, such as CPM and PERT, are based on deterministic assumptions that don’t consider uncertainty and interdependencies which ultimately produces unrealistic project predictions on large projects11. While regression models can produce very useful studies they demand either large and clean datasets or deal poorly with non-linear risks. Monte Carlo does deal with probabilistic variance but again, relies upon using the right inputs, which makes it only useful for the environment in which accurate inputs can be established during the earliest stages of a project or one with a lot of noisy data12.
This research provides three main contributions:
-
1.
The new risk input design using RII-weighted inputs based on a structured stakeholder survey, which then reduced the 93 risk factors down to 20 tried and tested predictors.
-
2.
The multi-layer perceptron (MLP) model which featured a specific risk-to-base adjustment with calibrated factors without the usual RII dependent weighting, providing useful future performance prediction of duration and cost under risk.
-
3.
The creation of a practical Python/Tkinter desktop tool that incorporated sensitivity analysis, tornado visualization, and parameter-space exploration which served to bridge the theoretical modeling with the practical decision support needed in a public irrigation infrastructure project.
Literature review
Cost-related factors that affect construction projects
Cost-related factors have been identified in several studies as having a major influence on ICLP contingency estimates. By determining the essential cost parameters for Field Canal Improvement Projects (FCIPs), one study offered an extensive and useful conceptual cost-estimation model to help organizations plan and carry out irrigation improvements8. Another study used a prompt list to analyze the cost components of Continuous Flight Auger (CFA) piles, classified them using an affinity-diagramming technique, and created a comprehensive cost breakdown structure (CBS) for CFA construction13. A feasibility and economic analysis was conducted to determine the lining costs for each of the three engineering approaches that were suggested for the rehabilitation of Egyptian irrigation canals9. By analyzing the work into its main cost-driving components, another study looked at the costs of canal lining for various cross-sectional shapes14. Additionally, it was shown that precise cost estimation is largely dependent on the cross-sectional perimeter measurements of a channel15. In conclusion, a nonlinear optimization model was created to identify the smallest practical dimensions and incorporate design constraints in order to minimize canal-lining costs per unit length16.
Time-related factors that affecting construction projects
Time-related factors influencing contingency predictions for ICLPs have been the subject of numerous studies. To estimate the time needed to construct Continuous Flight Auger (CFA) piles in Egypt, one study created a time-management module using information from forty prior projects17. In a precast water canal project, another examined the differences between the Critical Path Method (CPM) and the Linear Scheduling Method (LSM)2. Furthermore, 26 factors were found to have an impact on the estimation of execution times for drainage and irrigation projects18. Likewise, variables influencing irrigation project timelines were listed, which resulted in the creation of a mathematical prediction model that was subsequently verified with the aid of contemporary technology5.
Risk factors affecting the estimation of ICLPs’ contingency
Risk factors affecting the estimation of ICLP contingencies were drawn from several prior studies. Forty-five common risks associated with Continuous Flight Auger (CFA) pile construction under Egyptian conditions were compiled, specifying each risk’s causes, events, and impacts Risk factors affecting the estimation of ICLP contingencies were drawn from several prior studies. Forty-five common risks associated with Continuous Flight Auger (CFA) pile construction under Egyptian conditions were compiled, specifying each risk’s causes, events, and impacts19. Key risks in irrigation network rehabilitation were identified and organized into eight categories: technical, economic, financial, governance, safeguards, environmental, resettlement, and Indigenous peoples20. A modified Fuzzy Group Decision-Making Approach (FGDMA) was suggested and validated to evaluate 57 possible risks across nine groupings in power plant projects. In Egyptian irrigation canal rehabilitation, the compatibility risks between various canal-lining techniques and site conditions were investigated21. In Egyptian irrigation canal rehabilitation, the compatibility risks between various canal-lining techniques and site conditions were investigated9. Finally, a survey comprising 51 risk factors for construction-related water supply projects was developed22.
Contingency definitions, types, and methods of calculation
Multiple studies have defined, classified, and proposed methods for calculating contingency. For example, contingency is described as “an amount of funds added to the base cost estimate to cover estimate uncertainty and risk exposure3. Another work distinguishes between project contingency and process contingency–both intended to address common engineering uncertainties–and divides calculation approaches into deterministic and probabilistic methods6 . A separate source identifies three contingency types–construction, design, and management23. Lastly, a summary of fourteen different methods for estimating project cost contingency is provided, along with a review of a large number of studies on contingency estimation3,4.
Contingency estimation MODELS in construction projects
To estimate contingency costs for highway construction and evaluate the influence of risk factors during bidding, the fundamental Analytic Hierarchy Process (AHP) model was introduced1. A mathematical model combining AHP and Multi-Attribute Utility Theory (MAUT) was developed to determine the optimal contingency value for construction projects6. Power plant development employed a Classification and Regression Tree (CART) approach to forecast project costs and the contingency required to cover potential overruns during the development phase7. For construction projects, a hybrid Work Breakdown Structure (WBS)–Pareto model was developed that combines the 80/20 principle with cost–time risk analysis24. Lastly, to improve the distribution of cost contingency reserves during the planning phase, a Monte Carlo Simulation (MCS) framework was suggested25.
Current studies emphasize the growing role of ANNs in predicting infrastructure expenses and timelines. As an illustration, deep-learning models that utilize BIM properties in the schematic design phase have given cost estimates that are more accurate by a large margin than those obtained by conventional methods26. Artificial Neural Networks (ANN) were employed to analyze data collected in highway construction from 2011 to 2023, leading to very high correlation coefficients (\(R \approx 0.989\)), thus demonstrating their ability to predict large-scale infrastructure projects27. In the same way, modified ANN architectures having more interesting activation functions were able to surpass the time-series and regression models in regional construction cost indices forecasting28. For example, in research aimed at predicting time and cost overruns, ANN models optimized with Tabu Search achieved an R2 of approximately 0.94 despite having limited project data29. A bibliometric survey conducted recently has also recorded the increasing use of ANNs in the built environment, which is in line with the time of our research30.While previous studies have applied machine learning techniques such as regression models and Monte Carlo simulations to address uncertainty in project estimation, few have operationalized these models into practical tools that incorporate risk-based adjustments for both time and cost. This research fills that gap by combining expert-driven risk assessment with artificial neural networks in a deployable application, thereby extending the applicability of ANN models in real-world infrastructure projects. Recent advances in global ANN-based approaches. Also indicate the growing need for models that are both accurate and user-oriented, which this study addresses directly.
The role of RII on risk weighting
RII is recognized as a valid method for quantifying expert judgment in risk prioritization, as it enables the transformation of subjective evaluation scales into quantifiable values and weights. The RII process, however, struggles with establishing calibration (validity and reliability) post hoc in complex, multi-faceted decision-making environments like construction safety systems31 and managing urban infrastructure sites32. The RII process’s most significant limitation is the reliance on the linear combination of expert judgment, as the RII approach is only valid assuming the consistency of the experts and a stable decision-making context. The various perspectives and realities of different experts, coupled with emergent project opportunities or conditions, can introduce irreducible bias into the RII process. This is why internal reliability should be assessed to verify experts’ internal consistency through calculation metrics like Cronbach’s alpha10.
In terms of meaningfully addressing this limitation, authors can recommend some form of validation procedures, as demonstrated in advanced predictive analytics and occupational safety forecasting applications using ensemble machine learning33. This ensures that the derived weights are statistically reliable in addition to being contextually responsive and representative of real-world situations.
Small-N regression problems
ANNs can discover non-linear structures but are also likely to overfit. It has been shown that ensemble and kernel-based mechanisms typically outperform ANNs in this small-N type of problem: Stevanović et al.34 found XGBoost to be more accurate from smaller building datasets; Wang et al.35 found XGBoost to be a higher-ranked algorithm than ANN, SVM, and Random Forest with only 45 samples; Oukhouya and El Himdi36 found SVR and XGBoost to outperform MLP for stock prediction; and Qi et al.37 confirmed that XGBoost and SVR were better in hydropower forecasts.
In this present study, we addressed the issue of limited irrigation data by generating 5000 simulated samples for ANN training, while 8 real-world projects were used in the validation step. This approach led to the development of a combined RII weighting with dual risk-adjusted outputs, implemented into a usable application.
Applications of ANN and ML in irrigation project estimation
In studies of irrigation, artificial neural networks (ANN) have primarily been used for cost or hydraulic prediction purposes but with some evident constraints. For example, ElMousalami et al.8 investigated using ANN for field canal cost estimates, and Pourgholam-Amiji et al.38 implemented machine learning (ML) processes for identifying drip irrigation costs; both provided only point estimates. Aghayee et al.39 sought to address uncertainty in canal operations, Selim et al.40 developed ANN to model seepage losses, and Zeleňáková et al.15 evaluated the impacts of lining on rehabilitation costs.
Significantly, the focus of these studies on stand-alone parameters does not use risk adjustment or phenomenon-based tools for deployment. Alternatively, this study utilizes scenarios in conjunction with expert-based spawning probability attached to the utility of the risk index framework (RII), prediction using ANN, risk-adjusted or alternative prediction outputs, and a usable desktop tool, thereby filling a distinct methodological gap and practical void.As shown in Table 1, the comparison highlights the research gap in irrigation-specific estimation work.
Research gap
Traditional methods such as CPM and regression are incapable of recognizing the complexity of risk interactions in the delivery of irrigation canal construction projects. Even though ANNs offer some predictive ability, the majority of research considered only cost or time in isolation, did not integrate risk, and did not serve as an implemented tool for use in practice.
This study addresses these gaps by developing an integrated ANN model to utilize risk factors from a survey using a weighting system (practicable for the construction delivery of irrigation projects), while addressing the following two research questions:
-
RQ1: Can survey-weighted risk factors drive accurate predictions of both duration and cost in canal-lining projects?
-
RQ2: How should risk coefficients be calibrated to minimize forecast error across realistic risk levels?
An eight-phase process (Fig. 1) is employed to address these questions, including risk identification, expert evaluation, reliability assessment, designing artificial neural networks (ANN), calibration from 5000 simulations, validation on eight projects, sensitivity analysis, and deployment through Python. This process results in a reliable, practical tool for planning projects in the early stages.
Research methodology
A phase-eight research model (Fig. 1) was employed in this research to estimate project duration and cost related to waterway canal lining by way of expert judgement and an artificial neural network approach.
Phase 1: Factor identification
93 cost, time, and risk factors previously published in the literature were assessed and grouped into 12 groupings.
Phase 2: Expert survey and data governance
Risks were assessed using an AHP-RII methodology10. All participants responded anonymously, and each participant required a minimum of 10 years of relevant experience to participate.
Phase 3: Key variables selection
20 key variables were identified by the participants, with an RII > 70, providing confidence in their selection10.
Phase 4: Reliability verification
The key variables were found to be consistent (Cronbach’s \(\alpha\) = 0.954).
Phase 5: ANN configuration
ANNs were configured using a MLP architecture (128-64-32 nodes) with fixed random seed (42), and kept the software versions specific (Python 3.13.0, scikit-learn 1.4.0, and TensorFlow 2.15.0).
Phase 6: Model calibration
5000 risk scenarios were generated using Latin Hypercube Sampling to train the ANN models.
Phase 7: Model validation
Model performance was evaluated in 8 real life projects implementing leave-one-out for cross-validation.
Phase 8: Application & analysis
The ANN model was developed as an application in Python, with sensitivity and tornado chart assessment.
Questionnaire stage
Several factors that influence the estimation of the contingency of EICLPs have been identified and categorized by prior research. After further refinement, a total of 93 factors were obtained from previous literature reviews and categorized into three main groups: cost, time, and risk-related factors. These factors were then further subdivided into 12 groups, which included three cost-related factors (design, construction, and overheads), three time-related factors (owner, contractor, and project), and six risk-related factors (technical, environmental, financial, operational, legal and regulatory, and social)-related elements. Subsequently, the 93 factors were examined using an AHP-RII survey (full details are presented in our previously published research10), which confirmed the reliability of the instrument (Cronbach’s \(\alpha = 0.954\)). The survey identified the 20 most impactful factors, which were then applied as the explanatory variables for designing the current ANN model. A previous foundational study we did10 covered in great detail the survey design and administration. Here the key parameters are just enumerated for the first time, as they are required:
Survey design
Roles of respondents: A stratified sample of 150 experts was selected for the survey, aiming to include equal numbers of representatives from the three main stakeholder groups: contractors (40%), consultants (40%), and government owners (20%).
Size of sampling: The purposive sampling frame consisted of Egyptian professionals with at least ten years of experience working on irrigation projects. The sample size was \(N = 150\).
Rate of response: 112 completed the survey and returned it.
So, there was a high rate of response (74.6%). Distribution of respondents: The respondents were scattered all over the country but had mainly project experience in the Nile Delta and Upper Egypt regions. These regions are, therefore, the places where the most Egyptian Irrigation Canal Lining Projects (EICLPs) are carried out.
Scales, equations, and weighting justification
Scales: Two 10-point Likert scales were used:
- Frequency index (FI): 1 (very rare) to 10 (very frequent).
- Impact index (II): 1 (very low impact) to 10 (very high impact).
Equations: The relative importance index (RII) for each factor was calculated using the following equations, as shown in Eqs. (1)–(3)10:
where \(F_i, I_i\) are individual scores; a is the upper scale limit (10); N is the number of respondents (150).
Justification for \({\rm FI \times II}\) product: The product \(({\rm FI \times II})\) was used instead of a simple mean to prioritize factors that are both high-frequency and high-impact. This ensures that a factor with a high frequency but low impact does not outweigh a less frequent but critically severe factor, providing a more robust and realistic risk weighting for input into the ANN.
Key important variables that affect construction project lining
The key significant cost, time, and risk-related variables influencing the estimation of EICLPs’ contingency were found in 20 out of 93, and their relative weight was greater than 70%. The hierarchical analysis of critical success factors in irrigation canal lining projects reveals that technical-execution factors dominate the highest impact tier (F.I. \(\ge 0.10\)). Specifically: CC7 (concrete pouring works for canal beds/slopes, F.I.=0.1140), TP5 (water rotation management, F.I.=0.1124) and TP10 (labor productivity, F.I.= 0.1083)
These factors collectively underscore the operational primacy of construction quality control and resource coordination. They are complemented by design parameter CD2 (hydraulic cross-section area, F.I.=0.1065) and contractual factor TO3 (owner payment policies, F.I.=0.1063), indicating that project success hinges on integrating engineering precision with stakeholder governance.
Notably, the top five factors account for 54.75% of the maximum relative weight ( \(\ge 93.25\%\) ), while environmental (ER5, F.I.=0.0833) and socioeconomic factors (SR4, F.I.=0.0806) demonstrate marginal influence. This evidence suggests that optimization efforts should prioritize technical-administrative synergies over broader sustainability metrics in canal lining contexts10.
Table 2 demonstrates high consistency among the items. With an average inter-item correlation of 0.509, the scale exhibits strong internal reliability. The low variance (0.031) further indicates that these correlations are uniform across all items. In sum, the evidence suggests a well-designed scale whose components cohesively measure the targeted construct. As shown in Fig. 2 below:
The inter-item correlation matrix revealed strong relationships among several of the 20 input variables, which reflect the interdependence of technical, environmental, contractual, and resource-related risks. These variables were previously identified by using the relative importance index (RII). Two key metrics were evaluated using a 1–to–10 scale, where 1 indicates low effectiveness and 10 indicates high effectiveness. The first metric examined how frequently cost, time, and risk factors influenced contingency estimates for EICLPs. The second metric assessed the level of impact these factors had on those estimates. Based on these metrics, two indices were then calculated: the Frequency Index (FI) and the Impact Index (II). The final index was determined by multiplying FI and II, followed by applying the appropriate weights. The key inputs for training an Artificial Neural Network (ANN) model as shown in Fig. 3 below.
Variable reliability
The reliability of this variable was tested, and the SPSS Ver.25, the alpha coefficient value for 20 components was 0.954, indicating that it was reliable as shown in Table 3 below. Table 2 shows substantial variation in how the 20 risk factors were rated. Importance scores ranged from 4.70 (±2.64) for lack of skilled labor to 7.66 (±2.82) for concrete pouring works–highlighting clear differences in perceived criticality. The high standard deviations, like ±3.24 for owner’s bid evaluation policy, reveal significant disagreement among raters. This likely stems from different project contexts (urban vs. rural sites) and professional perspectives (contractors vs. engineers). Notably, physical factors such as canal cross-section area (±2.82) showed 18% less variability than operational risks, suggesting technical parameters are easier to quantify consistently. Rather than flawed methodology, this spread reflects Egypt’s complex infrastructure reality–precisely the kind of messy relationships our ANN model handles effectively.
The Cronbach’s alpha of the variables and the item-total statistics are shown in the following Table 4.
Subsequent analyses were performed to further assess the factorability and dimensionality of the 20-item scale. The measure of sampling adequacy, as shown in Table 5 by Kaiser-Meyer-Olkin (KMO), was 0.940 and Bartlett’s Test of Sphericity was statistically significant, \(\chi ^2(190) = 2612.43, \; p <.001\), as shown in Table 5. Both statistics provide strong evidence of the correlation matrix’s factorability.
An exploratory factor analysis (EFA) was conducted using principal component analysis with promax rotation. The results revealed that the four-factor solution provided the cleanest representation. The four factors—Technical-Execution, Resource Management, Financial-Contractual, and External-Environmental—are consistent with the notion of risk domains that are theoretically distinct.
The whole scale demonstrated very good internal consistency (\(\alpha =.954\)), and the newly formed subscales also showed good reliability (\(\alpha\) ranging from 0.82 to 0.91).
Supplemental Tables S1 and S2 provide the descriptions of each factor, along with its ID and subscale designation.
Artificial neural network model
Artificial neural networks (ANNs) are models built on the computer to solve difficult problems by loosely basing them on the way that animal brains are designed to process information. In perception-style networks, the artificial neurons (or nodes) are the units of the processors, which are organized in separate layers and connected by the weighted connections like the synapses. Supervised learning allows these neurons to pick up and distribute signals, thus creating a model to make predictions that will categorize the data that have been stored. An average ANN has an input layer, at least one hidden layer, and an output layer. Every neuron in one layer is connected to each neuron in the next layer, while neurons within the same layer are not connected, as shown in Fig. 4 below.
The ANN structure consisted of three hidden layers with 128–64–32 ReLU activation neurons and was trained using the Adam optimizer. Input data were normalized prior to training. The model was calibrated over 5000 simulated scenarios to capture a wide range of risk conditions and subsequently tested on 8 real project cases. The performance of the ANN model was evaluated using \(R^{2}\), MAE, RMSE, and MAPE to assess overall model validation.
Model calibration and sensitivity analysis
The ANN model was calibrated via a systematic process to identify optimal risk influence coefficients on duration and cost adjustments. The calibration process included: Latin Hypercube Sampling (LHS): 5000 risk scenarios were generated representing the entire spectrum of risk (0–100%) to account for realistic uncertainty conditions for canal linings. Grid Search Over Coefficients: A comprehensive grid search was performed over 150 coefficient combinations:
-
Duration coefficient: 15 values ranging from 0.10 to 0.40
-
Cost coefficient: 10 values ranging from 0.05 to 0.30
Each coefficient combination was assessed based on mean absolute error (MAE) and mean absolute percentage error (MAPE). Optimal coefficient selection: The combination (0.30 for duration, 0.20 for cost) produced the lowest MAE values (0.87 months for duration, EGP 102,500 for cost), thus this combination was selected as the optimal combination asshown in Fig. 5Response surface analysis: The response surface of MAE as a function of duration and cost coefficients is shown in Fig. 5. The surface clearly indicates that the global minimum is located at (0.30, 0.20), confirming optimality across the full parameter space.
Robustness validation: Leave-One-Project-Out Cross-Validation (LOPO-CV) was implemented to validate the stability of the coefficients across subsets of projects. The optimal coefficients were consistent across all folds and the change in MAE was less than 5%, demonstrating strong robustness.
Sensitivity profiling: The sensitivity to coefficient changes for MAE is observed in Fig. 6. A 10% change in duration coefficients resulted in an MAE increase of 18%, while cost coefficients had an MAE increase of 7%. This evidence demonstrates that project duration is more sensitive to risk factors than cost in canal lining projects.
This calibration framework ensures that the model is both accurate in terms of optimality and maintains robust uncertainty for various risk conditions. Figure 5 demonstrates the relationship between the coefficient values and error reduction.
Validation of ANN model for the affecting factors
The contingency model from the data of eight Egyptian irrigation canal lining projects has been proven valid. The comparison of the model’s forecasts with the actual expenditures revealed that, besides the expected budget, there were unplanned additional costs that resulted in the real project contingencies –2.48% to 23.20%. The average contingency obtained from the actual data (9.68%) very much resembles the model’s predicted average contingency (9.43%), thus supporting the model’s reliability. This confirmation indicates to project managers and decision-makers that they can be more practical and efficient in preparing for and managing the risk of overrun cost.
Model deployment (Python 3.13 interface)
As a practical implementation and to facilitate reproducibility, the trained ANN was produced as an independent Python 3.13/Tkinter desktop application (Fig. 7). The upper portion contains the inputs (project parameters and the level of risk factors). The lower portion displays the outputs (predicted project duration in months and total cost in EGP, with uncertainty bands).
The application incorporates input validity checks (for types and ranges) and applies the same preprocessing pipeline (standardization and Min–Max scaling fitted on training data) to ensure conformity with the model training. Embedded error handling provides informative messages to the user in the case of invalid input, prediction errors, or model training failures.
The application is available in a basic version (v1.0.0) and includes a list of environment dependencies (numpy, matplotlib, pandas, scikit-learn v1.4). The full source code and a requirements.txt file are provided for transparency and reproducibility, along with a synthetic demo dataset (in the Supplementary Material) for independent checking and reproduction of the figures reported here.
The application integrates three main components:
The application integrates three main components:
-
Data preprocessing: To enable equal contributions of variables during training, input features were standardized using zero-mean normalization. Mathematically, the zero-mean normalization is defined in Eq. (4):
$$\begin{aligned} x' = \frac{x - \mu }{\sigma }, \end{aligned}$$(4)where x = original feature value, \(\mu\) = mean of the feature in the training set, \(\sigma\) = standard deviation of the feature in the training set, \(x'\) = standardized (normalized) value of the input feature. Output targets (project duration and cost) were scaled into the range [0,1] through Min–Max normalization as shown in Eq. (4):
$$\begin{aligned} y' = \frac{y - y_{\min }}{y_{\max } - y_{\min }}, \end{aligned}$$(5)where y = the original output value, \(y_{\min }\) = the minimum observed target value in the training set, \(y_{\max }\) = the maximum observed target value in the training set, \(y'\) = the scaled value of the output target. All of the normalization parameters for input and output values were fitted to only the training set and only then applied to the validation and test sets to ensure no data leakage. Lastly, as all variables related to project and risk factors were continuous, there was no need to encode any categorical variables. Normalizing the data follows standard conventions in machine learning46.
-
ANN architecture and training: The predictive engine used was a Multi-Layer Perceptron (MLP) with three hidden layers (128-64-32 neurons) and a ReLU activation function. The model was trained with the Adam optimizer (\(\beta _1 = 0.9\), \(\beta _2 = 0.999\)) using a learning rate of 0.001, a batch size of 32, and early stopping (patience of 50) with a maximum of 1000 epochs. Weights were initialized using the Glorot uniform method, resulting in approximately 25,000 learnable parameters. Training was conducted on a standard workstation (Intel i7 CPU, 16 GB RAM). In ablation studies, the three-layer architecture achieved the least validation error while remaining computationally efficient.
-
Performance comparison: To validate the predictive strength of our proposed ANN, we benchmarked performance against the OLS, Ridge, Lasso, Random Forest, and XGBoost baseline models. As shown in Fig. 8, the ANN provided approximately 22% improvement in predictive accuracy (\(R^2\)) compared to OLS, 15–17% improvement on Ridge and Lasso, 12% improvement over Random Forest, and 4% improvement over XGBoost. In terms of relative error, the ANN achieved 1.0–1.3% relative error, compared to XGBoost (2.0%), Random Forest (2.5–3.0%), Ridge/Lasso (3.5%), and OLS (more than 4.5%). The results show that the ANN not only achieved the highest accuracy, but it also produced validation errors that were a full 3.5 percentage points lower. The loss curves show optimal convergence, with training loss decreasing from 0.0078 to 0.0024 and validation loss stabilizing at 0.1596. This indicates effective learning without overfitting, confirming the model’s strong generalization capability as shown in Fig. 9.
-
Risk adjustment mechanism the composite risk formula now consistently reflects a total of 20 risk factors, as shown in Ref. 10 is calculated as shown in Eq. (6). The corresponding weights (\(w_k\)) associated with these risk factors can be found in Table S1, located in the supplemental materials for this study.
$$\begin{aligned} \text {Risk factor} = \frac{\sum _{k=1}^{20} (w_k \times r_k)}{\sum _{k=1}^{20} w_k}, \end{aligned}$$(6)where
-
\(w_k\) = Weight of risk factor (from RII analysis)
-
\(r_k\) = Risk value (0.0–1.0) for factor k
-
20 risk factors with pre-defined weights (sum of weights = 1.0)
The base values (without risk) are then derived from the predicted adjusted values as shown in Eqs. (7) and (8):
$$\begin{aligned} & \text {Adjusted Duration} = \text {Base Duration} \times (1 + 0.3 \times \text {Risk Factor}), \end{aligned}$$(7)$$\begin{aligned} & \text {Adjusted Cost} = \text {Base Cost} \times (1 + 0.2 \times \text {Risk Factor}), \end{aligned}$$(8)The coefficients of duration and cost are both recorded at 0.30 and 0.20, respectively, and were systematically calibrated using the ANN model. They were subsequently determined to be optimal with respect to the model error (MAE and MAPE) across the 5000 generated risk scenarios. This approach ensures that both base estimates and risk-adjusted estimates are generated, offering planners the ability to account for different uncertainty levels during early project stages.
-
Prediction workflow
Prediction workflow is shown in Fig. 10 below.
Results and discussion
Model performance
The predicted project time and cost based on project risks of irrigation canal lining screen are shown in Fig. 11.
The ANN model exhibits excellent skills for estimation of time and costs of Irrigation Canal Lining Projects (ICLPs),the simulation experiments executed by using 5000 generated samples. For the actual case study dataset of eight irrigation canal lining projects, the integrated Python model achieved \(\hbox {R}^{2}\) = 0.92 (training) and 0.82 (testing) as shown in Fig. 12. This desktop tool, which is written in Python, allows for live forecasting, and thus planners are able to decide what to do at the most important stage of the projects on the canal and make the best choice for the situation. The model relied on data from eight irrigation canal lining projects to be confirmed. The forecasts were mostly close to the truth figures with a few exceptions at the highest risk conditions when the deviations were small. This indicates the model’s capability to integrate risk-driven variables efficiently, thus providing more credible scenario analyses relative to the deterministic approach.
Tornado analysis of key risk factors
To gain a better intuitive sense of effects on model behavior with respect to individual risk factors, a Tornado assessment was also performed through a One-at-a-Time (OAT) perturbation procedure that altered each factor while holding the others fixed at baseline values. While global approaches capture more of the complexity of interrelationships across factors, the horizontal bars presented in Fig. 13 are OAT \(\Delta\) impacts, and are not global Sobol or SHAP impacts. To make the output easier to interpret, descriptive names for each factor code were also presented in the output (e.g. TP5: Water Rotation Issues, CC7: Concrete Pouring Delays) along with indicator unit labels: duration impacts on project duration are in days, and cost impacts are presented in Egyptian Pounds (EGP). Overall, results validate that, as MLP importance rankings would dictate, all technical-execution factors, such as Water Rotation Issues (TP5) and Concrete Pouring Delays (CC7), have the greatest impact on both cost and duration.
Parameter space exploration and optimal selection
First, thorough research was carried out in order to select the most suitable risk impact coefficients. The search was limited to 150 possible combinations of 15 values for the duration coefficient (from 0.10 up to 0.40) and 10 values for the cost coefficient (from 0.05 to 0.30). Using the Latin Hypercube Sampling technique, 5000 validation scenarios for each combination were generated to represent risk levels ranging from 0% to 100%. The outcomes revealed that the coefficients which delivered the best performance were 0.30 for the duration and 0.20 for the cost. The combination resulted in the lowest overall Mean Absolute Error (MAE) of 0.87 months for time and EGP 102,500 for money. In addition, it gave the smallest error variance (\(\sigma ^2 = 0.04\) as compared to an average of 0.21), which means both precision and stability under various risk conditions. These findings strongly illustrate the dominating effect of time-centered risks on project performance and stress the crucial role of focusing on the schedule in the contingency plan.
Validation
Table 6 presents the validation results across the two tested configurations below.
The artificial neural network (ANN) model was validated with real data from eight Egyptian projects on irrigation canal lining. Because of the small sample (N = 8), a Leave-One-Project-Out Cross-Validation (LOPO-CV) scheme was implemented to maximize the use of available data and, in addition, to estimate the performance reliably, where each project was the test set once.
Validation results from the two leading risk impact coefficients (0.30 duration, 0.20 cost), which were used to compare the next best alternative. The top solution, which resulted from LOPO-CV, had a mean absolute error (MAE) of \(0.87 \pm 0.11\) months for the project duration and EGP \(102{,}500 \pm 8200\) for cost. The second setting (0.28 duration, 0.22 cost), which produced slightly higher errors (\(0.93 \pm 0.15\) months and EGP \(110{,}300 \pm 12{,}500\), cost), supported the steadiness and reliability of the coefficients we selected.
Limitations and future work
The proposed artificial neural networks framework exhibits great accuracy for irrigation canal and road projects in Egyptian contexts. Nonetheless, two limitations still need to be addressed. Firstly, the model generalizability to places that have very different construction regulations from those used in the study needs to be confirmed. Secondly, although the model shows a good result for irrigation canals and roads, its capability of being used for the infrastructure with dynamic complexity such as dams and tunnels is still problematic.The third stage of empirical validation used only 8 irrigation canal lining projects as case studies to validate the model. While this was achieved and showed internal validity (\(R^{2} = 0.92\) training, 0.82 testing), the small number of validation projects limits the research’s external validity. Future studies should address this limitation by evaluating a greater number of irrigation canal lining projects to provide a level of external validity and to expand applicability of the model to a more general context.
In addition, the model was designed specifically on the 20 most salient risk factors, systematically reduced from an original 93 factors through a validated AHP–RII survey instrument10. Therefore, the model is based on an empirically validated and representative set of predictors, rather than a subjectively or arbitrarily reduced set.
Conclusions
This study demonstrates a creation of a model for the prediction of the contingency of projects of the lining of irrigation canals (ICLPs) in the case of Egypt. Rather than concentrating on the factor selection phase, this research carries on a previously set of 20 variables that most influence the cost, time, and the risk, which were identified through the earlier AHP-based studies, and uses them as direct inputs to the supervised machine learning model. An Artificial Neural Network (ANN) was trained with records of past ICLPs. The model was able to predict project contingency with high precision. It had an average predicted contingency of 9.43%, which is very close to the actual average of 9.68% from the eight test projects. The ANN obtained a correlation coefficient of 97.42% which means that the predicted values are very similar to the observed ones. To facilitate practical implementation, the authors have incorporated the trained model into a desktop application written in Python 3.13 using the Tkinter interface. This tool provides project estimators with the ability to enter the key parameters and get reliable contingency predictions without any delay, thus enabling them to make a more informed decision and have a more detailed budget for the future. These results confirm that the infrastructure project forecasting models based on ANN are feasible and also, they open new prospects of incorporating real-time data analytics and advanced AI for even better predictive performance. From a practical point of view, the developed Python-based application gives project managers an easily accessible decision-support tool to make accurate, risk-adjusted estimates at the early stages. This ability can greatly lower the risk of delays and cost overruns, thus raising the efficiency of public expenditure in critical water infrastructure projects. At the level of society, the model is a beneficent factor of the better resource allocation and sustainable water management, without which it would be impossible to enhance agricultural productivity and economic stability. The authors suggest that the model can be extended to include real-time IoT-based monitoring data as a future research direction. Further, it can be also verified that the model can be used to other infrastructure domains such as transportation and energy projects.
Data availability
The datasets generated during the current study are available from the corresponding author (B.T.) upon reasonable request.
References
El-Touny, A. S., Ibrahim, A. H. & Amer, M. I. Estimating cost contingency for highway construction projects using analytic hierarchy process. Int. J. Comput. Sci. Issues 11, 73–84 (2014).
Ramani, P. V., Selvaraj, P., Shanmugapriya, T. & Gupta, A. Application of linear scheduling in water canal construction with a comparison of critical path method. J. Constr. Dev. Ctries. 27, 189–212 (2022).
Dykman, A., Bakshi, N. & Donn, M. Interpreting traditional cost contingency methods in the construction industry. In Proc. of the International Conference of Architectural Science Association, 165–174 (2019).
Ammar, T., Abdel-Monem, M. & El-Dash, K. Appropriate budget contingency determination for construction projects: State-of-the-art. Alex. Eng. J. 78, 88–103 (2023).
Al-Somaydaii, J. et al. Using support vector machine (svm) technology to predict the duration of irrigation canal projects in Western Iraq. Int. J. Nonlinear Anal. Appl. 14, 73–81 (2023).
Shash, A. A., Al-Salti, M., Al-Shibani, A. & Hadidi, L. Predicting cost contingency using analytical hierarchy process and multi attribute utility theory. J. Eng. Proj. Prod. Manag. 11, 228–242 (2021).
Arifuzzaman, M., Gazder, U., Islam, M. S. & Skitmore, M. Budget and cost contingency cart models for power plant projects. J. Civ. Eng. Manag. 28, 680–695 (2022).
El-Mousalami, H. H., El-yamany, A. H. & Ibrahim, A. H. Predicting conceptual cost for field canal improvement projects. J. Constr. Eng. Manag. 144, 1–8 (2018).
Abu-zeid, T. S. Compatibility between canals lining methods and sites conditions case study: Al-khofoog canal, el-minia. (dept.c). Mansoura Eng. J. 46, 1–9 (2021).
Ibrahim, A. H. An integrated model for estimation of Egyptian irrigation canal lining projects based on risk. Int. J. Multiphys. 18, 744–756 (2024).
Ragel, L. J. B., Subia, G. S., Mina, J. C. & Campos Jr, R. B. Limitations of PERT/CPM in construction management planning: Inputs to mathematics in architecture education. Turkish J. Comput. Math. Edu. (TURCOMAT). 12(10), 5218–5223 (2021).
Zhasmukhambetova, A. et al. Integrating risk assessment and scheduling in highway construction: A systematic review of techniques, challenges, and hybrid methodologies. Future Transp. 5, 85 (2025).
Hosny, H. E., Ibrahim, A. H. & Fraig, R. F. Cost analysis of continuous flight auger piles construction in Egypt. Alex. Eng. J. 55, 2709–2720 (2016).
Khasiya, R. & Patel, J. Computation of seepage losses and economic analysis of canal lining- a case study of Hajira branch canal. Res. Square https://doi.org/10.21203/rs.3.rs-3090309/v1 (2023).
Zelenakova, M. et al. The effect of lining hydraulic properties on the efficiency and cost of irrigation canal reconstruction. Res. Square https://doi.org/10.21203/rs.3.rs-3734693/v1 (2023).
Rahman, S. H. & Sarma, B. Optimization of sukla irrigation canal. In Advances in Sustainability Science and Technology (eds Rahman, S. H. & Sarma, B.) 118–130 (Springer, 2022).
Hosny, H., Ibrahim, A. & Fraig, R. Deterministic assessment of continuous flight auger construction durations using regression analysis. Int. J. Eng. Res. Appl. 5, 86–91 (2015).
Hameed, A. A. & Al-Zwainy, F. M. Statistical evaluation of the planning and scheduling management process for irrigation and drainage projects. J. Algebr. Stat. 13, 259–282 (2022).
Hosny, H. E., Ibrahim, A. H. & Fraig, R. F. Risk management framework for continuous flight auger piles construction in Egypt. Alex. Eng. J. 57, 2667–2677 (2018).
Nakao, T. N. Proposed loan republican state enterprise Kazvodkhoz irrigation rehabilitation project (guaranteed by the republic of Kazakhstan) (2019).
Islam, M. S. Modelling cost overrun risks in power plant projects (2019).
Nguyen, M. T., Mai, S. H. & Vu, Q. H. Risk assessments in construction of water supply projects in Hanoi, Vietnam. Am. J. Ind. Bus. Manag. 11, 232–250 (2021).
Adaurhere, R. E. Contingency planning and management for construction projects in South Africa (2019).
Jaya, N. M. & Sudarsana, D. K. Development of contingency value analysis model using cost-time risk integration and wbs-pareto hybrid in construction projects. Int. J. Eng. Appl. https://doi.org/10.15866/irea.v11i2.23235 (2023).
Curto, D., Poza Garcia, D. J., Villafanez Cardenoso, F. A. & Acebes Senovilla, F. Cost contingency estimation: A new method for quantitative risk analysis applied to a real construction project. DYNA Ingenieria Ind. 99, 106–112 (2024).
Park, D. & Yun, S. Construction cost prediction using deep learning with bim properties in the schematic design phase. Appl. Sci. 13, 7207 (2023).
Abd, A. M., Kareem, Y. A. & Zehawi, R. N. Prediction and estimation of highway construction cost using machine learning. Eng. Technol. Appl. Sci. Res. 14, 17222–17231 (2024).
Aslam, B., Maqsoom, A., Inam, H., Basharat, M. U. & Ullah, F. Forecasting construction cost index through artificial intelligence. Societies 13, 219 (2023).
Al Manseer, R., Al-Smadi, S. & Al-Bdour, H. Machine learning-aided time and cost overrun prediction in construction projects. Asian J. Civil Eng. 24, 2583–2593 (2023).
Kaushik, A. K., Islam, R., Elbahy, S. & Arif, M. Artificial neural network application in construction and the built environment: A bibliometric analysis. Buildings 14, 2423 (2024).
Shehadeh, A., Alshboul, O. & Saleh, E. Enhancing safety and reliability in multistory construction: A multi-state system assessment of shoring/reshoring operations using interval-valued belief functions. Reliab. Eng. Syst. Saf. 252, 110458 (2024).
Shehadeh, A., Alshboul, O. & Arar, M. Enhancing urban sustainability and resilience: Employing digital twin technologies for integrated WEFE nexus management to achieve SDGs. Sustainability 16, 7398 (2024).
Shehadeh, A. & Alshboul, O. Enhancing occupational safety in construction: Predictive analytics using advanced ensemble machine learning algorithms. Eng. Appl. Artif. Intell. 159, 111761 (2025).
Stevanović, S., Dashti, H., Milošević, M., Al-Yakoob, S. & Stevanović, D. Comparison of ann and xgboost surrogate models trained on small numbers of building energy simulations. PLoS ONE 19, e0312573 (2024).
Wang, J., Shahani, N. M., Zheng, X., Hongwei, J. & Wei, X. Machine learning-based analyzing earthquake-induced slope displacement. PLoS ONE 20, e0314977 (2025).
Oukhouya, H. & El Himdi, K. Comparing machine learning methods–svr, xgboost, lstm, and mlp–for forecasting the moroccan stock market. Comput. Sci. Math. Forum 7, 39 (2023).
Qi, Z., Feng, Y., Wang, S. & Li, C. Enhancing hydropower generation predictions: A comprehensive study of xgboost and support vector regression models with advanced optimization techniques. Ain Shams Eng. J. 16, 103206 (2025).
Pourgholam-Amiji, M., Ahmadaali, K. & Liaghat, A. A novel early stage drip irrigation system cost estimation model based on management and environmental variables. Sci. Rep. 15, 4089 (2025).
Aghayee, Z., Ghodousi, H. & Shahverdi, K. Uncertainty analysis of irrigation canals operation. Iran. J. Sci. Technol. Trans. Civil Eng. 48, 4769–4779 (2024).
Selim, T., Elshaarawy, M. K., Elkiki, M. & Eltarabily, M. G. Estimating seepage losses from lined irrigation canals using nonlinear regression and artificial neural network models. Appl. Water Sci. 14, 90 (2024).
Jiang, Q. Estimation of construction project building cost by back-propagation neural network. J. Eng. Design Tech. 18(3), 601–609 (2020).
Rezaee Arjroody, A., Hosseini, S. A., Akhbari, M., Safa, E. & Asadpour, J. Accurate estimation of cost and time utilizing risk analysis and simulation (case study: road construction projects in Iran). Int. J. Const. Manag. 24(1), 19–30 (2024).
Al-Rawe, Y. H. A. & Naimi, S. Project construction risk estimation in Iraq based on Delphi, RII, spearman's rank correlation coefficient (DRS) using machine learning. (2023).
Salman, A. Analyzing Contingency Estimation for Residential Turnkey Projects in Saudi Arabia: A Neural Network Approach. Buildings 14(6), 1844 (2024).
Sing, M. C., Ma, Q. & Gu, Q. Improving Cost Contingency Estimation in Infrastructure Projects with Artificial Neural Networks and a Complexity Index. Appl. Sci. 15(7), 3519 (2025).
Han, J., Pei, J. & Tong, H. Data Mining: Concepts and Techniques (Morgan Kaufmann, 2022).
Acknowledgements
The authors extend their appreciation to the Deanship of Research and Graduate Studies at King Khalid University, KSA, for funding this work through General Research Project under grant number GRP/57/46.
Funding
Open access funding provided by The Science, Technology & Innovation Funding Authority (STDF) in cooperation with The Egyptian Knowledge Bank (EKB).
Author information
Authors and Affiliations
Contributions
Conceptualization, A.H.I. and B.T.; methodology, A.H.I., B.T., and A.A.S.; validation, A.A.S. and B.T.; formal analysis, B.T. and A.A.S; investigation, A.H.I. and B.T.; data curation, A.H.I., B.T., and A.A.S.; writing-original draft preparation, B.T. and A.A.S.; writing-review and editing, A.H.I. and B.T.; visualization, A.H.I., B.T., and A.A.S.; supervision, A.H.I; project administration, B.T. and A.A.S. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Taha, B., Ibrahim, A.H. & Soliman, A.A. Risk-indexed artificial neural network for predicting duration and cost of irrigation canal-lining projects using survey-based calibration and python validation. Sci Rep 15, 40316 (2025). https://doi.org/10.1038/s41598-025-24125-1
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-24125-1















