Introduction

Bangladesh, in the Ganges-Brahmaputra Delta, experiences perpetual flooding due to its low elevation much of the nation lies below five meters of sea level and its position at the confluence of three enormous river systems1. Normally, 21% of the country, about 31,000 square kilometers, is submerged each year, and intense events overwhelm this by several orders of magnitude, as occurred in 1998 when over two-thirds of the nation was covered by water2. This vulnerability is driven by a combination of upstream water inputs responsible for 80% of Bangladesh’s overall streamflow from the Ganges, Brahmaputra Jamuna, and Meghna rivers coupled with flash floods from surrounding hills and intense local precipitation, further boosted by poor drainage, generating a near annual flood cycle3. Sylhet, the most water-receiving region of Bangladesh, has an annual rainfall of approximately 4,000 mm, and hence it is highly susceptible to flash and local floods4. The district’s topography, surrounded by hills, channels runoff rapidly, overburdening its drainage system. The past of Sylhet’s western region being frequently flooded is seen in records, with the most significant floods taking place in 1781, 1853, 1902, 1966, 1968, 1988, and in recent years, namely 1998, 2000, 2004, 2007, 2010, 2012, 2015, 2016, 2017, 2019, and 20224. Heavy rainfall in Assam and Meghalaya also worsens flooding in Sylhet5. Similarly, along the southeast coast, Chittagong, which is one of the key urban centers in the country, suffers from crucial urban flooding primarily due to excess rainfall. Waterlogging during the monsoon is a perpetual problem, covering primary roads, alleys, and business hubs of old and new parts of the city, interrupting day-to-day life and freezing economic activity in the business capital of the nation6. With such apocalyptic socio-economic impacts of these floods, accurate prediction of rainfall in Sylhet (Bangladesh), Chittagong (Bangladesh), Assam (India) and Meghalaya (India) is not just a need it is a necessity for the protection of human lives, food security, and sustainable development in Bangladesh.

For rainfall forecasting, time series models like ARIMA and SARIMA have previously been used by researchers extensively, but they tend to struggle to account for rainfall variability in terms of noises involved in historical data7,8,9,10,11. Different time series models have been developed to forecast rainfall in Sylhet, Chittagong, Assam, and Meghalaya, the regions of interest of this research, each of which is specific to their respective climatic patterns7,12,13,14. However, high error rates in forecasting have been reported in these investigations. While machine learning models, such as Support Vector Machines (SVM)15,16, Artificial Neural Networks (ANN)17,18, and Long Short-Term Memory (LSTM)19 have been demonstrated to outperform conventional time series models. Not only in the regions covered by this research work but in the world, as a whole, these ML models outperform the traditional models. Research20,21,22,23,24,25,26,27,28,29,30 employed ANN to predict rainfall, and research31,32,33,34,35,36,37,38,39,40 employed SVM. Research41 employed LSTM, Stacked-LSTM, Bidirectional-LSTM Networks, XGBoost, and an ensemble of Gradient Boosting Regressor, Linear Support Vector Regression, and an Extra-trees Regressor have been utilized for the said purpose as well. Research40,42, employed Decision tree, Naïve Bayes, K-Nearest Neighbors, and Support Vector Machines, Bayesian Linear Regression (BLR), Boosted Decision Tree Regression (BDTR), Decision Forest Regression (DFR) and Neural Network Regression (NNR). Research43 applied Linear Regression, Multi-Layer Perceptron (MLP), Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Bidirectional Long Short-Term Memory (BLSTM) for rainfall modelling. A number of hybrid models have been proposed in recent years to enhance the accuracy of rainfall forecasting. Research44 combined Convolutional Neural Networks (CNN) with Long Short-Term Memory (LSTM) networks in a successful effort to detect spatial and temporal correlations in rainfall data. Research45 employed a Neural Network reinforced by Gaussian Random Fuzzy Numbers, and tree models such as Random Forest and Dual Perturb and Combine Tree, in a bid to improve prediction accuracy. Research46, however, employed lazy learner approaches such as K-star and instance-based K-nearest neighbors hybridized with Rotation Forest (ROF) model. These methods were chosen for their ability to extract meaningful patterns from past rainfall data and, subsequently, improve the precision and reliability of rainfall predictions.

The intricate relationship between meteorological factors and inherent noise in short-term rainfall records continues to be a concern for refining the precision of rainfall forecasting further. While machine learning (ML) models have been shown to outperform their time series counterparts, the high forecasting errors in the literature accentuate the need for more sophisticated methods. To address these challenges, researchers have explored various hybrid techniques for rainfall forecasting. In particular, the use of advanced signal processing techniques e.g., Empirical Mode Decomposition (EMD), Ensemble Empirical Mode Decomposition (EEMD), Discrete Wavelet Transform (DWT), and Singular Spectrum Analysis (SSA) has proven to be extremely beneficial in feature extraction for ML models47,48,49,50,51,52. Such techniques help break down intricate, non-stationary rainfall data into understandable parts, improving model performance. Although hybrid models are successful, there is a need for further improvement to mitigate the effects of noise and data variability. Addressing these limitations is important not just for use in areas covered by current research but also for worldwide application. Therefore, there is a need to formulate more robust and advanced models to enhance the precision and reliability of rainfall prediction significantly. However, the complex interplay of meteorological variables and the inherent noise in short-term rainfall data demand more powerful and nuanced solutions.

Artificial Neural Networks (ANNs) have been found to be effective tools for non-linear modeling of hydrological behavior53,54. ANN is one of the most well-known algorithms for advanced tasks that can handle high-level data patterns55. Scientists have used ANN in rainfall forecasting to overcome these challenges effectively, increasing performance and precision. Its layered architecture, activation functions, and adjustable weights enable them to learn intricate patterns in challenging datasets40. However, the performance of ANNs is critically dependent on the optimal tuning of their weights56. An effective weight initialization technique would preferably initialize the weights to maximize the speed of training as well as the performance of the neural network57. Historically, optimizations like Gradient Descent and its variants have been employed for this purpose58. While appropriate for most applications, these gradient-based methods tend to get stuck in local minima and struggle in the high-dimensional, non-convex topographies of ANN weight spaces59. Furthermore, metaheuristics like Genetic Algorithms (GA)60,61, Particle Swarm Optimization (PSO)62, Ant Colony Optimization (ACO)63, and Differential Evolution (DE)64, Harmony Search Algorithm65, Simulated Annealing (SA)66, ant-lion optimization67, Bat algorithm (BA)68 have been extensively employed in an attempt to circumvent these limitations. These methods enable global search but are slow to converge and computationally expensive. Despite these advances, the efficient and effective optimization of ANN weights for challenging hydrological data, particularly in the context of short-term rain forecasting, remains a daunting task, often requiring innovative approaches to crossing the high-dimensional search space without overfitting. There is an increasing necessity for the development of a more powerful and efficient optimization system to enhance performance and accuracy in finding solutions to complicated problems. Alternatively, existing optimization processes have to be developed further through the inclusion of hybrid methods, which combine several methods to achieve a more accurate optimal solution.

This study addresses the limitations of existing methodologies, including the challenge of addressing noise and complexity in short-term rainfall data, the application of single-step optimization in ANN weight adjustment, and the lack of leverage for recent and robust nature-inspired metaheuristic optimization algorithms. To address these limitations, this study proposes a new dual-step optimization framework for ANN weight optimization that is crucial in guaranteeing high predictability accuracy in complex data rainfall structures. This study utilizes recent nature-inspired metaheuristic algorithms Egrit Swarm Optimization (ESOA), Harris Hawks Optimization (HHO), and Hippopotamus Optimization (HO) that are not extensively applied in rainfall modeling, and well-known optimizers like PSO and GA. Egret Swarm Optimization (ESOA), a metaheuristic optimization algorithm based on the hunting behavior of egrets, was introduced in 202269. Harris Hawks Optimization (HHO), also presented in 2019, is derived from the group behavior and natural surprise pounce hunting tactic of Harris’ hawks in nature70. Hippopotamus Optimization (HO), introduced in 2024, draws inspiration from the natural inherent behavior observed among hippopotamuses71. These recently developed nature-inspired metaheuristic optimization methods have vast possibilities to optimize the weights of ANN in noisy and complicated rainfall data conditions. This research not only intrigues these newly developed algorithms but also introduces a novel dual-stage optimization framework for the optimization of ANN weights, and it proves the strength and efficacy of such algorithms in improving the performance of the model. The proposed method begins with HHO which efficiently explores the search space for a well-conditioned and converged set of weights and prevents early convergence. Subsequently, in the second step, a supporting optimization (HHO, ESOA, HO, GA and PSO) algorithm adjusts and optimizes these weights to get the best possible convergence. Besides, this paper extends weight optimization by incorporating probabilistic optimization in the form of Bayesian Optimization (BO) into fine-tuning the hyperparameters of the model. This blend method both enhances weight learning and hyperparameter adjustment, offering a stronger and better-performing ANN-based rainfall forecasting technique.

Moreover, this study introduces a novel application of Non-negative Matrix Factorization (NMF) for short-term rainfall data preprocessing and feature extraction. While NMF has been used in various fields, its application to hydrological time series, particularly rainfall forecasting, is not widespread72,73. Additionally, NMF has been effectively used in rainfall-runoff modelling, with high accuracy and reliability in illustrating the complex nonlinear relationship between rainfall inputs and runoff responses74. NMF’s inherent non-negativity constraint is best suited for rainfall data because of its intrinsic non-negative nature, thereby ensuring physical interpretability and model stability75,76. Furthermore, this research proposes a novel data-driven component selection method for NMF based on the analysis of reconstruction error plots and their derivative counterparts. This approach is a far cry from traditional heuristic-based methods and offers an objective and stable way of determining the optimal dimensionality reduction.

By coupling this novel dual-step metaheuristic optimization with the application of NMF and its data-driven component selection method, this research provides an integrated and efficient framework for flood-prone area rainfall forecasting like Sylhet, Chittagong, Assam and Meghalaya. This not only surpasses the issues of noise and non-linearity but enhances the robustness and accuracy of ANN-based rainfall forecasting as well. The value of this work lies in being able to provide more accurate and credible rainfall prediction, thereby better flood risk management, disaster readiness, and sustainable development in Bangladesh.

Theoretical background

Non-negative matrix factorization

Non-negative Matrix Factorization (NMF) is a dimensionality reduction and matrix factorization technique that factorizes a non-negative matrix V into the product of two lower-rank non-negative matrices: V ≈ WH. Where, V is the original m×n non-negative matrix, W is an m×k non-negative matrix (basis matrix), H is a k×n non-negative matrix (coefficient matrix), k is the reduced rank (chosen to be smaller than both m and n). Figure 1 shows the matrix formation. NMF factorizes V in such a manner that: Columns of W are latent patterns or features of the data. Columns of H contain the weights or contributions of these features in the approximation of V. Mathematically, NMF finds W and H by optimizing the issue (Eq. 1):

$$\:{min}_{W,H\:}{\left|\right|V-WH\left|\right|}_{F}^{2}$$
(1)

Subject to W, H ≥ 0, where F denotes the Frobenius norm. Due to the non-negativity of matrix entries, the decomposition naturally defaults to an additive, part-based description of the data. In contrast to approaches like Principal Component Analysis (PCA), which allows negative components, NMF focuses on constructive combinations and therefore lends itself to interpretability in real datasets. For instances like rainfall data analysis, W may represent basic factors and H signifies to what degree each factor is involved.

Artificial neural network

Artificial Neural Network (ANN) is computer simulations using the structure and function of living neural networks6. ANN consists of processing nodes called neurons, which are networked in layers: an input layer, hidden layers, and an output layer77. Figure 2 shows the basic working system of Neural Network. The fundamental mechanism of an ANN is to get input data, process it by weighted connections, apply activation functions, and then generate an output77. Mathematically, the output of a neuron in a hidden layer is given by Eq. 2:

where. xi represents input features, wi are the associated weights, b is the bias term and f(.) is the activation function56. A single hidden layer was employed in the artificial neural network, (ANN) model, and the number of hidden nodes was determined through iterative experimentation78. This study used the activation functions Leaky ReLU, ReLU, tanh, and sigmoid. The behavior of these activation functions is shown in Eqs. (3-6).

$$Z = f~\left( {\mathop \sum \limits_{{i = 1}}^{n} w_{i} x_{i} } \right) + b$$
(2)
Fig. 1
figure 1

Matrix formation of NMF.

Fig. 2
figure 2

Basic working system of neural network.

$$Sigmoid:{\text{ }}f\left( x \right){\text{ }} = \frac{1}{{1 + ~e^{{ - x}} }}$$
(3)
$$\text{Tanh} :{\text{ }}f\left( x \right){\text{ }} = \frac{{e^{x} - e^{{ - x}} }}{{e^{x} + e^{{ - x}} }}$$
(4)
$${\text{ReLU}}:{\text{ f}}\left( {\text{x}} \right){\text{ }} = {\text{ max }}\left( {0,{\text{ x}}} \right)$$
(5)
$${\text{Leaky ReLU}}:{\text{ f}}\left( {\text{x}} \right){\text{ }} = {\text{ max }}\left( {\alpha {\text{x}},{\text{ x}}} \right),{\text{ where }}\alpha {\text{ is a small constant}}$$
(6)

A transfer function, also referred to as an activation function, is a fundamental function that produces an output depending on the net value of the input, which is the result of the weighted sum of the inputs and biases79. the final output of the ANN is computed as (Eq. 7):

$$Y = f_{{out}} \left( {\mathop \sum \limits_{{j = 1}}^{m} w_{j} Z_{j} + b_{{out}} } \right)$$
(7)

Where \(\:{Z}_{j}\) are activations from the hidden layer, \(\:{w}_{j}\) are weights of the output layer, and \(\:{b}_{out}\) is the bias term.

Metaheuristic optimization

This study employs nature-inspired metaheuristic optimization algorithms for modeling rainfall using artificial neural networks (ANN). It contrasts the performance of newly introduced and popular algorithms, namely Harris Hawks Optimization (HHO), Egret Swarm Optimization Algorithm (ESOA), and Hippopotamus Optimization (HO), with conventional ones, namely Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). There is abundant literature in favor of the effectiveness of GA and PSO for ANN-based prediction issues in a broad spectrum of areas, including but not limited to rainfall prediction80.

On the other hand, HHO, HO, and ESOA are recent metaheuristic developments, inspired by the complex behavioral patterns of animals and swarms. These algorithms are targeted at solving highly nonlinear optimization issues and have demonstrated promising results in various applications. Their use in rainfall modeling is largely untested, and hence, their application in the current research forms a new contribution. By including both classical and new optimizers, this research provides a comprehensive assessment of their impact on ANN performance in rainfall prediction. In the interest of maintaining the brevity of the main manuscript, detailed mathematical equations and algorithmic descriptions of HHO, HO, ESOA, GA, and PSO are provided in Appendices A, B, C, D, and E, respectively.

Fig. 3
figure 3

Proposed framework.

Design and implementation of the proposed method

This study introduces two novel approaches that are aimed at enhancing the performance of predictive models for rainfall data. It initially applies Non-negative Matrix Factorization (NMF) as a data preprocessing technique to rainfall data, which is then provided as input to an Artificial Neural Network (ANN) model. NMF is used to factor out underlying features from rainfall data, thus obtaining dimensionality reduction and improved input feature quality for the ANN. This preprocessing step helps in the improved capture of hidden patterns in the data and thus improves the predictive accuracy of the ANN model.

In the second approach, three newly introduced nature-inspired metaheuristic algorithms such as Harris Hawks Optimization (HHO), Hippopotamus Optimization (HO), and Egret Swarm Optimization Algorithm (ESOA) and two classical algorithms such as Genetic Algorithm (GA) and Particle Swarm Optimization (PSO), are utilized for optimizing the weights of the Artificial Neural Network (ANN) model. In this study, a novel two-stage optimization framework is proposed, wherein the algorithms are combined in hybrid pairs: HHO-HHO, HHO-HO, HHO-ESOA, HHO-GA, and HHO-PSO. The efficiency of these hybrid two-step optimization methods is compared to that of conventional single-step optimization by the respective algorithms (HHO, HO, ESOA, GA, and PSO). The paradigm, not only takes advantage of each optimization technique in isolation but synergistically benefits from a combined approach for achieving superior results in fine-tuning the parameters of the model. The synergistic combination of NMF preprocessing and two-step optimization produces a robust platform for accurate rainfall forecasting, one of the significant advances in predictive modeling. The two major sections are divided into the preprocessing phase and the optimization phase, which represent the primary objective and novelty of this research. The overall methodology is presented in in Fig. 3.

Preprocessing phase

In the first step, Non-Negative Matrix Factorization (NMF) is applied for preprocessing the data before Artificial Neural Network (ANN) training. NMF is a technique of matrix factorization that breaks down a given non-negative matrix X, rainfall data, into two non-negative matrices: W (Basis Matrix) and H (Coefficient Matrix). It decomposes iteratively following a multiplicative update rule. The selection of the number of NMF components is a critical choice because it directly influences model accuracy, training time, and the ability to generalize. An inappropriate decision may lead to underfitting if too few features are chosen, resulting in a loss of useful information, or overfitting if too many features are picked, incorporating irrelevant or noisy features. Since ANN training requires an optimized feature space, a wrong choice of features aggravates model complexity by necessitating additional layers for feature extraction, affecting weight initialization and convergence rate in ANN training. To determine the optimal number of components, a systematic approach is taken in which the number of components is changed and the reconstruction error is evaluated. The aim is to determine where more components do not yield much more improvement in reconstruction accuracy to avoid underfitting and overfitting.

To solve this, the reconstruction error changes are inspected using derivative-based and curvature-based methods. The first derivative (\(\:\varDelta\:\) Error) measures the rate of diminishing error with the increase in the number of components, and the second derivative (\(\:{\varDelta\:}^{2}\) Error) is the acceleration of diminishing error. The third derivative (\(\:{\varDelta\:}^{3}\) Error) provides information about the stability of the second derivative, and the curvature analysis (K) evaluates the bending of the error function to find the greatest improvement in the reconstruction quality. By ascertaining the number of components where the second derivative is maximized, the third derivative peaks, or the curvature is highest, an optimal trade-off is achieved between feature representation and computational complexity. This ensures that the preserved components capture the salient patterns in the data without including redundancy or noise. The low-dimensional representation thus obtained is then used for ANN training, leading to improved model performance and increased generalization capacity.

Optimization phase

In this study, a two-stage hybrid optimization methodology is presented to enhance the accuracy and efficiency of Artificial Neural Network (ANN) training. The initial stage is used to optimize the ANN architecture, and the second stage applies state-of-the-art hybrid algorithms to optimize the network weights. By combining Bayesian Optimization (BO) to set architecture and a range of robust hybrid optimization techniques to tune weights, the study aims to improve the overall performance of the ANN so that it can converge more quickly with reduced error when trained.

Stage 1: Bayesian optimization for architecture configuration

The initial step of the optimization process employs Bayesian Optimization (BO) for the optimal design of the ANN structure (Appendix F). In this scenario, BO is employed for finding two very crucial details in the network environment such as the transfer function and number of hidden neurons. The activation function in each layer in the ANN, i.e., ‘relu’, ‘sigmoid’, ‘tanh’, and ‘leakyrelu’. BO helps in the identification of the best-fitted activation functions. The optimal number of hidden neurons of the ANN, which plays a significant role in determining the capacity and complexity of the model, is optimized using BO. The method looks for various numbers of hidden layers to find the best depth to attain the best performance. Bayesian Optimization is particularly well positioned for this task since it has the capability of considering the probabilistic distribution of performance across different configurations. Probabilistic character allows the optimization process to reflect on uncertainty and aggressively search for the architecture minimizing the validation error (Mean Squared Error or MSE). Utilizing BO ensures the network architecture is selected in an efficient computational, data-driven manner to avoid overfitting or underfitting.

Stage 2: weight tuning using hybrid dual stage algorithms

Once the architecture has been optimized in the initial stage, the second phase focuses on fine-tuning the ANN weights using a combination of Harris Hawks Optimization (HHO) and other leading-edge optimization techniques (Appendix A). Hybrid algorithms are used to improve training efficiency and performance through optimal weights for quicker convergence and fewer total errors during training. The overall methodology is illustrated in Fig. 3 as well, and the Pseudocode is provided.

Hybrid HHO-PSO (Harris Hawks optimization and particle swarm optimization)

In the HHO-PSO hybrid method, the process begins with HHO conducting a global search for optimal ANN weights in the entire solution space. The HHO algorithm is tasked with searching the weight space widely, identifying regions that are most likely to contain optimal solutions. After potential regions are identified, the Particle Swarm Optimization (PSO) algorithm is employed to locally optimize these solutions (Appendix E). PSO updates the particles’ positions with their individual best-known position and global best-known position discovered by the swarm. The combination of HHO’s global exploration and PSO’s local enhancement ensures strong convergence towards an optimal solution, improving the ANN’s training performance.

First-stage HHO: global exploration of optimal regions.

The HHO algorithm is used during the first stage to perform a global exploration of the whole weight space of the ANN. This stage aims to examine different regions and identify regions in which optimal weight settings could lie. HHO mimics the cooperative hunting strategy of Harris hawks with dynamic exploration mechanisms such as:

  • Levy flight-based search: The hawks explore the solution space extensively, evading premature convergence by taking huge random leaps.

  • Surrounding and preying on the opponent: Solutions get modified based on measurements of fitness, and variable strategies are utilized to stay adaptable.

  • Escaping local minima: Using dynamic adjustment in exploration parameters, the initial-stage HHO ensures that the search is over a wide and diverse set of solutions.

After the completion of this phase, the HHO algorithm has shortlisted the most promising regions where optimal ANN weights will be obtained.

Second-stage PSO: local optimization.

Once high-potential areas of the search space have been identified by the HHO algorithm, the PSO algorithm performs local optimization and fine-tuning of the selected solutions. It enhances convergence with fine-tuning of weights within the recognized areas. Critical PSO mechanisms of this phase are:

  • Particle-Based Search: Each solution is depicted as a particle, which moves across the search space based on the best-known positions.

  • Individual and Global Best Updates: Each particle updates its position based on Personal Best Position and Global Best Position.

  • Velocity and Position Updates: The movement of particles is controlled using velocity updates, which produce a localized adaptation and gentle convergence.

Hybrid HHO-HHO (Harris Hawks optimization and Harris Hawks optimization)

The HHO-HHO hybrid makes use of two instances of the HHO algorithm. In this setup, the first HHO algorithm globally searches the solution space to determine the areas where the potentially optimal ANN weights exist. The second HHO algorithm then fine-tunes these discovered solutions through a local search. This two-stage procedure provides enhanced exploration and exploitation phases in the optimization process. By applying HHO in two stages, the method increases robustness and flexibility to complex search spaces with enhanced performance and accuracy.

First-stage HHO: local exploitation.

In the first phase, the HHO algorithm performs a global search across the whole solution space (Appendix A). The prime aim of this phase is to find the potentially good regions in which the optimal ANN weights would be likely located. The HHO algorithm imitates the cooperative hunting behavior of Harris hawks using different strategies such as:

  • Exploration phase: The hawks randomly search the search space based on patterns of Levy flights to avoid premature convergence.

  • Surrounding prey: The algorithm estimates diverse solutions at each stage, changing their positions based on the solution quality.

  • Diverse searching strategies: The search is dynamically diversified across different movement patterns (e.g., soft and hard besiege strategies) for enhancing adaptability.

After this stage, the initial HHO algorithm effectively narrows down the potential regions where better-performing ANN weight configurations are found.

Second-stage HHO: local exploitation.

Once the initial HHO has identified potential areas, the second HHO algorithm performs a local search within the more specific areas. The weight values chosen are further refined through finer exploitation methods during this stage, thus better convergence towards the optimal point is guaranteed. The key steps are:

  • Focused adjustment: Instead of investigating the entire solution space, this phase operates in the areas found in the first phase, producing faster and better optimization.

  • Exploitation phase: The second HHO incidence optimizes solutions so that adaptive dynamic parameters are employed to control the movement of the hawks based on the fitness landscape.

  • Avoiding local minima: Through adaptive switching between different escaping and besieging strategies, the algorithm avoids stagnation and improves convergence reliability.

Hybrid HHO-HO (Harris Hawks optimization and hippopotamus optimization)

The HHO-HO hybrid method combines HHO with the Hippopotamus Optimization (HO) algorithm (Appendix B). In the hybridization, HHO conducts the global search for promising regions of the weight space, and HO is employed to locally refine these solutions. HO, inspired by hippopotamuses’ territorial and foraging behavior, enhances the algorithm’s ability to explore solution spaces more thoroughly. By combining the global search of HHO with the local exploitation of HO, the hybrid method improves the efficiency and effectiveness of the optimization process with faster convergence to the optimal solution.

First-stage HHO: global search for promising regions.

Optimization begins with HHO, the problem of searching the ANN weight space globally for promising regions. Inspired by the distributed hunting behavior of Harris hawks, HHO uses adaptive search strategies to find promising regions:

  • Exploration stage: During this stage, HHO employs Levy flight-based random flight, diversifying the search while avoiding premature convergence.

  • Surrounding and besieging prey: Several adaptive strategies, such as soft/hard siege mechanisms, provide robust search coverage.

  • Adaptive search mechanism: HHO dynamically oscillates between exploitation and exploration based on the fitness landscape. After this phase, HHO has identified high-potential areas in the solution space where the optimal ANN weight configurations are likely to be discovered.

Second-stage HO: local refinement and exploitation.

Once HHO has identified potential regions, the HO algorithm is applied to optimize the solutions found. HO is inspired by hippopotamus foraging and territoriality, which is the cause of an effective local search strategy:

  • Local exploration based on social behavior: Hippos explore in their territory, a focused improvement of solutions. This mechanism ensures step-by-step but precise modifications in the ANN weight space.

  • Self-adaptation strategy: Relative to solely random search algorithms, HO dynamically modifies movement step size based on optimization advancement. This allows HO to enhance the exploitation of near-optimal solutions and enhance the convergence rate.

  • Territorial dominance for optimal solution selection: Solutions are updated iteratively about their relative fitness ranking among a population. This method facilitates the removal of suboptimal weights, leading to improved ANN performance.

Hybrid HHO-ESOA (Harris Hawks optimization and Egret swarm optimization algorithm)

The HHO-ESOA hybrid method combines the global search capability of HHO with the swarm intelligence of the Egret Swarm Optimization Algorithm (ESOA) (Appendix C). Here, HHO performs the global search for the solution space and finds promising regions for ANN weights. After a global search, ESOA optimizes these solutions with cooperative behavior inspired by that of egret birds, which cooperate to achieve their goals. The cooperative search improves the accuracy of local search and leads to more accurate solutions. The combination of HHO’s global search and ESOA’s local tuning improves the convergence and optimization of the weights of the ANN.

First-stage HHO: global search for potential regions.

The optimization process begins with HHO, which performs a wide search of the ANN weight space. The HHO algorithm is inspired by the hunting style of Harris hawks, wherein cooperative hunting strategies are used to locate prey efficiently. Main Mechanisms of HHO during the Global Search Stage:

  • Exploration using levy flight movements: HHO explores the solution space randomly first to avoid premature convergence.

  • Adaptive switching between phases: HHO switches between different search strategies, such as soft and hard besiege mechanisms, dynamically based on fitness evaluations.

  • Identification of high-potential regions: HHO narrows down the search space after a few iterations, pinpointing the high-potential regions where the optimal ANN weights might be located.

HHO has effectively performed a global scan at the end of this phase, identifying the regions in the search space where refinement is needed.

Second-stage ESOA: cooperative local optimization.

Once HHO has identified the high-potential regions, the ESOA algorithm is tasked with optimizing and fine-tuning these solutions. ESOA is inspired by the cooperative foraging behavior of egret birds, which work together to locate food efficiently. Key Mechanisms of ESOA during the Local Optimization Phase:

  • Cooperative search behavior: ESOA employs group-level intelligence wherein egret-inspired agents plan and coordinate their movement to enhance the quality of the solution.

  • Leader-follower dynamics: The best solutions prompt others to shun stagnation and enable perpetual enhancement.

  • Dynamic step size adjustment: ESOA adjusts movement strategies by using feedback from fitness, which causes fine-tuning ANN weights accurately.

By integrating cooperation-based intelligence into the optimization process, ESOA avails greater precision in ANN weight optimization, leading to a stronger and more efficient model.

Hybrid HHO-GA (Harris Hawks optimization and genetic Algorithm)

In the HHO-GA hybrid approach, HHO performs the global search to identify potential regions for the optimal ANN weights. Once these regions are identified, the Genetic Algorithm (GA) is used to further refine the solutions (Appendix D). GA evolves and searches the population of solutions using evolutionary principles such as crossover, mutation, and selection. The genetic operators introduce diversity into the search process to avoid premature convergence and achieve robust optimization. The interaction between HHO’s global search and GA’s local search accelerates convergence and optimizes the ANN’s weight configuration.

Initial-stage HHO: global search for potential ANN weights.

Optimization starts with HHO, which efficiently searches the search space to locate regions of high promise for the best ANN weights. Inspired by the cooperative hunting strategy of Harris hawks, HHO employs adaptive search mechanisms:

Main Mechanisms of HHO in Global Search:

  • Exploration using adaptive movements: Employs Levy flight-based randomness to prevent stagnation in local minima.

  • Dynamic search strategy: Alternates between soft and hard besiege phases, mimicking real hunting tactics.

  • Identification of high-fitness areas: Targets fruitful areas where the best solutions are likely to be.

By this phase, HHO has discovered the most promising ANN weight areas but requires further fine-tuning for higher accuracy.

Second-stage GA: evolutionary refinement of solutions.

After HHO has minimized the search space, the Genetic Algorithm (GA) is employed to optimize and fine-tune the solutions further. GA follows natural selection processes, improving solutions over generations. Primary Mechanisms of GA in Local Optimization:

  • Crossover (recombination): Combines genetic information from two parent solutions to generate improved offspring solutions.

  • Mutation: Introduces random changes in candidate solutions to maintain diversity and avoid premature convergence.

  • Selection (survival of the fittest): Chooses the best-performing solutions of the next generation, accelerating convergence towards an optimal ANN weight configuration.

GA’s preservation of diversity ensures that the optimization process will never be trapped in poor local minima, increasing overall robustness.

The hybrid optimization techniques function iteratively, alternating between global exploration and local exploitation. HHO (or some other global search technique) does an initial general search of the weight space to discover regions with possibly optimal ANN weights. Once these regions are discovered, a secondary optimization technique such as HO, HHO, PSO, and GA is used to search the solutions locally for optimization. This balance between exploitation and exploration ensures that the optimization process explores the solution space extensively but converges rapidly to an optimal solution.

Optimal model hypermeters range

Another crucial aspect in the optimization process is the selection of appropriate hyperparameter ranges, as they play a major role in determining the overall performance of the model. In the present study, the optimization is carried out using various optimizations, where several key parameters must be specified with care. These are the number of hawks, the top number of iterations, the boundary weight constraints, the transfer range function, and the hidden range neuron in the model. The selection of these hyperparameters will directly influence the ability of the model to learn and generalize, so it is crucial to specify a well-defined range for each parameter. However, specifying such ranges has its constraints since too broad a range or too tight a range can lead to poor performance or convergence issues. In this, the size of the population of hawks in the HHO algorithm is selected as 50 and the maximum number of iterations is also selected as 50. In the case of activation functions, four common transfer functions ReLU, sigmoid, tanh, and leaky ReLU are selected to test the performance of each of them for model training. Further, the neurons in the hidden layer are selected between 1 and 30 for selecting the optimal network setting. By accurately tuning these hyperparameters, the model aims to achieve improved predictive accuracy and stability as well as fix potential problems related to the optimization process.

Pseudocode of the proposed method

figure a
Fig. 4
figure 4

Study area map.

Study area and data

Four major heavy precipitation areas have been selected in this study: Bangladesh’s Chittagong and Sylhet and India’s Assam and Meghalaya (Fig. 4). These regions contribute significantly to flooding annually in Bangladesh. Flooding occurs annually, and it primarily occurs due to intense rainfall over these catchment areas. Heavy monsoon rains fill the rivers within a very short time, which creates widespread inundation all over Bangladesh. Each year, torrential floods cut short livelihoods and infrastructure and gravely imperil agriculture, public health, and regional ecosystems. It is therefore vital to have accurate rainfall modeling in these regions for the improvement of flood management procedures in Bangladesh.

The data required for developing the model on rainfall prediction was collected using Google Earth Engine (GEE)81. GEE is a newly developed cloud-based geospatial analysis platform that allows users to visualize and analyze satellite images of the planet. This study uses a dataset comprising 43 years of daily rainfall data, ranging from 1981 to 2023. Such a long-term dataset will catch both the short- and long-term rainfall trends82,83,84. While developing a machine learning (ML) model, data split for training, validation, and testing is a critical process. Based on available research, it has been predominantly noted that the employment of the 70:15:15 split provides an effective and well-structured approach. The said split allows sufficient data for learning (70% training), hyperparameter tweaking and fine-tuning (15% validation), and final performance evaluation (15% testing), leading to stable and consistent predictions85,86. The rainfall data gathered was appropriately preprocessed to determine its accuracy and consistency. Missing values were handled either by interpolating or replacing them with the mean of surrounding observations. The map of the study area shown in Fig. 4 was generated using Google Maps (https://maps.google.com/). Historical rainfall records for the study area are shown in Fig. 5.

Model performance evaluation

Several key statistical metrics exist that can help in measuring accuracy and efficiency for the predictive models, each providing different insights. Table 1 shows all evaluation matrices. These key metrics in this research include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the Coefficient of Determination (R^2). Mean Squared.

Fig. 5
figure 5

Time series plot of rainfall.

Table 1 Evaluation matrix.

Error (MSE) computes the average of squared differences between actual and predicted values, as represented by Eq. 8. MSE punishes larger errors more than smaller ones and is therefore outlier-sensitive. Smaller MSE indicates a better model. Root Mean Squared Error (RMSE) is the square root of MSE, as represented by Eq. 9, which has the same unit as the target variable and is, therefore, easier to interpret. It is also susceptible to big errors and is usually utilized in evaluating regression models. Mean Absolute Error (MAE) approximates the mean absolute differences between actual and predicted values, and it is denoted as Eq. 10. MAE doesn’t square the errors as MSE and RMSE do, and therefore it isn’t as much influenced by outliers. Lower MAE indicates greater predictive accuracy. The coefficient of Determination R^2 estimates how well the model fits to describe the variance in data and is defined as Eq. 11. It takes values between 0 and 1, where higher values indicate a better fit, although it may turn out to be negative if the model is poorer than predicting the mean. In general, MSE and RMSE put more emphasis on larger errors, MAE provides a straightforward average error metric, and R^2 quantifies the model’s explanatory power.

Fig. 6
figure 6

Reconstruction error plot and its derivatives (Sylhet).

Fig. 7
figure 7

Reconstruction error plot and its derivatives (Chittagong).

Fig. 8
figure 8

Reconstruction error plot and its derivatives (Assam).

Fig. 9
figure 9

Reconstruction error plot and its derivatives (Meghalaya).

Result and discussion

This research presents a rainfall prediction model for Chittagong, Sylhet, Assam and Meghalaya. Heavy rainfall at these locations has been the primary driver of flooding in Bangladesh every year. In this research, Artificial Neural Networks (ANN) are applied to predict rainfall with a focus on enhancing ANN training through the implementation of Non-negative Matrix Factorization (NMF). Besides, this work integrates newly proposed nature-inspired metaheuristic optimization algorithms like Harris Hawks Optimization (HHO), Hippopotamus Optimization (HO), and Egret Swarm Optimization Algorithm (ESOA), and well-known optimizers like Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) in optimizing the ANN weights and biases. This paper also introduces a new dual-step approach through the combination of algorithms like HHO-HHO, HHO-HO, HHO-ESOA, HHO-GA, and HHO-PSO to produce better prediction capabilities. Moreover, a probabilistic optimization technique such as Bayesian Optimization is employed to optimize the optimal ANN transfer function and number of hidden layers.

Result of model training phase

The first step involves training the Neural Network using Non-negative Matrix Factorization (NMF) where the NMF number of components is optimized by selecting the optimal number of components from the reconstruction error plot. The objective is to select the point beyond which more components give minor improvements in reconstruction accuracy to avoid underfitting and overfitting. To achieve this, the reconstruction error changes are analyzed based on derivative-based and curvature-based methods. The first derivative (∆Error) measures the rate at which the error drops with an increasing number of components, and the second derivative (∆²Error) measures the acceleration of this error reduction. The third derivative (∆³Error) measures the stability of the second derivative, and curvature analysis (K) evaluates the bending of the error function to identify the peak improvement in reconstruction quality. Figures 6, 7, 8 and 9 plot the error and its derivatives, and the selection of the training component (W) is done based on these analyses. Table 2 shows the optimal number of components based on the Figs. 6, 7, 8 and 9.

In this study, the second derivative of the error (Max Δ²Error) was used as the criterion for selecting the optimal number of components in the Non-Negative Matrix Factorization (NMF) method. The second derivative can correctly identify the “elbow point,” where the rate of reduction in error plateaus, giving a good trade-off between model complexity and accuracy87. As opposed to the third derivative (Max Δ³Error) or curvature (Max Curvature), which may be too sensitive to minor variations and noise, the second derivative is a more interpretable and stable choice criterion88. Component selection based on the second derivative also avoids overfitting by retaining only the meaningful components and avoiding unnecessary computational complexity89.

Table 2 Optimum number of components.

Result declaration of the optimization phase

In the optimization phase of the Artificial Neural Network (ANN), Bayesian Optimization (BO) is utilized to start with the optimization of the best hyperparameters, e.g., the training function and structure of the hidden layers. The results of the Bayesian Optimization for each station are shown in Table-3. Then the weight optimization of the ANN model is carried out. The model.

is then subdivided into different single-objective optimization techniques, which are referred to as HHO, ESOA, HO, GA, and PSO. All of these techniques optimize a single parameter or objective independently to enhance the performance of the ANN. In.

addition, the study analyzes dual-step optimization models that integrate two different optimization techniques step by step. These dual-step models introduce combinations such as HHO-HHO, HHO-ESOA, HHO-HO, HHO-GA, and HHO-PSO. By applying these optimization techniques step by step, the dual-step models attempt to elevate the entire optimization process and model performance to an even greater level.

Results obtained for the Sylhet region

At Sylhet, the performance of different optimization methods varies significantly, as shown in Table 3. For better readability, an MAE-based radar chart, as shown in Fig. 10, is also presented, where axes distinguish single and double phase methods. In the figure, points near the center represent lower prediction errors. Among the independent models, Harris Hawks Optimization (HHO) proves to be the most accurate with an MSE of 2.96, RMSE of 1.72, MAE of 1.12, and an impressive R² value of 0.99. This means HHO is remarkably precise in prediction with almost no error. Egret Swarm Optimization (ESOA), nonetheless, has higher error values (MSE = 25.94, RMSE = 5.09, and R² = 0.91), showing its predictive accuracy is poorer than HHO. Particle Swarm Optimization (PSO) is excellent with an MSE of 4.97, RMSE of 2.23, MAE of 1.06, and R² of 0.98, and it is a highly competitive algorithm though not as accurate as HHO. Genetic Algorithm (GA) ranks second with an MSE of 7.86, RMSE of 2.8, MAE of 1.42, and R² of 0.97, indicating moderate accuracy but lagging behind HHO and PSO. In contrast, Hippopotamus Optimization (HO) is the worst in predictive potential with an MSE of 199.65, RMSE of 14.12, and a much lower R² of 0.34, and hence not well suited for modeling accurately here.

When hybrid models come into play, significant improvements can be seen. The HHO-ESOA model does the best in all aspects, with an MSE of 1.71, RMSE of 1.31, MAE of 0.64, and R² of 0.99, doing better in accuracy compared to any of the individual models. In a similar vein, HHO-HHO does very well as a prediction model, with an MSE of 1.75, RMSE of 1.32, MAE of 0.59, and R² of 0.99, making it another highly effective hybrid model. Other hybrid models, such as HHO-PSO (MSE = 4.92, RMSE = 2.21, R² = 0.98) and HHO-GA (MSE = 1.94, RMSE = 1.39, R² = 0.99), offer competitive accuracy, demonstrating that the integration of HHO with other optimization techniques can further enhance prediction performance. Conversely, HHO-HO (MSE = 5.06, RMSE = 2.24, R² = 0.98) shows a remarkable improvement over HO in isolation but is weaker than the other hybrid models.

Results obtained for the Chittagong region

For the Chittagong region, the prediction ability of different optimization techniques is quite different. Among the individual models, Harris Hawks Optimization (HHO) is outstanding, with an MSE of 0.12, an RMSE of 0.34, and an R² value of 0.99, indicating nearly perfect predictions with very little error. Particle Swarm Optimization (PSO) does better, in fact, with an MSE of 0.078, RMSE of 0.27, MAE of 0.16, and an R² of 0.99, the optimal individual model. Genetic Algorithm (GA), while maintaining satisfactory prediction capability, produces higher error measures (MSE = 2.7, RMSE = 1.64, R² = 0.98) compared to HHO and PSO. Egret Swarm Optimization (ESOA) does a lot worse, with an MSE of 6.29, an RMSE of 2.5, and an R² of 0.96, which is less appropriate for precise forecasting in this case. Hippopotamus Optimization (HO) has the worst performance, with a very high MSE of 43.12, RMSE of 6.56, and an R² of 0.78, with weak predictive power and unsuitability for this purpose.

Table 3 Results of single and double step Framework.
Fig. 10
figure 10

Radar chart based on log-scaled MAE values for comparative analysis of results.

For hybrid models, improvements are observed to be high. The HHO-HHO model achieves a low MSE of 0.11, RMSE of 0.33, and R² of 0.99, vindicating its strength. The HHO-PSO hybrid model is better than all the models with an MSE of 0.0083, an RMSE of 0.091, and an R² of 1, nearly being perfect in prediction accuracy. The other hybrid models, such as HHO-ESOA (MSE = 0.18, RMSE = 0.42, R² = 0.99) and HHO-GA (MSE = 0.5, RMSE = 0.71, R² = 0.99), are also very good at making predictions but slightly less so than HHO-PSO. The HHO-HO hybrid (MSE = 1.43, RMSE = 1.19, R² = 0.99) is better than HO on its own but worse than the other hybrid combinations.

Results obtained for the Meghalaya region

For Meghalaya, predictive accuracy with different optimization techniques varies in individual and hybrid models. Among the individual approaches, Harris Hawks Optimization (HHO) performs well in terms of predictive performance with an MSE of 3.17, RMSE of 1.78, and R² of 0.98, which reflects high predictive accuracy. Particle Swarm Optimization (PSO) provides nearly similar values, with an MSE of 2.59, RMSE of 1.61, and R² of 0.98, making it a serious contender for HHO. Genetic Algorithm (GA) produces larger error values than HHO and PSO, with an MSE of 4.33, RMSE of 2.08, and R² of 0.97, reflecting a slight decline in precision. Egret Swarm Optimization (ESOA), however, shows a substantial decrease in performance with an MSE of 14.92, RMSE of 3.86, and an R² of 0.93 and hence is less effective for this particular task. Hippopotamus Optimization (HO) has the same performance as HHO but is ever so slightly less effective with an MSE of 3.68, RMSE of 1.92, and R² of 0.98.

When hybrid models are considered, dramatic accuracy improvements are observable. HHO-HHO, MSE = 2.95, RMSE = 1.71, R² = 0.98, possesses a strong predictive power. Hybrid blends with other algorithms work even better. HHO-ESOA (MSE = 1.55, RMSE = 1.24, R² = 0.99) and HHO-PSO (MSE = 1.49, RMSE = 1.22, R² = 0.99) are found to possess much higher accuracy, hence further cementing their feasibility. The other hybrid variants, HHO-GA (MSE = 1.51, RMSE = 1.23, R² = 0.99) and HHO-.

HO (MSE = 1.43, RMSE = 1.19, R² = 0.99), are also shown to have decent prediction capabilities with HHO-PSO emerging as one of the finest hybrid techniques.

Results obtained for the Assam region

In Assam, the performance of the predictive capability of different optimization algorithms varies in success. Out of the single models, Harris Hawks Optimization (HHO) is successful with an MSE of 2.64, RMSE of 1.62, MAE of 0.96, and R² of 0.97, indicating valid predictive potential. Particle Swarm Optimization (PSO) also exhibits a comparable level of performance, with an MSE of 2.48, RMSE of 1.57, and R² of 0.97, which makes PSO a good alternative to HHO. On the other hand, the Genetic Algorithm (GA) decreases its performance with an MSE of 4.48, RMSE of 2.11, and R² of 0.95, reflecting a moderate increase in error. Furthermore, Egret Swarm Optimization (ESOA), with an MSE of 4.68, RMSE of 2.16, and R² of 0.95, is equal to that of GA but less effective than HHO and PSO. Hippopotamus Optimization (HO), with MSE of 2.6, RMSE of 1.61, and R² of 0.97, is revealed to be as effective and even better in performance than HHO.

In the consideration of hybrid models, predictive accuracy improves significantly. HHO-ESOA, with an MSE of 1.12, RMSE of 1.06, and R² of 0.98, is found to be the top-performing combination, minimizing error values substantially and improving precision. HHO-HHO (MSE = 3.2, RMSE = 1.78, R² = 0.96) and HHO-PSO (MSE = 2.06, RMSE = 1.43, R² = 0.97) are also among the top-performing combinations, with the latter exhibiting a considerable improvement over individual model. The remaining hybrid models, such as HHO-GA (MSE = 1.59, RMSE = 1.26, R² = 0.98) and HHO-HO (MSE = 1.58, RMSE = 1.26, R² = 0.98), demonstrate high predictive accuracy, indicating the merits of combining optimization techniques for enhanced forecasting.

Station-wise analysis of best-performing optimization models

Out of the models from the Sylhet station, the hybrid HHO-ESOA optimizer was the overall best performing, which yielded an MSE of 1.71, MAE of 0.64, RMSE of 1.31, and R² value of 0.99. Notably, this model was also able to correctly forecast the peak rainfall values, reflecting its effectiveness in handling high fluctuation situations. Table 3 presents the best-performing models for each station. The HHO-PSO optimizer gave outstanding results at the Chittagong station as well, with an exceptionally low MAE of 0.04, MSE of 0.0083, and RMSE of 0.091, along with a very close-to-perfect R² value of 1.00, meaning very accurate predictions across the whole range, including peak points. At Meghalaya, the rainiest of all stations, the HHO-HO optimizer performed flawlessly, recording the optimum MSE of 1.43, MAE of 0.57, and RMSE of 1.19, while having a very high R² value of 0.99. The model was particularly effective in modeling sharp increases in rainfall, detecting peak events with reasonable accuracy. Similarly, at the Assam station, hybrid optimizers outperformed others, and the HHO-ESOA model gave the best performance: MSE of 1.12, RMSE of 1.06, MAE of 0.60, and R² of 0.98. It also demonstrated strong capability in detecting the maximum rainfall intensities, thereby improving the prediction accuracy during the critical periods.

Result comparison between single step and dual step optimization framework

The Table 4 provides comparative information about various optimization techniques applied for predictive accuracy improvement at four stations: Sylhet, Chittagong, Meghalaya, and Assam. All optimizers (HHO, HO, ESOA, GA, PSO) are compared with a two-step hybrid approach (HHO-HHO, HHO-HO, HHO-ESOA, HHO-GA, HHO-PSO). Negative percentage values indicate improvement, i.e., the double-step optimization method reduces the error of the single optimizer. Outputs indicate that hybrid models outperform single optimizers. Figure 10 presents a radar chart showcasing the performance enhancements obtained through the use of two-step optimized models.

Table 4 Error reduction through hybrid optimization systems.

For Sylhet, HHO-HHO reduces MSE by 40.54%, RMSE by 23.26%, and MAE by 47.32%, which indicates that using HHO twice enhances performance. The hybrid approach HHO-ESOA greatly improves ESOA, with a 93.39% reduction in MSE, demonstrating that hybridization greatly improves ESOA. Surprisingly, HHO-HO gains the highest performance improvement in Sylhet, reducing MSE, RMSE, and MAE by 97.46%, 84.22%, and 81.99%, respectively, demonstrating that the synergy between the global exploration of HHO and the local tuning of HO is highly effective. Similarly, HHO-GA and HHO-PSO improve results, but the PSO-based hybrid only marginally improves in this station. In Chittagong, the hybrid models are good, particularly HHO-ESOA, which reduces MSE, RMSE, and MAE by 97.10%, 85.20%, and 85.83%, respectively. HHO-HO and HHO-GA also show considerable improvements, reducing MSE by 80.11% and 81.48%, respectively, as proof that GA is further improved. Unexpectedly, HHO-PSO witnesses a massive 89.33% reduction in MSE, which is in contrast to Sylhet, confirming that PSO’s performance enhancement is dataset-oriented. Meghalaya also observes the same trend but with considerably smaller gains. HHO-ESOA again records the most MSE reduction at 89.57%, followed by other combinations such as HHO-HO and HHO-GA that have a moderate boost. In the case of Assam, two-step approaches maintain their advantage, with HHO-ESOA reducing MSE by 76.09%, HHO-HO by 65.73%, and HHO-GA by 64.61%.

The result confirms that employing a double-step optimization approach enhances prediction accuracy significantly by reducing errors in all key metrics. The result of all double-step and single-step models is depicted in Fig. 11 with the Lorenz Curve. The Lorenz Curve is often used to compare model efficiency by graphing the cumulative proportion of actuals vs. the cumulative proportion of predictions. The Perfect Equality Line is an ideal 1:1 relation, and deviance from it detects variation in model performance.

Fig. 11
figure 11figure 11

Lorenz curve for actual and predicted values.

Justification for employing a dual-step optimization approach

Two-step optimization outperforms one-step optimization in Artificial Neural Network (ANN) weight optimization for rainfall modeling because of several critical benefits of search efficiency, convergence rate, and local minimum robustness. The primary reasons why hybrid two-step approaches (e.g., HHO-HHO, HHO-HO, HHO-ESOA, HHO-GA, HHO-PSO) perform better than single optimizers (HHO, HO, ESOA, GA, PSO) in rainfall prediction models are: Single-step optimization algorithms tend to fall short in striking a balance between exploration (global search for improvement) and exploitation (local tuning for precision). Double-step optimization, however, leverages the best of two optimizers, where the first step involves global search and the second involves fine-tuning the weights locally. For instance, in HHO-HO, HHO efficiently explores the solution space, and HO fine-tunes the outcomes so that premature convergence is avoided. This leads to more accurate rainfall predictions.

Rainfall data is extremely nonlinear, and ANN training is typically plagued by local minima traps, and hence poor performance. One optimizer sometimes cannot escape from such local minima, and the predictive accuracy is limited. A two-stage hybrid optimizer such as HHO-ESOA employs HHO initially to roughly search for the best ANN weights and then ESOA to refine to avoid the network from becoming stuck in a local minimum. Optimizers may also converge more slowly, i.e., require more iterations to get to the optimal ANN weights. A two-stage technique accelerates convergence by first approximating an optimal region and then refining it in detail. For instance, HHO-GA applies the good exploratory capabilities of HHO first and then GA for more precise weight selection. It reduces training time while improving prediction stability. On the other hand, Single-step optimizers overfit to training data, especially when they are optimizing ANN weights for highly fluctuating rainfall data. Hybrid optimizations help prevent overfitting by making the first phase find a stable global solution and the second phase enhances the network’s generalization ability. This makes the model less sensitive to noise in real rainfall predictions.

Sensitivity analysis on the number of hidden neurons

One of the most significant parameters influencing Artificial Neural Network (ANN) performance is the number of hidden neurons, as noted in previous research90. To investigate this influence, a sensitivity analysis was conducted by expanding the optimization range of hidden neurons from 1 to 30 to 1–100. The objective was to assess whether model capacity increases would lead to improved prediction performance at different stations.

The results were varied depending on the optimizer and station. Significant improvement was observed for HO (98.25% MSE reduction), GA (58.52%), and ESOA (57.67%) at increased ranges of neurons. There were, however, some models such as HHO (− 55.23%), HHO-HHO (− 105.74%), and HHO-GA (− 63.74%) with increased errors indicating a performance drop at increased neuron sizes. For the Chittagong station, HHO-GA (98.08% error reduction), HO (96.67%), and GA (93.86%) had higher neuron search space. Conversely, some of them—particularly HHO-PSO (− 698.80%), HHO-ESOA (− 523.67%), and PSO (− 2,947.56%)—experienced sudden spikes in error, which means their best settings were already at the lower neuron scope.

In Meghalaya, expanding the number of neurons improved models such as GA (60.25%), ESOA (39.82%), and HHO-HHO (45.93%), whereas models such as HHO-ESOA (− 119.42%) and HHO-HO (− 113.14%) saw error growth. Similarly, at the Assam station, HHO-HHO (52.02%), PSO (32.43%), and HHO-PSO (27.15%) were improved with a rise in neurons, whereas HHO (− 170.08%) and HHO-GA (− 32.70%) saw error growth.

The results indicate that the influence of hidden neuron number is highly sensitive to both optimizer choice and the character of available data. Figure 12 (3D) illustrates the influence of varying the number of hidden neurons on the performance of the model, indicating the role of network complexity in prediction accuracy. While additional neurons may be utilized to enhance performance in certain models, it may also result in overfitting or instability in other models. Therefore, careful tuning of the quantity of hidden neurons is required to balance model accuracy and generalizability. Figure 13 presents log-scaled MSE values for all configurations, where positive axis bars indicate values > 1 and negative bars indicate MSEs < 1 because of the logarithmic scale. Figure 14 displays the percent change in MSE with more neurons, cut off at ± 200% for legibility. Green bars represent improvements (error decrease), and red bars represent degradation in performance. From these results, it is clear that the effect of increasing the number of hidden neurons is vastly varied depending on both the optimizer used and the station in question. A positive percentage means performance improvement, and a negative value represents an increase in the prediction error.

Fig. 12
figure 12

3D visualization of MSE (%) change.

Fig. 13
figure 13

Log-scale MSE comparison for neuron ranges 1–30 vs. 1–100.

Fig. 14
figure 14

Percentage change in MSE with increased neurons (clipped at ± 200% for readability).

Limitations and future recommendation

The proposed methodology performs well in the development of a rainfall prediction model for four geological stations: Chittagong, Sylhet, Assam, and Meghalaya. Although the framework presents promising results, some limitations are found, which are explained below:

  • Resolution of the data in google earth engine (GEE): Google Earth Engine (GEE) was used in this research to retrieve rainfall data. The resolution of the data offered by GEE is comparatively low. For the improvement of real-time rainfall modeling reliability, more sophisticated platforms offering higher-resolution datasets should be used in future studies.

  • Improving rainfall prediction with more variables: This work accounts for rainfall data only for modeling purposes. In addition to improving the performance and generalizability of the proposed rainfall prediction model, the incorporation of other meteorological data such as temperature, relative humidity, and wind speed is recommended. Such parameters have been found to influence atmospheric moisture content and precipitation characteristics, and their addition is likely to provide a better representation of the underlying physical mechanisms. By enriching the input feature space, the ability of the model to capture complex interactions and temporal relationships can be significantly enhanced, ultimately leading to more accurate and reliable rainfall forecasts.

  • Refining the ideal number of components in non-negative matrix factorization (NMF): As part of data preprocessing, training of NMF models was performed. The selection of ideal components in NMF is critical for model accuracy. The study employed the use of the reconstruction error plot and its derivative forms to refine the process of selecting the ideal components, up to 100 components. Future studies are encouraged to explore alternative approaches to determining the optimal number of components that can further enhance the reliability of real-time rainfall forecasting.

  • Artificial neural network (ANN) parameter optimization: In the optimization procedure, the range of transfer functions and neurons for the ANN model was set to a small value. To provide greater flexibility in the model, it is recommended to enhance the range of hidden layers and transfer functions for future research. Moreover, the implementation of recently proposed metaheuristic optimization algorithms can be employed to enhance model performance for real-time forecasting.

  • Population size in optimization algorithms: Throughout this study, different optimization algorithms like Harris Hawks Optimization (HHO), Hippopotamus Optimization (HO), Egret Swarm Optimization Algorithm (ESOA), Genetic Algorithm (GA), and Particle Swarm Optimization (PSO) and their hybrid versions were applied. However, optimization was executed with a small population size. Augmenting the initial solution space could lead to more optimum weight values for the ANN model.

  • Dual-Step optimization strategy: The study employed a dual-step optimization strategy with HHO as the first-stage optimizer. For further enhancement of optimization performance, future research must explore different configurations with other algorithms as the first-stage optimizer.

  • Employing deep learning model: Even though this study employs a standard ANN model, it acknowledges recent developments in advanced deep learning models such as LSTM, RNN, and CNN that have demonstrated great success in time series forecasting. Future research needs to include these models as benchmarks, especially in handling complex rainfall dynamics. Overcoming this gap would help de-mystify the relative strengths of hybrid metaheuristic-tuned ANN and advanced deep learning models in rainfall forecasting.

By addressing these limitations, future research can render the proposed methodology more robust and versatile for real-time rainfall forecasting applications.

Conclusion

This study addresses the challenge of noisy and complex rainfall patterns by using advanced signal processing and recently introduced metaheuristic optimization algorithms to enhance the optimization of Artificial Neural Network (ANN) weights. The proposed two-stage optimization method integrates Non-negative Matrix Factorization (NMF) with well-known (GA, PSO) and recently developed metaheuristic algorithms, such as Harris Hawks Optimization (HHO), Hippopotamus Optimization (HO), and Egret Swarm Optimization Algorithm (ESOA). The results demonstrate the potentiality of the dual-step hybrid model compared to traditional single-step optimization approaches, improving model accuracy significantly. Results exhibit considerable improvement in reduction Mean Squared Error (MSE), ranging from 1.00% to 97.46% in Sylhet, 8.33% to 97.10% in Chittagong, 6.96% to 89.57% in Meghalaya, and 17.74% to 76.09% in Assam over single-step optimized models. To provide transparency and facilitate further improvement in this area, this study presents step-by-step pseudocode, allowing future researchers and practitioners to easily replicate, verify, or expand the suggested methodology. These findings prove the robustness of the presented method in the case of extreme rainfall forecasting situations. Also, the sensitivity analysis of the hidden neurons proves that the impact of neuron count is sensitive to the optimizer and type of the dataset. While the work realizes promising outcomes, some limitations have been identified; these are the resolution limitation of GEE rainfall data, a more fine-grained selection of NMF components, partial tuning of ANN parameters, small sizes in metaheuristic algorithms, and insufficient full exploration of hybrid combinations. All these should be remedied in future work using higher-resolution datasets, adaptive NMF methods, more extensive ANN architecture search, and more heterogeneous hybridization options. It is also recommended to use advanced deep learning models such as LSTM, RNN, and CNN, which have shown great promise in time series prediction. Future studies would need to use these models as a benchmark, particularly for understanding the complexities of rainfall dynamics. Overall, the proposed methodology shows strong promise for real-time rainfall forecasting and has practical implications for improving flood management policies in Bangladesh.