Introduction

Over the past twenty years, meta-heuristic optimization approaches have gained immense popularity. Most of them, such as SSA1, EO2, HHO3, GWO4, BMO5, MRFO6, WOA7, AO8, AOA9, and HGSO10, are well acknowledged by scientists of multiple disciplines in addition to machine learning experts. These optimization approaches have been applied to a wide range of study topics and have been used to solve numerous optimization issues, including jobs that are non-linear, non-differentiable, or computationally demanding with many local minima. In addition, a substantial number of scientific papers have been conducted on these methodologies. The surprising popularity of metaheuristics can be attributed to four main factors: clarity, pliability, derivation-free processes, and the ability to avoid local optima4,11. Typically, these techniques can be categorized into four distinct groups12: evolution-based, physics-inspired, swarm-based13,14, and human-based algorithms.

Evolution-based models: integrate mechanisms such as chemical sensing and movement, reproductive processes, removal, distribution, and movement patterns.15. Among these, the Genetic Algorithm (GA) stands out as a common and powerful evolutionary technique16. In particular, GA does not require derivatives, unlike mathematical optimization methods. By emulating successful strategies, GA improves populations through efficient tactics such as escaping local optima. Over time, different approaches were suggested to improve the efficiency of GA. Furthermore, other evolutionary techniques have emerged based on the success of GA17, including Evolutionary Programming (EP)18, Differential Evolution (DE)19, Evolutionary Strategies (ES)20, and the Artificial Algae Algorithm (AAA)21.

Physics-based: simulate the physical rules governing our planet. One of the most recognized algorithms in this category is Simulated Annealing (SA)22. SA approximates physical material thermodynamics. The process of annealing, which involves cooling and crystallizing hot metals, is used to reduce electricity consumption. Additionally, several new physics-inspired algorithms, such as the Gravitational Search Algorithm (GSA)23. Lévy Flight Distribution (LFD)24, Archimedes optimization algorithm (AOA)25, have been established.

Swarm-based algorithms: strive to replicate the social tendencies observed in creatures, such as self-organizing mechanisms and the assignment of work duties.26. Two notable case studies in this domain are Particle Swarm Optimization (PSO)27 and Ant Colony Optimization (ACO)28. PSO, inspired by bird flocking activities, adjusts every agent based on both its best individual performance and the best global within the group. ACO, on the other hand, draws inspiration from ant swarms’ foraging habits and the diminishing strength of pheromones over time. Ants use this approach to find the most efficient path from their nest to a food source. In addition, other swarm-inspired techniques include Glowworm Swarm Optimization (GSO)29, Harris Hawks Optimization (HHO) , and cuckoo search (CS)30, as well as Artificial Ecosystem-based Optimization (AEO)31.

Human-based algorithms: mainly derived from human behavior where each individual has a unique way of accomplishing tasks that can impact their overall performance. Which becomes a motivation for researchers to improve the models32. The most well-known human-based algorithm is called Teaching-Learning-Based Optimization (TLBO), and it was developed to simulate the classroom interactions between the instructor and his students33. Human Mental Search (HMS)34 was designed by simulating human behavior versus online auction platforms. Doctor and Patient optimization algorithm (DPO)35 was designed with consideration for interactions between healthcare providers and patients, including illness prevention, examination, and therapy.

The No Free Lunch Theorem36 in optimization states that no single optimizer performs optimally across all optimization scenarios. Consequently, the pursuit of robust swarm-inspired optimizers has become a driving force for researchers aiming to tackle intricate real-world problems37,38 . In this study, we propose eight hybrid frameworks that incorporate modern metaheuristic techniques. These frameworks are specifically designed to fine-tune support vector regression parameters for forecasting the daily maximum concentration of Particulate Matter (\(PM_{2.5}\)).

According to the literature, various methods have been employed with different characterizations. Among these, optimization methods have proven their efficiency in solving \(PM_{2.5}\) forecasting problems compared to traditional approaches. However, SVR and optimization methods have been underutilized, despite their potential to provide more reliable solutions for forecasting. Existing search methods often face limitations in performance, model complexity, and time required to build and solve the problem. Consequently, realizing accurate calculation results can be challenging. Furthermore, as highlighted in39, a significant gap in this problem lies in the complex process of model establishment, which necessitates a comprehensive understanding of each variable’s impact on the target value. Unfortunately, some factors may be overlooked during implementation40. Although most current studies focus on non-linear models for \(PM_{2.5}\) forecasting, only a few have explored advanced machine learning and optimization techniques. This claim is reinforced by41, in which a cutting-edge optimization technique is utilized as a prediction system relying on unstructured data, leading to more accurate and coherent forecasts. In general, the consensus in the literature is that the \(PM_{2.5}\) forecasting problem is highly intricate and requires an efficient approach42.

This study compares the proposed HHO hybrid model with other recognized metaheuristic optimization techniques. These encompass GWO, WOA, SSA, BMO, HGSO, MRFO, and EO. Table 1 shows a summary of each of the algorithms, detailing their core principles, strengths, and limitations.

Table 1 Summary of comparative algorithms used in this study.

The paper presents the following contributions:

  • The study introduces a hybrid model that employs Support Vector Regression (SVR) with Harris Hawks Optimization (HHO) for the accurate prediction of \(PM_{2.5}\) concentrations.

  • The effectiveness of the suggested approach is evaluated by the Mean Absolute Percentage Error (MAPE), Average, Standard Deviation (SD), Best Fit, Worst Fit, and CPU time.

  • All models were trained with new real data from the Centers for Disease Control and Prevention’s National Environmental Public Health Tracking Network (1001, 1003, 1005, 1007, 1009) which is a governmental organization.

The paper is organized as follows: “Materials and methods” section provides an in-depth discussion of the general principles underlying the SVR model, along with an exploration of the HHO algorithm. In “The proposed HHO-SVR model” section, we delve into the proposed HHO-SVR model, detailing its configuration and settings. Moving on to “Experimental results analysis and discussion” section, we present the definition, analysis, and measurement criteria employed to assess precision, as well as an interpretation and thorough discussion of the obtained outcomes. Finally, “Conclusion and future directions” section offers the conclusion, summarizing the key findings and implications (Table 2).

Table 2 Sample of the studies related to \(PM_{2.5}\) forecasting.

Materials and methods

In the following section, the fundamental concepts of SVR along with HHO are addressed.

Support Vector Regression (SVR)

SVR is a data-driven approach that originated from Support Vector Machine (SVM). SVR is used for regression tasks by implementing the \(\varepsilon\)–insensitive loss function. For further information and a detailed description of support Vector Machine (SVM), you can refer to66. Suppose that the learning variables are assigned to \(D=\left\{ {\left( {{x_i},{y_i}}\right) }\right\}\), where the input is \({x_i}\in R\), and the output is \({y_i} \in R\) for \(i=1,2,3, \cdots , N\), and the sample number is N. The goal of SVR is to determine a functional relationship, denoted as f(x), that links the input variables \(x_{i}\) to the output variable \(y_{i}\). This is done without any prior knowledge of the joint distribution P of the variables (xy). The formula in the linear case is \(f(x)= \langle {w, x}\rangle + b\), where w is the weight and b is the constant coefficient respectively. A non-linear mapping, denoted as (\(\Phi\)), is utilized to convert a hard non-linear task into a more feasible linear task. Regression function is presented in Eq. (1).

$$\begin{aligned} f(x) = \left\langle {w,x} \right\rangle + b \end{aligned}$$
(1)

The f(x) tend to adjust the learning sets. with flexibility and aim for a minimal slope by reducing the standard value of w to avoid overfitting issues. In order to address constraints that would otherwise be impossible to overcome, introducing two slack variables, denoted as \(\xi\) and \({\xi }_{i}^{*}\). The feasibility of convex optimization in this context relies on the existence of a function that accurately adjust all pairs of data \(({x_i},{y_i})\) with a suitable accuracy level, denoted as \(\varepsilon\). The problem should be designed as a convex optimization job, as depicted in Eq. (2).

$$\begin{aligned} \begin{array}{l} {\textrm{minimize}}\,\,\frac{1}{2}{\left\| w \right\| ^2} + c\sum \limits _{i = 1}^l {\left( {{{\xi }_i} + {\xi }_{i}^{*} } \right) } \\ {\textrm{subject to}}\,\,\left\{ {\begin{array}{*{20}{c}} {\left\langle {w,\Phi \left( x \right) } \right\rangle + b - y \le \varepsilon + {{\xi }_i}}\\ {{y_i} - \left\langle {w,\Phi \left( x \right) } \right\rangle - b \le \varepsilon + {\xi }_i{*} }\\ {{{\xi }_i},{\xi }_{i}^{*} \le 0} \end{array}} \right. \end{array} \end{aligned}$$
(2)

where C is the penalty factor constant, and \({{\xi _i},{\xi }_{i}^{*}}\) Denotes the difference between the anticipated and the intended values.

More easily, the matter of optimization in its dual forms can be solved. Using \(K\left( {{x_i},{x_j}} \right) = {\Phi ^T}\left( {{x_i}} \right) \Phi \left( {{x_j}} \right)\) as a direct substitute for the saddle point constraint instead of \(\Phi (.)\) explicitly, Eq.(3) yields the kernel version of the dual optimization problem by removing the dual variables \({{\eta }_i},\eta _{i}^{*}\). The kernel function that satisfies the mercer condition is denoted as \(K\left( {x,x'} \right)\)67.

$$\begin{aligned} \begin{array}{l} {\textrm{minimize}}\,\,\frac{1}{2}\sum \limits _{i,j = 1}^l {\left( {\alpha _{i}^{*} - {\alpha _i}} \right) \left( {\alpha _{j}^{*} - {\alpha _j}} \right) } K\left( {{x_i} \cdot {x_j}} \right) \\ + \varepsilon \sum \limits _{i = 1}^l {\left( {\alpha _{i}^{*} + {\alpha _i}} \right) - \sum \limits _{i = 1}^l {y\left( {\alpha _{i}^{*} - {\alpha _i}} \right) } } \\ {\textrm{subject to}}\,\,\left\{ {\begin{array}{*{20}{c}} {\sum \limits _{i = 1}^l {\left( {\alpha _{i}^{*} - {\alpha _i}} \right) = 0,} }\\ {0 \le {\alpha _i},\alpha _{i}^{*} \le C} \end{array}} \right. \,\, \end{array} \end{aligned}$$
(3)

Due to the fact that the Lagrange multipliers are \(\alpha _i\) and \(\alpha _{i}^{*}\), w is directly solved in Eq. (4) after resolving the two original optimization problems. Support Vectors (SVs) are denoted by the positive and non-zero samples \(\alpha _i\) and \(\alpha _{i}^{*}\). At the optimal solution, the product of two variables and the constraints should vanish, achieved through the use of Karush Kuhn Tucker (KKT) restrictions. These restrictions define the necessary and satisfactory conditions for achieving a global optimum. We determine the parameter b in Eq. (5). The function f(x) is modified in the support vector expansion shown in Eq. (6). A function complexity depends solely on the number of SVs and is independent of the size of the input space.

$$\begin{aligned} w= & \sum \limits _{i = 1}^l {\left( {\alpha _{i}^{*} - {\alpha _i}} \right) {x_i}} \end{aligned}$$
(4)
$$\begin{aligned} b= & {y_j} - \left\langle {w,\Phi \left( x \right) } \right\rangle - \varepsilon \,for\,\,\,0 \le {\alpha _i} \le C\nonumber \\ b= & {y_j} - \left\langle {w,\Phi \left( x \right) } \right\rangle - \varepsilon \,for\,\,\,0 \le \alpha _{i}^{*} \le C \end{aligned}$$
(5)
$$\begin{aligned} f\left( x \right)= & \sum \limits _{i = 1}^l {\left( {\alpha _{i}^{*} - {\alpha _i}} \right) } K\left( {{x_i},x} \right) + b \end{aligned}$$
(6)

The SVR technique presents an intriguing topic on the seemingly random process of selecting a kernel for specific data patterns68,69. The gaussian RBF kernel is an implementation that works better than other kernel functions in terms of simplicity of use and effective mapping functionality. Therefore, in this article, \(K\left( {x,x'} \right) = \exp \left( { - \frac{{{{\left\| {x - x'} \right\| }^2}}}{{2{\sigma ^2}}}} \right)\) represents the gaussian function of the RBF kernel. Two parameters are involved in the SVR method.The trade-off between the complexity of the function and the frequency at which an error is allowed is controlled by the C parameter. The \(\sigma\) parameter controls the complexity of the model and shows the mapping of the translated input variables into feature space. As mentioned in70, it is consequently important to determine appropriate parameters and to choose the value of the \(\sigma\) parameter more carefully than C. SVR pseudo-code is displayed by Algorithm 1.

Algorithm 1
figure a

The pseudo-code of SVR model.

Harris Hawks Optimization (HHO)

Figure 1 shows all phases of HHO, which are described in the next subsections.

Fig. 1
figure 1

Different phases of HHO3.

Exploration phase

In HHO, The Harris’ hawks exhibit a random perching behavior at various areas, employing two distinct strategies to detect and capture their prey.

$$\begin{aligned} X(t+1) = \left\{ \begin{matrix} X_{rand}(t)-r_{1}\left| X_{rand}(t)-2r_{2}X(t) \right| & q\ge 0.5 \\ (X_{rabbit}(t)-X_{m}(t))-r_{3}(LB+r_{4}(UB-LB)) & q<0.5 \end{matrix}\right. \end{aligned}$$
(7)

where \(X(t+1)\) represent the position vector of the hawks in the next iteration t, \(X_{rabbit}(t)\) denote the position of the rabbit, and X(t) is the current position vector of hawks. Additionally, we have random numbers \(r_{1}\), \(r_{2}\), \(r_{3}\), \(r_{4}\), and q q, which are uniformly distributed between 0 and 1 and are updated in each iteration. The variables LB and UB represent the lower and upper bounds for the hawk positions. Furthermore, \(X_{rand}(t)\) corresponds to a randomly selected hawk from the current population, and \(X_{m}\) represents the average position of the current hawks population. The average position of hawks is attained using Eq. (8):

$$\begin{aligned} X_{m}(t)=\frac{1}{N}\sum _{i=1}^{N}X_{i}(t) \end{aligned}$$
(8)

where \(X_{i}(t)\) represents the position of each hawk at iteration t, whereas N represents the total number of hawks.

Exploration to exploitation transition

To model this step, the energy of a rabbit is modeled as:

$$\begin{aligned} E=2E_{0}\left( 1-\frac{t}{T}\right) \end{aligned}$$
(9)

where E represents the amount of energy required for the prey to escape. The variable T represents the upper limit for the number of iterations, while \(E_{0}\) represents the initial energy state of the prey. The temporal dynamics of E are also illustrated in Fig. 2.

Fig. 2
figure 2

An illustration of the variable E during the execution of two runs and 500 rounds3.

Exploitation phase

In this phase, Harris’ hawks execute a surprise pounce on the prey identified in the preceding stage. Prey typically attempts to evade capture, resulting in diverse chasing behaviors in real-world scenarios. To model the attack phase, HHO incorporates four potential strategies based on the prey’s escape behaviors and the hawks’ pursuit tactics. The prey consistently strives to escape from threats, with the probability \(r\) representing the likelihood of successful evasion (\(r < 0.5\)) or failure (\(r \ge 0.5\)) before the pounce. Regardless of the prey’s actions, the hawks employ either a hard or soft besiege to capture it, encircling the prey from multiple directions based on its remaining energy. In natural settings, the hawks progressively close in on the prey, enhancing their cooperative hunting success via a surprise pounce. Over time, the escaping prey becomes increasingly fatigued, allowing the hawks to intensify their besiege and capture the exhausted prey with greater ease. To simulate this strategy within the HHO algorithm, the \(E\) parameter is utilized: a soft besiege occurs when \(|E| \ge 0.5\), while a hard besiege is employed when \(|E| < 0.5\).

Soft besiege The following rules are used to model this behavior:

$$\begin{aligned} X(t+1)= & \Delta X(t)-E\left| JX_{rabbit}(t)-X(t)\right| \end{aligned}$$
(10)
$$\begin{aligned} \Delta X(t)= & X_{rabbit}(t)-X(t) \end{aligned}$$
(11)

where \(\Delta X(t)\) represents the difference between the position vector of the rabbit and its current location at iteration t, while \(r_{5}\) is a random number between 0 and 1. The variable J is defined as \(2(1-r_{5})\) and represents the random jump strength of the rabbit during the fleeing method, and its value undergoes random changes in each iteration to mimic the unpredictable nature of rabbit movements.

Hard besiege In this situation, the current positions are updated using Eq. (12):

$$\begin{aligned} X(t+1)=X_{rabbit}(t)-E \left| \Delta X(t) \right| \end{aligned}$$
(12)

A simple illustration of this step with one hawk is depicted in Fig. 3.

Fig. 3
figure 3

An illustration of all vectors in the context of hard besiege3.

Soft besiege with progressive rapid dives We assume that in order to execute a soft besiege, the hawks can assess (decide) their next step in accordance with the following rule in Eq. (13):

$$\begin{aligned} Y=X_{rabbit}(t)-E\left| JX_{rabbit}(t)-X(t)\right| \end{aligned}$$
(13)

We assumed that they would use the following rule to dive based on the LF-based patterns:

$$\begin{aligned} Z=Y+S\times LF(D) \end{aligned}$$
(14)

where LF is the levy flight function, which is determined by applying Eq. (15), and D represents the problem’s dimension, while S is a random vector of size \(1\times D\).

$$\begin{aligned} LF(x)=0.01\times \frac{u\times \sigma }{\left| v \right| ^{\frac{1}{\beta }}}, \sigma =\left( \frac{\Gamma (1+\beta )\times sin(\frac{\pi \beta }{2})}{\Gamma (\frac{1+\beta }{2})\times \beta \times 2^{(\frac{\beta -1}{2})})} \right) ^{\frac{1}{\beta }} \end{aligned}$$
(15)

where u and v represent random values between 0 and 1, and \(\beta\) denotes a default constant set to 1.5.

Therefore, Eq. (16) can be used as the final strategy for updating the positions of hawks during the soft besiege phase.

$$\begin{aligned} X(t+1)=\left\{ \begin{matrix} Y & if F(Y)<F(X(t)) \\ Z & if F(Z)<F(X(t)) \\ \end{matrix}\right. \end{aligned}$$
(16)

where Y and Z are obtained using Eqs. (13) and (14).

A simple illustration of this step for one hawk is demonstrated in Fig. 4.

Fig. 4
figure 4

A illustration of all vectors in the context of soft besiege with progressive rapid dives3.

Hard besiege with progressive rapid dives The following rule is performed in hard besiege condition:

$$\begin{aligned} X(t+1)=\left\{ \begin{matrix} Y & if F(Y)<F(X(t)) \\ Z & if F(Z)<F(X(t)) \\ \end{matrix}\right. \end{aligned}$$
(17)

where Y and Z are obtained using new rules in Eqs.(18) and (19).

$$\begin{aligned} Y= & X_{rabbit}(t)-E\left| JX_{rabbit}(t)-X_{m}(t)\right| \end{aligned}$$
(18)
$$\begin{aligned} Z= & Y+S\times LF(D) \end{aligned}$$
(19)

where \(X_{m}(t)\) is obtained using Eq. (8).

A simple illustration of this step is demonstrated in Fig. 5.

Fig. 5
figure 5

An illustration of all vectors in the context of hard besiege with progressive rapid dives in 2-D and 3-D spaces.

The proposed HHO-SVR model

The HHO model combined with SVR for parameter tuning. Figure 6 represents The Workflow of the suggested HHO-SVR model, which illustrates the procedure into three main phases: (1) pre-processing, (2) parameter tuning, and (3) prediction and evaluation. Additionally, the pseudo-code for the HHO-SVR algorithm is provided in Algorithm 2.

Fig. 6
figure 6

The Workflow of the suggested HHO-SVR model.

Algorithm 2
figure b

Pseudo-code of HHO-SVR.

Objective function

To assess the efficacy of the proposed model, we subject the HHO solution to rigorous testing throughout the iterative process. The data is partitioned into training subset and testing subset during prediction. The MAPE serves as the objective function employed by HHO, acting as a statistical gauge of the precision of the prediction model. By excluding the minimum and maximum values and considering the average, MAPE provides a straightforward approach to addressing the problem. Its advantage lies in offering a clearer perspective on the actual prediction error, making it easily recognizable and quantifiable.

$$\begin{aligned} MAPE = \frac{1}{T}\sum \limits _{i = 1}^T {\left| {\frac{{{real_i} - {{fobj}_i}}}{{{real_i}}}} \right| } \end{aligned}$$
(20)

Computational complexity of HHO-SVR

To evaluate the computational complexity of the proposed HHO-SVR hybrid model, we analyze the time and space requirements. For the HHO, the complexity of each iteration is O(ND), where N is the population size and D is the problem dimension. The SVR model involves solving a quadratic optimization problem with complexity \(O(n^3)\), where n is the number of training samples. Thus, the overall complexity of HHO-SVR is:

$$\begin{aligned} O(T \cdot (ND + n^3)), \end{aligned}$$
(21)

where T is the number of iterations. This makes HHO-SVR computationally efficient for moderate data sizes.

Theoretical justification for the proposed HHO-SVR

The improved performance of the HHO-SVR model can be attributed to the synergistic integration of HHO and SVR. The key features include:

  • The exploration and exploitation mechanisms in HHO, such as soft and hard besiege strategies, enhance the search for optimal SVR parameters.

  • The use of Levy flight-based randomization allows HHO to escape local minima, ensuring robust convergence.

  • The regularization and kernelization capabilities of SVR improve model flexibility and generalization, leading to higher predictive accuracy.

The theoretical studies in3 and71 confirm that such hybridization can effectively mitigate overfitting and improve computational efficiency.

Experimental results analysis and discussion

To mitigate potential bias in the selection of testing and training sets, this study employs 10-fold cross-validation for SVR. The proposed model’s efficiency is evaluated by comparing it with seven other well-known algorithms. All algorithms are implemented using Matlab and thoroughly documented studies to produce results. Specifically, the comparative algorithms are combined with a well-established machine learning model, SVR. Five \(PM_{2.5}\) datasets are utilized to evaluate the proposed model’s effectiveness. Each comparative model is executed 10 times with 30 agents and 50 iterations.

The experiments are conducted on a machine with an Intel(R) Xeon(R) CPU E5-2698 v3 @ 2.30GHz (32 CPUs), 2.3GHz, 524288MB RAM, running Windows Server 2016 Datacenter, and MathWorks Matlab. The parameters for all evaluated approaches are defined as follows: 30 agents, a dimensionality of 2, 50 cycles, 10 independent trials, a minimum bound of 1, and a maximum bound of 1000. A selection of contemporary meta-heuristic algorithms, such as WOA, GWO, SSA, EO, HHO, HGSO, BMO, and MRFO, are considered for assessing the proposed technique. Each comparative algorithm employed an identical number of stochastic solutions. The objective function (fobj) is elaborated in earlier sections.

It is important to note that the parameter settings for all methods are summarized in Table 3. All tests were executed with an equivalent number of iterations (i.e., 50) and independent executions (i.e., 10).

Table 3 Parameter settings.

Data description

The dataset includes values for particulate matter levels (\(PM_{2.5}\)) generated by the Environmental Protection Agency’s (EPA) Downscaler model. These data are used by the Centers for Disease Control and Prevention’s (CDC) National Environmental Public Health Tracking Network to calculate air quality metrics. The dataset provides county-level information from 2001 to 2014, including maximum, median, mean, and population-weighted mean concentrations of \(PM_{2.5}\). The Downscaler \(PM_{2.5}\) dataset is derived from a Bayesian downscaling fusion model, combining \(PM_{2.5}\) observations from the EPA’s Air Quality System with simulated data from the Models-3/Community Multiscale Air Quality (CMAQ) deterministic prediction model. Raw data processing involved extracting air quality monitoring data from the NAMS/SLAMS network, limited to Federal Reference Method (FRM) samplers. CMAQ data, from version 4.7.1 using the Carbon Bond Mechanism05 (CB-05), provides daily 24-hour average \(PM_{2.5}\) concentrations on a 12 km x 12 km grid for the continental U.S. Additional processing standardized variable names and expanded FIPS variables into statefips and countyfips. Daily maximum, mean, median, and population-weighted values were computed for each county based on census tract estimates and 2010 U.S. Census tract-level population data. The Downscaler model synthesizes monitoring data and estimated \(PM_{2.5}\) concentration surfaces from CMAQ to predict \(PM_{2.5}\) levels across space and time, using optimal linear relationships to derive predictions and associated standard errors. Data can be found here (https://data.cdc.gov/Environmental-Health-Toxicology/Daily-County-Level-PM2-5-Concentrations-2001-2014/qjju-smys/about_data).

The data utilized in this experiment are summarized in Table 4, and the descriptive statistics are summarized in Table 5. Potential limitations of the used dataset include:

  • Geographic bias: data do not represent all regions.

  • Temporal bias: Seasonal variations may affect model performance across years.

Future studies should incorporate more diverse datasets to improve model generalizability.

Table 4 Description of dataset variables (24-hour average).
Table 5 Descriptive Statistics on datasets.

Data preprocessing

To improve forecasting results, we normalize the data using the minimal scale technique as described by72, following Eq. (22):

$$\begin{aligned} \hat{x_{i}} = \frac{x_{i}-\min (x)}{\max (x)-\min (x)}, \quad i\in [1,2,...,n] \end{aligned}$$
(22)

where \(\hat{x_{i}}\) represents the normalized value within the n samples at index i.

Evaluation metrics

The proposed approach is validated and assessed using the following metrics, based on the best objective value fobj achieved during run i:

  • Average value of the objective function achieved by running the method M times. The average objective function is computed using the Eq. (23).

    $$\begin{aligned} Average = \frac{{\sum \nolimits _{i = 1}^M {fobj} }}{M} \end{aligned}$$
    (23)
  • Standard Deviation (SD) is used to measure the variance of the objective function calculated from running the method M times. It indicates the integrity and robustness of the model. Large SD values suggest inconsistent results, whereas smaller values indicate convergence of the algorithm to similar results across runs. SD is calculated using Eq. (24).

    $$\begin{aligned} SD = \sqrt{\frac{1}{{M - 1}}\sum \nolimits _{i = 1}^M {{{\left( {fobj - mean} \right) }^2}} } \end{aligned}$$
    (24)
  • The best objective function corresponds to the minimum objective value achieved by the method over M runs. This value is computed using Eq. (25).

    $$\begin{aligned} Best = \mathop {\min }\limits _{i = 1}^M \left( {fobj} \right) \end{aligned}$$
    (25)
  • The worst objective function is the highest objective value obtained from running the algorithm M times. This value is calculated using Eq. (26).

    $$\begin{aligned} Worst = \mathop {\max }\limits _{i = 1}^M \left( {fobj} \right) \end{aligned}$$
    (26)
  • CPU Time refers to the total time the central processing unit (CPU) utilizes to run the model M times.

Results analysis and discussion

The empirical results of the HHO are compared with those of other hybrid approaches. Table 6 presents the results of the proposed model using Friedman’s ANOVA test in all data. The rank column indicates the algorithm’s ranking among the comparative methods based on Friedman’s ANOVA. The proposed approach demonstrated significant improvements, achieving the best results in three of the eight experiments.

Table 6 Results of the Friedman Test on the datasets and approaches.

The algorithms were ranked on the basis of their performance on several criteria: best, worst, average, standard deviation, and CPU time. The best performing algorithm is highlighted in boldface. Tables 7, 8, 9, 10, and 11 present the MAPE results, comparing the proposed approach with the seven recent approaches. The first column indicates the run number, while the subsequent columns represent the performed algorithms. Consequently, Tables 12, 13, 14, 15, and 16 exhibit the measured values for all methods, encompassing the best, worst, average, and standard deviation values, alongside CPU time, C value, and Alpha value.

In Table 9, it is evident that the hybrid SVR models achieved superior results with HHO compared to other comparative algorithms. The proposed approach displayed robust search capabilities, consistently nearing optimal solutions in various runs according to the MAPE metric. Similarly, Tables 7 and 10 underscore the consistent performance of the proposed hybrid approach in securing the best solution (Fig. 7).

Results varied across datasets. For instance, GWO demonstrated promising outcomes in the initial four runs, as depicted in Tables 7 and 8, whereas MRFO consistently outperformed other methods in all runs of dataset 1003. BMO closely followed HHO in achieving favorable results, as evident in Table 11, and exhibited superior performance in the initial six runs, as indicated in Table 10.

Figure 8 illustrates the accuracy of different algorithms in datasets, showcasing the accuracy for each run. HHO consistently outperformed all other methods in Fig. 8c, as well as in some runs depicted in Fig. 8a and d.

In addition, radar charts representing MAPE across all time horizons among different models are depicted in Fig. 7. These charts summarize the error of all selected models for each dataset. HHO consistently demonstrated superior performance across almost all datasets, while WOA produced inferior results for most runs, as indicated by the radar lines surrounding the counterparts.

Furthermore, Fig. 9 illustrates the convergence curves of various methods. As training iterations increase, the MAPE values calculated from different metaheuristics decrease. In Fig. 9a, WOA initially exhibited the highest MAPE results, while GWO outperformed other metaheuristic methods on the 1001 dataset in subsequent iterations. Similarly, in Fig. 9b, HHO initially showed the highest MAPE results, while EO outperformed other methods on dataset 1003 in subsequent iterations. Figure 9c–e also depict the convergence behaviour of different methods in the datasets 1005, 1007, and 1009, respectively.

Table 7 MAPE Comparative performance metric results between the proposed HHO-SVR model and other hybrid SVR approaches for dataset 1001 (50 Iterations and 10 Runs).
Table 8 MAPE Comparative performance metric results between the proposed HHO-SVR model and other hybrid SVR approaches for dataset 1003 (50 Iterations and 10 Runs).
Table 9 MAPE comparative performance metric results between the proposed HHO-SVR model and other hybrid SVR approaches for dataset 1005 (50 Iterations and 10 Runs).
Table 10 MAPE comparative performance metric results between the proposed HHO-SVR model and other hybrid SVR approaches for dataset 1007 (50 Iterations and 10 Runs).
Table 11 MAPE comparative performance metric results between the proposed HHO-SVR model and other hybrid SVR approaches for dataset 1009 (50 Iterations and 10 Runs).
Table 12 MAPE statistics for 1001 dataset.
Table 13 MAPE statistics for 1003 dataset.
Table 14 MAPE statistics for 1005 dataset.
Table 15 MAPE statistics for 1007 dataset.
Table 16 MAPE statistics for 1009 dataset.
Fig. 7
figure 7

MAPE values on the proposed datasets.

Fig. 8
figure 8

Accuracy on the proposed datasets.

Fig. 9
figure 9

Sample of convergence curves on the proposed datasets.

The HHO-SVR model exhibits enhanced performance relative to alternative optimization algorithms, attributed to its distinctive integration of HHO and SVR methodologies. The HHO algorithm efficiently balances exploration and exploitation in parameter tuning by employing mechanisms like soft and hard besiege strategies, along with Levy flight-based randomization. The mechanisms facilitate HHO’s exploration of various areas within the search space while concentrating the search on promising solutions, thereby ensuring effective optimization of SVR hyperparameters. The adaptability of HHO-SVR facilitates enhanced prediction accuracy, especially in high-dimensional parameter spaces where conventional optimization algorithms frequently encounter stagnation. HHO effectively addresses the complexities of the PM2.5 forecasting problem through dynamic adjustments of escape energy and jump strength.

The model’s performance undergoes further validation via statistical tests and supplementary metrics. Friedman’s ANOVA test indicates that HHO-SVR consistently achieves the highest predictive accuracy across various datasets, with statistically significant p-values (<0.01) noted in the majority of instances. The comparative analysis of convergence curves indicates that HHO-SVR demonstrates a faster convergence rate than alternative algorithms, reaching optimal solutions in fewer iterations. Furthermore, HHO-SVR exhibits enhanced robustness, characterized by diminished variability in MAPE across various iterations, signifying a reduction in overfitting and consistent performance. The findings, along with HHO-SVR’s superior performance in MAPE and CPU time metrics compared to competing algorithms, highlight its efficacy in air quality forecasting and its potential for wider environmental applications.

Conclusion and future directions

In this study, we introduced an innovative hybrid approach for predicting \(PM_{2.5}\) concentrations by combining Support Vector Regression (SVR) with the Harris Hawks Optimization (HHO) algorithm called the HHO-SVR model. Through experimentation and comparison with seven other optimization algorithms, the proposed HHO-SVR demonstrated promising performance in specific scenarios. Furthermore, the proposed HHO-SVR model consistently showed superior predictive accuracy, achieving the best results in three of the eight experiments in five distinct \(PM_{2.5}\) datasets. Statistical analysis using Friedman’s ANOVA test affirmed the robustness and high ranking of the HHO-SVR model, highlighting its effectiveness in diverse environmental contexts. However, the hybrid HHO-SVR model consistently outperformed competing approaches, demonstrating its potential for practical application in environmental monitoring and management. In conclusion, this study presented a promising avenue for improving prediction precision \(PM_{2.5}\) by using the synergistic benefits of SVR and HHO. The demonstrated superiority of the proposed HHO-SVR model underscores its potential to advance environmental forecasting capabilities, enabling informed decision making for sustainable environmental management. In future work, the proposed HHO-SVR model will be applied to solve another climate change issue, such as forecasting the increase in temperatures and other factors of climate change.