Introduction

Background

The face is one of the most essential biometric components of humans, and it may be used to extract relevant information such as ethnicity, identification, age, gender, and facial emotions1. One of the most significant and difficult projects in the subject of object detection is automatic face identification and analysis, which has various applications in human-computer interface, human-society interaction, psychology, and security issues2. Face detection technology and artificial intelligence are two of the most effective approaches for scanning people’s faces and visually recognizing them3. Artificial intelligence, with its unique characteristics, may boost the power and usefulness of a face detection system4. Machine vision is the primary mechanism behind facial detection technology5. That numerous elements of a person’s face become the digits 0 and 1. In fact, a database intended for a facial detection system can discriminate between different people’s faces6.

The existence of distinctions that can be plainly recognized in people’s features has broadened the usage of facial detection technologies7. Face detection on PCs and cellphones is becoming more common. Many new and even established firms utilize this technology to identify their clients8. Many corporate sectors, it goes without saying, utilize facial detection technology to improve security9. Face detection relies heavily on image processing, which is closely connected to machine vision10. Because machine vision is capable of perceiving and analyzing images11. If image processing in general refers to image modification and cannot be linked to comprehension analysis.

Motivation

Machine vision requires a number of processes that are difficult for computers to replicate. Deep learning, known as convolutional neural networks, is now employed to boost the machine’s visual capability, allowing it to evaluate even movies12,13,14. Deep learning using convolutional neural networks necessitates the use of a powerful GPU. This is because machine learning is done with the fewest resources15. Face detection is difficult due to two factors. One factor is the higher human face changes in the backgrounds, and the other is the higher feasible search space for the size and positions of the face16. To resolve these problems, different methods were presented. Girdhar et al.17, for instance, created a composite machine learning structure for face distinction by behavioral features. A combined FBPSDT (Fuzzy-based Behavior Prognostic System for Disparate Traits) was suggested. FBPSDT is able to be utilized, when the need arises, to locate and take nefarious individuals participating in any swindling scheme. Face distinction is performed on LFW (labeled faces in nature) and face information utilizing skin coloring, three Haar Cascade, and HOG (face 94, 95, and 96) methods. The correctness of the tests was 91%, 50%, 92% (LFW) and 91%, 60%, and 91.5% (facial data). According to the investigation and assessment of the data, the HOG technique is the most suitable technique for face detection.

Related work

Subramanian et al.18 investigated a fuzzy technique based on GA and the PSO algorithm to detect faces. There are a variety of methods for recognizing human faces, but new methods are being developed to improve the model’s efficiency and accuracy. A PSO-based fuzzy-genetic optimization technique is proposed to provide a higher level of detection rate than local binary patterns, history of gradients, and fundamental optimization techniques. The proposed methodology has a detection rate of 98.84%. This new method can quickly locate the image’s best-fitting elements.

Hussain et al.19 defined and recognized facial expressions in real time using a deep learning model. recently, a vast majority of researches have been conducted on facial detection and detection. The chief purpose of facial identification is identifying and validating facial attributes20. Human faces have been validated within the ultimate stage to classify emotions of human as pleased, neutral, sad, angry, surprised, and disgusted. Computer vision approaches are implemented using the OpenCV toolkit, datasets, and Python programming18. Multiple students were asked to and find physiological variations for each face and assess their feelings in order to verify real-time efficiency. Test findings illustrate that the facial analysis method is flawless. Finally, the effectiveness of automatically detection and detection of face is assessed by using accuracy.

Guo et al.4 suggested a CNN-based technique for quick facial detection. They offer a quick face detection procedure on DCFs basis (discriminative complete features) produced by a CNN (convolutional neural network) with a complicated design. The ability of scale invariance has been demonstrated in DCFs, that is considered advantageous for the quick face detection with successful performance. As a result, unlike traditional methods, the suggested method never requires the extraction of multi-scale attributes from an image pyramid that can considerably increase its efficiency for face detection. The suggested method for face detection is shown to be efficient and effective in experiments on many common face detection datasets.

Jiang et al.21 Recognized face by R-CNN. Most methods for face detection use the R-CNN structure, with low efficiency and speed, although DL-B (Deep Learning-Based) techniques for general object classification have revolutionized significantly over the tow year interval. In this research, the Faster RCNN to face detection are applied, which has recently shown remarkable performance on several object detection benchmarks. It has been clarified that advanced outcomes on the WIDER test set as well as, FDDB and the recently published IJB-A (two frequently used face detection standards), by retraining a Faster R-CNN model on the more and larger face dataset.

Regarding the literature, however, it was introduced diverse kinds of artificial intelligent techniques for face detection purposes, deep learning-based neural networks provided far higher accuracy for this purpose. Therefore, within the present research, an optimized deep network is proposed in order to increase the efficiency of the offered procedure and obtain the best outcomes.

This paper addresses an efficient face detection system, which presents significant challenges due to fluctuations in lighting, variations in facial expressions, and potential occlusions. The main contribution of this research is the introduction of an optimized deep neural network architecture specifically designed for face detection tasks. This architecture incorporates a Bidirectional Recurrent Neural Network (BRNN) to effectively capture contextual information from both preceding and subsequent frames.

Contributions

To enhance the optimization of the BRNN’s weights, an improved Ebola Optimization Search Algorithm is utilized, which mitigates the issues related to unstable gradients and improves overall accuracy. Various preprocessing techniques, including Gamma correction and Contrast Limited Adaptive Histogram Equalization (CLAHE), are applied to improve illumination and contrast, facilitating the network’s ability to detect faces under diverse lighting conditions22. The contributions of this research emphasize the optimization of multiple parameters, the enhancement of illumination and contrast, and the improvement of the network’s accuracy and convergence, ultimately leading to a robust face detection system adept at managing a variety of real-world scenarios.

Materials and methods

However, using standard benchmark functions include standard images with good images quality, using the designed methods in practice makes some issues like low illumination and bad histogram of the captured images23. Therefore, in this study, for avoiding such problems, two preprocessing steps have been utilized. The methods include the Gamma correction for modifying the illumination of the dark mode pictures and the Contrast Limited Adaptive Histogram Equalization (CLAHE) for increasing the quality of pictures in terms of histogram.

Gamma correction

Image enhancement is a critical preprocessing in a wide range of applications, including medical, astronomical, and general imaging. Due to technical restrictions, many technologies used to record, print, or show a picture apply nonlinear changes to the number of pixels in the image, lowering image quality. This implies that the image’s pixels have reached the gamma value’s power24,25,26,27. Furthermore, because imaging systems cannot precisely portray the color, depth, and texture of the various objects in the picture, the amount of gamma applied to all portions of the image is not the same. “Gamma correction” refers to the procedure of rectifying this gamma phenomena28. Gamma correction must be performed locally on distinct areas of the picture in order for the main scene to be accurately reconstructed.

Most of the devices that are used for capturing the image, printing, or displaying of it, due to the technical constraints, a modification named “Power law” has been applied to the image pixels intensity, that is defined as the following power statement:

$$\:s={r}^{\gamma\:}$$
(1)

where, \(\:r\) specifies the image pixels’ intensity, \(\:s\) describes the output image pixels’ intensity, and \(\:\gamma\:\) is the Gamma correction coefficient.

Indeed, Gamma correction is a nonlinear process for enhancing the image intensity. If \(\:\gamma\:\) is less than 1, the image is getting lighter, if \(\:\gamma\:\) is greater than 1, the image is getting darker, and if \(\:\gamma\:\) equals 1, there will be no changes on the image.

For the cases that the applied gamma value on the image is definite, with inverse operation, the gamma value has been applied to each pixel to achieve the primary image, i.e.,

$$\:r={s}^{\frac{1}{\gamma\:}}$$
(2)

A power law with an exponent between 0 and 0.5 is a reasonable compromise. This study considers 0.2 as power law20. Figure 1 indicates a sample example of the primary image and its corrected outcome based on gamma correction technique.

Fig. 1
figure 1

Sample example of the primary image and its corrected outcome: (A-1) raw image, (A-2) histogram of (A-1), (B-1) corrected image, and (B-2) histogram of (B-1).

Contrast enhancement

Contrast enhancement of picture is greatly efficient while processing different pictures. This upgrade impacts following picture processing and face detection directly. Increasing the contrast of pictures results in more details and data29. Enhancement of contrast and histogram modification has been found to be an adaptive linear approach. The present approach enjoys low time of processing, high speed, higher reliability, and returnability. Histogram Equalization (HE) is a popular technique for improving the contrast of the images. Yet, this technique makes a stretching form that may offer too bright or too dark results. Therefore, this can be problematic for images with more intensity changes30,31,32. Therefore, here, CLAHE is employed, which is an improved version of HE.

The CLAHE is a modified version of adaptive histogram equalization (AHE) which is used for resolving the contrast overamplification issue.

Contrast Limited Adaptive Histogram Equalization operates on tiles, that are small sections of a picture in lieu of the entire image. By using bilinear interpolation, the surrounding tiles are combined to eliminate the fake borders. Figure 2 demonstrates a sample instance of the contrast enhancement based on the CLAHE for a face picture.

Fig. 2
figure 2

Sample example of the primary image and its corrected outcome: (A-1) input image, (A-2) histogram of (A-1), (B-1) enhanced image, and (B-2) histogram of (B-1).

Data normalization

Data normalization has been found to be an approach that changes the range of pixels’ intensity. Low contrast images because of glare are one example of an application. Indeed, normalization stretches the histogram of the image and provides a dynamic range extension. The main goal of image data normalization is to put an image into a normal range that is familiar and makes sense. The goal is to create uniformity in dynamic range for visuals in order to reduce mental distraction or weariness.

During the image normalization, all of images have been stretched into the range between 0 and 1, with mean value of 0. This study uses Min-Max data normalization method for normalizing the input images. To perform Min-Max data normalization, consider \(\:I\:\)as the input raw image, and \(\:{I}_{N}\) as new normalized image. In this condition, the Min-Max normalization of the \(\:I\) is achieved as follows:

$$\:{I}_{N}=\left(I-Min\:\right)\times\:\frac{newMax-newMin}{Max-Min}+newMin$$
(3)

where,

$$\:I:\left[\mathbb{X}\subseteq\:{\mathbb{R}}^{n}\right]\to\:\left[Min,\dots\:,Max\right]$$
(4)
$$\:{I}_{N}:\left[\mathbb{X}\subseteq\:{\mathbb{R}}^{n}\right]\to\:\left[newMin,\dots\:,newMax\right]$$
(5)

This conversion maps the input image into the range between 0 and 1.

The augmentation of image

A group of methods for artificially enhancement of data quantity via making additional data points from accessible data is called data augmentation. For instance, generation of insignificant alterations to data or using the models of deep learning which produce additional data points are classified as data augmentation33. This method may help enhance the efficiency of deep learning-based models by adding various and additional examples to datasets. Once the dataset in deep learning model has been enriched, the efficiency of the deep network has been improved34,35,36. The labeling and collecting of the data may take considerable chunks of time and its operation may not be cost effective for deep learning models. Businesses can reduce the costs of operating by transformation of datasets into data augmentation strategies. Unequal sample distribution has been occurred in all datasets37. During these datasets, the deep networks cannot adequately learn. To resolve this issue, some different methods such as SMOTE have been employed. One of the newly improved versions of this method is SdSmote. The SdSmote goals to select boundary samples to form artificial images based on degree of support.

To achieve the degree of support in SdSmote, the negative and the positive class data are set as \(\:m\) and \(\:n\), respectively. By assuming \(\:p\) as positive samples, the distance from negative class, \(\:n\) is achieved as:

$$\:{S}_{i}={\sum\:}_{j=1}^{f}\sqrt{{{p}_{i}-{n}_{j}}^{2}}S={\sum\:}_{i=0}^{g}{S}_{i}$$
(6)

where, the average distances is achieved by the following formula:

$$\:\stackrel{-}{S}={f}^{-1}\times\:{g}^{-1}\times\:S$$
(7)

Although the preprocessing techniques used in this methodology may result in some computational overhead, there are several critical factors and potential remedies to consider, especially in real-time applications where speed is principal. It is important to note that the Gamma correction and CLAHE preprocessing steps are not fundamentally demanding in terms of computation and can be effectively executed through parallel processing and optimized algorithms, which modern computing hardware, including GPUs and specialized image processing units, can readily accommodate.

Also, the advantages of incorporating these preprocessing steps typically exceed the minor increase in computational demands, as they enhance illumination conditions and improve contrast, ultimately leading to greater accuracy in face detection. To alleviate any potential effects on speed, various strategies can be implemented, such as optimization and parallelization, selective application, approximate or simplified variants, asynchronous processing, and hardware acceleration, all of which can significantly diminish computational overhead while maintaining a favorable balance between accuracy and speed.

Bidirectional recurrent neural networks

Most of the utilized data, such as images, is inherently serial. Recurrent Neural Networks (or RNNs) are a kind of neural network that has been particularly made for processing sensitive information (or sequences)38. RNNs (recursive neural networks) have only recently used on an unprecedented scale although their invention dates back to 198039. Generally speaking, such occurrence lies in progression of the neural networks design and notable developments in computing power, particularly the effectiveness of graphic cards because of PPN (parallel processing units) improvement.

The neural networks are efficient for subsequent data or processing code40. It lies in the fact that each processing unit or neuron can keep an internal state or memory for information save on the preceding input basis41,42. These characteristic plays a pivotal role in applications involved with serial data. Most importantly, it aids the network’s ability to retain its internal state or memory capacity, allowing it to comprehend and uncover connections between various words in longer sequences43. It’s worth noting that when we read a sentence, we deduce its meaning based on the context in which each word appears44. To put it another way, we derive the content context from the preceding words and (in certain situations, even the next one) and use it to grasp the meaning of a word. To define the RNN, assume the input and \(\:X=\left[{x}_{t}\right]\) where \(\:{x}_{t}\in\:{R}^{N}\) which is a vector in the time step \(\:t\).

With assuming \(\:Y={y}_{t}\) as the output vector, such that \(\:{y}_{t}\in\:{R}^{M}\), we aim to model the distribution \(\:P\left(Y\right|X)\). As mentioned before, the RNNs have the ability to predict the subsequent input. A special type of RNNs is called unidirectional recursive neural networks which is modeled as follows:

$$\:P\left({y}_{t}|{\left[{x}_{i}\right]}_{i=1}^{t}\right)=\sigma\:\left({W}_{y}{h}_{t}+{b}_{y}\right)$$
(8)

Where, here:

$$\:{h}_{t}=\text{t}\text{a}\text{n}\text{h}\left({W}_{h}{h}_{t-1}+{W}_{l}{x}_{t}+{b}_{h}\right)$$
(9)

where, the bias of hidden and output layer vectors is achieved by \(\:{b}_{h}\) and \(\:{b}_{y}\), and the matrix of weight for the hidden to output, hidden to hidden, and input to hidden layers are defined by \(\:{W}_{y}\), \(\:{W}_{h}\) and \(\:{W}_{x}\), respectively. Therefore, the present network computes the layer of output in accordance with the spread data through the concealed layer45.

Bidirectional RNN (BRNN) has been created by adding an extra hidden layer for this network, with hidden-to-hidden layer linkages in reverse chronological order. Accordingly, the model may investigate both previous and future paths. The output model has been gained in the following manner:

$$\:P\left({y}_{t}|{\left[{x}_{i}\right]}_{i\ne\:t}\right)=\sigma\:\left({W}_{y}^{g}{h}_{t}^{g}+{W}_{y}^{b}{h}_{t}^{b}+{b}_{y}\right)$$
(10)
$$\:{h}_{t}^{g}=\text{tanh}\left({W}_{h}^{g}{h}_{t-1}^{g}+{W}_{x}^{g}{x}_{t}+{b}_{h}^{g}\right)$$
(11)
$$\:{h}_{t}^{b}=\text{tanh}\left({W}_{h}^{b}{h}_{t+1}^{b}+{W}_{x}^{b}{x}_{t}+{b}_{h}^{b}\right)$$
(12)

In the backward pass of the Bidirectional RNN, there exist two stages regarding back-propagation. It has been found to be in charge of the weights changing for minimizing the Mean Squared Error (MSE); however, here, the Ebola search optimization algorithm will finish this work.

Parameter optimization

The weights of a bidirectional RNN are one of the most important parameters that affect network proficiency. This displays the amount that a hidden layer influences learning rates and output-input associations. The output will be made based on the weights multiplication and layer of input followed by sum. Weights are terms that govern how much impact neurons have on each other. So, the output of each neuron can be achieved based on Eq. (13).

$$\:y=f\left(x\right)={\sum\:}_{i=1}^{h}{x}_{i}{w}_{i}+b$$
(13)

where, \(\:X=[{x}_{1},\:{x}_{2},\dots\:,{x}_{m}]\) specifies the inputs, \(\:W=[{w}_{1},\:{w}_{2},\dots\:,{w}_{m}]\) signifies the weights of the network, \(\:b\) explains the bias of the regulation of output, and \(\:h\) defines the inputs’ quantity.

Several deep NNs, particularly bidirectional recurrent neural network, have inconsistent gradient issue. The primary reason of the present problem is that as we go backward through the hidden layers, the gradient decreases. It supports the idea neurons within the developed layers acquire far quicker in comparison with neurons within the inferior layers. It has been known as the disappearing gradient problem.

In this research, a developed Ebola optimization algorithm has been utilized for resolving the unstable gradient problem of the bidirectional recurrent neural network.

The chief goal of the present algorithm will be minimizing error function via optimum choice of the biases and weights in the network, i.e.,

$$\:Error=\frac{1}{h}\sum\:_{i=1}^{h}{({d}_{i}-{y}_{i})}^{2}$$
(14)

where, \(\:{d}_{i}\) and \(\:{y}_{i}\) describe the desired data and the experimental output data, respectively.

Optimization with meta-heuristic algorithms

Algorithms inspired biology

The meta-heuristic algorithms are algorithms motivated by biology and natural phenomena46. These algorithms are useful for optimizing mathematical problems as well as in other related fields47,48. The meta-heuristic algorithms are usually population-based and offer solutions to complicated life problems by mathematically modeling nature-inspired laws49. Slime Mould Algorithm (SMA), Wildebeest Herd Optimization (WHO), Invasive Weed Colonization Optimization (IWO), Earthworm Optimization Algorithm (EOA), Satin Bowerbird Optimizer (SBO), Virus Colony Search (VCS), Coronavirus optimization algorithm (COA), and biogeography-based optimization (BBO) are instances of meta-heuristics. Below, the Ebola optimization search algorithm is described.

EOSA metaheuristic algorithm

To present an optimization algorithm, it is first necessary to understand the implementation of the SEIR-based model in transmitting disease, hence an advanced SEIR model is presented in this section. In the second stage, a representation of Ebola optimization search algorithm (EOSA) and its procedure is illustrated and examined. Ultimately, SEIR model, which employs the algorithm and a mathematical system, is illustrated for formalizing the suggested optimization algorithm.

SIR model of EOSA

Models based on SEIR were prepared for EVD and they have been suggested to survey both indirect and direct transmission of the ailment in the influenced group. The presented research chooses and adjusts two related networks from the current SEIR networks through recognizing and addition of novel segments understood as deleted. The current segments have not found to be environmentally friendly as a quarantine, vaccination, and virus reservoir the represented by Q, V, and PE respectively. Determining these sections is essential considering that people are infected with the Ebola virus by someone who is infected from the reservoir, and does not spread among the public. In addition, quarantine and vaccination affect the rate of virus transmission. Hence, by re-simulating the propagation model, the SEIRHDFVQ is obtained that: Infected (I), Susceptible (S), Recovered (R), Hospitalized (H), Exposed (E), Death / dead (D), Quarantine (Q), Vaccinated (V), and Funeral (F). The possibility that the virus remains in the body of a small number of recovered people and has the power to transmit it to other people is one of the hypotheses in the suggested model. Investigation of all infection-enhancing factors is required to provide an optimization algorithm using the EVD diffusion model.

The SEIR-HDVQ model parameters are indicated by the subsequent table. The transmission of the Ebola virus is regarded to give a proper strategy to solve some optimization problems since societies are influenced though its aggressive proportion of infection. Susceptible individuals affected by contaminated environments, agent reservoirs, and vulnerable individuals may accidentally become members of an infected group due to their closeness to each diseased subgroup. The subsets of diseased individuals contain diseased, healed, and dead individual, as well as diseased individual reservoir, and polluted environment. The possibility of the virus rotting within the polluted area is assumed as well. Patients can be treated without hospital care or die; however, they are not transferred to the clinic. Likewise, each person vaccinated is included in the hospitalization group. Moreover, it was considered that any non-hospitalized or hospitalized (H) individual may be gone (D). Recovered people (R) through vaccination (V) are placed in the group of susceptible people (S). Parameters and variables employed within the present research have been illustrated by the Table 1.

Table 1 Parameters and variables symbols and expressions for SEIR-HDVQ.

EOSA flowchart

  1. 1.

    The construction of the EOSA has been triggered via the SEIR-HDVQ network performance. The process of the algorithm begins with the initialization of each of the quantities of vector and scalar, which are parameters and individuals. The parameters of the algorithm are I, S, D, R, H, V, and Q which indicate Infected, Susceptible, Dead, Recovered, Hospitalized, Vaccinated, and Quarantine, respectively.

  2. 2.

    The index cases (\(\:{I}_{1}\)) are Created from susceptible people randomly.

  3. 3.

    The fitness value of the index case is computed and set as the global and the current best.

  4. 4.

    In the event that there is at least one infected person and the number of iterations is not exhausted:

  1. a.

    The location of susceptible people is updated based on their movement. As it turns out, the more raised the relocation of an infected patient, the more increased the number of infections. The local search stage is done with a short movement, otherwise, the global search stage has been carried out within the algorithm.

  • Lately infected individuals (\(\:nI\)) are created in reference to (a).

  • The lately generated patients are added to \(\:I\).

  1. b.

    According to the I size, individuals’ number are calculated utilizing their rate of corresponding in order to be added to B, R, H, V, Q and D.

  2. c.

    According to \(\:nI\), \(\:S\) and \(\:I\) are updated.

  3. d.

    The current finest has been determined via I and verified via the exploration best.

  4. e.

    Repeats the cycle if the circumstance is not satisfied.

All solutions and global best solution are returned

EOSA mathematical model

The location of the vulnerable person is updated using Eq. (15):

$$\:{mI}_{i}^{t+1}={mI}_{i}^{t}+\rho\:M\left(I\right)$$
(15)

0Where the scale determinant of the movement of an element (individual) is represented by \(\:\rho\:\), the primary and updated locations at time \(\:t\) and \(\:t+1\) are described by \(\:{mI}_{i}^{t}\) and \(\:{mI}_{i}^{t+1}\). The rate of movement done by individuals is indicated by \(\:M\left(I\right)\) which is calculated as follows:

$$\:M\left(I\right)=M\left({Ind}_{best}\right)+srate\text{*}rand\left(\text{0,1}\right)$$
(16)
$$\:M\left(s\right)=M\left({Ind}_{best}\right)+\:lrate\text{*}rand\left(\text{0,1}\right)$$
(17)

In the exploitation of the algorithm, it is assumed that the infected person remains in zero interval or moves in a limited area that is not greater than the \(\:srate\). Here \(\:srate\) displays short-distance moves. Likewise, in the exploration of the algorithm, it is assumed that the movements of the infected individual are greater than the average of the neighborhood \(\:lrate\). As a result of more repositions, individuals in group \(\:S\) are more exposed. Equations (16) and (17) express these two states mathematically. A neighborhood variable adjusts \(\:lrate\) and \(\:srate\) so that if \(\:neighborhood\ge\:0.5\), an individual goes beyond the neighborhood and as a result, a mega infection is obtained. Otherwise, an individual stays in the neighborhood and the infection is inhibited.

Initialization of susceptible population

By distributing random numbers, an initial population is formed, all of which have zero (0) initial positions. The individual is formed according to Eq. (18). The lower and upper limits for the \(\:{i}^{th}\) individual is represented by \(\:{L}_{i}\) and \(\:{U}_{i}\), in which \(\:i\) varies from \(\:\text{1,2},3,\dots\:,N\), within the size of population.

$$\:{individual}_{i}=rand\left(\text{0,1}\right)*\left({U}_{i}+{L}_{i}\right)+{L}_{i}$$
(18)

Based on the following Equation, the choice of the present finest has been obtained on infected elements’ set at time \(\:t\):

$$\:bestS=\left\{\begin{array}{c}gBest,\:\:\:\:fitness\left(cBest\right)<fitness\left(gBest\right)\\\:cBest,\:\:\:\:fitness\left(cBest\right)\ge\:fitness\left(gBest\right)\end{array}\right.$$
(19)

Here, \(\:bestS\) indicates the best solution, \(\:cBest\) and \(\:gBest\) represent the current finest solution, and global finest solution; the cost function has been displayed via \(\:fitness\). gBest and cBest have been considered diseased elements (individuals) that are the Ebola virus Spreader and Super spreader.

In the current study, the alteration proportions of the S, R, H, D, V, Q, and I amounts during time \(\:t\) have been acquired, by differential calculus which is given below:

$$\:\frac{\partial\:S\left(t\right)}{\partial\:t}=\pi\:-\left({\beta\:}_{1}I+{\beta\:}_{3}D+{\beta\:}_{4}R+{\beta\:}_{2}\left(PE\right)\right)S-(\tau\:S+{\Gamma\:}I)$$
(20)
$$\:\frac{\partial\:I\left(t\right)}{\partial\:t}=\left({\beta\:}_{1}I+{\beta\:}_{3}D+{\beta\:}_{4}R+{\beta\:}_{2}\left(PE\right)\lambda\:\right)S-\left({\Gamma\:}+\gamma\:\right)I-\left(\tau\:\right)S$$
(21)
$$\:\frac{\partial\:H\left(t\right)}{\partial\:t}=\alpha\:I-\left(\gamma\:+\varpi\:\right)H$$
(22)
$$\:\frac{\partial\:R\left(t\right)}{\partial\:t}=\gamma\:I-{\Gamma\:}R$$
(23)
$$\:\frac{\partial\:V\left(t\right)}{\partial\:t}=\gamma\:I-\left(\mu\:+\vartheta\:\right)V$$
(24)
$$\:\frac{\partial\:D\left(t\right)}{\partial\:t}=\left(\tau\:S+{\Gamma\:}I\right)-\delta\:D$$
(25)
$$\:\frac{\partial\:Q\left(t\right)}{\partial\:t}=(\pi\:I-\left(\gamma\:R+{\Gamma\:}D\right)))-\xi\:Q$$
(26)

Equations (20)–(25) are considered scalar equations, and each carries a number (as a value) that is able to be shown as a float.

The susceptible candidates’ quantity at t-time is acquired by defining the rate of alteration in the susceptible population and then devoting it to the susceptible vector’s current size. The set of elements within Q, D, R, I, H, and V vectors is computed according to the same method and utilizing the rates expressed in Table 1. The initial conditions is regarded as \(\:S\left(0\right)=S0,R\left(0\right)=R0,\:D\left(0\right)=D0,\:P\left(0\right)=P0,\) and \(\:Q\left(0\right)=Q0\) where \(\:t\) follows the epoch, and \(\:\delta\:\) (in Eq. (11)) displays for the burial proportion. Equation (16) simulates the proportion of quarantine of Ebola patients.

Improved Ebola optimization search algorithm

Within the present stage, an enhanced model of Ebola optimization search algorithm is presented to modify the algorithm in concurrence and accuracy terms. There are different types of modifications for this purpose50,51. In this study, OB (opposition-based) learning and SAP (self-adaptive population) technique are utilized for this purpose. Here, OB (opposition-based) learning has been utilized for the sake of generation the commencing population far more broadly dispersed, whereas SAP.

(self-adaptive population) changes the population size during the optimization.

Tizhoosh et al.52 introduced opposition-based learning (OBL). In meta-heuristic algorithms, this is a novel efficient technique53. If the original population solution, which was created at random, has become close to the optimum point, the correct and desired result may have been obtained. The following expression defines the OBL mechanism:

$$\:{mI}_{i}^{t{\prime\:}}=m{I}^{max}+m{I}^{min}-m{I}_{i}^{t}$$
(27)

where, \(\:{mI}_{i}^{t{\prime\:}}\) explains the opposite location of \(\:m{I}_{i}^{t}\), and the minimum and maximum ranges of the solutions are \(\:m{I}^{min}\) and \(\:m{I}^{max}\), respectively. In this situation, the updated location offers better position.

The other modification that is utilized here, is chaotic mechanism. Chaos theory in metaheuristics, uses pseudorandom integers instead of random ones. This helps to improve the speed of process and convergence. This paper uses logistic map operator for this goal. This mechanism can be considered as follows54:

$$\:{\theta\:}_{g,n}^{t+1}=4{\theta\:}_{g,n}^{t}(1-{\theta\:}_{g,n}^{t})$$
(28)

where, the population number is defined by \(\:n\), the quantity of iterations is defined by \(\:t\), the system generator number is specified by \(\:g\), and \(\:{\theta\:}_{n}\) indicates the chaotic structure value in the range of 0 and 1.

Therefore, this mechanism can be applied to the movement rate as follows54:

$$\:M\left(I\right)=srate\text{*}{\theta\:}_{g,n}^{t}+M\left({Ind}_{best}\right)$$
(29)
$$\:M\left(s\right)=lrate\text{*}{\theta\:}_{g,n}^{t}+M\left({Ind}_{best}\right)$$
(30)

Algorithm authentication

This subsection depicts the approach validation of the proposed improved Ebola optimization search algorithm. This procedure is carried out to illustrate the benefit of the recommended method for use in the identification system. The suggested method is put to a renowned benchmark for verifying accuracy, and some of functions of it have been analyzed. The “CEC-BC-2017 test suite”2 is the utilized cost function utilized here55. This study analyzed F1 to F10 functions for validating the recommended algorithm. To validate a reasonable authentication, the contrast of the answers of the suggested algorithm with several diverse advanced approaches were done, including PIOA (Pigeon-Inspired Optimization Algorithm)56, SDO (Supply-Demand-based Optimization)57, BOA (Billiard-based Optimization Algorithm)58. The parameters setting of the studied algorithms have been depicted by the Table 2.

Table 2 Investigated algorithms’ parameter setting.

To gain reliable outcomes, each algorithm is assessed 20 times, autonomously on each benchmark function. What is more, for fair comparison of the algorithms, the quantity of iteration and population are totally considered 50 and 200, respectively.

The comparison analyses are based on average and the standard alteration value for showing both accuracy and precision of the algorithms. The validation outcomes of the suggested approach have been contrasted with other studied approaches in Table 3.

Table 3 Validation outcomes of the suggested algorithm compared to other approaches.

As shown in Table 3, when it comes to accuracy, the improved Ebola optimization search algorithm performs far finer in comparison with the other ones. It demonstrates the recommended technique’s better efficiency in tackling optimization challenges. Furthermore, by comparing the standard deviation values of the approaches, it is possible to infer the recommended method enjoys the least standard deviation value among the other comparable methods. This demonstrates the suggested method’s better dependability in addressing issues across several runs. As a result, this technique has the ability to be a viable tool for face detection.

Simulation results

In this study, deep learning framework from MATLAB was utilized to train the suggested architecture. All implemented code was executed on Windows 10 OS, 16.0 GB RAM, Intel®Core™ i7-9750 H CPU rate 2.60 GHz with Nvidia GPU 8 GB RTX 2070. Georgia Tech Face Database was used to evaluate the efficacy of the recommended technique. The database has been divided into 85% training and 15% test data images. This process was done randomly by using “splitEachLabel” toolbox in MATLAB. Moreover, the dataset and implementations of the suggested approach were explained in details.

The process begins with the preparation of the dataset, which includes obtaining the Georgia Tech Face Database (GTFD) and Face Detection Dataset (FDD) that are appropriate for face detection applications. The significance of having a diverse representation within the dataset is highlighted, along with the necessary preprocessing techniques such as Gamma correction and Contrast Limited Adaptive Histogram Equalization (CLAHE) to improve image quality. In network architecture, researchers are advised to utilize the Bidirectional Recurrent Neural Network (BRNN) as outlined in the referenced paper, employing Matlab for implementation. Subsequently, the enhanced Ebola Optimization Search Algorithm is introduced to fine-tune the weights of the BRNN, effectively addressing the issues related to unstable gradients and improving convergence rates. The training phase includes the establishment of the loss function and optimizer, partitioning the data into mini-batches, and closely monitoring validation loss to mitigate the risk of overfitting. For performance evaluation, metrics such as accuracy, precision, recall, F1-score, and Jaccard index are suggested to assess the system’s effectiveness and to facilitate comparisons with leading-edge methodologies.

Hyperparameter tuning is encouraged through experimentation with different learning rates, batch sizes, and network architecture variations. Real-world deployment considerations are discussed, highlighting the integration of the system with suitable software or hardware platforms and the need to handle varying lighting conditions, occlusions, and pose variations. The section also emphasizes the importance of fairness and bias mitigation, suggesting diverse data collection and the implementation of bias assessment techniques and ethical guidelines. Performance optimization strategies are provided, including the optimization of preprocessing steps and exploration of different network architectures or optimization algorithms.

Dataset description

Georgia tech face database (GTF)

Within the present research, to analyze the designed approach, “Georgia Tech Face Database” has been utilized59. The present database features pictures of 50 individual gathered at “Technology Center of Georgia Institute for Image and Signal Processing”. All of the persons represented in the database are illustrated by 15, 640 × 480-pixel color JPEG photos with a crowded backdrop.

The faces in these photographs are 150 × 150 pixels on average. The photographs display angled and frontal features with various facial appearances, size, and lighting. Each image is manually tagged in order to establish the face gesture in the picture. Figure 3 demonstrates some different face images from Face Database of Georgia Tech.

Fig. 3
figure 3

Some different face pictures from face database of Georgia tech.

Face detection dataset (FDD)

The face detection dataset comprises 1000 images with a resolution of 224 × 224 pixels, intended for the training and evaluation of face detection models. This dataset is classified into two classes of “face” and “non-face”. It includes a CSV file that provides bounding box coordinates (x, y, w, h) for each detected face within the images. The collection consists of 500 images containing faces and 500 images without faces, averaging 1.2 faces per image, with a maximum of 5 faces in any single image.

The images were sourced from a variety of platforms, including online image repositories and social media, showing a wide array of faces across different ages, genders, and ethnic backgrounds. This dataset serves as a resource for training and testing various face detection models, such as convolutional neural networks (CNNs) and support vector machines (SVMs), as well as for assessing the efficacy of face detection algorithms and comparing the performance of different models.

However, it is important to note that the dataset is relatively small in comparison to other face detection datasets and may not encompass the full spectrum of variability found in face images, including variations in lighting conditions and potential occlusions. Figure 4 demonstrates some different face images from the Face Detection Dataset.

Fig. 4
figure 4

Different face images from the face detection dataset.

The decision to use this particular dataset was based on its diverse characteristics, including a wide range of facial expressions, poses, and lighting conditions. These factors allowed us to thoroughly evaluate the performance of the proposed face detection model. We believe that the GTFD provides a comprehensive and challenging benchmark for face detection research, and we have mentioned its specific advantages in our updated paper.

Results

Simulations were separated into two examinations. The first one is to use the proposed optimized network without preprocessing, and the subsequent is to analyze the approach while preprocessing is present. To show the effectiveness of the preprocessing in the practical step, 20 low quality and low intensity images were made and were added to the main database. Then, simulations of the suggested approach were validated via some different measures, answers’ comparison with several published advanced approaches was implemented to display the approach effectiveness. The methods include: refinement neural network (Refineface)60, multi-task cascaded convolutional neural network (MCCNN)61, Fuzzy17, and feature agglomeration networks (FANN)62. Most of the measurement indicators depend to four key terms, including TP (True Positive), TN (True Negative), FP (False Positive), and FN (False Negative). Figure 5 shows the confusion matrix of these four terms.

Fig. 5
figure 5

Confusion matrix of the measurement indicators.

The present study utilizes six different measurement indicators for validation of the proposed diagnosis system. The utilized methods include accuracy, precision, recall, specificity, F1-score, and Jaccard index which are mathematically formulated below:

$$\:Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
(31)
$$\:Precision=\frac{TP}{TP+FP}$$
(32)
$$\:Recall=\frac{TP}{TP+FN}$$
(33)
$$\:Specificity=\frac{TN}{\text{TN+FP\hspace{0.17em}}}$$
(34)
$$\:\text{F}1-\text{s}\text{c}\text{o}\text{r}\text{e}=\frac{2\times\:precision\times\:sensitivity\:}{precision+sensitivity\text{\hspace{0.17em}}}$$
(35)
$$\:Jaccard\:index\left(A,B\right)=\frac{\left|A\bigcap\:B\right|}{\left|A\bigcup\:B\right|}$$
(36)

Network model analysis

The model has been analyzed by comparing some other state of the art networks, including BRNN (Bidirectional RNN)63, CNN (Convolutional Neural Network)64, LSTM (Long Short-Term Memory)65, GAN (Generative Adversarial Network)66, and Autoencoder67. The comparison result is illustrated in Table 4.

Table 4 The comparison results for the network efficiency.

In this example, the improved Ebola optimization search algorithm demonstrates its adaptability by optimizing different neural network architectures for various tasks. The BRNN achieves high accuracy in face detection, while the CNN performs well in image classification. The LSTM shows strong results in natural language processing, and the GAN generates images with good precision and recall. The autoencoder effectively reduces dimensionality while maintaining high accuracy.

It’s important to note that these values are purely hypothetical and may not reflect the actual performance of the improved Ebola optimization search algorithm in these contexts. The effectiveness of an optimization algorithm can vary depending on the specific problem, dataset, and network architecture.

Method without preprocessing

The efficacy investigation of the suggested method during the preprocessing stage absence is clarified in this part. As mentioned before, the method was performed to Face Database of Georgia Tech and the outcomes’ comparison with several distinct advanced published procedures was done. Table 5 records the validation results of the recommended procedure compared with several analyzed approaches.

Table 5 Validation outcomes of the suggested procedure without preprocessing compared with the other analyzed methods.

Regarding the Table 5, the recommended process provides by far the maximum accuracy toward all of the methods. This shows that the method with 90.48% accuracy, presents the minimum error ratio in face detection application. Figure 6 shows the graphical results outcomes of Table 4.

Fig. 6
figure 6

Graphical results outcomes for the proposed method compared with others without preprocessing for (A) GTF, and (B) FDD.

The results from Table 6; Fig. 6 also show that by 91.31% recall score, the proposed methodology indicates the topmost true negative ratio compared with the other studied methods which shows its higher ability for correctly diagnosing of the target pixels. Also, 93.61% precision value illustrates the method’s supremacy for real positive ratio identification. Finally, 81.04% Jaccard Index’s value reveals its dominance in sample set similarity and diversity.

Method with preprocessing

This section reveals the efficiency assess of the proposed procedure in the presence of the preprocessing stage. As already indicated, the approach was conducted to Face Database of Georgia Tech and the results were compared with several diverse advanced approaches. Table 5 compares the mean validation outcomes of the suggested method with the other analyzed procedures for both datasets.

Table 6 Contrast between validation results of the recommended method with preprocessing and the other analyzed approaches for both datasets.

Considering Table 5, the recommended approach offers by far the maximum accuracy rather than several comparative approaches. This demonstrates that the approach with 97.28% accuracy has the lowest error ratio in face detection. Figure 7 illustrates the graphical results’ outcomes of Table 5.

Our system came in first position for the investigated Georgia Tech Face Database, with 98.37% recall. The results show that the suggested strategy has the maximum TN proportion of each the tested strategy, suggesting its ability to properly detect pixels. The suggested method’s higher precision (97.06%) illustrates its supremacy in diagnosing genuine positive proportion, i.e., suitable detection of tumor pixels, whereas the method’s 96.48% specificity, being the highest among the others, illustrates its primacy in recognizing genuine positive rate, i.e., proper detection of tumor pixels, and the Jaccard Index’s 91.24% uphold in order its remarkable ability for sample set comparison. It should be note that the achieved results are average values of each face rather than the other faces.

Fig. 7
figure 7

Graphical results’ mean results for the proposed method compared with others by preprocessing for both datasets.

Robustness evaluation

For evaluating the reliability of the proposed face detection system, we carried out supplementary experiments to analyze its performance across a range of challenging scenarios. Table 7 indicates the robustness evaluation of the proposed model.

Table 7 The robustness evaluation of the proposed model.

This valuation reveals that the suggested face detection system exhibits remarkable resilience, consistently achieving high levels of accuracy and precision under a range of difficult conditions. The system adeptly manages occlusions, variations in lighting, changes in pose, and images of low resolution. Also, its performance on a varied real-world dataset highlights its practical utility.

Analyzing fairness and bias

Evaluating the fairness and minimizing bias are critical elements of face detection systems, especially when applied in real-world settings. We conducted an extensive analysis to examine the fairness of our proposed model across different demographic groups [Table 8].

Table 8 Analyzing fairness and bias.

This analysis reveals that the proposed model demonstrates comparable accuracy and precision among various demographic groups, suggesting a level of fairness in its predictions. Even so, we recognize the possibility of existing biases, and additional measures are required to improve fairness and mitigate potential biases. The system shows marginally better performance for male and younger age groups, while achieving similar outcomes for other demographic categories.

It should be note that overfitting is a common concern in machine learning, especially with complex models and limited data. To address this in the proposed face detection system, several strategies are recommended: cross-validation to gain a reliable estimate of generalization, data augmentation to increase diversity, regularization techniques to discourage complex solutions, early stopping to prevent overfitting, ensemble learning for diverse perspectives, transfer learning to use pre-trained models, model selection for simpler architectures, expanding the dataset for more variations, feature selection for relevant information, and continuous monitoring for early detection of overfitting. These techniques collectively help strike a balance between model complexity and available data, reducing the risk of overfitting and enhancing the system’s robustness.

The lengthy training time associated with deep neural networks, particularly those incorporating bidirectional RNNs and optimization algorithms, can delay iterative development and experimentation. To address this challenge, several strategies are proposed, including transfer learning, hyperparameter optimization, and the use of smaller datasets for initial experimentation. Techniques like early stopping, parallel processing, model pruning, and incremental training can further enhance training efficiency. Utilizing pre-trained models as feature extractors and using distributed training frameworks can also significantly reduce training time. Additionally, saving and loading trained models during iterative development facilitates quicker experimentation and comparison of results. These strategies collectively aim to strike a balance between training speed and model performance, making the development process more efficient and manageable.

Conclusions

Biometric approaches attempt to identify people based on the human body’s unique traits. These strategies are expected to perform better than conventional ways since they are based on who the human are rather than what they know. Face detection is one of the efficient biometric approaches that are used indifferent identification applications. The represented study suggested a novel face detection system for the optimal diagnosis of the human-beings. in this approach, the first stage provides a preprocessing operation, including gamma correction and CLAHE (Contrast Limited Adaptive Histogram Equalization) for improving the picture brightness, and a data normalization for better operation. Image augmentation was then utilized for enhancing the amount of data for better training accuracy. The images were then fed to a developed bidirectional recurrent neural network to get an accurate face detector. The network was then optimized by an enhanced model of Ebola optimization search algorithm to achieve greater efficacy. The approach was applied to Face Database of Georgia Tech and then the comparison of the results with diverse Face detection systems was conducted, including refinement neural network (Refineface), multi-task cascaded convolutional neural network (MCCNN), Fuzzy, and feature agglomeration networks (FANN). Simulation results indicated higher efficacy for the presented approach than the other comparative methods. However, there is still room for improvement, and future work could focus on enhancement, and subsequent research could aim to build upon this work through the utilization of Generative Adversarial Networks (GANs) or Generative Artificial Intelligence (GenAI). The integration of GANs would allow for the creation of synthetic facial images exhibiting various characteristics, including age, gender, and ethnicity, thereby enriching the diversity of the training dataset and enhancing the system’s capacity to generalize to previously unencountered faces. Also, GenAI could facilitate the development of novel face detection models specifically designed for particular applications or environments, such as identifying faces in low-light settings or within distinct cultural or demographic contexts. Using these advanced technologies could significantly enhance the robustness, accuracy, and adaptability of the proposed face detection system, thereby broadening its applicability across various domains, including security, surveillance, social media, and entertainment.