Introduction

The advancement of energy systems utilizing renewable resources is currently on the rise, aimed at mitigating the constraints associated with fossil fuel dependency1,2,3. Anaerobic digestion is a sustainable and natural process for managing organic waste, wherein microorganisms convert it into biogas and digitate it in the absence of oxygen. Biogas can serve as an environmentally friendly energy source for electricity, heating, or transportation4,5,6. It can also be used as an ingredient in the manufacturing of food and beverages7,8. Additionally, digestate, a byproduct of biogas production, can be applied as a fertilizer or soil enhancer to improve soil quality and increase crop output9. Anaerobic digestion reduces greenhouse gas emissions, waste volume, and odor, while also increasing farm revenue and stimulating the rural economy10,11. Anaerobic methods are used to treat sewage sludge, which is the residual material in solid, semisolid, or slurry form remaining after wastewater treatment12. These methods have the ability to reduce the dimensions and mass of sludge, eradicate pathogenic agents, and produce biogas that may be utilized as a sustainable energy resource13. Anaerobic digestion, which has four steps: hydrolysis, acidogenesis, acetogenesis, and methanogenesis, is one of the most frequent anaerobic processes for sludge treatment14. The major advantages of anaerobic digestion include the generation of biogas, which can be utilized as a source of power and warmth, and the stabilization of sludge, resulting in reduced odour and microbial content15. The primary disadvantages include the protracted time required, susceptibility to fluctuations in temperature and pH, and the potential emission of greenhouse gases that contribute to climate change16.

Currently, machine learning-assisted models are extensively utilized across multiple domains, particularly in the forecasting of energy systems. In this area, a significant number of researchers have documented the application of these models. For example, Ashraf et al.17 proposed a detailed, systematic approach for the integration of Industry 4.0 within a functional coal power plant. The methodology employs both traditional and advanced artificial intelligence techniques to facilitate extensive data visualization. Additionally, Monte Carlo simulations are conducted on artificial neural network (ANN) and least squares support vector machine (LSSVM) process models, alongside interval adjoint significance analysis (IASA), to identify and remove non-essential control variables. In18, a modeling and optimization framework based on AI (artificial intelligence) was created and implemented to enhance the smart and efficient functioning of a 660 MW supercritical coal power plant. The potential of AI model for the industrial competitiveness was demonstrated. AI model-based analysis could enhance the performance of coal power plant. Further, 1.3% improvement in thermal efficiency and 50.5 kt/y of emissions were reduced. The power generation capacity of the plant was reduced on the tighter emissions constraint. Krzywanski et al.19 developed the role of AutoML and fluidized bed innovations in the double-effect desalination and cooling production in adsorption chillers (AC). The new approach could allow considering various cooling and desalination adsorption systems. Those models could constitute powerful tools for optimizing ACs systems’ performance. Shahzad et al.20 explored the application of machine learning in predicting the production of solar-derived liquid fuels. The findings of this research highlighted the significant implications of solar liquid fuel technology as a sustainable solution within the renewable energy sector. Furthermore, this study marked a pivotal advancement in renewable energy storage, contributing to efforts aimed at addressing the pressing issue of climate change on a global scale. Ahmad Jamil and colleagues21 devised a machine learning-driven approach for the design of an innovative sustainable cooling system; specifically a Novel Indirect Evaporative Cooler characterized by high efficiency and minimal maintenance needs. The system successfully maintained supply air temperatures within the range of 23 to 27 °C. Notably, a temperature reduction of up to 20 °C was attained even under extreme climatic conditions. The highest coefficient of performance recorded was 32, as determined through artificial neural network modeling and experimental validation.

Sludge that is processed by anaerobic digestion can be further treated and disposed of using several ways, including dewatering, drying, composting, incineration, or land application22. The choice of the most suitable approach depends on several factors, including the quality and quantity of the sludge, the availability of land and markets, the regulations pertaining to the environment and health, and the economic and societal costs and benefits. Urban sewage treatment plants generate different amounts of sludge based on the kind and effectiveness of the wastewater treatment methods, the properties of the wastewater, and the ways of handling the sludge23. Generating and utilizing biogas is a challenging endeavor due to the substantial financial investment and labor required, as well as issues related to efficiency, technical difficulties, competition in the market, and uncertainties surrounding policies. Hence, it is crucial to ensure the affordability and profitability of biogas, enabling its long-term sustainability and strong competitiveness within the energy market24.

To produce more biogas, the organic substrates need to be made easier to digest by using different pre-treatment procedures. These procedures involve applying physical, chemical, biological, and thermal forces, such as crushing, adding acid, using enzymes, and heating up25. These forces can make the organic matter more available and soluble, and therefore speed up and increase the biogas production. The treatment of domestic or industrial wastewater produces solid or semi-solid wastes called wastewater sludges. They have a lot of organic matter and nutrients, and can be a source of biogas production. However, they need to be treated before using them for biogas production26. A new method is to use nanomaterials, such as tiny particles, tubes, and fibers, that can make the biogas production better by making the digestion faster, the toxicity lower, and the gas quality higher27. To get the most biogas, the treatment parameters also need to be optimized28. These are the factors that influence how the biogas is produced, such as how hot, how acidic, how well mixed, how long, and how much. These factors need to be monitored and changed to make sure the microorganisms that make the biogas work the best. The production of biogas and biosolids have a negative correlation. This implies that as the biogas output increases, the biosolids output decreases, and the other way around. This is due to the fact that biogas is formed by the decomposition of organic matter in the wastewater sludge, and biosolids are the solid or semi-solid residue after the treatment process. Hence, the less organic matter remains in the biosolids, the more organic matter is transformed into biogas29. Organic matter is decomposed by a biological process called anaerobic digestion (AD) without oxygen. Sludge from municipal wastewater treatment plants (MWTPs), which are facilities that clean sewage and industrial wastewater from pollutants, is commonly treated by AD. AD lowers the sludge volume and turns it into biogas, a gas mixture that can be used as a green energy source. Biogas mainly has methane (CH4​) and carbon dioxide (CO2​). Biogas also has small amounts of other gases (H2​S and NH3), which smell bad and can damage human health and the environment. Biogas must be cleaned and improved before it can generate heat and electricity. The most important part of biogas is the CH4, which has a lot of energy and can be used as fuel. How much CH4 is produced depends on many things, like the kind and quality of the waste, the heat and acidity of the digestion, and the things that can slow down or speed up the process. By changing and improving these things, the CH4 production can be made higher, which makes the AD process work better.

Ahmad et al.30 employed three machine learning techniques—Gradient Boosting, eXtreme Gradient Boosting, and Light Gradient Boosting Machine Regression—to predict biodiesel yield from waste cooking oil, while also applying a genetic algorithm for optimization purposes. The study detailed in31 focused on the prediction and optimization of biodiesel production yield and the physicochemical characteristics of waste cooking oil. This was achieved through the application of machine learning techniques, response surface methodology, and genetic algorithms, aimed at enhancing the efficiency of CI engines. The findings indicated that the Gradient-Boosting model exhibited the highest accuracy in its predictions, as determined by the coefficient of determination. A similar study reported that the synergistic impact of biodiesel and biogas led to a decrease in NOx and smoke emissions; nevertheless, the emissions of hydrocarbons and carbon monoxide were observed to be elevated32. The assessment and enhancement of the performance and emission attributes of a dual-fuel diesel engine utilizing biogas and biodiesel were conducted. In33, a Box–Behnken design was utilized to create an L29 orthogonal array that incorporated four factors at three distinct levels. The optimal engine performance metrics determined for brake thermal efficiency, brake specific fuel consumption, carbon monoxide, hydrocarbons, nitrogen oxides, and smoke emissions were found to be 28.3%, 0.3 kg/kWh, 3 g/kWh, 0.026 g/kWh, 4.42 g/kWh, and 42.85%, respectively. Ahmad et al.34 intensified biodiesel synthesis using ultrasonic-assisted technology. A Hybrid RSM-GA-PSO based approach was employed to understand the effect of operating parameters on the biodiesel yield. Significant process intensification benefits were observed in terms of higher yield and reduced time. The optimal combination of parameters could lead to a biodiesel yield of ~ 95.3%.Sharma et al.35 highlighted the promise of waste-to-hydrogen technologies in promoting sustainability and minimizing waste, emphasizing the need for efficient solutions for hydrogen storage and transportation.

Many things, like heat, acidity, how much waste is added, what kind of waste is added, how much starter is used, and how much carbon and nitrogen are in the waste, can change how much and how good the biogas is. Also, each city’s wastewater plant has different ways of treating the water, like using bacteria, filters, or membranes, which have different designs and settings36. So, it is hard to figure out how to change and improve the AD process to make more biogas, because each wastewater plant is different and has its own problems. Hence, many studies have been developed using data-driven models, for example, in37 the primary objective was to forecast and enhance the generation of biogas through the combined digestion of palm oil mill effluent (POME) and cow manure (CM) using a solar bioreactor in conjunction with an ANN-PSO model. To achieve this, the researchers employed various proportions of POME and CM while incorporating hydrogen peroxide and ammonium bicarbonate to augment the biogas output. Data collected from the experimental process were utilized to train and evaluate the ANN-PSO model. The outcomes indicated that the ANN-PSO model exhibited exceptional precision and adaptability when it came to predicting biogas production. The numerical findings were as follows: mean squared error (MSE) = 0.0143, correlation coefficient (R) = 0.9923, and the maximum biogas yield was recorded as 0.64 L/g volatile solids (VS) under the conditions of 80% POME, 20% CM, 0.5% hydrogen peroxide, and 0.1% ammonium bicarbonate.

In38 the primary objective was to create and compare various data-centric models that could estimate and optimize the biogas output from a municipal wastewater treatment plant (MWTP) anaerobic digestion process. The authors employed regression, artificial neural network (ANN), and adaptive neuro-fuzzy inference system (ANFIS) models with both processed and unprocessed input variables. Additionally, they performed an uncertainty analysis utilizing Monte Carlo Simulation. Moreover, genetic algorithm (GA) was used to determine the ideal operating parameters for maximizing the production of biogas. The regression model displayed a correlation coefficient (R) of 0.81, a root mean squared error (RMSE) of 0.95, and an index of agreement (IA) of 0.89. The ANN model, on the other hand, showed a higher R value of 0.94 while maintaining a lower RMSE of 0.51 and a higher IA of 0.96. Furthermore, the ANFIS model demonstrated even better results with an R value of 0.97, an RMSE of 0.35, and an IA of 0.98. In terms of maximum biogas production rates, the regression model recorded the highest rate at 28.6 m3/min, while the ANN and ANFIS models had slightly lower rates at 22.0 m3/min and 23.1 m3/min. These findings suggest that the ANFIS model is the most effective in predicting and optimizing biogas production from a MWTP anaerobic digestion process.

The anaerobic digestion process of each MWTP varies depending on how it operates and where it is located. This includes the Nanjing Jiangnan Wastewater Treatment Plant (NJWTP), which is the focus of this study. Therefore, it is important to develop data-driven models that are tailored to each facility and can provide accurate and reliable results for optimizing biogas production. However, the modelling framework presented in this study can also serve as a useful tool for creating other models that are specific to different facilities around the world. In this regards, this study presents an innovative approach to enhancing biogas production through the anaerobic digestion of NJWTP. Utilizing data-driven modeling and optimization methods, the research focuses on improving the sustainability and cost-effectiveness of waste-to-energy conversion processes. The core of the study involves the comparison of three distinct models: DBN, DBN-OOA, and DBN-BOOA. Consequently, Deep Belief Network models were formulated using both standardized (PCA) and raw input data. Subsequently, these models were examined using OOA and BOOA to ascertain the ideal process variables for the greatest biogas yield. In summary, the research objectives and contributions can be stated as follows:

  1. (1)

    Employing analytical methods based on data to enhance the eco-friendliness and economic efficiency of transforming waste into energy.

  2. (2)

    Evaluating three different frameworks: a standard DBN, a DBN-OOA, and a DBN-BOOA, to determine the most effective in precision and optimization potential.

  3. (3)

    Determining the best operational conditions for the highest yield of biogas.

  4. (4)

    Enhancing biogas output is advantageous for increasing energy generation from the anaerobic digestion of municipal wastewater treatment plant (MWTP) sludges, and concurrently, it contributes to the diminution of MWTP biosolids, thereby potentially lowering the expenses linked to their management.

  5. (5)

    Achieving remarkable performance metrics, including a correlation coefficient, a root mean square error, and an index of agreement.

The model proposed in this study has been modeled and analyzed in such a way that it can be easily generalized to other similar biogas plants. This can be implemented by changing the input data and operating conditions of the desired biogas plants.

Modelling and optimizing wastewater treatment

The following sections explain each part of this study in detail.

Anaerobic digestion process and data at the NJWTP

As a state-of-the-art plant, the Nanjing Jiangnan Wastewater Treatment Plant contributes significantly to preserving the environment. It has a three-stage treatment system and can effectively eliminate microplastics from wastewater. The plant uses various technologies and processes, such as the A2O biological pool, denitrification deep bed filter, and membrane bio-reactor. The Nanjing Jiangnan Wastewater Treatment Plant is a BNR-type MWTP that treats the city’s sewage and also serves as an eco-friendly hub for environmental businesses. The plant faces challenges from changing seasonal temperatures, which affect its open-air treatment methods and cause variations in the heat of the wastewater and sludge. The plant uses three anaerobic digesters to convert the concentrated activated sediments from the dissolved air flotation (DAF) system and the fermenter into biogases, which are then used as an internal heat source. The plant also has an EGSB reactor, a high-performance anaerobic digester that can process high-strength wastewater and generate high-quality biogas39. The biogas from the EGSB reactor had a higher methane content (71.5%) and a lower carbon dioxide content (28.5%) than the biogas from the anaerobic digesters (66.7% and 33.3% respectively). The hydrogen sulphide content was relatively low, but varied from 0.01 to 0.05%. A variety of parameters are regularly measured by the staff at the Nanjing Jiangnan Wastewater Treatment Plant for the DAF unit and fermenter. These parameters consist of fixed solids (FS), total solids (TS), volatile fatty acids (VFA), volatile solids (VS), pH, thickened waste activated sludge (TWAS), and waste fermented sludge (WFS), among others shown in Table 1. The data obtained from these parameters is essential for the effective functioning and management of the plant. In total, 180 data points were gathered from 2016 to 2018 for the purpose of the current study. This section gives a summary of how the data set was obtained and processed in the research of biogas generation from sewage sludge anaerobic digestion at Nanjing Jiangnan Wastewater Treatment Plant. The correlation coefficient tests and Principal Component Analysis (PCA) examined the operating parameters to find the relevant input variables. The original and the examined factors were employed in the three scenarios for estimating biogas production rates. For Scenario 1, Fixed Solids (FS%) and Fixed Solids 2 (FS2%) were picked to substitute Volatile Solids (VS%) and Volatile Solids 2 (VS2%) because they had a high and meaningful correlation, which was predictable since they should total 100% (Fixed Solids%+Volatile Solids%~100%). The original input variables were Volatile Fatty Acid (VFA, mg/L), Total Solids (TS%), Fixed Solids (FS%), pH, and Waste Fermenter Sludge (WFS, m3/day) from the fermenter, and Total Solids 2 (TS2%), Fixed Solids 2 (FS2%), and Thickened Waste Activated Sludge (TWAS, m3/day) from the Dissolved Air Flotation unit. For Scenario 2, Total Solids (%) was related to Fixed Solids (%), Volatile Solids (%), Total Solids 2 (%), Fixed Solids 2 (%), Volatile Solids 2 (%), Volatile Fatty Acid (mg/L), and Thickened Waste Activated Sludge (m3/day), so it was chosen to stand for them. The input variables for Scenario 2 were Total Solids (%), pH, and Waste Fermenter Sludge (m3/day). For Scenario 3, Principal Component Analysis was used to the input variables to generate five novel and uncorrelated factors. This decision was based on the fact that the initial five eigenvalues collectively explained over 92% of the total variability.

Table 1 Assessment variables for Nanjing Jiangnan Wastewater Treatment Plant (NJWTP): spanning Lowest, Highest, and Mean values with standard deviation range.

Normalization techniques for deep neural network model

To train deep neural network models effectively and quickly, normalization techniques are essential. They transform features to a similar scale and ensure stability. Normalization aims to change the values of the numeric columns in the dataset to a common scale without losing the variation in value ranges40. Normalization technique is usually the final step in data preprocessing and should be done right before training machine learning models. This technique can generate a new range from an existing one. The following method is one of the different data normalization methods that can be applied:

  • Min-Max normalization method.

This technique scales the variable between 0 and 1, where 0 represents the minimum value of x and 1 represents the maximum value of x. The general formula of a min–max scaler is as follows:

$$\:MinMaxScaler=\frac{{x}_{i}-\text{m}\text{i}\text{n}\left(x\right)}{\text{max}\left(x\right)-\text{m}\text{i}\text{n}\left(x\right)}$$
(1)

Where the maximum and minimum values of the feature x is represented by \(\:\text{max}\left(x\right)\)and \(\:\text{m}\text{i}\text{n}\left(x\right)\).

Deep belief network

DBNs (Deep Belief Networks) are unsupervised learning methods used in artificial intelligence that rely on efficient algorithm and probability distributions. The origin of Deep Belief Networks is the machine Boltzmann which is modeled by Geoffrey Hinton41. The main principle of Deep Belief Networks is to extract abstract properties by using multiple layers of “hidden units”. These networks are widely used in pattern recognition tasks, such as regression and classification, especially in areas like speech recognition, computer vision, and natural language processing. DBNs have a significant advantage and it is their ability to learn from unstructured data, and it reduces the degree to which DBNs depend on structured data. Where the structured data is scarce, the mentioned feature of Deep Belief Networks becomes valuable, for example in image classification and text processing. Moreover, Deep Belief Networks ability to generalize across various tasks and data sets is recognized. A complex arrangement of layers that are connected together, forms DBNs (Deep Belief Networks). Every DBN consists of neurons which are arranged in a specific structure and their aim is to receive input data. The input layer shows the network’s first layer. “Hidden” neurons which extract properties from the input data, make up the hierarchical layers. The duty of the network’s final layer, the output layer, is to predict or classify the input data. All of the layers can be used to identify patterns in input data and generate predictions after completing the training processes.

In order to maximize the likelihood of the information provided, a configuration for the Deep Belief Network algorithm is needed. This is achieved through Optimal or maximum likelihood estimation. Different methods like bio-inspired and back propagation techniques can be used to optimize the Deep Belief Network. There are also different regularization techniques that can be used such as dropout and weight decay for addressing overfitting and improving model generalization capability. The-presenter suggests a new optimal version of the DBN method that can help improve its fundamental structure of the Deep Belief Network. While the number of data features determines the number of input nodes, on the other hand, the number of classes determines the number of output nodes. It explains why this approach is important in obtaining better results. The number of base-layer hidden layer neurons plus particle size affects its structure too. Therefore, there is an improved artificial intelligence method which is called boosted osprey optimizer that uses to produce a potential set initial systems. Figure (1) illustrates a typical Deep Belief Network architecture.

Deep Belief Networks (DBNs) represent a sophisticated deep learning architecture composed of a series of Restricted Boltzmann Machines (RBMs). This layered structure allows each subsequent layer to learn increasingly intricate features compared to its predecessors. DBNs exemplify a compelling intersection of unsupervised learning techniques and neural network frameworks. In the context of deep learning, DBNs are recognized for their adaptability and effectiveness. Their distinctive architecture, characterized by multiple RBM layers, differentiates them from other deep learning methodologies, such as autoencoders and standard RBMs, which typically process raw input data. DBNs initiate with an input layer that contains one neuron for each element of the input vector, progressing through several layers before culminating in an output layer that generates results based on the probabilistic information derived from the preceding layers42.

The foundational design of a DBN is anchored in a sequence of RBMs, where each RBM functions as a generative model that learns the probability distribution of the input data. As one advances through the layers of the DBN, it becomes evident that the first layer captures the essential characteristics of the data, while the subsequent layers concentrate on more abstract features. In applications such as classification or regression, the final layer of the DBN is instrumental in producing the desired outputs. A basic block diagram illustrating the architecture of a DBN is presented as follows:

Input Layer.

↓.

Hidden Layer 1 (RBM).

↓.

Hidden Layer 2 (RBM).

↓.

….

↓.

Hidden Layer N-1 (RBM).

↓.

Output Layer (Supervised Learning).

In this representation:

  • The “Input Layer” serves as the initial stage, comprising one neuron for each element of the input vector.

  • “Hidden Layer 1” functions as the first Restricted Boltzmann Machine (RBM), which captures the essential patterns within the data.

  • “Hidden Layer 2” and the subsequent layers consist of additional RBMs that extract increasingly abstract features as the data progresses through the network. The number of hidden layers can vary based on the complexity of the problem at hand.

  • The “Output Layer” is designated for supervised learning applications, such as classification or regression tasks.

The arrows denote the direction of information flow between layers, while the connections between neurons in adjacent layers signify the weights that are adjusted during the training process.

We train the Deep Belief Network (DBN) coupled with the Boosted Osprey Optimization Algorithm (BOOA) on our training dataset. The DBN-BOOA method identifies the most effective parameters, and constructs the network structure of the biogas production optimization model using these parameters. We then initialize multiple Restricted Boltzmann Machines (RBMs) using the contrastive divergence method. After the pre-training phase, we stack the RBMs to form the DBN, and fine-tune it using error back-propagation. The final biogas production optimization model relies on the DBN architecture that we developed and implemented using our method. Figure 2 illustrates the process of improving a method for optimizing biogas production.

Fig. 1
figure 1

The architecture of DBN.

Fig. 2
figure 2

The process of improving a method for optimizing biogas production.

Equation 2 describes the performance measure that is used for enhancing the performance criterion of the boosted osprey optimizer in the section of Deep Belief Network/ boosted osprey optimizer:

$$\:FF=\left(1-{\alpha\:}_{1}+{\alpha\:}_{2}+{\alpha\:}_{3}+{\alpha\:}_{4}+{\alpha\:}_{5}\right)\times\:\:{N}_{E}+{\alpha\:}_{1}\times\:\frac{n}{{\text{Y}}^{\text{m}\text{a}\text{x}}\:}+{\alpha\:}_{2}\times\:\frac{{\sum\:}_{i}{x}_{i}}{\text{z}\times\:{\text{M}}^{\text{m}\text{a}\text{x}}\:}+{\alpha\:}_{3}\times\:FPA+{\alpha\:}_{4}\times\:FNA+{\alpha\:}_{5}\times\:RA$$
(2)

where the number of error is denoted by \(\:{N}_{E}\). The value of the middle layer in the DBN is indicated by z. The total number of middle layers is represented by \(\:{\sum\:}_{i}{x}_{i}\), the maximum number of the Deep Belief Network is given by \(\:{\text{Y}}^{\text{m}\text{a}\text{x}}\), The weight coefficients are \(\:{\alpha\:}_{1}+{\alpha\:}_{2}+{\alpha\:}_{3}+{\alpha\:}_{4}+{\alpha\:}_{5}\) and they vary from 0 to 1. Finally, the recognition amount (RA), the false positive amount (FPA) and the false negative amount (FNA) are obtained as follows:

$$\:RA=\frac{FN}{FN+TP}$$
(3)
$$\:FPA=\frac{FP}{TP+FP}$$
(4)
$$\:\:FNA=\frac{FN}{FN+TN}$$
(5)

Several evaluation methods are available to classify the intrusion recognition technique such as the false negative amount, the recognition amount, ACC (accuracy), and the false positive amount. These can help to improve the network settings; the fitness decreases as the optimization increases.

$$\:ER=1-ACC$$
(6)

ACC indicates the accuracy rate of classification and is calculated as follows:

$$\:ACC=\frac{TP+TN}{TP+TN+FP+FN}$$
(7)

In intrusion detection, normal and abnormal data that are correctly classified are denoted by TN and TP, respectively. Normal and abnormal data that are incorrectly classified are referred to as false positives (FP) and false negatives (FN), respectively. Figure 3 illustrates how the error parameters are derived from the actual and predicted data.

Fig. 3
figure 3

Specifying the error parameter from predicted and actual data.

A model that reliably forecasts biogas production resulting from the anaerobic digestion of municipal wastewater is characterized by several key terms. Specifically:

True Positive (TP) refers to accurate predictions when the model correctly anticipates biogas production.

Conversely, a False Negative (FN) occurs when the model fails to predict biogas production despite it being present.

A False Positive (FP) arises when the model incorrectly predicts biogas production where none should occur.

Lastly, a True Negative (TN) signifies the model’s accurate classification of situations without biogas production. To achieve this, the method employs Eq. (2) to dynamically adjust the deep belief network’s structure, determining hidden layer depth and overall network configuration. Subsequently, an optimization algorithm fine-tunes the operating parameters to maximize biogas production.

DBN consists of a backpropagation neural network and a restricted Boltzmann machine. The latter is a two-layer neural network with an input layer and a hidden layer, and it has direct connections between nodes within the same layer if RBM is applied. The training process of DBN starts with the bottom layer, where the input data is fed, and then it moves up gradually through each RMB layer to gather information for the hidden layer, which serves as the input for the next network. This could be either a BP neural network or another RMB network. The hidden layers are composed of n RMBs, each labeled as “hidden layers”, and the \(\:{d}^{th}\) layer that is hd (where d goes from 1 to n).

The number of nodes in this layer is shown by Num (\(\:{h}_{d}\)). A DBN has an output layer, an input layer, and a specific number of neuron nodes in each hidden layer. The number of layer nodes of input is determined by the number of features, and the number of layer nodes of output is determined by the total number of data categories. This study uses a modified version of the metaheuristic, called BOO (Boosted Osprey Optimizer), to optimize the network structure and find the optimal number of hidden layers and num\(\:({h}_{d}\)).

Osprey optimization algorithm

The Osprey Optimization Algorithm is composed of two phases: broad-scale probing and localized progression. The initialization of the population in the conventional Osprey Optimization Algorithm mirrors that of other meta-heuristic algorithms.

Initializing Osprey Population, each osprey’s location serves as a potential solution to the issue. A matrix \(\:X\) of dimensions \(\:N\times\:\:D\), constructed from the locations of N ospreys, is utilized as the initial osprey population. The location of every osprey is determined randomly, as per Eq. (8):

$$\:{X}_{i,j}={lb}_{j}+{r}_{i,j}.\left({ub}_{j}-{lb}_{j}\right),\:i=\text{1,2},\dots\:,D$$
(8)

where \(\:{X}_{i,j}\) represents the starting location of the \(\:{i}^{th}\) osprey in the \(\:{j}^{th}\) dimension, while \(\:{lb}_{j}\) and \(\:{ub}_{j}\) denote the minimum and maximum limits of the jth variable of the problem, respectively. \(\:{r}_{i,j}\) is a random value within the range (0, 1); \(\:N\) signifies the total number in the population; \(\:D\) is the dimensionality of the problem’s solution; and \(\:j\) corresponds to the \(\:{j}^{th}\) dimension.

Each osprey symbolizes a possible answer to the problem, with the suitability of each answer being evaluated based on the fitness function F. The calculation of the cost value was performed using Eq. (9):

$$\:{F}_{i}=F\left({X}_{i}\right),\:i=\text{1,2},\dots\:,N$$
(9)

The cost value of the \(\:{i}^{th}\) osprey is denoted as \(\:{F}_{i}\), while its location is represented by \(\:{X}_{i}\).

Locating and Hunting (stage 1); Ospreys are known for their diet of fish. Once a fish is spotted underwater, the osprey launches an attack and dives into the water to capture its prey. This action results in a significant shift in the osprey’s position within the search space, marking the global exploration phase of the Osprey Optimization Algorithm (OOA). In the OOA, this behavior is simulated, and the positions of other ospreys with superior cost values in the search space are perceived as underwater fish by each osprey. Consequently, the location of every osprey is represented in Eq. (10).

$$\:{FL}_{i}=\left\{{X}_{s}\left|s\in\:\left\{\text{1,2},\dots\:,\:\left.N\right\}\right.\cap\:{F}_{s}<\left.{F}_{i}\right\}\cup\:\left\{{X}_{F}\right\},\:i=\text{1,2},\dots\:,N\right.\right.$$
(10)

where \(\:{FL}_{i}\) is the set of locations for the \(\:{i}^{th}\) osprey, \(\:N\) is the total number of ospreys, and \(\:{F}_{s}\)​ and \(\:{F}_{i}\)​ are the cost values of the \(\:{s}^{th}\) and \(\:{i}^{th}\) osprey, respectively. \(\:{X}_{F}\)​ is the finest osprey’s location. The osprey randomly finds and attacks a fish in the search space. The following formula shows how the osprey’s position changes when it moves towards the fish:

$$\:{X}_{i,j}^{L1}={r}_{i,j}.\left({CF}_{i,j}-{K}_{i,j}.{X}_{i,j}\right)+{X}_{i,j},\:i=\text{1,2},\dots\:,N;j=\text{1,2},\dots\:,D$$
(11)

The novel location of the ith osprey in phase one is denoted as \(\:{X}_{i}^{L1}\), with \(\:{X}_{i,j}^{L1}\) representing its \(\:{j}^{th}\) component. \(\:{X}_{i}\) denotes the initial location of the ith osprey, and \(\:{X}_{i,j}\) signifies its \(\:{j}^{th}\) component. The fish selected by the first osprey are represented as \(\:{CF}_{i}\) with \(\:{CF}_{i,j}\)denoting its j-dimension. \(\:{r}_{i,j}\) is a random value within the range of (0, 1), while \(\:{K}_{i,j}\) is selected from the set (1,2) randomly.

Equation (12) is used for the boundary when the new position goes beyond it. The new position becomes the lower bound value if it is smaller than the problem’s lower limit. The new position becomes the upper bound value if it is bigger than the problem’s upper limit.

$$\:{X}_{i,j}^{L1}=\left\{\begin{array}{c}{X}_{i,j}^{l1},\:{lb}_{j}\le\:{X}_{i,j}^{L1}\le\:{ub}_{j}\\\:{lb}_{j},\:{X}_{i,j}^{L1}<{lb}_{j}\\\:{ub}_{j},\:{X}_{i,j}^{L1}>{ub}_{j}\end{array}\right.$$
(12)

In the case where the updated location, as determined by Eqs. (11) and (12), yields a superior cost value, it supersedes the previous position. This operation is illustrated in Eq. (13), ultimately resulting in the acquisition of the osprey’s updated location.

$$\:{X}_{i}^{1}=\left\{\begin{array}{c}{X}_{i}^{L1},\:{F}_{i}^{L1}<{F}_{i}\\\:{X}_{I},\:{F}_{i}^{L1}\ge\:{F}_{i}\end{array}\right.$$
(13)

where \(\:{F}_{i}^{L1}\) refers to the cost value of the \(\:{i}^{th}\) osprey’s updated position following phase one. Meanwhile, \(\:{X}_{i}^{1}\) denotes the position occupied by said osprey after the completion of phase one.

Position the Fish at the Right Place (Stage 2); After catching a fish, the osprey moves it to a safe place for eating. This changes the osprey’s location slightly in the search area, which helps the OOA to explore locally. This is called the local development phase, and the position change follows Eq. (14). Like the global exploration step, this phase also needs boundary processing operations, shown by Eq. (15).

$$\:{X}_{i,j}^{L2}=\frac{{lb}_{j}+{r}_{i,j}.\left({ub}_{j}-{lb}_{j}\right)}{t}+{X}_{i,j}^{1},\:i=\text{1,2},\:\dots\:.,N,\:j=\text{1,2},\:\dots\:.,\:D;t=1,\:2,\:\dots\:,\:T$$
(14)
$$\:{X}_{i,j}^{L2}=\left\{\begin{array}{c}{X}_{i,j}^{L2},\:{lb}_{j}\le\:{X}_{i,j}^{L2}\le\:{ub}_{j}\\\:{ub}_{j},\:{X}_{i,j}^{L1}>{ub}_{j}\\\:{lb}_{j},\:{X}_{i,j}^{L2}<{lb}_{j}\end{array}\right.$$
(15)

where \(\:{X}_{i}^{L2}\) represents the updated location of the \(\:{i}^{th}\) osprey during stage 2, with \(\:{X}_{i,j}^{L2}\) denoting its j\(\:{j}^{th}\) dimension. The variable \(\:{r}_{i,j}\) denotes to a randomly generated number within the range of (0, 1). Additionally, \(\:t\) denotes the current iteration count of the algorithm, while \(\:T\) signifies the maximum number of iterations allowed. Like the global exploration phase, if the recalculated position demonstrates improved cost value per Eqs. (14) and (15), it supplants the prior position. As per Formula (16), the novel location of the osprey is derived at this step:

$$\:{X}_{i}^{2}=\left\{\begin{array}{c}{X}_{i}^{L2},\:{F}_{i}^{L2}<{F}_{i}^{L1}\\\:{X}_{i}^{1},\:{F}_{i}^{L2}\ge\:{F}_{i}^{L1}\end{array}\right.$$
(16)

where \(\:{X}_{i}^{2}\) indicates the location of the osprey following stage 2, and FP2i is the cost value of location \(\:{X}_{i}^{L2}\). After the previously described phases are finished, the Osprey Optimization Algorithm (OOA) updates the positions of every osprey and performs population calculations iteratively until the highest count of iterations is achieved or the optimal solution to the problem is obtained.

Boosted Osprey optimization algorithm (BOOA)

  • Sobol sequence utilization for initial population setup.

In the conventional osprey optimization technique, the initial population is set up randomly. Nonetheless, enhancement have been made by integrating a method for initializing the population that utilizes the Sobol sequence in the context of the BOOA structure. The subsequent distribution of osprey population positions is demonstrated in the following equation:

$$\:{X}_{i,j}={M}_{i,j}\left({ub}_{j}-{lb}_{j}\right)+{lb}_{j},\:i=\text{1,2},\dots\:,N;j=\text{1,2},\dots\:,D$$
(17)

where \(\:{M}_{i,j}\) denotes a value from the Sobol sequence that falls within the (0, 1) interval. In this research, the dimension of the search space is defined as \(\:D\) = 2 and the size of the population as \(\:N\:\)= 100. The lower limit is established at 0, while the upper limit is fixed at 1. The initial population formed using the Sobol sequence demonstrates a superior degree of uniformity and diversity.

  • Step size determination using Weibull distribution.

In the structure of BOOA, random values that adhere to a Weibull distribution are integrated as a factor determining the step size during the position update phase of the first stage in the osprey optimization algorithm. The equation below represents the probability density function of the Weibull distribution:

$$\:f\left(x;\:\eta\:,\:s\right)=\left\{\begin{array}{c}0,\:\:\:x<0\\\:\frac{s}{\eta\:}{\left(\frac{x}{\eta\:}\right)}^{s-1}.{e}^{-{\left(\frac{x}{\eta\:}\right)}^{s}}\end{array}\right.$$
(18)

where \(\:x\) stands for a variable that takes on random values, \(\:\eta\:\:>0\:\)is the scale variable, and \(\:s>0\) signifies the variable of shape. The Weibull distribution is incorporated into the location update Formula (11) during the first phase of the osprey optimization algorithm, leading to an updated equation for position updates as shown in Eq. (19):

$$\:{X}_{i,j}^{L1}={X}_{i,j}+{wblrnd}_{i,j}.\left({CF}_{i,j}-{k}_{i,j}.{X}_{i,j}\right),\:i=\text{1,2},\dots\:,N;j=\text{1,2},\dots\:,D$$
(19)

where \(\:{wblrnd}_{i,j}\) is indicative of a random step factor that adheres to the Weibull distribution, characterized by a scale variable of 1 and a shape variable of 0.5.

  • Firefly Disturbance.

After the second phase of the osprey optimization algorithm is finished, a disturbance method inspired by the Firefly Algorithm (FA) is implemented for each osprey population in this study. The main factors taken into account when modifying the firefly’s position are changes in light intensity and attraction. In the FA, the light intensity \(\:I\) changes monotonic and exponential variations with distance \(\:r\), and its corresponding fluorescence intensity \(\:I\) is formulated as:

$$\:I={I}_{0}.{e}^{-\gamma\:{r}_{i,j}}$$
(20)

Where \(\:{I}_{0}\) represents the peak light intensity, while \(\:\gamma\:\) denotes the light intensity absorption coefficient, a value set to 0.01 in this investigation. The inter-firefly attraction is directly linked to the light intensity observed by adjacent fireflies, with the attraction factor \(\:\alpha\:\) expressed as:

$$\:\alpha\:={\alpha\:}_{0}.{e}^{-\gamma\:{r}_{i,j}^{2}}$$
(21)

where \(\:{\alpha\:}_{0}\) refers to the attraction at \(\:r=0\), which is set to 1 for this research. Firefly i is drawn towards a more appealing and brighter firefly j, and upon its introduction to the second phase of the OOA, it updates its location utilizing Eq. (22). Subsequently, boundary processing is performed in accordance with Eq. (23):

$$\:{X}_{i}^{FA}={X}_{i}^{2}+\alpha\:\left({X}_{j}^{2}-{X}_{i}^{2}\right)+\beta\:(r-\frac{1}{2})$$
(22)
$$\:{X}_{i,j}^{FA}=\left\{\begin{array}{c}{ub}_{j},{X}_{i,j}^{FA}>{ub}_{j}\\\:{lb}_{j},{X}_{i,j}^{FA}<{lb}_{j}\\\:{X}_{i,j}^{FA},\:{lb}_{j}\le\:{X}_{i,j}^{FA}\le\:{ub}_{j}\end{array}\right.$$
(23)

where \(\:{X}_{i}^{2}\) and \(\:{X}_{j}^{2}\) represent the locations of firefly \(\:i\) and \(\:j\), respectively, after completing the second stage. \(\:{X}_{i}^{FA}\) denotes the locations after the firefly disturbance. The parameter \(\:\beta\:\) falls within the range of (0, 1) and is set to 0.2 in this study. The variable \(\:r\) follows a uniform distribution (0, 1). If the updated location, determined by Eqs. (22) and (23), yields a superior cost value, it replaces the previous location. By utilizing Eq. (24), the updated location of the osprey at this stage can be achieved:

$$\:{X}_{i}^{F}=\left\{\begin{array}{c}{X}_{i}^{2},\:{F}_{i}^{FA}\ge\:{F}_{i}^{L2}\\\:{X}_{i,j}^{FA},\:{F}_{i}^{FA}<{F}_{i}^{L2}\end{array}\right.$$
(24)

where \(\:{F}_{i}^{FA}\) represents the cost value of the location \(\:{X}_{i}^{FA}\). Furthermore, \(\:{X}_{i}^{F}\) corresponds to the location of the \(\:{i}^{th}\) osprey subsequent to the disturbed stage. The suggested BOO algorithm process is shown in Fig. 4.

Fig. 4
figure 4

The process of BOOA.

Validation of the proposed algorithm

The performance of the proposed BOOA was evaluated using different test functions. The test routines came from the “CEC-BC-2017 test suite”43. The effectiveness of the recommended method was measured by F1-F10. In addition, various newly developed metaheuristic algorithms were applied for validation and comparison purposes44. The applied algorithms include some recently presented metaheuristic algorithms, such as Butterfly Optimization Algorithm (BOA)45, Dwarf Mongoose Optimization Algorithm (DMOA)46, and Honey Badger Algorithm (HBA)47. The parameter settings are shown in Table 2.

Table 2 Parameters setting of algorithms.

The parameters are set to be as close as feasible to ensure a fair study. The population size is 200 and the maximum iteration is 300 in this case49. Each algorithm was run 40 times on each function individually to ensure consistency. The solution range for all test problems is -100 to 100. All test functions are in 10 dimensions50. Table 3 displays the experimental findings of the analyzed algorithms using mean and standard deviation values. The Boosted Osprey Optimization Algorithm (BOOA) demonstrates high accuracy across all benchmark functions, as indicated in Table 3. The results indicate that the Boosted Osprey Optimization Algorithm has a low standard deviation, making it reliable for optimization techniques across multiple runs.

Table 3 The experimental results of all algorithms on the considered test suite.

Optimizing of anaerobic digestion systems at the NJWTP

Enhancing the anaerobic digestion process at the MWTP could enhance the generation of additional biogas, leading to a more economical and dependable energy production. Outputs from data-oriented models can offer appropriate criteria for optimization methods like evolutionary algorithm (EA) to identify the most effective operational variables that boost biogas output. The Boosted Osprey Optimization Algorithm (BOOA) is a derivative-free search algorithm that is applicable to a wide spectrum of continuous and discrete fitness functions. It identifies the best solution from a vast array of potential solutions. The BOOA begins with an initial population of ospreys, each defined by features that represent different operating parameters of anaerobic digestion (for instance, pH, TS (%)). Each osprey, symbolizing a biogas production rate, signifies a potential solution. The superior solutions are assessed utilizing the provided fitness function and are chosen to undergo position updates and perturbations, leading to the creation of a novel population. Position updates are generally applied to the ospreys, transferring their strengths to expedite BOOA convergence. To escape getting stuck at potential local optima and to guarantee the achievement of a global optimum, perturbations are applied to a select number of ospreys by altering all or some features. After several iterations, the BOOA’s convergence stabilizes, indicating that the optimal solution has been found. The algorithm’s processing time is influenced by factors such as its structural complexity, initial population size, coefficients of position update and perturbation, and non-differentiable fitness functions, which may extend the computation time. More comprehensive descriptions of BOOA can be found in other research studies51.

Results and discussion

Evaluating biogas production models

80% of the data points were used for learning and 20% were used for verification to thoroughly examine models for estimating biogas output (DBN, DBN-OAA, DBN-BOAA). This was achieved by randomly dividing the whole data set into six parts and applying a cross-validation technique. The data-driven models were applied to each segment, and the overall model performance was determined by adding together the results of all six trials. Three metrics were used to assess this: the index of agreement (IA), the correlation coefficient (R), and the root mean square error (RMSE).

  • Root Mean Square Error RMSE.

$$\:RMSE=\sqrt{\frac{\sum\:_{i=1}^{n}({P}_{i}-{O}_{i})}{n}}$$
(25)

where \(\:{P}_{i}\:\)refers to the forecasted value, \(\:{O}_{i}\) denotes the actual value, and \(\:n\) signifies the total count of data points.

  • (R)

$$\:R=\frac{\sum\:_{i=1}^{n}\left({X}_{i}-\stackrel{-}{X}\right)\left({Y}_{i}-\stackrel{-}{Y}\right)}{\sqrt{\sum\:_{i=1}^{n}{\left({X}_{i}-\stackrel{-}{X}\right)}^{2}}\sum\:_{i=1}^{n}{\left({Y}_{i}-\stackrel{-}{Y}\right)}^{2}}$$
(26)

where \(\:{X}_{i}\) and \(\:{Y}_{i}\) represent the individual data points with \(\:i\), \(\:\stackrel{-}{X}\) and \(\:\stackrel{-}{Y}\) stand for the average values of the \(\:X\) and \(\:Y\) variables respectively, and n indicates the total count of observations or data points.

  • Index of Agreement (IA).

$$\:IA=1-\frac{\sum\:_{i=1}^{n}{\left({O}_{i}-{P}_{i}\right)}^{2}}{\sum\:_{i=1}^{n}{\left(\left(\left|{P}_{i}-\stackrel{-}{O}\right|\right)+\left(\left|{O}_{i}-\stackrel{-}{O}\right|\right)\right)}^{2}}$$
(27)

where \(\:{O}_{i}\)​ is the observation value, Pi​ is the forecast value, \(\:\stackrel{-}{O}\) is the average observation values, and \(\:n\) is the total number of data points.

Biogas generation rate estimation using deep belief network and boosted osprey optimization algorithm model

Figure 5 displays the results of the DBN, DBN-OOA, and DBN-BOOA models in forecasting the biogas flow rate (m3/min) from the NJWTP anaerobic digesters under three distinct scenarios. The DBN-BOOA model showed better consistency between the collected data and the model’s forecasts across all three scenarios when compared to the other models. The DBN-BOOA model yielded the most accurate estimations for the biogas production rate in Scenario 1, with statistical findings showing an R-value of 0.98, an RMSE of 0.015, and an IA of 0.99. This also implies that the processing of the input variables using correlations and PCA did not help to improve the models, and may have even made them worse. These results were somewhat expected, because the data related to anaerobic digestion are not linear, but nonlinear.

Fig. 5
figure 5figure 5

The results of the DBN, DBN-OOA, and DBN-BOOA models in forecasting the biogas flow rate (m3/min) from the NJWTP anaerobic digesters under three distinct scenarios.

This research and previous studies were compared by running different models for the first scenario. The method used, and the model performance metrics (R, RMSE, IA) for each method are summarized in the Table 4.

Table 4 Comparison of different methods for biogas production prediction.

According to the table, the most effective and efficient machine learning method for predicting biogas production from the input data is the DBN-BOOA. This method is a hybrid of a deep neural network, which is a type of artificial neural network with multiple layers of neurons that can learn complex patterns and features from the input data, and a metaheuristic optimization algorithm, which is a type of algorithm that can find near-optimal solutions to difficult optimization problems by exploring the search space in a stochastic and adaptive way. The DBN-BOOA method has the highest values of R and IA, and the lowest value of RMSE, which are the metrics that measure the accuracy and reliability of the machine learning models. The higher the values of R and IA, and the lower the value of RMSE, the better the performance of the model.

As mentioned earlier, three methods of DBN, DBN-OOA (Deep Belief Network and Osprey Optimization Algorithm) and DBN-BOOA were applied to find the best combination of parameters that maximizes biogas production from anaerobic digesters using real-world data. Figure 6 displays the outcomes of fitness function evolution and optimization utilizing DBN, DBN-OOA, and DBN-BOOA to enhance biogas production from anaerobic digesters. The fitness function quantifies the effectiveness of a solution for a specific problem. The higher the fitness, the better the solution. The optimization process includes creating and testing different combinations of input variables until finding the best ones. Each combination is called an iteration and the best one is the combination that has the highest biogas flow rate.

Fig. 6
figure 6

The outcomes of fitness function evolution and optimization utilizing DBN, DBN-OOA, and DBN-BOOA.

According to the figure, the maximum biogas flow estimations from the first iteration to the final iteration increased for each method, indicating a constant value of about 24.38 for DBN, 26.5 for DBN-OOA, and 31.35 for DBN-BOOA. This means that process optimization significantly improves biogas production and there is great potential to increase biogas yield in NJWTP facilities. The required optimized input variables are shown in Table 5. The table shows the values of the input variables that lead to the maximum flow rate of biogas for each method.

Table 5 The value of optimum input variables.

The DBN-BOOA method is the best for optimizing biogas production from the anaerobic digestion of municipal wastewater, based on the criteria and table data. It has the highest values for TS (6.28%), TS2 (9.84%), VS (78.9%), VS2 (75.7%), and VFA (769 mg/L), indicating a higher and balanced substrate concentration and availability for biogas production. It also maintains a neutral pH (7) and increases the recycling rate of WFS (531 m³/day) and TWAS (689 m³/day), resulting in more biogas and less sludge. Compared to the other methods and the measured operating ranges, the DBN-BOOA method provides the most optimal conditions for maximizing biogas production.

  • The issue of noisy data:

The issue of noisy data is a common challenge in real-world applications like MWTPs. Addressing measurement noise is critical to ensure the reliability and accuracy of model predictions and optimizations. The following strategies could be implemented to handle noisy data effectively:

  • Data Preprocessing and Filtering: Techniques such as moving averages, Gaussian filters, or Savitzky-Golay filters can be applied to smooth noisy data while retaining critical trends and patterns.

  • Robust Feature Engineering: Employing robust statistical methods, such as median-based or quantile-based transformations, can help reduce the impact of outliers and noise on the model inputs.

  • Noise-Resilient Model Architectures: The DBN-BOOA model could be augmented with mechanisms that make it less sensitive to noise, such as dropout regularization during training or adversarial training to improve robustness against perturbed inputs.

  • Outlier Detection and Removal: Implementing methods like interquartile range (IQR) analysis or z-score analysis can identify and exclude extreme outliers that may result from measurement errors.

  • Data Augmentation: Generating synthetic data by introducing controlled noise could help the model learn to generalize better in the presence of noisy inputs.

  • Ensemble Approaches: Combining predictions from multiple models or using ensemble methods can help reduce the influence of noise on final predictions, as each model may respond differently to noisy data.

  • Advanced Noise Modeling: Techniques such as denoising autoencoders or robust optimization methods can be integrated to explicitly account for and mitigate the effects of measurement noise during the training and optimization phases.

These approaches not only help to address the challenges posed by noisy data but also enhance the overall performance and reliability of the DBN-BOOA model in real-world MWTP operations. Future work could incorporate these strategies to further refine the model and ensure its resilience against data noise.

  • Limitations:

Note that the assumption that input and output variables are independent and identically distributed (i.i.d.) simplifies model formulation and optimization but may not accurately reflect the dynamic and interdependent nature of variables in MWTP systems. In real-world applications, deviations from this assumption can introduce biases in the model, potentially reducing its accuracy and reliability. For instance, temporal correlations or system-level interactions that are not captured by the model could lead to suboptimal predictions or less robust optimization results. To mitigate this limitation, future research could explore advanced modeling techniques that explicitly account for the dependency and variability in MWTP systems. For example:

  1. i)

    Incorporating Temporal and Spatial Dependencies: Employing models like recurrent neural networks (RNNs) or long short-term memory (LSTM) networks could capture temporal dependencies in the data, reflecting the dynamic behavior of MWTP processes more accurately.

  2. ii)

    Using Probabilistic and Uncertainty-Aware Models: Bayesian frameworks or Monte Carlo simulations could help quantify uncertainties in the input and output variables, leading to more robust optimization and decision-making.

  3. iii)

    Feature Engineering and Data Augmentation: Developing new features or augmenting the dataset to include lagged variables or system-specific interdependencies could improve the model’s ability to capture non-i.i.d. behaviors.

Addressing this assumption would enhance the generalizability and applicability of the model, making it more robust and reliable for real-world MWTP operations. We acknowledge this as an area for improvement and a key direction for future research.

Incorporating uncertainty and variability into the DBN-BOOA model would indeed enhance its robustness and applicability under real-world operational conditions. To address this, several methodologies could be integrated into the model:

  1. 1)

    Probabilistic Modeling: Extending the DBN-BOOA framework to include Bayesian inference could allow the model to quantify uncertainty in its predictions. Bayesian deep learning, for instance, could be employed to capture both epistemic uncertainty (due to model limitations) and aleatoric uncertainty (due to data variability).

  2. 2)

    Monte Carlo Simulations: Performing Monte Carlo simulations within the DBN-BOOA optimization process could help evaluate the effects of input variability on output predictions. This would provide a range of possible outcomes, offering a more comprehensive understanding of system performance under uncertain conditions.

  3. 3)

    Scenario Analysis: The DBN-BOOA model could be adapted to evaluate multiple operational scenarios, simulating different combinations of input variables to assess their impact on biogas production. This approach would help operators identify robust operating conditions that perform well across various scenarios.

  4. 4)

    Stochastic Optimization: Incorporating stochastic elements into the BOOA algorithm could enable it to handle variability in the input data more effectively. This would involve optimizing the system over a range of probabilistic distributions for the input variables rather than fixed values.

  5. 5)

    Ensemble Modeling: Combining the DBN-BOOA model with other machine learning models or ensemble techniques could reduce the impact of uncertainty by aggregating predictions and providing confidence intervals for the results.

These approaches could significantly enhance the model’s robustness by explicitly considering uncertainty and variability, ensuring that its predictions and optimizations remain reliable under varying operational conditions. Future work will focus on integrating these methodologies into the DBN-BOOA framework to further improve its utility for MWTP operators.

The most valuable contribution of this work lies in the synergistic integration of the model, procedure, and insights, each playing a pivotal role in advancing the research topic. However, among these, the procedure and its practical applicability stand out as the most impactful. The development of the DBN-BOOA framework provides a structured and efficient methodology for optimizing biogas production in MWTPs, combining advanced machine learning techniques with meta-heuristic optimization. This procedural innovation simplifies complex optimization tasks, eliminates the need for input variable pre-processing, and offers a user-friendly approach for MWTP operators. Additionally, the insights gained through the study—such as the identification of optimal operational parameters and the understanding of how these parameters influence biogas production—represent a significant contribution to the domain of waste-to-energy conversion. These insights not only validate the effectiveness of the DBN-BOOA model but also offer actionable knowledge that can be generalized to other facilities. While the model itself (DBN-BOOA) demonstrates superior accuracy and robustness, its value is amplified by the procedural framework and the actionable insights it generates. Together, these contributions establish a comprehensive approach to addressing the challenges of biogas optimization, making this work a valuable reference for both academic research and practical applications in MWTP operations.

Conclusion and future work

This study aimed to optimize biogas production from the anaerobic digestion of municipal wastewater at a MWTP, using data-driven modelling and optimization methods. Biogas production is a sustainable and cost-effective way of converting waste into energy, and optimizing its production can enhance the efficiency and performance of MWTPs. Of the tested models, the Deep Belief Network model development using processed input variables via correlations and PCA resulted in poor model results. Three different models were developed and compared in this study: DBN, DBN-OOA, and DBN-BOOA. Thus, the DBN model were chosen to be coupled with OOA and BOOA for optimization purposes given lack of pre-processing which makes them simpler for use by potential MWTP end-users. The DBN-BOOA model outperformed the other models in terms of accuracy and optimization. The DBN-BOOA model achieved a correlation coefficient (R) of 0.98 and a root mean square error (RMSE) of 0.41 m3/min, and index of agreement (IA) of 0.99 for biogas production, while the DBN and DBN-OOA models had R of 0.89 and 0.95, RMSEs of 0.46 and 0.44 and IA of 0.92 and 0.96, respectively. The DBN-BOOA model also found the optimal operating parameter values that maximized biogas production to 31.35 m3/min, which was higher than the values obtained by the DBN and DBN-OOA models (24.38 m3/min and 26.5 m3/min, respectively). The optimal values indicated that the DBN-BOOA method provided the most favorable conditions for biogas production, such as a high and balanced substrate concentration and availability, a neutral pH, and a high recycling rate of WFS and TWAS. The DBN-BOOA model performed better than the other models because it used a more powerful and flexible optimization algorithm that could explore a larger and more diverse search space, and find the global optimum more efficiently and reliably. The DBN-BOOA model also did not require any pre-processing of the input variables, which made it simpler and more convenient for use by potential MWTP end-users. The DBN-BOOA method can be considered as the best method for optimizing biogas production from the anaerobic digestion of municipal wastewater, based on the criteria and data used in this study. It can help MWTP operators to adjust the operating parameters in real time and achieve higher biogas yields and lower sludge production. However, the DBN-BOOA method has some limitations and challenges that need to be addressed in future work. For instance, the method requires a large and reliable data set to train and validate the DBN model, which may not be available or accessible for some MWTPs. This means that access to larger and more reliable data can increase the accuracy of the prediction process. Data access limitations can only reduce the accuracy of the prediction process and do not generally lead to more complex limitations. The method also assumes that the input and output variables are independent and identically distributed, which may not hold true in complex and dynamic systems such as MWTPs. Moreover, the method does not account for the uncertainty and variability of the input and output variables, which may affect the robustness and reliability of the optimization results. Therefore, future work should focus on overcoming these limitations and challenges, and extending the applicability and generality of the DBN-BOOA method to other MWTPs and anaerobic digestion processes. Future work should also evaluate the impact and value of the DBN-BOOA method for the MWTP operators and the society at large, in terms of energy savings, environmental benefits, and economic returns. In this paper, the proposed models were chosen because of their potential performance. However, it is recommended that more models be compared in future work.