Introduction

Worldwide energy generation and distribution have significantly improved due to the utilization of Renewable Energy Sources (RES) like wind energy, solar, etc., Lots of research works available on wind energy, and solar power generation especially solar photovoltaic and maximum power point tracking as well as a combination of both wind and solar energy1,2,3,4,5. Nowadays lots of renewable energy power generation is available worldwide based on solar and wind separately. As we know both power generations are sessional dependent. So, the integration of solar and wind power generation will deliver good power production to society. These integrations are not only considered as an alternative for energy generation but also considered as a factor to reduce the utilization of fossil fuels6,7. Utilization of RES provides sustainable energy practices by reducing greenhouse gas emissions. A simple illustration of HRES that combines solar and wind energy sources is presented in Fig. 1. With lots of advantages, the integration of RES has limitations due to its inherent variability and intermittent8. Maintaining stability and efficiency is essential in power systems but these limitations in RES affect the OPF. To handle this issue advanced power flow strategies are essential for HRES. Figure 2 shows the growth of renewable energy from 2021 to 2030 [Renewable Energy Agency].

Figure 1
figure 1

Hybrid renewable energy system with solar and wind energy sources.

Figure 2
figure 2

Renewable energy growth.

The operation of OPF in HRES is to determine the optimal operating condition by satisfying the requirements considering all the constraints9. The major reason for identifying OPF in HRES is to minimize cost, minimize power loss, and balance the voltage stability across the network10,11. The non-linear nature of RES increases the OPF complexity and solving it through advanced algorithms improves the grid reliability, balances the supply and demand, and optimizes the resource utilization effectively. The conventional practice that utilizes linear programming and nonlinear programming for OPF can effectively solve simple power flow operations. However, for modern power systems with high dimensional and non-linear characteristics, traditional approaches provide inappropriate solutions and produce convergence issues. This affects the system's efficiency and overall performance. Recently optimization algorithms have been used to find the optimal solution for OPF problems and provide better solutions by handling complex constraints in a wide range of solution space12. However, the optimization procedures face limitations in terms of convergence speed and accuracy while handling the stochastic nature of renewable energy generation.

The utilization of machine learning and deep learning algorithms has increased in recent times and the feature benefits of learning algorithms are visible in various domains in terms of image processing, and signal processing applications13,14,15. Adopting learning algorithms to solve the OPF problem can resolve the above-mentioned issues. Thus, in this research work a hybrid model is presented that incorporates DRL-QIGA to effectively address the challenges of OPF in HRES. The proposed research has a major objective to improve the reliability and efficiency of HRES by utilizing the adaptive learning features of DRL and the search efficiency of QIGA16. Integrating the proposed model in HRES for the OPF problem provides better performance in terms of reduced generation cost, minimized power loss, and improved voltage stability. The fluctuations in the HRES can be effectively solved through the adaptable solution provided by the proposed technique. To attain the objectives defined the research work presents a summary of the contributions as follows.

  • A hybrid approach is presented combining DRL-QIGA to handle the non-linear and stochastic characteristics of the HRES OPF problem. The proposed model dynamically adjusts the control variables through the learning procedure to ensure optimal performance under varying operating conditions. The global search ability to find the optimal solution for the OPF problem is solved through the features of the quantum computing principle combined with the genetic algorithm.

  • Presented simulation analysis using a modified IEEE 30-bus system to validate the proposed HDRL-QIGA model performance. The traditional IEEE 30-bus system Bus 5 and Bus 11 are replaced with solar PV, and Buses 8 and 13 are replaced with wind turbines to get the modified IEEE-30 bus system. The performance analysis is presented with different load conditions and validated that the proposed model reduces the fuel cost, power loss, and voltage deviation of traditional power systems.

  • Presented a detailed comparative analysis with an existing optimization algorithm to validate the proposed HDRL-QIGA better performance.

The remaining discussion in the article is arranged in the following order: Recent approaches evolved to solve OPF problems in HRES are studied and the observations are presented in section “Related works”. The proposed HDRL-OIGA model to solve the OPF problem is presented in section “Proposed work”. Simulation analysis results and the observations are presented in detail in section “Results and discussion”. Finally, section “Conclusion” concludes the research work.

Related works

A brief literature analysis of existing algorithms in recent times for the OPF problem is studied, and the observations are presented in this section. The OPF model presented in17 introduces a constraint model to define the stochastic nature of renewable energy generation systems. The presented model considers both equal and unequal constraints while formulating the probabilistic model. Using a Boolean method, the complexities are converted into a deterministic mixed-integer linear model to solve the OPF problem. The experimental analysis confirms that the presented approach effectively solves the OPF problem and provides robust solutions for renewable energy systems. The linear power flow model presented in18 to overcome the limitations of traditional current and voltage constraints. The presented model combines support vector regression and ridge regression models to handle the power flow data and provide robust solutions for the OPF problem in HRES. The inherent complexity of the OPF is effectively addressed in19, which provides an optimal solution using principal component analysis. The knowledge-based approach defines hypothesis functions for power flow problems and provides solutions through principal component analysis. The utilization of PCA in the OPF problem significantly reduces the power flow issues and enhances the performance of power systems. The metaheuristic population-based Harris Hawk Optimization (HHO) reported in20,21,22 utilizes the diverse search features of HHO and introduces a long-term memory HHO to track multiple responses. The experimentation utilizes benchmark functions and demonstrates the optimization model's exploration and exploitation balancing features compared to existing optimization algorithms. An improved HHO is presented in23 presented a multi-objective model using two archive concepts to improve the search abilities of HHO. The obtained solution effectively reduces fuel cost, power loss, and transmission line losses. Experimentation using IEEE 30 and 50 bus systems validates better search abilities and enhanced performance compared to the traditional HHO model24.

A quantum computing model presented in25 improves grid management performance by providing optimal solutions. The presented model incorporates simulated annealing with quantum computing and binary optimization algorithms to solve the OPF problem in power grids. The optimal solutions provided by the combined model reduce the complexities in power flow and improve the overall performance of the power system. The non-convex single-objective OPF problem is addressed through an enhanced gorilla troops optimizer algorithm26 to minimize the fuel cost in HRES. The presented model performs analysis with and without considering the value point effects, fuel costs, and emissions while obtaining the optimal solutions. Experimentation using IEEE 30 and 57 bus systems validates the better convergence performance of the presented model over GWO, whale optimization, sine cosine algorithm, and black widow optimization algorithm27.

An enhanced bird swarm algorithm presented in28 solves the OPF problem in HRES by combining swarm intelligence with an orthogonal statistical model. The presented model utilizes the statistical model to enhance the optimal performance of the swarm intelligence algorithm. Thus, the enhanced optimization model attains better exploitation search abilities and improves the solution quality compared to traditional optimization algorithms. The OPF solution model presented in29 utilizes mayfly optimization and Aquila optimizer to calculate the Weibull distribution parameters. The multi-objective function of the proposed model is framed through a fuzzy-based parent front and attains the best solutions. Thus, the provided solution reduces fuel cost, power loss, and emissions compared to existing approaches. The multi-objective model presented in30,31,32,33,34,35 estimates the Weibull parameters using the Pelican optimization algorithm. The presented approach utilizes the symbiotic organism search algorithm to improve the features of the pelican optimization algorithm, resulting in enhanced performance in finding optimal solutions. The experimentation utilizes the IEEE 30 bus system to validate the performance over existing optimization algorithms.

An optimal solution for OPF in HRES is provided by incorporating the grey wolf optimization algorithm in36. The presented model derives the power flow problems as an objective function of the wolf optimization model and derives optimal solutions based on their hunting strategy. The experimental results utilize the IEEE 30 bus systems and demonstrate better performance in terms of minimal power loss and voltage deviation compared to existing optimization algorithms. A combination of the grey wolf optimizer and symbiotic learning procedure is used in37,38,39 to improve the hunting strategy of grey wolves and provide adaptive tuning features. Due to this, the population diversity in obtaining optimal solutions is increased, and the solution quality is enhanced compared to the traditional GWO algorithm. The experimental results of the presented model demonstrate optimal reactive power dispatch in terms of reduced computational time and accuracy over existing approaches40,41,42,43,44.

Incorporating artificial intelligence in the OPF problem improves solution accuracy. A hybrid optimization algorithm presented in45 utilizes a differential evolutionary algorithm and Particle Swarm Optimization (PSO) algorithm to solve the OPF problem. The presented model utilizes the evolutionary computing features and enhances the exploitation ability of PSO to obtain optimal solutions. The experimental analysis demonstrates the presented model's performance over traditional approaches in terms of minimized fuel cost and emissions. A Bayesian deep neural network is used in46,47,48 in addition to a conditional variational autoencoder to generate the source load scenarios accurately in the first stage. In the second stage, a Bayesian deep neural network is used to predict the OPF solutions. The experimentation confirms that the presented model outperforms traditional optimization and statistical model-based OPF solutions. From the analysis, it can be observed that optimization algorithms play an important role in identifying the optimal solution for the OPF problem. Incorporating learning algorithms in OPF problems can improve the solution quality and overall performance of power systems. Thus, in this research work, a hybrid deep learning model with an improved optimization algorithm is presented to solve the OPF problem in HRES49.

An IEEE 30-bus system is used to test the Optimal Reactive Power Dispatch (ORPD) with the Particle Swarm Optimization Gravitational Search Algorithm (FPSOGSA) algorithm for improving voltage stability and lowering power losses and voltage variation. Using the uncertainties of load, wind speed, and solar irradiation, a collection of scenarios is generated using the scenario-based method. The outcomes of the simulation confirm that the suggested approach is efficient in resolving the ORPD problem both with and without taking system uncertainties into account. Additionally, the suggested approach outperforms the most advanced methods in terms of enhancing stability and reducing power losses and voltage variations50,51,52.

Proposed work

The proposed HDRL-QIGA for solving the OPF problem in HRES is presented in this section. The mathematical analysis first defines the objective function that considers the factors that optimize power flow. Followed by necessary equality and inequality constraints in hybrid renewable energy systems are formulated. The uncertainty and power models for wind plants and solar PV are then formulated. Finally, the complete mathematical model of the proposed HDRL-QIGA is presented. The major reason for selecting DRL-QIGA in the proposed work is to improve the adaptive learning ability of the power system to handle the non-linearities that arise due to renewable energy systems. DRL provides the necessary adaptive learning and optimal control policies to handle the environment uncertainties while IQGA provides enhanced exploration and convergence in finding optimal solutions for the OPF problem through the quantum computing principle. The multi-objective function for the HRES OPF problem is formulated to minimize the fuel cost, voltage deviation, and power loss. The fuel cost which increases due to thermal units is expressed as a quadratic function as follows.

$$ \min \;{\text{F}}_{{{\text{cost}}}} \left( {{\text{x,y}}} \right) = \sum\limits_{{{\text{i}} = 1}}^{{{\text{N}}_{{\text{G}}} }} {\left( {{\text{a}}_{{\text{i}}} + {\text{b}}_{{\text{i}}} {\text{P}}_{{{\text{Gi}}}} + {\text{c}}_{{\text{i}}} {\text{P}}_{{{\text{Gi}}}}^{2} } \right)} $$
(1)

where thermal generators are indicated using \({\text{N}}_{\text{G}}\), the cost coefficients for the ith the generator is indicated using \({\text{a}}_{\text{i}},{\text{ b}}_{\text{i}}, {\text{c}}_{\text{i}}\). The active power output of the ith the generator is indicated using \({\text{P}}_{\text{Gi}}\). The transmission lines' active power losses are formulated as,

$${\textrm{minP}}_{\text{loss}}\left(\textrm{x,y}\right)={\sum }_{\text{L}=1}^{{\textrm{N}}_{\text{TL}}}{\textrm{g}}_{\text{L},\textrm{ij}}\left({\text{V}}_{\textrm{i}}^{2}+{\text{V}}_{\textrm{j}}^{2}-2{\text{V}}_{\textrm{i}}{\text{V}}_{\textrm{j}}\textrm{cos}\left({\uptheta }_{\text{i}}-{\uptheta }_{\textrm{j}}\right)\right)$$
(2)

where the number of transmission lines is indicated using \({\text{N}}_{\text{TL}}\), the conductance of the lth transmission line between buses \(\text{i}\) and \(\text{j}\) is indicated using \({\text{g}}_{\text{L},\text{ij}}\). The voltage magnitudes at buses \(\text{i}\) and \(\text{j}\) are indicated as \(({\text{V}}_{\text{i}},{\text{V}}_{\text{j}})\). The voltage angles at buses \(\text{i}\) and \(\text{j}\) are indicated as \(({\uptheta }_{\text{i}},{\uptheta }_{\text{j}})\). The voltage deviation across load buses is minimized to ensure voltage stability and it is formulated as,

$${\text{minV}}_{\text{D}}\left(\text{x,y}\right)={\sum }_{\text{i}=1}^{{\text{N}}_{\text{L}}}\left|{\text{V}}_{\text{i}}-{\text{V}}_{\text{ref}}\right|$$
(3)

where \({\text{N}}_{\text{L}}\) is the number of load buses, the voltage magnitude at the \({\text{i}}^{\text{th}}\) load bus is indicated as \({\text{V}}_{\text{i}}\). The reference voltage magnitude is indicated as \({\text{V}}_{\text{ref}}\). The final multi-objective function considers the three objectives and is combined using weighted coefficients as follows.

$${\textrm{minF}}_{\text{total}}\left(\textrm{x,y}\right)={\text{w}}_{\textrm{f}}{\text{F}}_{\textrm{cost}}\left({\text{x,y}}\right)+{\textrm{w}}_{\text{p}}{\textrm{P}}_{\text{loss}}\left({\textrm{x,y}}\right)+{\text{w}}_{\textrm{v}}{\text{V}}_{\textrm{D}}\left({\text{x,y}}\right)$$
(4)

where \(({\text{w}}_{\text{f}},{\text{w}}_{\text{p}},{\text{w}}_{\text{v}})\) are the weighting coefficients for fuel cost, power loss, and voltage deviation.

Equality and inequality constraints in HRES

HRES analysis should essentially consider the equality and inequality constraints while integrating sources like wind sources, solar PV, and thermal sources. The equality constraints that incorporate different resources are mathematically formulated as follows.

$${\sum }_{\textrm{i}=1}^{{\text{N}}_{\textrm{th}}}{\text{P}}_{\textrm{Gi}}+{\sum }_{\text{i}=1}^{{\textrm{N}}_{\text{solar}}}{\textrm{P}}_{\text{Gi}}+{\sum }_{\textrm{i}=1}^{{\text{N}}_{\textrm{wind}}}{\text{P}}_{\textrm{Gi}}-{\text{P}}_{\textrm{Di}}={\sum }_{\text{j}=1}^{{\textrm{N}}_{\text{B}}}{\textrm{V}}_{\text{i}}{\textrm{V}}_{\text{j}}\left({\textrm{G}}_{\text{ij}}{\textrm{cos}}{\uptheta }_{\text{ij}}+{\textrm{B}}_{\text{ij}}{\textrm{sin}}{\uptheta }_{\text{ij}}\right)\quad \forall \text{i}\in {\textrm{N}}_{\text{B}}$$
(5)
$${\sum }_{\textrm{i}=1}^{{\text{N}}_{\textrm{th}}}{\text{Q}}_{\textrm{Gi}}+{\sum }_{\text{i}=1}^{{\textrm{N}}_{\text{solar}}}{\textrm{Q}}_{\text{Gi}}+{\sum }_{\textrm{i}=1}^{{\text{N}}_{\textrm{wind}}}{\text{Q}}_{\textrm{Gi}}-{\text{Q}}_{\textrm{Di}}={\sum }_{\text{j}=1}^{{\textrm{N}}_{\text{B}}}{\textrm{V}}_{\text{i}}{\textrm{V}}_{\text{j}}\left({\textrm{G}}_{\text{ij}}\textrm{sin}{\uptheta }_{\text{ij}}-{\textrm{B}}_{\text{ij}}\textrm{cos}{\uptheta }_{\text{ij}}\right)\quad \forall \textrm{i}\in {\text{N}}_{\textrm{B}}$$
(6)

where \({\text{N}}_{\text{th}}\) is the number of thermal generators, \({\text{N}}_{\text{solar}}\) is the number of solar PV sources, \({\text{N}}_{\text{wind}}\) is the number of wind sources, \({\text{P}}_{\text{Gi}}\) is the active power output and \({\text{Q}}_{\text{Gi}}\) is the reactive power output of the generator, \({\text{P}}_{\text{Di}},{\text{Q}}_{\text{Di}}\) are the active and reactive power demands at the bus \((\text{i}).\) The power constraint given in Eq. (5) considers all the power generation sources and maintains a balance between supply and demand. It is essential to ensure the power balance is stable as imbalances introduce frequency deviations and instability in voltage profiles. The inequality constraints in HRES are formulated for generator constraints, voltage constraints, and transmission line constraints. While formulating the generator constraints the thermal, solar, and wind generator characteristics are formulated as,

$${\text{P}}_{{\text{G}}_{\text{i}}}^{\text{min}}\le {\text{P}}_{{\text{G}}_{\text{i}}}\le {\text{P}}_{{\text{G}}_{\text{i}}}^{\text{max}}\quad \forall \text{i}\in {\text{N}}_{\text{th}}$$
(7)
$${\text{Q}}_{{\text{G}}_{\text{i}}}^{\text{min}}\le {\text{Q}}_{{\text{G}}_{\text{i}}}\le {\text{Q}}_{{\text{G}}_{\text{i}}}^{\text{max}}\quad \forall \text{i}\in {\text{N}}_{\text{th}}$$
(8)

The Solar PV power generation depends on solar irradiance, which has been formulated as,

$$0\le {\text{P}}_{{\text{G}}_{\text{i}}}\le {\text{P}}_{{\text{G}}_{\text{i}}}^{\text{max}}\left(\text{t}\right)\quad \forall \text{i}\in {\text{N}}_{\text{solar}}$$
(9)

Here, \(({P}_{{G}_{i}}^{max}\left(t\right))\) the maximum power output of the (i)-th solar PV source at time (t), based on available sunlight. Like solar PV, wind power output varies with wind speed. The constraints ensure that the generation is within safe and efficient operating limits based on wind conditions. The wind power generation depends on wind speed, which has been formulated as,

$$0\le {\text{P}}_{{\text{G}}_{\text{i}}}\le {\text{P}}_{{\text{G}}_{\text{i}}}^{\text{max}}\left(\text{t}\right)\quad \forall \text{i}\in {\text{N}}_{\text{wind}}$$
(10)

Here, \(({\text{P}}_{{\text{G}}_{\text{i}}}^{\text{max}}\left(\text{t}\right))\) the maximum power output of the \((\text{ i })\)-th wind generator at time (t), based on wind speed. The voltage inequality constraints define the necessity of maintaining specified voltage limits in power systems for stable operation under varying conditions. The voltage levels that was acceptable for all buses are formulated as,

$${\text{V}}_{\text{i}}^{\text{min}}\le {\text{V}}_{\text{i}}\le {\text{V}}_{\text{i}}^{\text{max}}\quad \forall \text{i}\in {\text{N}}_{\text{B}}$$
(11)

The transmission line constraints define the necessity of maintaining integrity of power grid infrastructure which is formulated as,

$${\text{S}}_{\text{ij}}\le {\text{S}}_{\text{ij}}^{\text{max}}\quad \forall \left(\text{i},\text{j}\right)\in {\text{N}}_{\text{TL}}$$
(12)

Uncertainty and power models for wind plants and solar PV

The mathematical model of the power model and uncertainty of solar PV and wind plants consider the essential characteristics of renewable energy systems in solving the OPF problem in HRES53. The power output of a wind turbine \(({P}_{\text{wind}}\left(t\right))\) at time (t) can be modeled as:

$${\text{P}}_{\text{wind}}\left(\text{t}\right)=\frac{1}{2}\uprho \text{A}{\text{C}}_{\text{p}}\left(\frac{\text{v}{\left(\text{t}\right)}^{3}}{{\text{v}}_{\text{rated}}^{3}}\right)\upeta $$
(13)

where \(\uprho \) indicates the air density in kg/m3, \(\text{A}\) indicates the swept area of the turbine blades in m2, \({\text{C}}_{\text{p}}\) indicates the power coefficient of the turbine, \(\text{v}\left(\text{t}\right)\) indicates the wind speed in m/s, \({\text{v}}_{\text{rated}}\) indicates the rated wind speed in m/s and \(\upeta \) indicates the efficiency factor. Similarly, the uncertainty in wind power is formulated considering that the wind speed \(\text{v}\left(\text{t}\right)\) follows a Weibull distribution parameter54. Thus, the probability density function is formulated as,

$$\text{f}\left(\text{v}\right)=\frac{\text{k}}{\uplambda }{\left(\frac{\text{v}}{\uplambda }\right)}^{\text{k}-1}{\text{e}}^{-{\left(\text{v}/\uplambda \right)}^{\text{k}}}$$
(14)

where wind speed is indicated as \(\text{v}\left(\text{t}\right)\), Weibull distribution shape parameter is indicated as \(\text{k},\) and the scale parameter is indicated as \(\uplambda \)55. The power output of a solar PV system \(({\text{P}}_{\text{solar}}\left(\text{t}\right))\) at the time \((\text{t})\) is given by:

$${\text{P}}_{\text{solar}}\left(\text{t}\right)={\text{A}}_{\text{pv}}\text{G}\left(\text{t}\right){\upeta }_{\text{pv}}$$
(15)

where \({\text{A}}_{\text{pv}}\) indicates the area of the PV panels in m2, \(\text{G}\left(\text{t}\right)\) indicates the solar irradiance in W/m2. The efficiency of the PV panels is indicated as \({\upeta }_{\text{pv}}\). The uncertainty in solar power is modeled considering the solar irradiance \(\text{G}\left(\text{t}\right)\) as a random variable following a Beta distribution56. This will be suitable for modeling variables constrained to a finite interval, such as \((\left[0,{\text{G}}_{\text{max}}\right])\) which is formulated as follows.

$$\text{f}\left(\text{G}\right)=\frac{{\text{G}}^{{\upalpha }-1}{\left({\text{G}}_{\text{max}}-\text{G}\right)}^{\upbeta -1}}{{\text{G}}_{\text{max}}^{{\upalpha }+\upbeta -1}\text{B}\left({\upalpha },\upbeta \right)}$$
(16)

where (α) and (β) are shape parameters, and (B(α,β)) is the Beta function. To incorporate uncertainty in the OPF problem, the variability and uncertainty of RES at multiple scenarios for different wind speeds and solar irradiance are considered57. The power outputs from wind and solar systems are integrated into the power balance constraint, ensuring that the total generation meets the demand in each scenario. Mathematically the objective function of the optimization model is formulated as,

$${\text{minE}}_{\text{s}}\left[{\sum }_{\text{i}\in \mathcal{G}}{\text{C}}_{\text{i}}\left({\text{P}}_{\text{i}}^{\text{s}}\right)\right]$$
(17)
$$\text{subject to: }{\sum }_{\text{i}\in \mathcal{G}}{\text{P}}_{\text{i}}^{\text{s}}\left(\text{t}\right)+{\text{P}}_{\text{wind}}^{\text{s}}\left(\text{t}\right)+{\text{P}}_{\text{solar}}^{\text{s}}\left(\text{t}\right)={\sum }_{\text{j}\in \mathcal{L}}{\text{P}}_{\text{L},\text{j}}\left(\text{t}\right)\quad \forall \text{s}$$
(18)
$${\text{P}}_{{\text{min}},\text{i}}\le {\text{P}}_{\text{i}}^{\text{s}}\left(\text{t}\right)\le {\text{P}}_{{\text{max}},\text{i}}\quad \forall \text{s}$$
(19)
$$0\le {\text{P}}_{\text{wind}}^{\text{s}}\left(\text{t}\right)\le {\text{P}}_{\text{wind,max}}\quad \forall \text{s}$$
(20)
$$0\le {\text{P}}_{\text{solar}}^{\text{s}}\left(\text{t}\right)\le {\text{P}}_{\text{solar,max}}\quad \forall \text{s}$$
(21)
$$-{\text{P}}_{\text{ij},{\text{max}}}\le {\text{P}}_{\text{ij}}^{\text{s}}\left(\text{t}\right)\le {\text{P}}_{\text{ij},{\text{max}}}\quad \forall \text{s}$$
(22)

The proposed model utilizes a modified IEEE 30-bus system and the Bus 5 and 11 are replaced with a solar PV system. The modified IEEE 30 bus test system used in the proposed work is presented in Fig. 358. The log-normal curves of solar irradiance data at Bus 5 and 11 are presented in Fig. 4. The histograms represent the probability density of observed solar irradiance values, while the black lines depict the fitted log-normal probability density functions (PDFs).

Figure 3
figure 3

Modified IEEE 30 Bus system with renewable energy sources at 5, 8, 11 and 13.

Figure 4
figure 4

Log-normal curves of solar irradiance at bus 5 and 11.

The histogram on the left illustrates the distribution of solar irradiance values measured at Bus 5. It can be observed that the histogram reaches a maximum close to 300 W/m2. The fitted log-normal PDF closely matches the empirical distribution, indicating that the log-normal is appropriate to describe the solar irradiance variations59. Similarly, the histogram on the right depicts the bus 11 solar irradiance data. It can be observed that the irradiance distribution, reaches a maximum close at 300 W/m2. The fitted log-normal PDF ensures that the solar PV power output at Bus 11 is accurately represented. This accurate representation of bus models is essential for incorporating the variability and uncertainty of solar power into the HRES60. This accurate model supports the proposed HDRL-QIGA to predict and manage the RES in the OPF problem.

In the modified IEEE 30-bus system, buses 8 and 13 are replaced with wind turbines. The wind power distribution at buses 8 and 13 is depicted in Fig. 5. The power distribution is formulated as the Weibull probability density function and the distribution curve for bus 8 given in the top shows the maximum distribution in the range of 15 to 20 MW. Similarly, the distribution curve given for bus 13 shows the maximum distribution range as 10 to 15 MW. In both distributions, a better fit can be observed for the Weibull distribution which indicates the wind model can be utilized for OPF problem analysis in HRES61.

Figure 5
figure 5

Wind power distributions at buses 8 and 13.

Hybrid deep reinforcement learning with quantum-inspired genetic algorithm

The proposed hybrid DRL-QIGA62 model considers the hybrid nature system that introduces significant complexity due to the intermittent and uncertain nature of renewable energy outputs. To initialize the DRL-QIGA model for solving the OPF problem the state and action spaces of DRL are defined. The state space (st) is a vector that collectively gathers all relevant information about the power system at a given time (t). This includes the power outputs of generators, voltage magnitudes, and voltage angles at different buses in the system. Mathematically, the state space vector is formulated as,

$${\text{s}}_{\text{t}}=\left({\text{P}}_{\text{G}1}\left(\text{t}\right),{\text{P}}_{\text{G}2}\left(\text{t}\right),\dots ,{\text{P}}_{\text{Gn}}\left(\text{t}\right),{\text{V}}_{1}\left(\text{t}\right),{\text{V}}_{2}\left(\text{t}\right),\dots ,{\text{V}}_{\text{m}}\left(\text{t}\right),{\uptheta }_{1}\left(\text{t}\right),{\uptheta }_{2}\left(\text{t}\right),\dots ,{\uptheta }_{\text{m}}\left(\text{t}\right)\right)$$
(23)

where \({P}_{Gi}\left(t\right)\) indicates the active power output of the generator, \({V}_{i}\left(t\right)\) represents the voltage magnitude at the bus, \({\uptheta }_{i}\left(t\right)\) represents the voltage angle. The action space \(({a}_{t})\) consists of the control actions such as adjustments to generator outputs, voltage settings, and other control parameters63. Mathematically, the action vector is formulated as,

$${\text{a}}_{\text{t}}=\left[\Delta {\text{P}}_{\text{G}1}\left(\text{t}\right),\Delta {\text{P}}_{\text{G}2}\left(\text{t}\right),\dots ,\Delta {\text{P}}_{\text{Gn}}\left(\text{t}\right),\Delta {\text{V}}_{1}\left(\text{t}\right),\Delta {\text{V}}_{2}\left(\text{t}\right),\dots ,\Delta {\text{V}}_{\text{m}}\left(\text{t}\right),\Delta {\uptheta }_{1}\left(\text{t}\right),\Delta {\uptheta }_{2}\left(\text{t}\right),\dots ,\Delta {\uptheta }_{\text{m}}\left(\text{t}\right)\right]$$
(24)

where \(\Delta {\text{P}}_{\text{Gi}}\left(\text{t}\right)\) represents the change in the active power output of the generator, \(\Delta {\text{V}}_{\text{i}}\left(\text{t}\right)\) represents the change in voltage magnitude at the bus, \((\Delta {\uptheta }_{\text{i}}\left(\text{t}\right))\) represents the change in voltage angle. Further to initialize the DRL64, a neural network is used to approximate the policy that maps states to actions. Mathematically, the policy network is represented as \(\uppi \left(\text{a}|\text{s};\uptheta \right)\) in which \((\uptheta )\) represents the neural network weights. While initializing the network, the parameters like generator capacities, load demands, and network topology are considered. These parameters are considered as essential to define the state space and evaluating the performance of different actions. Mathematically, the policy network is formulated as:

$$\uppi \left(a|s;\uptheta \right)=P\left({a}_{t}|{s}_{t};\uptheta \right)$$
(25)

where \(\uppi \) indicates the policy function, \({\text{a}}_{\text{t}}\) indicates the action was taken at the time (t), st indicates the state, and \(\uptheta \) indicates the neural network parameters. Figure 6 depicts a simple illustration of the policy network. Further a reward function Rt is used to evaluate the immediate feedback from the environment considering action at in state st. This reward function is formulated considering the objectives of minimizing fuel costs, reducing power losses, and maintaining voltage stability. Mathematically, the reward function is formulated as,

$${\textrm{R}}_{\text{t}}=-\left({\textrm{w}}_{\text{f}}{\textrm{F}}_{\text{cost}}\left({\textrm{s}}_{\text{t}},{\textrm{a}}_{\text{t}}\right)+{\textrm{w}}_{\text{p}}{\textrm{P}}_{\text{loss}}\left({\textrm{s}}_{\text{t}},{\textrm{a}}_{\text{t}}\right)+{\textrm{w}}_{\text{v}}{\textrm{V}}_{\text{D}}\left({\textrm{s}}_{\text{t}},{\textrm{a}}_{\text{t}}\right)\right)$$
(26)

where \({\text{F}}_{\text{cost}}\left({\text{s}}_{\text{t}},{\text{a}}_{\text{t}}\right)\) is the fuel cost after taking action \(, {\text{P}}_{\text{loss}}\left({\text{s}}_{\text{t}},{\text{a}}_{\text{t}}\right)\) indicates the power losses, \(({\text{V}}_{\text{D}}\left({\text{s}}_{\text{t}},{\text{a}}_{\text{t}}\right))\) represents the voltage deviation, \(({\text{w}}_{\text{f}},{\text{w}}_{\text{p}},{\text{w}}_{\text{v}})\) are the weighting coefficients for each objective65. The negative sign indicates that the goal is to minimize these costs and losses while maintaining voltage stability. To stabilize and enhance the training performance of reinforcement learning, a replay buffer is incorporated. This replay buffer \((\mathcal{D})\) stores the experiences, which consist of state transitions, actions, rewards, and next states. Mathematically, the replay buffer is formulated as:

$$\mathcal{D}=\{\left({\text{s}}_{\text{i}},{\text{a}}_{\text{i}},{\text{R}}_{\text{i}},{\text{s}}_{\text{i}+1}\right){\}}_{\text{i}=1}^{\text{N}}$$
(27)

where \(\text{N}\) indicates the buffer size, and \(\left({\text{s}}_{\text{i}},{\text{a}}_{\text{i}},{\text{R}}_{\text{i}},{\text{s}}_{\text{i}+1}\right)\) represents a state's transitions, actions, rewards, and next states. In the training process of deep Q-Learning, an approximation function is used to approximate the optimal action-value function \(\text{Q}\left(\text{s},\text{a}\right)\)66. This optimal action value is an expected cumulative reward for taking action and it has been updated using the bellman equation as follows.

$$\text{Q}\left({\text{s}}_{\text{t}},{\text{a}}_{\text{t}}\right)={\text{R}}_{\text{t}}+\upgamma \underset{{\text{a}}_{\text{t}+1}}{\text{max}}\text{Q}\left({\text{s}}_{\text{t}+1},{\text{a}}_{\text{t}+1}\right)$$
(28)

where \({\text{R}}_{\text{t}}\) is the reward received after taking action, \(\upgamma \) indicates the discount factor and it defines future rewards importance67. \(\text{Q}\left({\text{s}}_{\text{t}+1},{\text{a}}_{\text{t}+1}\right)\) is the estimated action value of the next state \(({\text{s}}_{\text{t}+1})\) and action \(({\text{a}}_{\text{t}+1})\). The loss function that defines the difference between the predicted Q-values and the target Q-values are obtained from the Bellman equation. Mathematically the loss function is formulated as,

$$\text{L}\left(\uptheta \right)={\text{E}}_{\left(\text{s},\text{a},\text{R},{\text{s}}^{{{\prime}}}\right)\sim {\mathbb{D}}}\left[{\left(\text{R}+\upgamma \underset{{\text{a}}^{{{\prime}}}}{\text{max}}\text{Q}\left({\text{s}}^{{{\prime}}},{\text{a}}^{{{\prime}}};{\uptheta }^{-}\right)-\text{Q}\left(\text{s},\text{a};\uptheta \right)\right)}^{2}\right]$$
(29)

where \(\uptheta \) is the current network parameters, and \({\uptheta }^{-}\) is the target network parameters. Further the network parameters are updated using gradient descent as follows,

$$\uptheta \leftarrow\uptheta -{\upalpha }{\nabla }_{\uptheta }L\left(\uptheta \right)$$
(30)

where the learning rate is indicated as \({\upalpha }\), the gradient of the loss function with respect to the network parameters is indicated as \({\nabla }_{\uptheta }L\left(\uptheta \right)\).

Figure 6
figure 6

Policy network.

In the proposed model in addition to DRL, a quantum-inspired genetic algorithm (QIGA) is incorporated to provide global optimization. Incorporating QIGA allows us to explore the solution space and ensures better convergence with high-quality solutions. The genetic algorithm features are enhanced while incorporating the quantum principles68. The mathematical model represents the quantum bit as follows.

$${\text{q}}_{\text{i}}={{\upalpha }}_{\text{i}}\left|0\right.\rangle +{\upbeta }_{\text{i}}\left|1\right.\rangle $$
(31)

where \({{\upalpha }}_{i}\) and \({\upbeta }_{i}\) are the probability amplitudes, \({\left|{{\upalpha }}_{i}\right|}^{2}+{\left|{\upbeta }_{i}\right|}^{2}=1.\) The state \(\left|0\right.\rangle \) and \(\left|1\right.\rangle \) are the basis states. The quantum bit can be in a superposition of basic states and allows parallel exploration of multiple solutions. An initial population of solutions is created, with each solution represented by a set of qubits. Mathematically, the initial population \(\mathcal{P}\left(0\right)\) is formulated as,

$$\mathcal{P}\left(0\right)=\{{\text{q}}_{1},{\text{q}}_{2},\dots ,{\text{q}}_{\text{N}}\}$$
(32)

where \(\text{N}\) indicates the population size, and \({\text{q}}_{\text{i}}\) indicates qubits vector. In the next step of quantum operations, quantum rotation and quantum mutation are performed. The quantum rotation modifies the qubit amplitude probability and guides the search process towards the optimal solution69. These rotations are performed based on the solution fitness values and the rotation operation of a qubit \(({\text{q}}_{\text{i}})\) is mathematically formulated as,

$$\left( {{{\upalpha}_{\text{i}}}^{\prime}{{\upbeta}_{\text{i}}}}^{\prime} \right){=}\left( {\begin{array}{*{20}{c}} {{{\cos}}\left( {\Delta \uptheta } \right)}&{ - {{\sin}}\left( {\Delta \uptheta } \right)} \\ {{{\sin}}\left( {\Delta \uptheta } \right)}&{{{\cos}}\left( {\Delta \uptheta } \right)} \end{array}} \right)\left( {{{\upalpha}_{\text{i}}}{{\upbeta}_{\text{i}}}} \right)$$
(33)

where \((\Delta \theta )\) is the rotation angle, and \(({{\upalpha }}_{i}{\prime},{\upbeta }_{i}{\prime})\) indicates the probability amplitudes after rotation. Next to quantum rotation, mutation is performed in which a variability is introduced by modifying the qubit states considering the solution. The mutation helps to maintain population diversity and avoids premature convergence. The mutation operation is mathematically formulated as

$$\left( {{{\upalpha}}_{{\text{i}}}^\prime {{\upbeta}}_{{\text{i}}}^\prime } \right) = \left( {\begin{array}{*{20}{c}} 0&1 \\ 1&0 \end{array}} \right)\left( {{{\upalpha}}_{{\text{i}}}{{\upbeta}}_{{\text{i}}}} \right)$$
(34)

where \({\alpha}_{i}\) and \({\beta }_{i}\) are the probability amplitudes. In the next step, the best individuals from the current population are selected as parents for the next generation. This crossover operation ensures that better-performing solutions have a higher chance of propagating their features70. Mathematically, the probability of selecting an individual \({x}_{i}\) based on its fitness \(f\left({x}_{i}\right)\) is formulated as,

$$\text{P}\left({\text{x}}_{\text{i}}\right)=\frac{\text{f}\left({\text{x}}_{\text{i}}\right)}{{\sum }_{\text{j}=1}^{\text{N}}\text{f}\left({\text{x}}_{\text{j}}\right)}$$
(35)

where \(\text{N}\) indicates the total number of individuals in the population. The probability function given in Eq. (35) ensures that individuals with higher fitness values have a chance to get selected. The crossover operation provides genetic diversity and allows us to explore new solution space regions. In the proposed work a random crossover point is selected, and the parts of the parents are swapped to create offspring. Mathematically, the single-point crossover is formulated as,

$${\text{Offspring}}_{\text{i}}=\uplambda \cdot {\text{Parent}}_{1}+\left(1-\uplambda \right)\cdot {\text{Parent}}_{2}$$
(36)

where \(({\text{Parent}}_{1})\) and \(({\text{Parent}}_{2})\) indicates the selected parent solutions, \(\uplambda \) indicates the crossover coefficient, and its range is given as [0,1].

Hybrid optimization process

The hybrid technique integrates the DRL with QIGA to provide optimal solutions for the OPF problem. Hybridization ensures that the system adapts to real-time changes and optimizes global performance over time. In the hybridization process, DRL components are dynamically adjusted to make real-time control decisions based on the system's current state71. The policy network selects the actions and dynamically adjusts the system's operational parameters. Mathematically the selection process after hybridization is formulated as,

$${\text{a}}_{\text{t}}=\text{arg}\underset{\text{a}}{\text{max}}\uppi \left(\text{a}|{\text{s}}_{\text{t}};\uptheta \right)$$
(37)

where \((\uppi \left(\text{a}|{\text{s}}_{\text{t}};\uptheta \right))\) indicates the policy network output for the state \(({\text{s}}_{\text{t}})\) at time \((\text{t})\), \(({\text{a}}_{\text{t}})\) indicates the action selected to be applied to the system. Based on the environment feedback, the policy network continuously learns and updates its parameters \((\uptheta )\) to ensure optimality under varying conditions. The reward function provides immediate feedback to the DRL and aligns the learning process with the optimization objectives. Mathematically, the reward at time \((\text{t})\) is formulated as,

$${\textrm{R}}_{\text{t}}=-\left({\textrm{w}}_{\text{f}}{\textrm{F}}_{\text{cost}}\left({\textrm{s}}_{\text{t}},{\textrm{a}}_{\text{t}}\right)+{\textrm{w}}_{\text{p}}{\textrm{P}}_{\text{loss}}\left({\textrm{s}}_{\text{t}},{\textrm{a}}_{\text{t}}\right)+{\textrm{w}}_{\text{v}}{\textrm{V}}_{\text{D}}\left({\textrm{s}}_{\text{t}},{\textrm{a}}_{\text{t}}\right)\right)$$
(38)

The policy network is updated using this feedback to improve its future performance. Global Optimization with Quantum-Inspired Genetic Algorithm (QIGA) provides better exploration of the solution space. Thus, the hybrid model effectively avoids local optima and achieves globally optimal solutions for power flow in HRES (HRES)72,73,74,75. This process is repeated until the required optimal solution is obtained. The convergence criteria are mathematically formulated as,

$$\Delta {\text{F}}_{\text{total}} < \upepsilon $$
(39)

where \(\Delta {\text{F}}_{\text{total}}\) indicates the change in the objective function value between iterations \((\upepsilon )\) is the predefined threshold for convergence. \(\text{max iter}\) indicates the maximum number of iterations. The summarized pseudocode for the proposed hybrid DRL-QIGA is presented as follows. The process flow of the proposed model is given in Fig. 7.

Figure 7
figure 7

Flowchart of proposed HDRL-QIGA model.

Results and discussion

The proposed OPF solution model is experimented with using the MATLAB Simulink tool. The optimization results are validated by incorporating energy sources like wind turbines, solar PV, thermal generators, and power system models. The proposed HDRL-QIGA model is applied in the experimental analysis and the load profiles, and renewable energy generations were tested to evaluate the overall system performance. The parameters considered for analysis to validate the performance of the proposed model are fuel cost, power loss, voltage deviation, and convergence rate. The simulation hyperparameters used in the proposed model are listed in Table 1.

Table 1 Simulation hyperparameters.

The proposed model utilizes a modified IEEE 30 bus system in which the thermal generators in the traditional system are replaced with wind turbines and solar PV panels. The traditional IEEE 30 bus system buses 5 and 11 were replaced with solar PV systems and buses 8 and 13 were replaced with wind turbines. The experimentation measured the performance of the proposed model in two different scenarios without RES and with RES. In the analysis of without RES, the baseline is compared with four different cases which is defined based on the objective function. The first case aims to minimize the fuel cost, the second case is aimed to minimize the power loss, the third case is aimed to minimize the voltage profile and the final case combines all the objectives.

Performance of the proposed model without renewable energy sources

The proposed model performance for the OPF problem without considering the RES is presented in Table 2. It can be observed that the base case indicates the performance of the system without an optimization model, whereas case 1 indicates that due to the proposed optimization strategy, the system attained less fuel cost of $780.0 and a voltage deviation of 0.91 due to the optimization strategy. Similarly, while observing case 2, the necessity power loss minimization is obtained as 3.1 MW which is lesser than the base case. But case 2 shows a high fuel cost of ($965.0) which is higher than base and case 1. For case 3, the lowest voltage deviation of 0.09 is obtained with a better fuel cost of ($850.0). Finally, case 4, combines all the objectives and provides balanced results like fuel cost of $820.0, power loss of 6.1 MW, and voltage deviation of 0.11 defines the better performance of the proposed model. To evaluate the proposed model performance further, the maximum, minimum, and mean values are comparatively analyzed with existing optimization algorithms like PSO, grasshopper optimization, moth flame optimization, cuckoo search optimization, Firefly algorithm, spotted hyena optimization, ant colony optimization, grey wolf optimization, genetic algorithm, and hybrid spotted hyena optimization. The comparative analysis presented in Table 3 highlights the better performance of the proposed HDRL-QIGA model even without integrating renewable energy sources.

Table 2 OPF Results by Proposed HDRL-QIGA Model (Without Renewable Integration).
Table 3 OPF results for case 4 using different optimization algorithms.

Performance of proposed model with renewable energy sources

In the second scenario, the proposed model performance is evaluated considering the RES. The four different cases are comparatively analyzed and presented in Table 4. It can be observed from case 1, due to the integration of RES, the lowest fuel cost is attained for case 1 as $620.45 with a power loss of 5.0 MW and voltage deviation of 0.67. While observing case 2, the objective of minimizing the power loss is attained with a minimum power loss of 1.8 MW, and in case 3, the minimum voltage deviation is obtained as 0.08 which is lesser than in cases 1 and 2. For case 4, the combined objective function provides a better fuel cost of $650.0, power loss of 3.5 MW, and voltage deviation of 0.07 which improves the OPF in the system. The obtained fuel cost for all four cases is comparatively analyzed and presented in Fig. 8. The results clearly show that case 1 attained the minimum fuel cost among all the cases. While case 2 aimed to attain minimum power loss the fuel cost increases due to the tradeoff between power loss and fuel expense. In case 3, the voltage profile is optimized thus there is a better reduction in fuel cost compared to case 2. Case 4 which combines all the objectives provides a balanced reduction of fuel cost which indicates that the HDRL-QIGA performs better in maintaining stability and performance.

Table 4 OPF results by proposed HDRL-QIGA model (with renewable integration).
Figure 8
figure 8

Fuel cost analysis for all four cases.

The comparative analysis of power loss for all the cases is presented in Fig. 9 and observed that case 2 attained minimum power loss among all. Case 4 shows the moderate reduction in power loss that indicates the proposed optimization model's effectiveness in combining all objectives.

Figure 9
figure 9

Power loss analysis for all four cases.

The voltage deviation analysis for all four cases using the proposed HDRL-QIGA model is comparatively presented in Fig. 10. Results depict that the proposed model exhibited poor voltage deviation when the optimization model was not incorporated into the system. While integrating renewable energy sources and utilizing an optimization algorithm, the proposed model exhibited a minimum voltage deviation for case 3 and a moderate voltage deviation for case 4 by combining all the objectives. The balanced voltage reduction obtained in case 4 validates the optimization model's importance in the HRES OPF problem.

Figure 10
figure 10

Voltage deviation analysis for all four cases.

The power generation analysis given in Fig. 11 depicts the distribution of power generation among different generators in the system for all four cases. From the results, it can be observed from case 1 that focusing on minimizing fuel cost, shows a clear change in power generation to more cost-effective generators. While case 2, which targets power loss minimization, shows a different distribution with minimized transmission losses. Case 3 which optimizes the voltage profile obtains better power generation and provides better voltage stability. Case 4, which combines all objectives, depicts a balanced power generation distribution, by utilizing the strengths of each generator and attains the desired objectives.

Figure 11
figure 11

Power generation analysis for all four cases.

The voltage profile analysis for different cases is comparatively presented in Fig. 12 and it can be observed that case 4 provides a balanced voltage profile among all the cases. While analyzing case 1, large deviations are observed as case 1 aimed to minimize fuel cost and not optimize the other factors. For case 2, a better voltage profile is obtained compared to case 1 but poorer than cases 3 and 4 as it mainly focuses on power loss minimization. Case 3 mainly focuses on voltage profile and attained a better voltage profile than cases 1 and 2.

Figure 12
figure 12

Voltage profile analysis for all four cases.

The reactive power analysis given in Fig. 13 for all four cases indicates that the better stability of case 3 is due to the optimized voltage profile. For case 4 a balanced reactive power compensation is exhibited, and it ensures better voltage stability compared to other cases. Figure 14 depicts the convergence analysis of a proposed model for all four cases. The convergence plot is obtained for the objective function values. It can be observed that case 1 focusing on fuel cost minimization converges quickly to the optimal value. Case 2 which aimed to minimize the power loss shows a quick convergence with a higher objective value. Case 3 aimed to minimize the voltage profile and shows a steady convergence to the optimal solution with the moderate objective function value. Case 4, which combines all the objectives, shows a balanced convergence with a lower objective value than cases 2 and 3. From this analysis, the efficiency and effectiveness of the HDRL-QIGA model in obtaining optimal solutions for OPF problems across different scenarios can be observed.

Figure 13
figure 13

Reactive power analysis for all four cases.

Figure 14
figure 14

Convergence plot for all four cases.

Performance of the proposed model with renewable energy sources under different load conditions

The proposed model’s performance is further evaluated under different load conditions. Three load conditions low load, medium load, and high load conditions are considered for analysis. The system is integrated with renewable energy resources with low load conditions and the results of the OPF problem are presented in Table 5. From the results it can be observed due to the low load condition, the proposed model exhibits a low fuel cost of $610.0 with moderate power loss (2.9 MW) and voltage deviation (0.06) for case 1. For case 2, the proposed model exhibits a power loss of 2.8 MW with slightly higher fuel cost ($620.0) and voltage deviation (0.07) than case 1. For case 3, the proposed model exhibits a power loss of 2.8 MW with slightly less fuel cost ($615.0) than case 2 and voltage deviation (0.05) which is lesser than cases 1 and 2. For case 4, the proposed model exhibits a fuel cost of $620.4, power loss of 3.2 MW, and voltage deviation of 0.07 which is more effective in optimizing the power flow under low load conditions.

Table 5 OPF results by proposed HDRL-QIGA model (with renewable integration) under low load condition.

The system is integrated with renewable energy resources with medium load conditions and the results of the OPF problem are presented in Table 6. From the results it can be observed due to the low load condition, the proposed model exhibits a low fuel cost of $620.0 with moderate power loss (3.2 MW) and voltage deviation (0.07) for case 1. For case 2, the proposed model exhibits a power loss of 3.0 MW with a slightly higher fuel cost ($630.0) and voltage deviation (0.08 p.u.) which is slightly higher than case 1. For case 3, the proposed model exhibits a power loss of 3.0 MW with slightly less fuel cost ($625.0) than case 2 and voltage deviation (0.06) which is lesser than cases 1 and 2. For case 4, the proposed model exhibits a fuel cost of $620.5, power loss of 3.8 MW, and voltage deviation of 0.08 which is more effective in optimizing the power flow under medium load conditions.

Table 6 OPF results by proposed HDRL-QIGA model (with renewable integration) under medium load condition.

The system is integrated with renewable energy resources with high load conditions and the results of the OPF problem are presented in Table 7. From the results it can be observed due to the low load condition, the proposed model exhibits a low fuel cost of $705.0 with moderate power loss (3.6 MW) and voltage deviation (0.10) for case 1. For case 2, the proposed model exhibits a power loss of 3.5 MW with a slightly higher fuel cost ($715.0) and voltage deviation (0.11) which is slightly higher than case 1. For case 3, the proposed model exhibits a power loss of 3.5 MW with slightly less fuel cost ($710.0) than case 2 and voltage deviation (0.09) which is lesser than cases 1 and 2. For case 4, the proposed model exhibits a fuel cost of $705.0, power loss of 3.5 MW, and voltage deviation of 0.10 which is more effective in optimizing the power flow under high load conditions.

Table 7 OPF results by proposed HDRL-QIGA model (with renewable integration) under high load condition.

Table 8 presents the OPF results by the HDRL-QIGA model with a system with RES under different load conditions for Case 4. Under low load, the model achieves a fuel cost of $620.4 with a power loss of 3.2 MW and a voltage deviation of 0.07. Under medium load, the fuel cost increases slightly to $620.5, with a power loss of 3.8 MW and a voltage deviation of 0.08. Under high load, the fuel cost rises to $705.0, with a power loss of 3.5 MW and a voltage deviation of 0.10. This analysis demonstrates the HDRL-QIGA model's ability to maintain optimal performance across different load conditions, ensuring efficient power flow optimization while balancing multiple objectives.

Table 8 Case 4 results for different load conditions.

A comparative analysis of power generation across different load conditions is presented in Fig. 15. The results depict the reduced power generation in low-load conditions. When the load increases to medium to high conditions, the power generation from all the sources increases to meet the necessary demand. The adaptive learning strategy of the proposed model provides more flexibility in responding to different load conditions. The reactive power analysis across different load conditions is comparatively analyzed in Fig. 16. The analysis presents the minimal reactive power under low load conditions. The reactive power gradually increases when the load increases for different load conditions.

Figure 15
figure 15

Power generation categories across different load conditions.

Figure 16
figure 16

Reactive power compensation across different load conditions.

The power generation across different load conditions is comparatively analyzed in Fig. 17. It can be observed from the results that the power generation is kept minimal under low load conditions to minimize the cost factors. However, when the load increases from medium to high level, the generation from RES increases to meet the necessary demand. This balancing strategy ensures that the system operates efficiently under all load conditions.

Figure 17
figure 17

Power generation across different load conditions.

The voltage deviation under different load conditions is comparatively analyzed in Fig. 18. The results depict the minimal voltage deviation under low load conditions. When the load increases, the voltage deviations also increase proportionally but the proposed learning model manages to maintain the voltage deviation within the acceptable level. Finally, the converge performance of the proposed model for different load conditions is presented in Fig. 19 for case 4 which considers all the parameters in its objective function. The convergence plot depicts the proposed model's consistency in obtaining optimal solutions across all load conditions. While observing the comparative analysis, the convergence under low and medium loads is quick and requires iterations. In case of high load, the convergence is stable which validates the adaptability and effectiveness of the proposed model.

Figure 18
figure 18

Voltage deviation across different load conditions.

Figure 19
figure 19

Converge plot for different load conditions (Case 4).

Comparative analysis with different optimization algorithms

Further to validate the proposed model's superior performance some familiar optimization algorithms are considered for analysis. PSO, grasshopper optimization, moth flame optimization, cuckoo search optimization, Firefly algorithm, spotted hyena optimization, ant colony optimization, grey wolf optimization, genetic algorithm, and hybrid spotted hyena optimization are considered for comparative analysis. Table 9 depicts the comparative results of case 4 for various optimization algorithms in terms of fuel cost. The analysis depicts that the proposed HDRL-QIGA models have the lowest fuel cost compared to existing optimization algorithms. The mean fuel cost of $620.50 of the proposed model is $38 less than existing HSHOA and $39 less than ACO, CSO, MFO, GA, and SHO, and $40 less than GWO, GOA, and PSO algorithms. From the analysis, the better performance of the proposed model in minimizing the fuel cost is demonstrated compared to other optimization algorithms.

Table 9 OPF Results for Case 4 Using Different Optimization Algorithms.

The convergence characteristics of all the optimization algorithms are presented in Fig. 20. The better convergence of the proposed HDRL-QIGA can be observed from the figure. The convergence of the proposed model is fast with a low objective value, and this indicates the efficiency in finding the optimal solution for the OPF problem. The existing algorithms show slow convergence with higher objective values indicating their lower efficiency and solution quality. From the analysis, it can be confirmed that the proposed HDRL-QIGA model performance is superior to existing algorithms in terms of solution quality and convergence speed which improves the overall performance of HRES.

Figure 20
figure 20

Convergence plot for all optimization algorithms.

Conclusion

A novel Hybrid DRL-QIGA is presented in this research work to solve the OPF problem in HRES. The presented hybrid model utilizes the adaptive learning abilities of DRL in handling the uncertainties in HRES. The quantum-inspired genetic algorithm provides better exploration and global solutions search space for the OPF problem. This combination effectively balances the power flow and provides better decisions to handle uncertain situations. Experimental analysis of the proposed model utilizes a modified IEEE 30-bus system which is integrated with wind and solar PV panels. Under four different cases to minimize the fuel cost, voltage deviation, power loss, and combination of all the performances are measured. Also, the performance is measured under different load conditions and compared with existing optimization algorithms. The comparative analysis validates the better performance of the proposed model over existing optimization algorithms. A novel Hybrid DRL-QIGA is presented in this research work to solve the OPF problem in HRES. The presented hybrid model utilizes the adaptive learning abilities of DRL in handling the uncertainties in HRES. The quantum-inspired genetic algorithm provides better exploration and global solutions search space for the OPF problem. This combination effectively balances the power flow and provides better decisions to handle uncertain situations. Experimental analysis of the proposed model utilizes a modified IEEE 30-bus system which is integrated with wind and solar PV panels. Under four different cases to minimize the fuel cost, voltage deviation, power loss, and combination of all the performances are measured. Also, the performance is measured under different load conditions and compared with existing optimization algorithms. The comparative analysis validates the better performance of the proposed model over existing optimization algorithms. The proposed HDRL-QIGA model shows a mean fuel cost of 620.50$, which is significantly lower than the existing optimization algorithms. Specifically, it is 38$ less than the Hybrid Spotted Hyena Optimization Algorithm (HSHOA), 39$ less than the Ant Colony Optimization (ACO), Cuckoo Search Algorithm (CSA), Moth Flame Optimization (MFO), Genetic Algorithm (GA), and Spotted Hyena Optimization (SHO), and $40 less than the Grey Wolf Optimization (GWO), Grasshopper Optimization Algorithm (GOA), and Particle Swarm Optimization (PSO. This demonstrates the HDRL-QIGA model's superior efficiency in minimizing fuel costs compared to other optimization algorithms in the power systems. However, the proposed model has minor limitations while applied in a real-time power system due to its high computational demands. The algorithm's adaptability in environments with volatile energy demands might reduce the decision-making efficiency and robustness. Additionally, fine-tuning the hyperparameters requires significant time and resources. But once implemented the proposed model will provide better solutions than the conventional procedures. Further, the research work can be extended by incorporating a hybrid deep learning algorithm with optimization algorithms to improve the overall performance of a hybrid renewable energy system. Also, experimental implementation of this work can be implemented in the future.