Table 1 Pseudocode of the TD3-based dynamic low-carbon economic scheduling approach.

From: Intelligent scheduling for distributed-level island integrated energy systems considering multi-energy utilization and incentive-penalty stepped carbon trading mechanism

Step

Description

Training stage

1 

Initialization: weights of the critic networks (\(\theta _{Q_1}\), \(\theta _{Q_2}\)), actor network (\(\theta _\mu \)), target critic networks (\(\theta _{Q_1'}\leftarrow \theta _{Q_1}\), \(\theta _{Q_2'}\leftarrow \theta _{Q_2}\)), target actor network (\(\theta _{\mu '}\)), experience replay buffer, mini-batch size, hyperparameters.

2 

for episode = 1 : N do

3 

   The agent observes the environment to obtain the initial state \(s_1\) of the island IES scheduling model, \(s_1 = [p_{\text {load},1}, h_{\text {load},1}, w_{\text {load},1}, p_{\text {wt},1}, soc_{1}, 1]\)

4 

   Initialize a random exploration process N for action exploration

5 

   for time step = 1 : T do

6 

      The agent selects the current action \(a_t\) based on the random exploration process N, \(a_t = [p_{\text {chp},t}, p_{\text {hsm},t}, h_{\text {gb},t}]\)

7 

      Execute action \(a_t\) and transit to the new state \(s_{t+1}\)

8 

      Obtain the reward \(r_t\)

9 

      Store the transition in the experience replay buffer, transition = \((s_t, a_t, r_t, s_{t+1})\)

10 

      Modify the target action using Eq. (45)

11 

      Update the weights of the critic network by minimizing the loss function (44)

12 

      if t mod update interval = 0 or \(\:t = T\:\) then

13 

         Update the weights of the actor network using the sampled gradient policy (46)

14 

         Update the weights of the target networks using the soft update strategy (47)

15 

      end if

16 

   end for

17 

end for

18 

Save the weights of the actor network

Scheduling Stage

19 

Load the weights of the actor network saved during the training stage

20 

for time step \(t = 1 : T\:\) do

21 

   Input: real-time operational data of the island IES at time step t

22 

   Output: The scheduling strategy at time step t

23 

   Execute the scheduling strategy and input the real-time operational data of the next time step \(t+1\)

24 

   Obtain the reward \(r_t\) and calculate the total operation cost

25 

end for