Table 1 Pseudocode of the TD3-based dynamic low-carbon economic scheduling approach.
Step | Description |
|---|---|
Training stage | |
1Â | Initialization: weights of the critic networks (\(\theta _{Q_1}\), \(\theta _{Q_2}\)), actor network (\(\theta _\mu \)), target critic networks (\(\theta _{Q_1'}\leftarrow \theta _{Q_1}\), \(\theta _{Q_2'}\leftarrow \theta _{Q_2}\)), target actor network (\(\theta _{\mu '}\)), experience replay buffer, mini-batch size, hyperparameters. |
2Â | for episode = 1 : N do |
3Â | Â Â Â The agent observes the environment to obtain the initial state \(s_1\) of the island IES scheduling model, \(s_1 = [p_{\text {load},1}, h_{\text {load},1}, w_{\text {load},1}, p_{\text {wt},1}, soc_{1}, 1]\) |
4Â | Â Â Â Initialize a random exploration process N for action exploration |
5Â | Â Â Â for time step = 1 : T do |
6Â | Â Â Â Â Â Â The agent selects the current action \(a_t\) based on the random exploration process N, \(a_t = [p_{\text {chp},t}, p_{\text {hsm},t}, h_{\text {gb},t}]\) |
7Â | Â Â Â Â Â Â Execute action \(a_t\) and transit to the new state \(s_{t+1}\) |
8Â | Â Â Â Â Â Â Obtain the reward \(r_t\) |
9Â | Â Â Â Â Â Â Store the transition in the experience replay buffer, transition = \((s_t, a_t, r_t, s_{t+1})\) |
10Â | Â Â Â Â Â Â Modify the target action using Eq. (45) |
11Â | Â Â Â Â Â Â Update the weights of the critic network by minimizing the loss function (44) |
12Â | Â Â Â Â Â Â if t mod update interval = 0 or \(\:t = T\:\) then |
13Â | Â Â Â Â Â Â Â Â Â Update the weights of the actor network using the sampled gradient policy (46) |
14Â | Â Â Â Â Â Â Â Â Â Update the weights of the target networks using the soft update strategy (47) |
15Â | Â Â Â Â Â Â end if |
16Â | Â Â Â end for |
17Â | end for |
18Â | Save the weights of the actor network |
Scheduling Stage | |
19Â | Load the weights of the actor network saved during the training stage |
20Â | for time step \(t = 1 : T\:\) do |
21Â | Â Â Â Input: real-time operational data of the island IES at time step t |
22Â | Â Â Â Output: The scheduling strategy at time step t |
23Â | Â Â Â Execute the scheduling strategy and input the real-time operational data of the next time step \(t+1\) |
24Â | Â Â Â Obtain the reward \(r_t\) and calculate the total operation cost |
25Â | end for |