Table 1 Pseudocode of the TD3-based dynamic low-carbon economic scheduling approach.

From: Intelligent scheduling for distributed-level island integrated energy systems considering multi-energy utilization and incentive-penalty stepped carbon trading mechanism

Step	Description
Training stage
1	Initialization: weights of the critic networks (\(\theta _{Q_1}\), \(\theta _{Q_2}\)), actor network (\(\theta _\mu \)), target critic networks (\(\theta _{Q_1'}\leftarrow \theta _{Q_1}\), \(\theta _{Q_2'}\leftarrow \theta _{Q_2}\)), target actor network (\(\theta _{\mu '}\)), experience replay buffer, mini-batch size, hyperparameters.
2	for episode = 1 : N do
3	The agent observes the environment to obtain the initial state \(s_1\) of the island IES scheduling model, \(s_1 = [p_{\text {load},1}, h_{\text {load},1}, w_{\text {load},1}, p_{\text {wt},1}, soc_{1}, 1]\)
4	Initialize a random exploration process N for action exploration
5	for time step = 1 : T do
6	The agent selects the current action \(a_t\) based on the random exploration process N, \(a_t = [p_{\text {chp},t}, p_{\text {hsm},t}, h_{\text {gb},t}]\)
7	Execute action \(a_t\) and transit to the new state \(s_{t+1}\)
8	Obtain the reward \(r_t\)
9	Store the transition in the experience replay buffer, transition = \((s_t, a_t, r_t, s_{t+1})\)
10	Modify the target action using Eq. (45)
11	Update the weights of the critic network by minimizing the loss function (44)
12	if t mod update interval = 0 or \(\:t = T\:\) then
13	Update the weights of the actor network using the sampled gradient policy (46)
14	Update the weights of the target networks using the soft update strategy (47)
15	end if
16	end for
17	end for
18	Save the weights of the actor network
Scheduling Stage
19	Load the weights of the actor network saved during the training stage
20	for time step \(t = 1 : T\:\) do
21	Input: real-time operational data of the island IES at time step t
22	Output: The scheduling strategy at time step t
23	Execute the scheduling strategy and input the real-time operational data of the next time step \(t+1\)
24	Obtain the reward \(r_t\) and calculate the total operation cost
25	end for

Back to article page

Table 1 Pseudocode of the TD3-based dynamic low-carbon economic scheduling approach.

Search

Quick links