A learning-driven algorithm for maintenance team and UAV collaboration in restoring power network

Pan, Tiejun; Zheng, Leina; Xu, Ying; Zhang, Xuefeng; Zhong, Caiming; Wang, Zhang

doi:10.1038/s41598-025-06512-w

Download PDF

Article
Open access
Published: 02 July 2025

A learning-driven algorithm for maintenance team and UAV collaboration in restoring power network

Tiejun Pan¹,
Leina Zheng²,
Ying Xu³,
Xuefeng Zhang¹,
Caiming Zhong¹ &
…
Zhang Wang¹

Scientific Reports volume 15, Article number: 23359 (2025) Cite this article

400 Accesses
Metrics details

Subjects

Abstract

Power networks are highly vulnerable to disruptions caused by natural and man-made disasters, necessitating prompt restoration of damaged power supply. This research addresses the challenge of efficiently restoring large-scale power networks, which often involve numerous unknown or uninspected faulty nodes. Leveraging advancements in unmanned aerial vehicles (UAVs) technology, this study facilitates the inspection of these nodes and subsequent manual maintenance. However, coordinating maintenance teams and UAVs is complex due to the intricate network structure and scheduling correlations. We propose a learning-driven (LD) algorithm to enhance human-UAV collaboration for effective power network restoration. The algorithm includes an initialization method to generate promising initial solutions, followed by the use of search operators as basic action elements and a learning engine to guide search directions based on state assessments. Comprehensive experiments validate the algorithm’s effectiveness in improving the restoration process.

White shark optimizer with optimal deep learning based effective unmanned aerial vehicles communication and scene classification

Article Open access 27 December 2023

Structure analysis in an octocopter using piezoelectric sensors and machine learning

Article Open access 28 August 2025

Optimized placement of distributed generators, capacitors, and EV charging stations in reconfigured radial distribution networks using enhanced artificial hummingbird algorithm

Article Open access 01 April 2025

Introduction

Power networks are crucial for social production and daily life but are vulnerable to man-made or natural disasters. In December 2022, the intentional destruction of four substations in Washington State disrupted the power supply to tens of thousands of users. Similarly, on June 17, 2024, Meizhou Typhoon experienced a severe rainstorm, causing power outages for over 130,000 households.

Power network interruptions can be modelled as simultaneous removals of network nodes or edges¹. While occasional faulty nodes can be easily identified through traditional measurement methods, accurately estimating the condition of a network with numerous faulty nodes is challenging due to its complexity. Consequently, several studies have explored evolutionary algorithms to develop maintenance strategies for these nodes, aiming to find near-optimal solutions within limited timeframes^2,3,4,5.

Genetic algorithm (GA) and particle swarm optimisation (PSO), as classic EA, have been widely applied to solve combinatorial optimization problems. Related studies are also committed to employing them for power network restoration scenarios. For example, Zhang et al.⁶ combined a chromosome test operator with GA to obtain feasible solutions to restore damaged interdependent transportation-electric power networks, while Volkova et al.⁷ constructed a multi-objective optimization model to formulate restoration process of distributed power networks, utilizing GA to calculate appropriate subgraphs based on practical scenarios. For regional network fault restoration and to verify the recovery system and algorithm in regional power grid operations, Cheng et al.⁸ improved GA whereas Molaali and Abedi⁹ combined heuristic and GA algorithms to optimise distribution network load restoration. Guamán and Valenzuela¹⁰ investigated the spanning tree technique to generate initial solutions and applied GA to obtain optimal solutions that satisfied all the constraints and Zhang et al.¹¹ addressed dynamic emergency restoration scheduling, incorporating an early-termination test method into GA to improve computational efficiency. The PSO algorithm, initially designed for path planning, was combined with other learning methods to determine maintenance task sequences for power networks. Combining fuzzy simulation with PSO to find the optimal network skeleton and restoration sequence, Liang¹² developed a network reconfiguration model post-outage. In order to enhance the restoration efficiency for an electricity distribution network, Kayal and Basumatary¹³ established a fuzzy-based model according to actual engineering situation and utilized PSO and grasshopper algorithms to optimize multiple objectives, while ElDesouky et al.¹⁴ used Boolean algebra in binary PSO to solve service restoration problems in distribution systems. To calculate service restoration schemes for distributed networks, Ayalew et al.¹⁵ applied a teaching-learning evolutionary algorithm, outperforming PSO and differential evolution (DE) algorithms under various protection constraints.

As shown in above studies, specific challenges of power network maintenance, have called for constructing multi-objective models and designing related algorithms to achieve satisfactory solutions. Augugliaro et al.¹⁶ formulated one such service restoration model with fuzzy objectives for electrical distribution networks, developing an evolutionary algorithm for solving the fuzzy multi-objective problem within limited response time. Huang¹⁷ addressed distribution system service restoration by establishing a multi-objective optimization model and applying a weighted sum strategy to convert objectives into a single goal by setting appropriate weighting factors. Huang also built a fuzzy cause-effect network to investigate task restoration strategies. Sanches et al.¹⁸ combined an improved NSGA-II, a subpopulation table-based multi-objective evolutionary algorithm, and a new heuristic to evolve solutions toward the Pareto frontier for large-scale distribution system service restoration. Wang and Chiang¹⁹ constructed a multi-objective model for restoring services in large-scale distribution system post-faults, integrating the K-means method into group-based PSO to enhance exploration and designing local heuristics to improve exploitation. Carrano et al.²⁰ addressed load restoration in power distribution networks by establishing a bi-objective model considering both restored loads and presented a multi-objective evolutionary algorithm with a new encoding strategy and effective evolutionary mechanism. Wang et al.²¹ used DE and estimation of distribution algorithms to solve a multi-objective optimization model, aiming to minimize power network restoration costs and maximize power capacity, showing good convergence and distribution in obtaining Pareto frontier.

The advancement of UAV technology supports the detection of power network fault nodes, with research focusing on determining UAV detection paths. Zhou et al.²² studied UAV applications for inspecting power networks in smart grids, improving dynamic programming to obtain solutions based on different timescales. Lim et al.²³ developed a two-phase stochastic model to determine UAV locations and movement directions for timely and effective power network damage assessment, while Hoang et al.²⁴ provided a system architecture for real-time object inspection using multiple UAVs, employing an angle-encoded PSO algorithm to generate initial inspection paths and adjusting directions based on communication links. To evolve the maintenance team simultaneously and UAV scheduling solutions for transmission network restoration, Zheng et al.²⁵ designed a cooperative evolutionary algorithm and successfully applied it in the 2017 Jiuzhaigou earthquake. Fu et al.²⁶ proposed a UAV routing strategy to monitor power networks in real-time, designing a two-layer architecture for large- and small-scale problems.

As aforementioned, most of the related studies focused on applying classic EA to solve deployment of maintenance teams or UAVs with aim of optimizing single or multiple objectives according to given scenarios, but few supported the cooperation between human team and UAV in carrying out restoration work. In addition, with the development of reinforcement learning, it has injected new vitality into solving combinatorial optimization problems. Exploring its integration with EA and utilizing complementary advantages to handle complex scheduling issues demonstrate potential research value.

In this article, we address the collaborative scheduling of maintenance teams and UAVs for power network restoration, where accurate UAV detection of unknown faulty nodes facilitates subsequent manual maintenance. A learning-driven algorithm is proposed to solve this problem, initially generating a high-quality solution and applying local search methods to improve its quality further. Searching operators, as basic elements of the learning engine, balance global and local search capabilities. The algorithm continuously learns from current operator feedback, guiding the search towards better directions. The effectiveness of the algorithm is verified through comprehensive experiments. The main contributions of this study are twofold:

1)
Establishing a mathematical model to describe maintenance-team and UAV cooperation scheduling, aiming to minimize power supply restoration time. The model accounts for maintenance team fatigue and treats UAV detection results as random events consistent with actual power network rescue scenarios.
2)
Proposing a learning-driven algorithm for the problem, which designs initialization method, search operators, and evolutionary strategies based on problem properties, achieving good performance across various instances. The algorithm provides relevant guidance for solving other human-machine collaboration problems.

Problem description

The challenge is to devise UAV and maintenance team schedules that minimize the maximum completion time for the maintenance teams. Table 1 presents the notations employed in the mathematical model.

Table 1 Notations of a mathematical model.

Full size table

Based on input variables, the detection completion time of unknown faulty node $f_{j'}$ can be calculated as Eq. (1).

$$\begin{aligned} \left\{ \begin{array}{ll} c_{f_{j'}}^{dec}=t_{{u_{i}, 0, f_{j'}}}+\tau _{u_{i}, f_{j'}} & \!\!\!\! f_{j'}=\pi _{u_{i}}(1)\\ c_{f_{j'}}^{dec}=c_{\pi _{u_{i}}(j-1)}+t_{u_{i}, \pi _{u_{i}}(j-1), f_{j'}}+\tau _{{u_{i}}, f_{j'}} & f_{j'}=\pi _{u_{i}}(j),\\ & j>1\\ \end{array}\right. \end{aligned}$$

(1)

During the solution evolution process, there are four scenarios regarding the detection and repair times of the faulty node f, as depicted in Fig. 1. The blue rectangle indicates detection by a UAV, while the orange rectangle represents repair by a maintenance team, with the periods noted on the rectangles.

In Case 1, node f is first detected by the UAV from 7:31 to 7:45, then repaired by the maintenance team from 8:03 to 9:15.
In Case 2, node f is detected by the UAV at 7:31. During this detection process, the maintenance team arrives at a node f at 7:38. However, in practice, the UAV detection and maintenance team repair does not occur simultaneously. In this scenario, repair work is performed after the UAV completes detection, i.e., starting at 7:45 instead of 7:38, causing subsequent node repairs to be postponed.
In Cases 3 and 4, the maintenance team begins working on the node f without UAV detection. Here, the maintenance team handles both detection and repair, rendering UAV operation on the node f unnecessary. Consequently, UAV detection for subsequent nodes is advanced.

Here, we suppose that $\pi _{h_{i}}(1) \in F(i=1,2, \ldots , m_{1})$, thus the repair completion time of $\pi _{h_{i}}(1)$ could be calculated as Eq. (2).

$$\begin{aligned} c_{\pi _{h_i}(1)}^{rep}=t_{h_{i}, 0, \pi _{h_{i}}(1)}+\tau _{h_{i}, \pi _{h_{i}}(1)} \end{aligned}$$

(2)

For the $\pi _{h_{i}}(j)(j>1)$, the repair completion time of $\pi _{h_{i}}(j)$ is computed based on the above four situations, as shown in Eqs. (3)–(5).

In the case $1, \pi _{h_{i}}(j)$ is detected by $u_{i}$ before being repaired by the maintenance team, then

$$\begin{aligned} \begin{aligned} c_{\pi _{h_{i}}(j)}^{rep}&=c_{\pi _{h_{i}}{(j-1)}}^{rep}+t_{h_{i}, \pi _{h_{i}}(j-1), \pi _{h_{i}}(j)}\\&+\,(p_{u_{i}, \pi _{h_{i}}(j)} \times \eta ^{-} \times \tau _{h_{i}, \pi _{h_{i}}(j)}\\&+\,(1-p_{u_{i}, \pi _{h_{i}}(j)})\times \eta ^{+} \times \tau _{h_{i}, \pi _{h_{i}}(j)}) \times (1+\delta )^{j-1} \end{aligned} \end{aligned}$$

(3)

In the case 2, $\pi _{h_{i}}(j)$ is immediately repaired by $h_{i}$ after being detected by $u_{i}$, then

$$\begin{aligned} \begin{aligned} c_{\pi _{h_{i}}(j)}^{rep}&=c_{\pi _{h_{i}}(j)}^{dec}+(p_{u_{i}, \pi _{h_{i}}(j)} \times \eta ^{-} \times \tau _{h_{i}, \pi _{h_{i}}(j)}\\&+\,(1-p_{u_{i}, \pi _{h_{i}}(j)}) \times \eta ^{+} \times \tau _{h_{i}, \pi _{h_{i}}(j)}) \times (1+\delta )^{j-1} \end{aligned} \end{aligned}$$

(4)

Case 3 and 4 indicate that $\pi _{h_{i}}(j)$ is directly repaired by the maintenance team without being detected by UAV, then

$$\begin{aligned} c_{\pi _{h_{i}}(j)}^{rep}=c_{\pi _{h_{i}}(j-1)}^{ \text{ rep } }+t_{h_{i}, \pi _{h_{i}}(j-1), \pi _{h_{i}}(j)}+\tau _{h_{i}, \pi _{h_{i}}(j)} \times (1+\delta )^{j-1} \end{aligned}$$

(5)

Eqs. (3–5) consider the fatigue factor of the maintenance team in repairing faulty nodes, which indicates that as the maintenance work progresses, the crew could feel tired and extend the task processing time.

We provide a simple scenario to describe the transmission network restoration, as shown in Fig. 2. The network topology involves six faulty nodes $p_1 - p_6$, where $p_3$, $p_4$ and $p_6$ are unknown faulty nodes. A UAV is employed to detect some unknown faulty nodes, while a team is responsible for repairing all faulty nodes. Supposing that a scheduling solution is obtained as $\pi _{u_1}=\{p_3,p_4,p_6\}$, $\pi _{h_1}=\{p_1,p_3,p_5,p_6,p_4,p_2\}$, where numbers on edges denote travel time, and numbers near faulty nodes represent detection/repair time. Regarding completion time of UAV detecting unknown faulty nodes, it can be computed as follows:

$$\begin{aligned} c_{p_3}^{dec}&=1 + 3=4\\ c_{p_4}^{dec}&=c_{p_3}^{dec}+2 + 4 = 10\\ c_{p_6}^{dec}&=c_{p_4}^{dec}+2 + 4 = 16 \end{aligned}$$

Regarding completion time of maintenance team repairing faulty nodes, it can be obtained as follows:

$$\begin{aligned} c^{rep}&=8 + 20=28 \end{aligned}$$

Assuming $p_3$ is accurately detected by UAV, and $\eta ^{-}$ is set to 0.6, $\delta$ is set to 0.5, then

$$\begin{aligned} c_{p_3}^{rep}&=c_{p_1}^{rep}+5 + 12\times 0.6\times (1 + 0.05)=40.56\\ c_{p_5}^{rep}&=c_{p_3}^{rep}+7 + 25\times (1 + 0.05)^2=75.12 \end{aligned}$$

Assuming $p_6$ is accurately detected by UAV, then

$$\begin{aligned} c_{p_6}^{rep}&=c_{p_5}^{rep}+11 + 14\times 0.6\times (1 + 0.05)^3=95.85 \end{aligned}$$

Assuming $p_4$ is not accurately detected by UAV, and $\eta ^{+}$ is set to 1.5, then

$$\begin{aligned} c_{p_4}^{rep}&=c_{p_2}^{rep}+7 + 20\times 1.5\times (1 + 0.05)^4=139.31\\ c_{p_2}^{rep}&=c_{p_4}^{rep}+5 + 10\times (1 + 0.05)^5=157.07 \end{aligned}$$

The objective function is formulated as Eq. (6).

$$\begin{aligned} \min \max _{1 \le i \le m_{1}} c_{\pi _{h_{i}}(end)}^{rep} \end{aligned}$$

(6)

s.t. (1)–(5)

$$\begin{aligned}&\bigcup _{i=1}^{m_{1}} \pi _{h_{i}}=F \end{aligned}$$

(7)

$$\begin{aligned}&\pi _{h_{i}} \cap \pi _{h_{i'}}=\varnothing , i \ne i' \end{aligned}$$

(8)

$$\begin{aligned}&\pi _{u_{i}} \cap \pi _{u_{i'}}=\varnothing , i \ne i' \end{aligned}$$

(9)

Here $c_{\pi _{h_{i}}(end)}^{rep}$ denotes the task completion time of $h_{i}$. The objective function represents minimizing the maximum task completion time of the maintenance teams. Eq. (7) indicates that maintenance teams need to repair all faulty nodes. Eq. (8) shows that there was no intersection between faulty node sets repaired by two maintenance teams. Eq. (9) shows that there was no intersection between unknown faulty node sets detected by two UAVs.

Q-learning algorithm

Q-learning, an efficient reinforcement learning algorithm, excels at learning valuable knowledge from a Markov environment to enhance its performance. In this algorithm, the reward value evaluates the feedback for actions taken in the current state, while the Q-table records the expected long-term returns from specific actions in each state. Table 2 shows a Q-table with three states and four actions. The core idea of this algorithm is to determine the optimal action selection strategy during the search process. The Q-table is updated by evaluating the expected utility of actions in the given state, and the appropriate action is selected by implementing a policy based on the Q-table. The detailed procedure of Q-learning is outlined in Algorithm 1.

Table 2 A Q-table example.

Full size table

Proposed algorithm

Initialization

The initialization procedure consists of two steps. The first step determines the initial tasks for each maintenance team and UAV. The second step assigns the remaining tasks to the maintenance teams and UAVs.

1.
First Step:
- Assign $m_{1}$ known faulty nodes $\in F-F^{\prime }$ with the largest processing times to each maintenance team.
- Assign $m_{2}$ unknown faulty nodes $\in F^{\prime }$ with the largest detection times to each UAV.
2.
Second Step:
- Select the earliest available rescue unit (either a team or a UAV).
- If the available unit is a maintenance team $h_{i}$, then for the faulty nodes that have not been restored and unknown faulty nodes that have been detected by UAV, identify the $k_{0}$-nearest neighbor ones $\Omega ^{*}$ of the team $h_{i}$ and assign a faulty node with the largest processing time in $\Omega ^{*}$ to the team $h_{i}$;
- If the available unit is the UAV $u_{i}$, then for the unknown faulty nodes that have not been detected, identify the $k_{0}$-nearest neighbor ones $\Omega ^{\prime }$ of the UAV $u_{i}$ and assign a unknown faulty node with the largest detection time in $\Omega ^{\prime }$ to the UAV $u_{i}$.

Searching operators

The proposed algorithm incorporates two types of action operators: perturbation operators and local search operators. The perturbation operators perturb the current maintenance team and UAV solutions, facilitating the exploration of a wider search area and enhancing exploration capability. The local search operators conduct a deep search of the current maintenance team solution, improving exploitation capability. The detailed descriptions of these operators are as follows:

Perturbation_insert: Randomly select an unknown faulty node from the current UAV solution, insert it into all possible positions, and retain the position yielding the best objective value. The same operation is performed on the current maintenance team solution. These operations are executed ds times, each time selecting a different faulty node.

Perturbation_swap: Randomly select an unknown faulty node from the current UAV solution, swap it with other faulty nodes, and retain the swap yielding the best objective value. This operation is also performed on the current maintenance team solution. These operations are executed ds times, each time selecting a different faulty node.

Localsearch_insert: Sequentially extract each faulty node from the maintenance team sequence with the latest task completion time and insert it into its $k_{1}$-nearest neighbor positions. If a better solution is found, the process is restarted until no further improvement is achieved.

Localsearch_swap: Sequentially extract each faulty node from the maintenance team sequence with the latest task completion time and swap it with its $k_{1}$-nearest neighbor nodes. If a better solution is found, the process is restarted until no further improvement is achieved.

Local search method

The proposed algorithm is based on a single solution evolution strategy, making the quality of the initial solution crucial for search efficiency and effectiveness. A local search method is implemented to conduct a deep search on the initial solution, as outlined in Algorithm 3. In this method, the search operators Localsearch_insert and Localsearch_swap are executed at least once in a random order, then alternately executed until no further improvement is detected.

Learning-driven algorithm

As previously discussed, the design of search operators balances both exploration and exploitation capabilities. Guiding these operators toward better directions is crucial for obtaining an optimal solution. Q-learning algorithm, which draws learning information through receiving feedback from environment on actions, is utilized to navigate the whole searching process. According to the properties of considered problem, definitions of state, reward, and action set are defined as follows.

State: State $=\{0,1\}$. If the current solution is better than the best one found so far, then $s=1$, otherwise $s=0$.

Reward: Based on the literature²⁷, the reward is proportional to performance improvement. Consequently, the performance gap between the incumbent solution and the best one is denoted as the reward, that is $reward =(obj(\pi _{best})-obj(\pi )) / obj(\pi _{best})+0.2$, where $obj(\pi )$ is the objective value of the current solution $\pi$. The definition of reward reflects the degree of improvement of incumbent solution relative to the best solution and 0.2 is an empirical parameter value.

Action: An action comprises a perturbation operator and a local search operator. Consequently, four actions are designed based on the function of operators, i.e. Action $=\{a_{1}, a_{2}, a_{3}, a_{4}\}$, $a_{1}=\{Perturbation\_insert, Localsearch\_insert\}$, $a_{2}=\{Perturbation\_insert, Localsearch\_swap\}$, $a_{3}=\{Perturbation\_swap, Localsearch\_insert\}$, $a_{4}=\{Perturbation\_swap, Localsearch\_swap\}$. For a given state s, we employ the roulette wheel method to determine the action to be taken, which can balance the probability of each action being adopted.

The procedure of the LD algorithm

A high-quality solution is first generated, and the maintenance team task scheduling sequence is refined using a local search method to enhance its performance. The solution evolves based on the learning-driven mechanism until a termination condition is met. During this process, the algorithm retains better solution, and when the iterations reach a specified number, the incumbent solution is replaced by the best one to continue the evolution operation. The main procedure of the LD algorithm is detailed in Algorithm 4.

Complexity analysis of LD algorithm

Suppose that there are n faulty nodes, $n'$ unknown faulty nodes, $m_1$ maintenance teams and $m_2$ UAV. According to the definition of objective function, the complexity of computing objective function is $O(n + n')$. In the initialization procedure, the computational complexity of two steps are $O(m_1(n - n')+m_2n')$, $O(k_0((n - m_1)^2+(n' - m_2)^2))$ respectively. For each perturbation operator, the computational complexity is $O(ds\times n'\times (n + n'))$. In the Localsearch_insert method, assuming that insertion operation is performed $c_1$ times, then computational complexity is $O(c_1\times k_1\times (n + n'))$. Similarly, swap operation is executed $c_2$ times, the corresponding complexity is computed as $O(c_2\times k_1\times (n + n'))$. For the local search method, in the $c_3$ iterations, the computational complexity is $O(c3\times (O(c_1\times k_1\times (n + n'))+O(c_2\times k_1\times (n + n'))))$. In summary, in $c_4$ iterations, the computational complexity of LD is calculated as follows:

$$\begin{aligned}&O(n,n',m_1,m_2,k_1,ds)=O(m_1(n - n')+m_2n')+O(k_0((n - m_1)^2+(n' - m_2)^2))\\&+O(c3\times (O(c_1\times k_1\times (n + n'))\\&+\,O(c_2\times k_1\times (n + n'))))+O(c_4\times (O(ds\times n'\times (n + n'))+O(c_1\times k_1\times (n + n'))/2\\&+\,O(c_2\times k_1\times (n + n'))/2)). \end{aligned}$$

The space complexity of LD is $O(m_1\times n + m_2\times n')$. Since multi-start iterated greedy algorithm (MSIG), evolutionary strategy (ES) and improved iterated greedy algorithm (IIG) are single solution-based evolutionary algorithm, their space complexities are $O(m_1\times n + m_2\times n')$. For cooperative algorithm (CEA), hybrid enhanced discrete fruit fly optimization algorithm (HEDFOA), and discrete fruit fly optimization algorithm based on a differential flight method (DFFODF), the space complexities are $N_1\times O(m_2\times n')+N_2\times O(m_1\times n)$, $SN\times NP\times O(m_1\times n + m_2\times n')$ and $PSize\times O(m_1\times n + m_2\times n')$ respectively.

Computational experiments

A test suite of 15 instances ranging from small to large scales is constructed according to the information from public statistical data and related research²⁵. The details of each instance are listed in Table 3, where |V| denotes the number of nodes while |E| denotes the number of edges, vol represents cables’s voltage and MRT is the current per square millimeter of high voltage cable is set to 3 amps.

Table 3 Details of test instances.

Full size table

All compared algorithms are performed in Matlab 2023b and run on a personal computer with an Intel (R) Core (TM) i7-1165G7 CPU @ 2.80GHz processor and 16.0 GB RAM. The computational experiments involve two parts: the parameter calibration is illustrated in the first part. In the second part, our proposed algorithm is compared with other popular algorithms in objective function value and power supply restoration process under deterministic UAV detection scenarios.

Calibration of LD algorithm

Suitable parameter setting is critical for the efficiency of algorithm. LD algorithm involves six parameters with candidate levels: $k_0 \in \{4,6,8\}$, $ds \in \{1,2,3\}$, $k_1 \in \{6,9,12\}$, $\alpha \in \{0.1,0.2,0.3\}$, $\gamma \in \{0.1,0.2,0.3\}$, $it \in \{4,6,8\}$. Then there are a total of $3^{6}=729$ parameter combinations. For each parameter combination, #1, #6, and #13 instances are executed five replications and the objective function value $obj=\max _{1 \le i \le m_{1}} c_{\pi _{h_{i}}(end)}^{rep}$ is transformed to $\Delta _{trans}=(obj-min(obj))/(max(obj)-min(obj))$ for each instance. And then, $729\times 3\times 5=10935$ transformed objective function values are obtained. The averaged $(obj-min(obj))/(max(obj)-min(obj))$ of each parameter at different levels are recorded in main effect plots, denoted as Fig. 3, which reflects the effects of different levels of each parameter on the transformed objective value. Based on the results, the multi-factor analysis of variance (ANOVA) method is utilised to analyze the influence of each parameter. The corresponding results are described in Table 4. Since the parameter interactions have no significant impact on the performance of LD, we omit the part of this table. We can observe that $k_{1}$ obtains the largest F-ratio, demonstrating $k_{1}$ has the most significant impact on performance. A larger $k_{1}$ means more neighborhood solutions are investigated, and the exploitation capability can be improved, thus leading to a better result. The impact of ds is in second place. A larger ds can be considered a strong vibration to the current solution, making the search move to areas with poor performance. $k_0$ ranks third according to the impact on the performance, a larger $k_0$ indicates that providing more neighborhood faulty nodes to be selected during the initial solution generation process does not directly contribute to reducing task completion time. A larger it reduces frequency of solution restart and makes the search process fall into a local optimum. A smaller value enhances randomness and convergence difficulty. Therefore, a medium value of it is appropriate. Moreover, the factor level of $\alpha$ and $\gamma$ can be easily determined by main effect plots. For the above reason, the following values are set: $k_0=4$, $d s=1$, $k_1=12$, $it=6$, $\alpha =0.2$, and $\gamma =0.2$.

Table 4 ANOVA results.

Full size table

Table 5 Comparisons between LD and variants.

Full size table

Table 6 Comparison of other algorithms.

Full size table

Table 7 The 95% confidence interval of compared algorithms.

Full size table

Ablation experiment

In this part, we validate the performance of initialization method, local search operators and Q-learning strategy. Then four variants of LD are constructed, involving ranIni, LD1, LD2, and LD3. Each variant denotes modifying a component of LD respectively. Specifically, ranIni represents randomly generating an initial solution, LD1 and LD2 denote removing Localsearch_swap and Localsearch_insert operators of LD algorithm, LD3 with randomly selecting actions is designed. All variants are implemented 30 replications for each instance within MRT time frame. The median and best metric indices of results are recorded in Table 5, where minimum values of these two indices among each variant are marked in boldface. From Table 5, it can be observed that LD algorithm yields minimum values in the best-metric index except No.3, No.5, and No.6 applying LD3, and competitive performance in the median-metric index except No. 1, No.3, No.5 and No.6 employing LD3, No.4 applying LD1. The italic p-value shows that, compared to its variants, LD algorithm demonstrates significant advantages on medium and large-scale instances from statistical view. The ranIni shows worst result and stability for each instance, indicating the importance of designing effective initialization method for this complicated scheduling problem. A good initialization makes search starting from more promising solution spaces. Moreover, it can be seen that LD1 outperforms LD2 for all instances, further demonstrating that insertion operation is more capable of conducting a through and in-depth search for the current solution compared to swap operation²⁸. LD3 exhibits superior performance on small-scale instances, indicating that randomly selecting actions can be applicable to small-scale instances. However, as the instance scale increases, how to select appropriate action for the incumbent solution becomes increasingly important. The Q-learning strategy continuously adjusts searching direction during solution evolution process, thereby avoiding blind search and accelerating convergence speed. Meanwhile, the superior performance of LD algorithm validates the combination of two operators enhances exploration and exploitation capability by continuously transforming neighbourhood structures.

Comparisons with other algorithms

The comparison experiments are conducted from two aspects: one is to compare LD algorithm with other algorithms according to the objective function value, the other is to analyze the power supply restoration process under deterministic UAV inspection scenarios. The two parts are detailed in the following subsections.

In this part, LD algorithm is compared with seven popular algorithms referring to the objective function value: (1) The cooperative evolutionary algorithm (CEA)²⁵. (2) The hybrid enhanced discrete fruit fly optimization algorithm (HEDFOA)²⁹. (3) The iterated greedy algorithm with restart scheme (IGR)³⁰. (4) The multi-start iterated greedy algorithm (MSIG)³¹. (5) The discrete fruit fly optimization algorithm based on a differential flight method (DFFODF)³². (6) The evolution strategy (ES)³³. (7) The improved iterated greedy algorithm (IIG)³⁴. We run each algorithm 30 replications on each instance, and set MRT as the termination condition.

Table 6 demonstrates median and best results of objective values among 30 replications. Moreover, the nonparametric Wilcoxon rank sum test is utilized to compare the results for each instance, and p-values are also listed in this table. It can be observed that LD obtains minimum values in two metric indices for all instances, and p-values indicates the performance of LD is statistically superior to other algorithms. Specifically, the objective values of LD are approximately 66% of CEA, 85% of IGR, DFFODF, IIG, and 91% of HEDFOA, MSIG, ES for all instances, which shows significant advantages of LD over compared algorithms. Additionally, the box plots reflect the performance of compared algorithms, as shown in Fig. 4. Due to the inferior performance of CEA, its results are not displayed in this figure. From these figures, it can be concluded that (1) The performance ranking of algorithms on each instance has stability and LD achieves the best results. The reasons that LD obtains satisfied results are summarized as follows: an effective initialization method guarantees a good starting point. Since our initialization is also applied to compared algorithms except CEA, the searching mechanism plays an important role in ensuring the quality of solution. Specially, the perturbation operator enables searching proceeds in a wide solution space, which facilitates improving the exploration capability. Local search operator conducts a deep search for the current solution, thereby enhancing the exploitation capability. Furthermore, the learning-driven mechanism learns beneficial experiences based on the performance of current action and guides searching direction towards optimal solution. (2) CEA yields worst result on each instance. This algorithm randomly generates initial solutions and utilizes migration and mutation operators to evolve solutions. Consequently, when solutions are trapped into poor areas, local search operators are not effective and it may take a long time to extract solutions from these areas. The search strategy ensures diversity of solutions but leads to low exploitation for this NP hard problem. (3) The overall performance of HEDFOA, IGR, MSIG, DFFODF, ES, and IIG are superior to that of CEA, indicating the effectiveness of our proposed initialization method. Performance gap between HEDFOA and LD decreases as instance scale increases, demonstrating stable and robust search ability of HEDFOA. The reason is that HEDFOA implements a thorough local search strategy to improve current solution and information sharing between different swarms is also beneficial for improving performance. It is noted that these algorithms include efficient local search strategies based on task movement, these strategies make searching concentrate on promising areas and ensure the stability and effectiveness of evolution.

This part investigates the power supply restoration process under deterministic UAV detection scenario obtained by compared algorithms. Specially, for a solution of an instance, the UAV detection results for unknown faulty nodes are determined according to probability distribution, as the fault nodes being repaired, if a new restored power path is formed from a power source to the identified terminal, the transmission capacities will be added in the power recovery. For single-phase circuits, transmission capacities can be calculated as $KVA=(voltage\times current)/1000$, while for three-phase circuits, $KVA=\sqrt{3}\times (voltage\times current)/1000$. Then the power supply restoration process can be calculated within specified time period. We implement each algorithm 30 replications for each instance, obtain the simulation result of each solution, and record their average values. It is noted that the result characteristics of all instances have some commonalities. Because of the limited space, Fig. 5 demonstrates the power supply restoration process of No. 15 instance. The 95% confidence intervals of restored power supply obtained by compared algorithms at different times are displayed in Table 7. The results are characterized in the following aspects: (1) CEA outperforms other algorithms in the early stage of power supply restoration, especially during the period of 10:00 to 21:00, but its performance gradually declines to the worst among all compared algorithms in the middle and later stages. (2) LD demonstrates excellent performance in the mid to late stages, specifically, the restored power supply of LD is approximately 120-130% of CEA in that stages, and 105-110% of other algorithms. These algorithms except CEA show a similar power restoration trend as LD, indicating that these algorithms can effectively control the resource allocation and scheduling throughout the entire power supply restoration process. (3) Since LD and most compared algorithms are proposed for solving flowshop scheduling problem adopting greedy search based operators, these operators exhibit good adaptability in solving the complicated scheduling problem.

Conclusion

This paper investigates the collaborative scheduling of maintenance teams and UAVs for power network restoration, proposing a learning-driven algorithm to develop feasible scheduling schemes within limited timeframes. The initialization method makes searching start from promising areas and improve potential to obtain better solutions. Q-learning mechanism guides searching directions and enhances searching efficiency both for rescue team and UAV scheduling sequences, where perturbation operators provide algorithm with broader exploration capability while local search operators enhance convergence speed. The proposed algorithm can be applied to other human-machine collaboration scenarios through designing reasonable task scheduling coupling method and solution evolutionary mechanism. For example, in the fire rescue operations, UAVs detect fire sources and rescuers implement tasks based on UAV’s feedback. Also, in the pipeline maintenance scenarios, intelligent robot is utilized to identify leakage points, with the efficient cooperation of robots, the workload of staff can be largely reduced. Our proposed algorithm provides relevant references for solving similar human-machine collaboration scheduling problem in designing solution encoding and decoding method, evolutionary strategy to balance exploration and exploitation capability.

The limitations and future work can be summarized as follows. Although the human-UAV collaboration scenario was considered, the mathematical models are idealised. Multiple objectives, such as the restoration time of major faulty nodes and rescue costs, can be incorporated into the model. Developing an efficient multi-objective optimization algorithm for this complex issue is a future goal. Additionally, actual rescue scenarios involve numerous uncertainties, such as changes in the power network structure and adjustments in rescue personnel arrangements. These uncertainties necessitate dynamic scheduling strategies based on the current situation.

Regarding the proposed algorithm, the movement operations of faulty nodes in the local search operators rely on geographic distance rather than network topology. Future studies will focus on integrating dynamic network topology into the design of search operators. This integration will enhance the algorithm’s ability to adapt to real-time changes and improve its overall performance in dynamic environments.

Data availability

The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

References

Rocco, C. M., Nock, D. & Barker, K. A fairness-based approach to network restoration. IEEE Trans. Syst. Man Cybernetics-Syst. 53, 3890–3894 (2023).
Article Google Scholar
Zheng, Y.-J., Zhang, Z.-Y., Yan, J.-Y. & Sheng, W.-G. Cooperative uav scheduling for power grid deicing using fuzzy learning and evolutionary optimization. IEEE Open J. Indus. Appl. 6, 15–33 (2025).
Article Google Scholar
Bosisio, A. et al. A tabu-search-based algorithm for distribution network restoration to improve reliability and resiliency. J. Modern Power Syst. Clean Energy 11, 302–311 (2023).
Article Google Scholar
Wang, Z., Liao, W., Xia, X., Wang, Z. & Duan, Y. Routing and scheduling in time-sensitive networking by evolutionary algorithms. Biomimetics 10, 333 (2025).
Article PubMed PubMed Central Google Scholar
Castillon, L. F. & Bedriñana, M. F. Transmission network reconfiguration in restoration process based on constructive heuristic algorithms. J. Control, Autom. Electr. Syst. 33, 929–938 (2022).
Article Google Scholar
Zhang, W., Han, Q., Dong, H., Wen, J. & Xu, C. Resilience-based post-earthquake restoration scheduling for urban interdependent transportation-electric power network. Structure and Infrastructure Engineering (2024).
Volkova, A., Ghasemi, A. & de Meer, H. Towards forming optimal communication network for effective power system restoration. IEEE Trans. Netw. Serv. Manage. 21, 5250–5259 (2024).
Article Google Scholar
Cheng, X. et al. Design and realization for fault restoration system based on the genetic algorithm. In 2016 IEEE International Conference on Information and Automation (ICIA), 905–909 (IEEE, 2016).
Molaali, M. & Abedi, M. A new heuristic method for distribution network restoration and load elimination using genetic algorithm. In 2018 Electrical Power Distribution Conference (EPDC), 46–51 (IEEE, 2018).
Guamán, A. & Valenzuela, A. Distribution network reconfiguration applied to multiple faulty branches based on spanning tree and genetic algorithms. Energies 14, 6699 (2021).
Article Google Scholar
Zhang, Z., Ji, T. & Wei, H.-H. Dynamic emergency inspection routing and restoration scheduling to enhance the post-earthquake resilience of a highway-bridge network. Reliab. Eng. Syst. Saf. 220, 108282 (2022).
Article Google Scholar
Liang, H. An improved optimization algorithm for network skeleton reconfiguration after power system blackout. Tehnički vjesnik 22, 1359–1363 (2015).
Google Scholar
Kayal, P. & Basumatary, R. G. Fostering restoration and power delivery efficiency by distributed generators and line switches in electricity distribution network. Energy Sourc. Part A-Recov. Utiliz. Environ. Effects 45, 5864–5884 (2023).
Google Scholar
ElDesouky, A. A., Reyad, E. M. & Mahmoud, G. A. Implementation of boolean pso for service restoration using distribution network reconfiguration simultaneously with distributed energy resources and capacitor banks. Int. J. Renew. Energy Res. 10, 354–365 (2020).
Google Scholar
Ayalew, M., Khan, B. & Alaas, Z. M. Optimal service restoration scheme for radial distribution network using teaching learning based optimization. Energies 15, 2505 (2022).
Article Google Scholar
Augugliaro, A., Dusonchet, L. & Sanseverino, E. R. Multiobjective service restoration in distribution networks using an evolutionary approach and fuzzy sets. Int. J. Electr. Power Energy Syst. 22, 103–110 (2000).
Article Google Scholar
Huang, C. M. Multiobjective service restoration of distribution systems using fuzzy cause-effect networks. IEEE Trans. Power Syst. 18, 867–874 (2003).
Article ADS Google Scholar
Sanches, D. S., Junior, J. B. A. L. & Delbem, A. C. B. Multi-objective evolutionary algorithm for single and multiple fault service restoration in large-scale distribution systems. Electric Power Syst. Res. 110, 144–153 (2014).
Article Google Scholar
Wang, S. & Chiang, H.-D. Multi-objective service restoration of distribution systems using user-centered methodology. Int. J. Electr. Power Energy Syst. 80, 140–149 (2016).
Article CAS Google Scholar
Carrano, E. G., da Silva, G. P., Cardoso, E. P. & Takahashi, R. H. Subpermutation-based evolutionary multiobjective algorithm for load restoration in power distribution networks. IEEE Trans. Evol. Comput. 20, 546–562 (2015).
Article Google Scholar
Wang, T. et al. Multi-objective optimization of unit restoration during network reconstruction based on de-eda. In 2020 IEEE 3rd International Conference on Electronics and Communication Engineering (ICECE), 102–106 (IEEE, 2020).
Zhou, Z. et al. Energy-efficient industrial internet of uavs for power line inspection in smart grid. IEEE Trans. Industr. Inf. 14, 2705–2714 (2018).
Article Google Scholar
Lim, G. J., Kim, S., Cho, J., Gong, Y. & Khodaei, A. Multi-uav pre-positioning and routing for power network damage assessment. IEEE Trans. Smart Grid 9, 3643–3651 (2016).
Article Google Scholar
Hoang, V. T. et al. System architecture for real-time surface inspection using multiple uavs. IEEE Syst. J. 14, 2925–2936 (2019).
Article ADS Google Scholar
Zheng, Y. J. et al. Evolutionary human-uav cooperation for transmission network restoration. IEEE Trans. Industr. Inf. 17, 1648–1657 (2020).
Article Google Scholar
Fu, J., Nunez, A. & De Schutter, B. Real-time uav routing strategy for monitoring and inspection for postdisaster restoration of distribution networks. IEEE Trans. Industr. Inf. 18, 2582–2592 (2021).
Article Google Scholar
Karimi-Mamaghan, M., Mohammadi, M., Pasdeloup, B. & Meyer, P. Learning to select operators in meta-heuristics: An integration of q-learning into the iterated greedy algorithm for the permutation flowshop scheduling problem. Eur. J. Oper. Res. 304, 1296–1330 (2023).
Article MathSciNet Google Scholar
Shao, Z., Shao, W., Chen, J. & Pi, D. A feedback learning-based selection hyper-heuristic for distributed heterogeneous hybrid blocking flow-shop scheduling problem with flexible assembly and setup time. Eng. Appl. Artif. Intell. 131, 107818 (2024).
Article Google Scholar
Shao, Z., Pi, D. & Shao, W. Hybrid enhanced discrete fruit fly optimization algorithm for scheduling blocking flow-shop in distributed environment. Expert Syst. Appl. 145, 113147 (2020).
Article Google Scholar
Huang, J., Pan, Q. K. & Gao, L. An effective iterated greedy method for the distributed permutation flowshop scheduling problem with sequence-dependent setup times. Swarm Evol. Comput. 59, 100742 (2020).
Article Google Scholar
Mao, J., Pan, Q., Miao, Z. & Gao, L. An effective multi-start iterated greedy algorithm to minimize makespan for the distributed permutation flowshop scheduling problem with preventive maintenance. Expert Syst. Appl. 169, 114495 (2021).
Article Google Scholar
Guo, H., Sang, H., Zhang, B., Meng, L. & Liu, L. An effective metaheuristic with a differential flight strategy for the distributed permutation flowshop scheduling problem with sequence-dependent setup times. Knowl.-Based Syst. 242, 108328 (2022).
Article Google Scholar
Karabulut, K., Kizilay, D., Tasgetiren, M. F., Gao, L. & Kandiller, L. An evolution strategy approach for the distributed blocking flowshop scheduling problem. Comput. Indus. Eng. 163, 107832 (2022).
Article Google Scholar
Li, W. et al. An improved iterated greedy algorithm for distributed robotic flowshop scheduling with order constraints. Comput. Indus. Eng. 164, 107907 (2022).
Article Google Scholar

Download references

Acknowledgements

The paper is supported by the National Social Science Fund of China (Grant No. 24BGL226).

Author information

Authors and Affiliations

School of Information Engineering, College of Science & Technology Ningbo University, No. 521 Wenwei Rd. Baisha Road St., Cixi, Zhejiang Province, 325060, China
Tiejun Pan, Xuefeng Zhang, Caiming Zhong & Zhang Wang
College of Business, Zhejiang Wanli University, No.8, Qianhu South Road, Ningbo, Zhejiang Province, 315100, China
Leina Zheng
College of Digital Technology and Engineer, Ningbo University of Finance & Economics, Ningbo, 315175, Zhengjiang, China
Ying Xu

Authors

Tiejun Pan
View author publications
Search author on:PubMed Google Scholar
Leina Zheng
View author publications
Search author on:PubMed Google Scholar
Ying Xu
View author publications
Search author on:PubMed Google Scholar
Xuefeng Zhang
View author publications
Search author on:PubMed Google Scholar
Caiming Zhong
View author publications
Search author on:PubMed Google Scholar
Zhang Wang
View author publications
Search author on:PubMed Google Scholar

Contributions

Tiejun Pan: Conceptualization, Software, Writing-original draft. Leina Zheng: Software, Formal analysis. Ying Xu: Formal analysis, Supervision. Xuefeng Zhang: Supervision, Project administration. Caiming Zhong: Methodology, Validation. Zhang Wang: Validation.

Corresponding author

Correspondence to Ying Xu.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Pan, T., Zheng, L., Xu, Y. et al. A learning-driven algorithm for maintenance team and UAV collaboration in restoring power network. Sci Rep 15, 23359 (2025). https://doi.org/10.1038/s41598-025-06512-w

Download citation

Received: 25 December 2024
Accepted: 09 June 2025
Published: 02 July 2025
DOI: https://doi.org/10.1038/s41598-025-06512-w