Explainable machine learning model of disorganisation in swarms of drones

Gackowska-Kątek, Marta; Cofta, Piotr

doi:10.1038/s41598-024-73220-2

Download PDF

Article
Open access
Published: 28 September 2024

Explainable machine learning model of disorganisation in swarms of drones

Marta Gackowska-Kątek¹ &
Piotr Cofta¹

Scientific Reports volume 14, Article number: 22519 (2024) Cite this article

2006 Accesses
6 Citations
Metrics details

Subjects

Abstract

The main challenges when managing a fleet of unmanned aerial vehicles are to ensure the relative stability of its formation and to minimise disorganisation, specifically when undergoing an intrusion. When planning the mission it is beneficial for the operator to set the parameters of the formation to balance the needs of the mission with the disorganisation that an intruder may cause. The model developed in this research predicts the anticipated disturbance as a function of the parameters of the formation. The effectiveness of six machine learning methods are compared with a previously established baseline, using data obtained from simulations. CatBoost (categorical boosting) delivered the best results, with an $R^2$ (coefficient of determination) value of 83.3%, representing an improvement of 80% over the baseline. The SHAP (Shapley Additive Explanations) method was used to extend the model beyond predictability for particular combinations of values of parameters, towards generalised recommendations for the operator of the formation.

Dual-loop control and state prediction analysis of QUAV trajectory tracking based on biological swarm intelligent optimization algorithm

Article Open access 17 August 2024

Multivariate linear regression model based on cross-entropy for estimating disorganisation in drone formations

Article Open access 07 August 2023

Structure analysis in an octocopter using piezoelectric sensors and machine learning

Article Open access 28 August 2025

Introduction

Maintaining a static formation of a swarm of drones and avoiding collisions are two of the main challenges in the field of unmanned aerial vehicles, particularly if they are engaged in observation or sensing missions or are covering a fixed area. The correct execution of the mission is contingent upon satisfying both the mission objectives of maintaining the formation in position, while avoiding intruders and preventing collisions within the formation. Of these, the primary challenge is the need to deal with intruders and unexpected random events, as described by Kallinikos¹. For unmanned aerial vehicles, this issue routinely resolved by fitting drones with a collision avoidance algorithm, as noted by Wei et al.².

The appearance of an intruder introduces an undesired disturbance to the formation. Guided by their collision-avoidance algorithms, drones move from their positions to let the intruder pass, and then need to return to their positions while avoiding collisions with other drones in the swarm. As this disturbance can result in additional energy expenses, delays or reductions in mission readiness, it is essential to plan for it. As highlighted by Atyabi et al.³ as long as the operator of the formation knows the extent of the disorganisation that will be caused by an anticipated intruder, the parameters of the formation can be altered to balance this with the mission of the formation. Jiang⁴ observed that the ability to predict the level of disturbance to a given formation is therefore essential.

As an outcome of our research, this paper presents a set of two tools that can aid an operator undertaking this type of planning. The first is a predictive model that was developed using selected machine learning methods. The model predicts the level of disturbance that the average intruder will impose, expressed as the level of cross-entropy, on the basis of selected parameters of the formation, including the parameters of the collision avoidance algorithm. Our research indicated that the most productive method was based on CatBoost, delivering an $R^{2}$ of 83.3%. Catboost was developed by Dorogush et al.⁵ and employs gradient boosting, a powerful ensemble learning technique that builds models in a sequential manner. By combining multiple weak learners, typically decision trees, it can create a strong predictive model. This algorithm minimizes overfitting through the use of techniques such as ordered boosting and oblivious trees, which ensure robust and accurate performance even on smaller datasets. The coefficient of determination, denoted as $R^2$, is a key metric in regression analysis, as it represents the proportion of the variance in the dependent variable that is predictable from the independent variables. Moreover in predictive modelling, a higher $R^2$ suggests that the model has better predictive accuracy, meaning it can more reliably predict future outcomes based on the input variables.

However, the use of machine learning tools can make the model opaque, preventing its interpretation by the operator. That is, although the operator may obtain a prediction for the disturbance arising from a given set of parameters, there is no indication of the effects of increasing or decreasing some of the parameters on the disturbance. The SHAP method can be applied to the best model to provide concrete guidelines regarding alterations in the parameters.

The novelty of this work lies in the following aspects:

A new model to predict the disturbance caused by an intruder was developed by exploring several methods from the machine learning domain.Our model employs the CatBoost method and delivers $R^2$ equal to 83.3%, an improvement of 80% over the baseline drawn from work reported by Gackowska et al.⁶ for the same dataset, obtained through simulations.
The SHAP method is applied to the model, and the outcome is used to formulate guidelines for the operator regarding the alteration to parameters. This approach can overcome the limitations of the machine learning models, i.e. the lack of anexplainable relationship between the values of the parameters and the outcome of the model.

This paper is organised as follows: Section 2 presents a literature review in the areas of disorganisation, machine learning models, and explainability. Section 3 describes the proposed methodology. Section 4 presents some results and a discussion. Section 5 summarises our work and draws some conclusions from this research.

Literature review

As observed by Wu et al.⁷ disorganisation is related to the state of the system structure. It concerns the occurrence of anomalies, i.e. certain events or patterns that deviate from the well-defined concept of normal and expected behaviour, as noted by Chandola et al.⁸. Kelso⁹ points out that disorganisation also involves to the occurrence of disturbances in coordination, i.e. the spatial, temporal and functional order. As defined by Chaudhury et al.¹⁰ coordination can be considered as a cycle consisting of four phases: definition phase, conflict resolution, action and adaptation. The effects of interactions such as chaos effects, interference effects or one-time anomalies, as identified on the basis of Luhmann’s theory¹¹, can negatively affect the organisation and hence, the performance and the security of the system.

Initially, the concept of cross-entropy was closely related to thermodynamics, and was considered as a measure of the disorder or randomness in a thermodynamic system, as noted by Zorich¹². Later with the development of information theory, the concept of Shannon’s entropy¹³ was introduced and used as a measure of information allowing for the assessment of certain global information based on a probability distribution. In general, it can be said that cross-entropy represents the difference between the expected and actual probability distributions.

Work by Cofta et al.¹⁴ has shown that cross-entropy can be seen as a measure of the degree of organisation and disturbance in the formation of a drone swarm. Entropy is widely used as a measurein interdisciplinary research. Pincus¹⁵, indicated that approximate entropy is an appropriate measure for assessing complex systems involving both stochastic and deterministic chaotic processes. As mentioned by Niku¹⁶ entropy is a measure of the degree of organisation and randomness of a system. All natural systems are prone to increasing entropy, but when artificial order is imposed on the system, entropy decreases. Chuprov¹⁷ used entropy as a measure of order in self-organising industrial economic systems; this research; this research showed that an increase in the order of the system is characterised by a decrease in entropy. Canfora et al.¹⁸ focused on the issue of using entropy as a measure of changes, complexity dependencies and disorganisation in source code.

As indicated by Gackowska et al.⁶, although a drone swarm formation may be on the verge of chaos, it is possible to develop a predictive model that allows the average disruption caused by an intruder to be estimated as a function of the formation parameter values. In this case, a multivariate linear regression model using cross-entropy as a measure predicted cross-entropy values of approximately 54%.

Machine learning models are widely used for both regression and classification tasks. Their versatile nature allows them to be applied in many fields, including medicine, as noted by Kavitha et al.¹⁹,the natural environment, as studied by Mosavi et al.²⁰, mechanics, as described by Liu²¹, and robotics, as highlighted by Kim et al.²². In a literature review, Choi et al.⁸ noted that machine learning algorithms are also used for UAVs, mainly in regard to issues related to collision avoidance, route planning, object recognition, parameter tuning and adaptive control These authors also reported that it is important to take into account simulation data or real-time data when developing algorithms. The prediction of the formation shape of a swarm of three drones was investigated by Traboulsi²³, who used softmax regression as a machine learning classifier to predict the type of drone formation and the transformation of one formation into another. This research considered formations of drones and parameters such as the average distance between drones, the average speed of drones, and the average centre point between drones. Research carried out by Nebe et al.²⁴ focused on the use of a one-dimensional convolutional neural network (CNN) to detect chaos in a drone swarm formation and to respond to disorganisation. Based on data on the positions of the drones and speeds, drawn from the sensors and the planned trajectory, four states of the system were identified: normal, chaos, uncertain chaos, uncertain norm. A situation was also identified in which the sixth drone in the formation exceeded the period of 85 s needed to return to the initial position, and in this case, chaotic behaviour was identified. In their research, this threshold was not universal but specific to the observed behaviour of the sixth drone during this particular test. According to the data, sixth drone exhibited irregular responses to changes in the base direction at around 90 and 75 s, respectively. Based on this behaviour, the chaos detection algorithm determined that the system could not be normal at 85 s, a finding that was corroborated by the fact that this sixth ultimately crashed during the flight test, suggesting that the observed chaotic behaviour was indeed critical. Bartak et al.²⁵ used supervised machine learning techniques based on sequences of data from sensors and the corresponding control signals to identify the activities of a single drone.

Researchers are increasingly taking advantage of the explainability of artificial intelligence models, which represents the degree to which a human can understand and explain the internal mechanics of an artificial intelligence/machine learning system, as pointed by Choras et al.²⁶.

Explainability and its importance was reviewed by Samek et al.²⁷. It is indicated that it meets various requirements such as the ability to assess trust, safety and security, and the ability to complement human expertise in the decision-making process.

As rightly noted by Roscher et al.²⁸, reliance on an algorithmic interpretation of a model for individual data will not provide an explanation hat allows the user to understand the model’s decisions. It is necessary to include knowledge and to integrate with the artificial intelligence model as highlighted by von Rueden et al.²⁹. Zhou et al.³⁰ report that a popular library for achieving a posteriori explainability is SHAP (SHapley Additive exPlanations), developed by Lundeberg et al.³¹. Hu et al.³², note that the values obtained as a result of using SHAP are better than other measures in the presence of outliers or feature dependencies.

Methodology

The primary goal of this research was to analyse and develop machine learning models that will aid decisions on the optimal settings of the parameters for a drone formation. We consider a situation that involves stationary formation of a swarm of multi-rotor drones. These drones move according to the implemented collision avoidance algorithm, in a grid formation. The parameters are related to both the proprietary anti-collision algorithm ($R_1$, $R_2$, $\tau$, q) and the formation itself (size, spacing) and details of these parameters are discussed in section 4.1.

The model was expected to predict the increase in entropy for a formation of drones caused by the passage of an intruder. Prior research by Gackowska et al.⁶ set the benchmark for predictability at $R^{2}$=54%. In addition, for the purpose of providing support for decision-making, a high level of explainability of the model was sought as following²⁷.

We first perform a preliminary analysis of the parameters in the dataset, and then develop machine learning models to predict cross-entropy values. We evaluate these using commonly accepted metrics. Finally, based on the developed artificial intelligence models, their explainability, and existing domain knowledge, we provide a number of guidelines for swarm operators and managers.

The inclusion of reasoning about explainability was driven by the observation, that in the context of decision-making with regard to unmanned systems, it is important not only to obtain the best possible prediction results, but also to understand the parameters based on which the model makes predictions, as far as possible. The rationale behind this reasoning is that the model alone may provide point predictions, i.e. it may give information about the expected entropy for a given set of parameters, but it neither guarantees the robustness of the formation nor helps the operator of the formation to determine which parameter should be moved in which direction. This is particularly true in situations where the model may contain local extremes that may discourage the operator from exploring beyond them.

Our selection of models was limited to machine learning models, mostly from the decision tree family. It was expected that the near-chaotic nature of the relationship between parameters of the formation and cross-entropy would be well captured by decision trees. In addition to the decision tree model, XGBoost (eXtreme Gradient Boosting), SVR (Support Vector Regression), CatBoost (Categorical Boosting), Random Forest, K-Neighbours Regressor and Decision Tree were chosen.. This selection intentionally included models that are known for their explainability, and excluded the family of artificial neural networks, where explainability is less assured. we note that in our prior research⁶, models from the regression family were used.

The evaluation used four metrics: $R^2$ (coefficient of determination), MAE (mean absolute error), MSE (mean square error), and RMSE (root mean square error). This allowed for a comparison with prior research and identification of the best model. The best results were achieved by CatBoost (categorial boosting, with a vaule for $R^2$ reaching 83%.

To assess the explainability, the SHAP SHapley Additive exPlanations metric was used to establish the importance and the impact of various parameters and their combinations. The outcome in terms of explainability was then re-interpreted in light of domain knowledge, to draw up specific recommendations for the operator.

Results and discussion

Dataset and data points

For the purpose of this research, a data set developed by the authors of this work was used, which is available from the repository in³³. The methodology used forf data creation and preparation, and a detailed description of the dataset, are given in⁶. In this section, we will focus on only the key aspects and characteristics of the data that are necessary to understand our work. The dataset was created for a scenario in which an intruder passes through a static formation of a drone swarm, and the aim was to maintain the allocated positions of the drones. The dataset consisted of 3720 unique data samples regarding a swarm of unmanned aerial vehicles, where each sample represented the averaged results for about 250,000 flights of the intruder at various angles and starting points. The parameters of the formation were selected at random, from within specific ranges, while observing the physical limitations of the modelled situation. Distances are expressed in metres, and the drones speed was 4 m/s.

Each individual data sample in the dataset consisted of the following data:

(a)
Input: parameters for the swarm and the anti-collision algorithm used: size, spacing, $R_1$, $R_2$, $\tau$, q

These are independent variables that are assumed to be set by the operator of the swarm. The size parameter represents the number of drones making up the height or width of the swarm (the formation always has the shape of a square), the spacing is the distance between drones in metres, $R_1$ is the radius of the external safety zone, and $R_2$ is the radius of the internal safety zone, both in metres. The parameters $\tau$ and q control the agility of the collision avoidance algorithm, where $\tau$ is the linear component and the q is the non-linear component of the response to the proximity of other drones.
(b)
Derived variables: $D_{1}$, $D_{2}$, $D_{3}$

These variables were introduced based on the authors’ domain knowledge, as it was expected that they would better capture the physical dependencies present in the system, specifically certain certain nonlinearities observed previously⁶. They are calculated as shown in Eqs. (1)–(3):
$$\begin{aligned} D_1= & spacing - R_2 \end{aligned}$$
(1)
$$\begin{aligned} D_2= & R_2 - R_1 \end{aligned}$$
(2)
$$\begin{aligned} D_3= & \tau ^q \end{aligned}$$
(3)
(c)
Results: estimation of the probability of: collision, oscillation, vibration, passage

Each flight may end up in one of four situations, as listed above. For a single data point, the results take the form of probability estimates. Collisions, oscillations, and vibrations are behaviours that negatively impact a swarm of unmanned aerial vehicles and represent situations where the calculation of cross-entropy is meaningless; it is only for a passage scenario that the cross-entropy can be meaningfully calculated.
(d)
Results: cross - entropy

For the purposes of this work, entropy will be considered as a reference measure that can be used to assess the state of the drone swarm formation, andthe level of disorganisation of this system. The cross–entropy is calculated as the average cross-entropy of those simulations that ended with the passage of the intruder. Equation (4) is used for the calculation of cross-entropy for anindividual simulation. If there was no passage for a given set of parameters, the value of cross-entropy is reported as zero, and is excluded form further analysis. It is assumed that the greater the cross-entropy, the greater the level of disorganisation of the drone swarm.
$$\begin{aligned} H(p,q) = - \sum \limits _{x\in X} p(x)*log(q(x)) \end{aligned}$$
(4)

Preliminary data analysis

Table 1 shows a statistical summary for the cross-entropy parameter. The mean entropy is found to be 22191.1, with a standard deviation of 28530.7. The minimum entropy observed is 1321.85, while the maximum entropy reaches a remarkable 527232. A quartile analysis indicates that 25% of the data lies below 7015.48, 50% below 13226, and 75% below 26825.3.

Table 1 Statistical summary of cross-entropy values in the dataset.

Full size table

Figure 1 shows the frequency distribution of the cross-entropy values in the dataset. The distribution has a long tail, and it is clearly highly skewed towards smaller values.

The aim of the preliminary data analysis was to find apparent relationships in the data that could help in determining the best model. The correlation between the cross-entropy and other elements of each data point was analysed, an the results are shown in Fig. 2. Note that both independent and dependent variables and probabilities were considered. A positive correlation coefficient indicates that there is a direct and positive relationship between two variables.

The variables that are positively correlated with the cross-entropy values are $R_1$ and the probability of oscillation. A significant negative correlation can be observed for the probability of a normal passage, and this observation is consistent with the domain knowledge. A larger value of $R_1$ has a direct impact on the behaviour of the swarm by causing drones to respond faster to any violation of the space around them. Oscillations, and near-oscillations lead to very high cross-entropy, as these are situations in which the swarm is unable to stabilise. Normal passage, on the other hand, implies that the swarm is able to contain the disturbances.

Machine learning models

A preliminary analysis did not provide any definitive clues about the nature of cross-entropy as a function of the parameters of the swarm, apart from a clear understanding of the wide range of the values and some correlations. In view of this, we considered two groups of parameters that can be used as the input to the model:

(A)
Group 1 (basic parameters): $R_1$, $R_2$, $\tau$, q, size, spacing
(B)
Group 2 (basic parameters with derived variables): $R_1$, $R_2$, $\tau$, q, size, spacing, $D_1$$D_2$$D_3$

Figure 3 shows a flowchart of the steps in this research.

Prior work by Gackowska et al.⁶ provided a baseline. In that paper, a multivariate linear regression model was presented that provided an accuracy of approximately 54% in predicting the cross-entropy value. This research focused on the family of decision trees, in the expectation that it will be able to address the characteristics of the function while supporting explainability. The following machine learning models were developed, implemented and compared: CatBoost (proposed by Dorogush⁵, XGboost developed by Chen³⁴, Decision Tree, Random Forest, SVR, and a K-Neighbors Regressor (as discussed by Pedregosa et al.³⁵).

Table 2 shows the hyperparameters used for the artificial intelligence models. Unless otherwise indicated, the default values of the remaining model hyperparameters were used. These parameters were found to yield the best results.

Table 2 Hyperparameters values for the machine learning models.

Full size table

The four most frequently used metrics described by Chicco³⁶ were used to evaluate the models: $R^2$ (coefficient of determination)³⁷, MAE (mean absolute error), MSE (Mean squared error), RMSE (root mean squared error). The results are presented in Table 3. As $R^2$ is accepted as indicator of the extent of the explainability of a model, it was used to compare these models, an theother metrics were used only when needed. An analysis of the results showed that the remaining metrics followed the pattern of the $R^2$ metric. All results are rounded to four decimal places. For reference, the baseline model which achieved an $R^2$ value of 0.544 is also included in this table.

Table 3 Evaluation of machine learning models.

Full size table

In this analysis, we considered whether varying the parameters of the swarm and the anti-collision algorithm would enable us to develop models with a high level of predictability of the cross-entropy values. We also examined whether the introduction of derived variables would improve the effectiveness of the model. The results were compared with the results from the base model. All models except Decision Tree gave improvements in cross-entropy predictions over the baseline model, but the highest percentage of improvement compared to the base model was achieved for the CatBoost model developed on the basis of the second group of parameters, and the XGBoost model, also based on the second group of parameters. It is worth noting that the CatBoost model achieved the highest percentage improvement in terms of predictions of the models developed on the basis of the both the first and second groups of parameters. The prediction accuracy of the SVR, K-neighbours regressor and Random Forest models was similar regardless of the parameter group; for these models, there was a difference of approximately 2% between between the firs and second parameter groups, although they all achieved a higher prediction accuracy (higher $R^2$) values than the base model. Random Forest was the only model where a better result was obtained for the first group of parameters (i.e. only the swarm parameters and the anti-collision algorithm, without derived variables.)

Explainability

One of the objectives of this research was to use explainability to aid in decision-making. Explainability was analysed only for the CatBoost and SVR models. These models were selected as a bracket for the remaining methods, as CatBoost achieved the highest percentage improvement compared to the base model, while SVR achieved the lowest (apart from Decision Tree). SHAP was used as an explainability method. This approach was built on the mathematical concept of coalition game theory, and combines local interpretations with Shapley values³⁸. The Shapley value for the j-th feature is its contribution to the prediction compared to the average prediction, weighted and summed for all possible combinations of features³⁹. The goal is to determine the contribution made by variable or feature to the model’s predictions. Figures 4 and 5 show the SHAP distribution plot for the feature parameters of our models. The x-axis shows SHAP values, representing the impact of parameters on the output of the model, i.e. on cross-entropy.The y-axis lists individual model parameters: in this case, blue indicates low values of a parameter, and red indicates high values. Parameters with negative values on the X axis will reduce the cross-entropy value, while those with positive values will increase it. It is also worth specifying the meaning of the low and high feature values, to which the chart assigns a colour. These values refer to parameters included in the analysed data set. According to statistical analysis of the dataset presented in the Table 4, the lowest value on the chart refers to the minimum value of a given parameter, and the highest value to the maximum.

Table 4 Statistical summary of parameter values in the dataset.

Full size table

Figure 4 shows the SHAP distribution plot of the feature parameters for the CatBoost model and the first group of parameters. Based on the above data, it is possible to identify the features that have the greatest impact on this model, which are $R_1$, size, spacing, q, $\tau$, and $R_2$.

Figure 5 shows the result for the CatBoost model developed on the basis of the second group of parameters. Again, the parameters $R_1$ and size have the strongest impact on the model’s decision, followed by $D_2$ and $D_1$. The smaller the difference between $R_2$ and $R_1$ (low values of the $D_2$ parameter), the higher the cross-entropy. The smaller the difference between the spacing and $R_2$ (low values of the$D_1$ parameter), the higher the cross-entropy.

When considering the CatBoost model that achieved the highest $R_2$ value (83.3%), in order to better illustrate the relationship, parameters $D_1$ and $D_2$ were selected and introduced into the model as additional variables, and an analysis showed that these parameters closely reflected the changes in cross-entropy. The results are shown in Figs. 6 and 7, where the x-axis shows the values of parameters $D_2$ and $D_1$, respectively. The colour of each data point on the graph is determined based on a specific value, where low values of the feature are marked in blue, and high values are in red. The y-axis shows the SHAP value for a given feature.

As mentioned above $D_2$ is a derived variable introduced by the authors, which denotes the difference between the internal safety zone $R_2$ and the external safety zone $R_1$. An analysis of Fig. 6 indicates that, low values of the $D_2$ parameter in the range 0–5 significantly increase the cross-entropy value; above this range, the cross-entropy is lower, and it can even be said that this situation has a positive effect in terms of its reduction.

Another parameter that had a significant impact on the prediction from the model was $D_1$. Figure 7 shows a graph of the relationship between this parameter and the SHAP values. A value for $D_1$ in the range 0–10 increases the cross-entropy values. The values of the $D_1$. parameter above contribute to reducing cross-entropy and at the same time the disorganisation the swarm.

Tables 5 and 6 present an interpretation of the results of SHAP analysis. Table 5 summarises the SHAP values and the impacts on the CatBoost and SVR models based on the first group of parameters, while Table 6 summarises the impact on the CatBoost and SVR model based on the second group of parameters.

Table 5 Summary of SHAP values and impact on CatBoost and SVR models based on the first group of parameters.

Full size table

Table 6 Summary of SHAP values and impact on CatBoost and SVR models based on the second group of parameters.

Full size table

Although the order and importance of parameters in Tables 5 and 6 are very similar, a noticeable difference is the fact that the $R_2$ parameter is more important for SVR than for CatBoost. In addition, $\tau$ and q are more important in the CatBoost model than in SVR. The parameters $\tau$ and $D_3$ are more important for CatBoost than in SVR, where no significant impact is visible. The rest of the parameters $R_1$, q, size, spacing, $D_1$ and $D_2$ have similar impacts for both models.

For the above models, the following general relationships can be formulated that influence the prediction of cross-entropy values.

1.
High $R_1$ values increase cross-entropy, while low values reduce it.
2.
Low values for the size reduce cross-entropy, while higher values increase it.
3.
3 High values for the spacing reduce entropy, while low values increase it..

The outcome of this SHAP analysis leads to the following observations, which can aid in decision-making.

(1)
The response of a drones to a disturbance occurring in the formation via the collision avoidance algorithm) must not be too slow, as this may result in increased disorganisation. More specifically, a sub-linear response can lead to a rapid increase in disorganisation.
(2)
A safety zone $R_1$ with a value of up to 20 (equivalent to a flight time of 5 s) should not have a negative impact on the cross-entropy values; however, when there are more than four drones in the formation, this zone should not exceed 25 (approximately equivalent to a flight time of six s).
(3)
The difference between the internal safety zone $R_2$ and the external safety zone $R_1$ should exceed a value of five (approximately equivalent to a flight time of 1 s) to prevent a significant increase in disorganisation.
(4)
It is important to note that a difference between the spacing and $R_2$ ($D_1$ parameter) of below 10 (approximately equivalent to a flight time of 2.5 s) increases the cross-entropy values; hence, care should be taken in regard to not only the safety zones around the drones but also appropriate distances between drones.

Conclusion

In this research, we developed a series of machine learning models to predict the extent of entropy faced by the formation of drones during the passage of an intruder. A size model was developed and analysed using data received from a simulator; using XGBoost, Random Forest, Decision Tree, CatBoost, SVR, and K Neighbours Regressor. Two sets of parameters was used, the second of which included domain knowledge. The performance of these model was assessed based on three metrics, $R^2$, MSE, MAE and RMSE and compared to the multivariate linear regression model described in⁶, which was selected as a baseline. The best result was achieved by the CatBoost model developed on using the second set of parameters. This model achieved an $R^2$ value of 83.3%, significantly outperforming the baseline performance of 54%.

In itself,however, a model is an unwieldy tool in the hands of a manager who needs to find appropriate parameters. The explainability of the model satisfies non-functional requirements that can determine the usefulness of the model. In order to analyse the explainability, two of the developed model were considered: one based on CatBoost, and one based on SVM. The former was selected because it delivered the best performance, the and latter because it delivered the worst. The SHAP method was applied to both, resulting in explanations of the extent of entropy that can be both understood and applied by the operator of a formation of drones. It is worth noting that both models delivered almost identical explanations, despite the significant differences in their performance.

It is interesting that the importance of parameters parameters obtained from the explainability analysis differed from those obtained from a simple correlation analysis. The correlation indicated that $R_1$, $R_2$ and the size (in this order) were correlated with the value of cross-entropy whereas although $R_1$ and size were also listed by SHAP as the most influential parameters, $R_2$ was considered less important. This indicates that the relationships are more complex than simple corelations.

The outcome of this research was knowledge that can aid in the appropriate design and operation of a formation of drones.The key characteristics of the model, and the explainability are likely to be applicable to several configurations of drones and intruders. The main limitations of this approach arise from the limitations of the simulator itself,for example its ability to model only two-dimensional formations, or from the selection of the collision avoidance algorithm. Future research will explore two avenues.The first is the development of a better prediction model for the same set of data, without losing its explainability. There are some indications that artificial neural networks can achieve even better results; however, their explainability involves additional considerations. The second research direction involves the use of the same simulator to model a situation where the intruder behaves differently. Our goal is to create a comprehensive set of models for predicting cross-entropy and disorganisation in a swarm.

Data availability

The dataset generated and analysed during the current study are available in the Kaggle repository: https://www.kaggle.com/datasets/martapbs/drone-swarm-entropy³³

References

Kallinikos, J. The order of technology: Complexity and control in a connected world. Inf. Organ.15, 185–202. https://doi.org/10.1016/j.infoandorg.2005.02.001 (2005).
Article Google Scholar
Wei, Z. et al. Anti-collision technologies for unmanned aerial vehicles: Recent advances and future trends. IEEE Internet Things J.9, 7619–7638. https://doi.org/10.1109/jiot.2021.3135578 (2022).
Article Google Scholar
Atyabi, A., MahmoudZadeh, S. & Nefti-Meziani, S. Current advancements on autonomous mission planning and management systems: An auv and uav perspective. Annu. Rev. Control.46, 196–215. https://doi.org/10.1016/j.arcontrol.2018.07.002 (2018).
Article Google Scholar
Jiang, Y., Gao, Y., Song, W., Li, Y. & Quan, Q. Bibliometric analysis of uav swarms. J. Syst. Eng. Electron.33, 406–425. https://doi.org/10.23919/jsee.2022.000042 (2022).
Article Google Scholar
Veronika Dorogush, A., Ershov, V. & Gulin, A. CatBoost: Gradient boosting with categorical features support, chap. nill, nill (nill, 2017).
Gackowska, M., Cofta, P., Śrutek, M. & Marciniak, B. Multivariate linear regression model based on cross-entropy for estimating disorganisation in drone formations. Sci. Rep.[SPACE]https://doi.org/10.1038/s41598-023-39926-5 (2023).
Article PubMed PubMed Central Google Scholar
Wu, K., Zhang, S. & Wu, T. The complex relations between organization and disorganization and their conceptual system. Complexity2021, 1–8. https://doi.org/10.1155/2021/2907061 (2021).
Article ADS CAS Google Scholar
Chandola, V., Banerjee, A. & Kumar, V. Anomaly detection: A survey. ACM Comput. Surv.41, 1–58. https://doi.org/10.1145/1541880.1541882 (2009).
Article Google Scholar
Kelso, J. A. S. Coordination Dynamics, 1537–1565 (Springer, New York, New York, NY, 2009).
Google Scholar
Chaudhury, A., Deng, P.-S. & Rathnam, S. A computational model of coordination. IEEE Transactions on Systems, Man, and Cybernetics. Part A, Systems and Humans: A Publication of the IEEE Systems, Man, and Cybernetics Society26, 132–141, https://doi.org/10.1109/3468.477868 (1996).
Luhmann, N. Risk: A Sociological Theory (AldineTransaction, Somerset, NJ, 1993).
Google Scholar
Zorich, V. A. Entropy in thermodynamics and in information theory. Probl. Inf. Transm.58, 103–110. https://doi.org/10.1134/s0032946022020016 (2022).
Article MathSciNet Google Scholar
Shannon, C. A mathematical Theory of Communication (1948), 121–134 (The MIT Press, 2021).
Cofta, P., Ledziński, D., Śmigiel, S. & Gackowska, M. Cross-entropy as a metric for the robustness of drone swarms. Entropy22, 597. https://doi.org/10.3390/e22060597 (2020).
Article ADS MathSciNet PubMed PubMed Central Google Scholar
Pincus, S. M. Approximate entropy as a measure of system complexity. Proc. Natl. Acad. Sci. USA88, 2297–2301. https://doi.org/10.1073/pnas.88.6.2297 (1991).
Article ADS MathSciNet CAS PubMed PubMed Central Google Scholar
Niku, S. B. Entropy: Natural Orders, Thermodynamics, Friction, Hybrid Cars, and Energy, 1–21 (Springer International Publishing, Cham, 2016).
Google Scholar
Chuprov, S. V. Stability and criteria for self-organization of industrial economic systems. In The European Proceedings of Social and Behavioural Sciences (European Publisher, 2022).
Canfora, G., Cerulo, L., Cimitile, M. & Di Penta, M. How changes affect software entropy: An empirical study. Empir. Softw. Eng.19, 1–38. https://doi.org/10.1007/s10664-012-9214-z (2014).
Article Google Scholar
Kavitha, M., Gnaneswar, G., Dinesh, R., Sai, Y. R. & Suraj, R. S. Heart disease prediction using hybrid machine learning model. In 2021 6th International Conference on Inventive Computation Technologies (ICICT) (IEEE, 2021).
Mosavi, A., Ozturk, P. & Chau, K.-W. Flood prediction using machine learning models: Literature review. Water10, 1536. https://doi.org/10.3390/w10111536 (2018).
Article Google Scholar
Liu, X., Athanasiou, C. E., Padture, N. P., Sheldon, B. W. & Gao, H. A machine learning approach to fracture mechanics problems. Acta Mater.190, 105–112. https://doi.org/10.1016/j.actamat.2020.03.016 (2020).
Article ADS CAS Google Scholar
Kim, D. et al. Review of machine learning methods in soft robotics. PLoS ONE16, e0246102. https://doi.org/10.1371/journal.pone.0246102 (2021).
Article CAS PubMed PubMed Central Google Scholar
Traboulsi, A. & Barbeau, M. Recognition of drone formation intentions using supervised machine learning. In 2019 International Conference on Computational Science and Computational Intelligence (CSCI) (IEEE, 2019).
Nebe, E. et al. Chaos detection and mitigation in swarm of drones using machine learning techniques and chaotic attractors. Int. J. Adv. Comput. Sci. Appl.: IJACSA[SPACE]https://doi.org/10.14569/ijacsa.2022.0130656 (2022).
Article Google Scholar
Barták, R. & Vomlelová, M. Using Machine Learning to Identify Activities of a Flying Drone from Sensor Readings (Proceedings of the Thirtieth International Florida Artificial Intelligence Research Society Conference, 2017).
Choraś, M., Pawlicki, M., Puchalski, D. & Kozik, R. Machine Learning–the Results are not the Only Thing that Matters! What About Security, Explainability and Fairness?, 615–628 (Springer International Publishing, Cham, 2020).
Google Scholar
Samek, W., Montavon, G., Lapuschkin, S., Anders, C. J. & Muller, K.-R. Explaining deep neural networks and beyond: A review of methods and applications. Proceedings of the IEEE. Institute of Electrical and Electronics Engineers109, 247–278, https://doi.org/10.1109/jproc.2021.3060483 (2021).
Roscher, R., Bohn, B., Duarte, M. F. & Garcke, J. Explainable machine learning for scientific insights and discoveries. IEEE Access: Pract. Innov., Open Solut.8, 42200–42216. https://doi.org/10.1109/access.2020.2976199 (2020).
Article Google Scholar
von Rueden, L. et al. Informed machine learning–a taxonomy and survey of integrating knowledge into learning systems. arXiv [stat.ML] (2019).
Zhou, R. & Hu, T. Evolutionary approaches to explainable machine learning. arXiv [cs.AI] (2023).
Lundberg, S. & Lee, S.-I. A unified approach to interpreting model predictions. arXiv [cs.AI] (2017).
Hu, S., Xiong, C., Chen, P. & Schonfeld, P. Examining nonlinearity in population inflow estimation using big data: An empirical comparison of explainable machine learning models. Transp. Res. Part A, Policy Pract.174, 103743. https://doi.org/10.1016/j.tra.2023.103743 (2023).
Article Google Scholar
online access. Drone swarm cross–entropy. https://www.kaggle.com/datasets/martapbs/drone-swarm-entropy (2023).
Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM, New York, NY, USA, 2016).
Authorpedregosa, F. et al. Machine learning in python. J. Mach. Learn. Res.12, 2825–2830 (2011).
MathSciNet Google Scholar
Chicco, D., Warrens, M. J. & Jurman, G. The coefficient of determination r-squared is more informative than smape, mae, mape, mse and rmse in regression analysis evaluation. PeerJ. Comput. Sci.7, e623. https://doi.org/10.7717/peerj-cs.623 (2021).
Article PubMed PubMed Central Google Scholar
Jawlik, A. Statistics from A to Z: Confusing concepts clarified (Wiley, 2016).
Štrumbelj, E. & Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowl. Inf. Syst.41, 647–665. https://doi.org/10.1007/s10115-013-0679-x (2014).
Article Google Scholar
Lipovetsky, S. & Conklin, M. Analysis of regression in game theory approach. Shapley Regres. Values17, 319–330 (2001).
MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Telecommunications, Computer Science and Electrical Engineering, Bydgoszcz University of Science and Technology, Al. prof. S. Kaliskiego 7, 85-796, Bydgoszcz, Poland
Marta Gackowska-Kątek & Piotr Cofta

Authors

Marta Gackowska-Kątek
View author publications
Search author on:PubMed Google Scholar
Piotr Cofta
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: P.C. and M.G., Software and simulation: M.G. and P.C., Methodology: P.C., investigation: P.C., M.G ., writing-original draft preparation, M.G.; writing-review and editing, P.C., project administration: M.G., Validation: P.C. All authors reviewed the manuscript.

Corresponding author

Correspondence to Marta Gackowska-Kątek.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Gackowska-Kątek, M., Cofta, P. Explainable machine learning model of disorganisation in swarms of drones. Sci Rep 14, 22519 (2024). https://doi.org/10.1038/s41598-024-73220-2

Download citation

Received: 11 April 2024
Accepted: 16 September 2024
Published: 28 September 2024
Version of record: 28 September 2024
DOI: https://doi.org/10.1038/s41598-024-73220-2

This article is cited by

Machine learning for modeling intergranular stress corrosion cracking of stainless steels in light water reactors with uncertainty quantification and explainability
- Dayu Fajrul Falaakh
- Chi Bum Bahn
npj Materials Degradation (2025)