Introduction

In the era of industrial civilization, the generation of large amounts of greenhouse gases leading to climate change is a common challenge to the survival and development of all mankind1, and the world is highly concerned about the vision of the twenty-first session of the Conference of the Parties (COP21), which actively called on countries to take action in order to reduce carbon emissions and the emission of other greenhouse gases2. According to the European Union Emissions Database for Global Atmospheric Research (EDGAR), China will rank first in global carbon emissions in 2020. As the largest developing country, China has been actively taking practical actions in global climate governance, and the Chinese government has proposed a dual carbon target of “peak carbon and carbon neutrality” in the United Nations General Assembly in 2020. As a key emerging industry in China, Electric Vehicle (EV) plays an important role in the dual-carbon goal and optimizing the structure of energy supply3.

As shown in Fig. 1, as of May 2024, EVs and charging facilities in all regions of China continue to grow, with the total number of charging piles 3128795, and EV penetration climbing year by year, with the total penetration rate reaching 50.39\(\%\), and charging piles and penetration rate growing by 45.74\(\%\) and 59.46\(\%\), respectively. The climbing of charging facilities and EV penetration rate poses new challenges to the rational allocation of EV charging resources and charging experience, and at the same time, the random loads formed by disorderly access during large-scale EV charging will cause high fluctuation and high randomness of the load of the power grid in the region where it is located, and the safe operation of the distribution network will be affected4. In order to improve the rational use of charging resources and reduce the potential danger to the grid system, and to take advantage of the fast response characteristics of EVs to participate in vehicle-to-grid (V2G) services as a mobile power storage and load resource, the V2G scheduling center needs to predict the charging loads of the EVs within a short period of time, make full and rational use of charging resources of the charging stations, and improve the impact of the EV loads on the impact of grid load5. Therefore, improving the accuracy of EV charging load prediction is of great significance for the vigorous development of EV, a promising green transport mode, the orderly charging of EV and the safe operation of power grids6, the in-depth research and application of V2G technology, and the planning of charging station siting7.

Fig. 1
figure 1

Number of Electric Vehicles and Charging Facilities Distributed by Provinces in China, May 2024. Data sources: Traffic Management Bureau of the Ministry of Public Security of China, China Charging Union.

In contrast to electric load forecasting, EV charging loads are difficult and complex to predict due to the stochastic nature of charging behaviour resulting in the presence of numerous zero sampling points. To solve the above problems, we propose a combined model Crested Porcupine Optimizer-Variational Mode Decomposition-Bidirectional Gate Recurrent Unit (CPO-VMD-BiGRU) for short-term charging load prediction at EV charging stations based on K-Medoids clustering and multifactor optimization decomposition. The main contributions of this study are as follows.

  1. 1.

    In order to lay the foundation for improving the accuracy of EV charging load prediction, the relevant factors affecting EV charging load prediction are proposed, and the feature input matrix consisting of historical temperature, date type, holidays and EV historical charging load data is designed

  2. 2.

    To significantly improve the quality of the to-be-forecast dataset, the K-Medoids clustering algorithm was used to classify the similarity of the original historical charging load dataset. Unlike existing studies this study clusters the original load data and determines the classified dataset to be predicted based on the date to be predicted.

  3. 3.

    In order to effectively overcome the decomposition error caused by human experience in setting some parameters of the neural network. The CPO optimization algorithm, which is significantly different from other meta-heuristic algorithms and has been proposed with a 4-fold defense mechanism, is used to optimize the VMD. adaptive parameter optimization is carried out to set the parameters of the core parameters of the VMD, the modal component K and the penalty factor \(\alpha\). The VMD is then optimized by the CPO optimization algorithm.

  4. 4.

    In order to improve the final EV charging load prediction accuracy, BiGRU bi-directional gated cyclic unit model is proposed, which helps to excavate the relationship data features between the EV charging load forward-backward related influencing factors and the current charging load, and ultimately improves the EV short-term charging load prediction accuracy of charging stations.

  5. 5.

    The advantages and synergies of each part of clustering, optimization, decomposition, and prediction are combined to innovatively propose an optimal combination model that effectively improves the charging load prediction accuracy of EV charging stations. At the same time, the computational cost after combining each part is considered, and simulation verification is carried out by using foreign datasets, and verification is carried out by domestic actual charging station engineering examples. The accuracy and generalization of this research in EV charging load prediction and the advantages of synergy of each unit are fully demonstrated.

The rest of this research is structured as follows: Section “Related work” presents the work related to EV charging loads and load forecasting. Section “Method” provides a comprehensive description of the principles of K-Medoids, CPO, VMD and BiGRU as well as the modelling framework. Section “Example analysis and discussion” describes the data used and the analysis of the specific steps and results of the example simulation. Section “Industrial field validation” presents an engineering example validation to illustrate the validity and generalization of our proposed model. Section “Conclusion” gives the conclusion of the carried out work and future research directions.

Related work

At present, about EV charging load prediction is mainly divided into two categories: model-driven and data-driven, model-driven has achieved richer results8,9,10, and data-driven is still in the continuous exploration stage11. Model-driven many factors of high randomness and different factors of high complexity, while data-driven use of EV and charging station real data support, combined with machine learning can reduce the impact of many random factors, thus improving the authenticity and reliability of the prediction results. Based on traditional machine learning algorithms Random Forest (RF), Support Vector Machine (SVM)12,13, and load forecasting methods based on deep learning Feedforward Neural Network(FNN), Recurrent Neural Network(RNN), the latter deep learning is more adaptive and flexible14, and has received extensive attention from scholars . Among them, RNN has the feature of establishing the connection between local and global, which is more adapted to time series prediction. RNN had been widely used for short-term power load forecasting15, but phenomena such as gradient vanishing would occur when dealing with time series problems leading to training difficulties and unsatisfactory prediction results. Long Short-Term Memory (LSTM) neural network improves the recurrent unit in RNN, thus solving the gradient problem that exists in RNN16. Research scholars proposed the Quantile Regression LSTM (QRLSTM) model for predicting the charging pile load of EV charging stations, and used Adam stochastic gradient descent method to estimate the parameters of the LSTM network, comparing the Back Propagation (BP) and Quasi-RNN (QRNN) models for better prediction and higher reliability17. LSTM based on single-chain structure can only learn forward temporal relationships, Bidirectional Long Short-Term Memory (BiLSTM)18, which is a combination of forward LSTM and backward LSTM, optimizes the single-chain structure of LSTMs, and BiLSTM19 can learn the global information of the historical data more adequately, and takes into account the the before and after time correlation problem. Meanwhile, Siami-Namini S et al. carried out in time series prediction proved, the performance, accuracy of LSTM and its variant BiLSTM and analyzed and compared their behavioral training and proved that BiLSTM prediction accuracy is better than LSTM and captures more additional features related to the data20. The Gate Recurrent Unit (GRU), a variant of LSTM with fewer parameters and faster convergence, is not only a representative special recurrent neural network that can mitigate the gradient explosion, but also explore the necessary long-term information in short-term load forecasting, while the GRU solves the problem of the LSTM’s too slow convergence and can maintain its accuracy21. While BiGRU combines the forward and backward propagation mechanism with the gated loop unit GRU. In each training sequence, 2 GRU models are built in the forward and reverse directions, and the hidden layer nodes of the models are connected to the same output layer. The complete forward-backward information for each time point of the input layer is provided to the output layer, which can be mined for more data information22. And Zou et al. verified that the BiGRU has better results than the GRU in short-term load forecasting is better than GRU, and the combined model forecasting effect is better than the independent model forecasting23, neural network forecasting in different combinations is the current research hotspot24,25.

For EV short-term load forecasting, the non-stability of the original signal will affect its prediction accuracy, Empirical Modal Decomposition (EMD) can decompose the non-stationary original signal to obtain multiple relatively smooth signals with different scales, which can be used to improve the prediction accuracy of the late load26. However, EMD generally suffers from endpoint effects and mode aliasing, etc. VMD can solve this problem, and its modelling complexity is lower than that of EMD, but the setting of the core parameters of VMD is cumbersome and has the influence of human factors, which cannot bring out the best performance of the algorithm and affects the decomposition effect27. Wu Xiaomei et al. proposed an optimization algorithm for adaptive optimization of the core parameters of the VMD, which achieved high decomposition accuracy28. At the same time, the quality of the prediction dataset is the core of the prediction accuracy, and the use of clustering algorithm to improve the similarity of the dataset and thus improve the quality of the prediction dataset can help to improve the accuracy of the final EV charging load prediction. Haoliang Shan et al29 used Density-Based Spatial Clustering of Applications with Noise (DBSCAN) to cluster the densities of signal samples, and clusters are defined as regions of dense sample points with the ability to separate noise ability. This research is to cluster the charging load similar days to further improve the accuracy of the subsequent decomposition prediction, so density clustering is not suitable for this research. Chengwen Yao et al. used K-Means for similar daily load clustering30, but the K-Means algorithm adopts the mean of the samples in the clusters as the center of the clusters, which has the disadvantage of falling into local optima. Tiexing Wang et al.31 and Yuqi Ji et al.32 both proposed K-Medoids clustering, which improves on the shortcomings of K-Means by using the actual samples in each cluster that are most similar to the other samples as the center of the clustering. Table 1 summarizes the characteristics of the above related algorithm papers (the arrangement is independent of the algorithm’s merit). Based on the above problems of existing research, this paper clusters the EV charging station charging loads with similar dates to improve the quality of the dataset to be predicted, and then adopts the method of combining model prediction to propose a short-term charging load prediction at EV charging stations based on K-Medoids clustering and multifactor optimization decomposition.

Table 1 Literature review of related algorithms.

Method

EV short-term charging load prediction model

In this research, the overall architecture of the proposed combined model CPO-VMD-BiGRU based on K-Medoids clustering and multifactor optimization decomposition of short-term charging load prediction at EV charging stations is shown in Fig. 2.

Fig. 2
figure 2

Overall Architecture of EV Load Forecasting Hybrid Model.

The overall forecasting framework is divided into two main modules, one is the dataset selection, processing and decomposition module, such as the A, B, C module used in Fig. 2 to improve the quality of the initial EV charging load data, weaken its non-stationarity and capture its local features. Firstly, the three influencing factors and the historical charging load data of charging stations are selected as the model dataset, the historical load data in the dataset are missing or outliers due to the high stochasticity of charging behaviours, and the outliers are treated using the Median Absolute Deviation (MAD), the MAD method is to use a fixed window to find the MAD, using the median value to replace outliers in the data that are more than three times the MAD33. The missing value null value is due to the EV charging data has a certain periodicity, then the mean value of similar dates before and after is used to fill in, as shown in Eq. (1).

$$\begin{aligned} x_{k} =\left( x_{k-1}+x_{k+1} \right) /2 \end{aligned}$$
(1)

Where \(x_{k}\) is the missing value data to be populated. \(x_{k-1}\) and \(x_{k+1}\) are the EV charging load data at the same moment on similar dates in the previous and following week. The historical temperature data is preprocessed in the same way. Secondly the quality of the dataset directly affects the accuracy of subsequent load forecasting using the dataset, so the K-Medoids algorithm is used to cluster the similar time period sequences, and the clustered post-similar time sequences containing the time period to be forecasted are selected as the subsequent decomposition sequences, and in this way date sequences of lower relevance to the day to be forecasted are disregarded, and load similarity is utilized to construct the similar date dataset. After that VMD is employed to decompose the similar date dataset. The final dataset x after selection, processing and decomposition is obtained. The other is a time series prediction module with engineering example validation for short-term charging load prediction at charging stations, as shown in Fig. 2, D and E. The modal components obtained from the decomposition of the previous module together with the factors related to influencing the charging of electric vehicles constitute the dataset x. Then the bidirectional gated loop unit BiGRU is established respectively, and the forward and backward propagation mechanism is combined with the gated loop unit to mine more data information and obtain the component prediction values, and then the component prediction values are summed up to reconstruct the charging load prediction sequence. Finally the trained combined model is utilized to combine the advantages of each algorithm, further focusing on the prediction accuracy and generalization ability of this combined model while considering the computational cost. An engineering example is validated at a charging station in Northwest China to further confirm the accuracy and generalization of the model, and the flowchart is shown in Fig. 3.

Fig. 3
figure 3

EV load forecasting hybrid model flowchart.

K-Medoids similar time period clustering

Clustering is the process of dividing a dataset into multiple classes or clusters and analyzing them using a similarity metric that allows for greater similarity of data within each class or cluster and greater differences between different classes or clusters. Commonly used clustering algorithms based on center of mass are K-Means and K-Medoids, both of which can accomplish the clustering of similarity days required for this research, the difference lies in how the center of mass is determined, K-Means is the mean of the samples, rather K-Medoids selects the point with the smallest distance sum from the current classification samples to make it more robust to outliers and noise.

When the load curves are similar in magnitude but different in shape, the cosine resemblance can be used to better identify the morphological features of the load curves compared to the Euclidean distance34. Therefore, cosine similarity is selected as the similarity metric for EV load clustering. The cosine similarity between the EV load data series of day i and day j is given by Eq. (2).

$$\begin{aligned} S_{COS} \left( L_{i} , L_{j} \right) = \left( L_{i}L_{j} \right) /\left( \left| L_{i}\right| \left| L_{j} \right| \right) =\left( \sum _{n=1}^{M}l_{in}l_{jn} \right) /\left( \sqrt{\sum _{n=1}^{M} l_{in}^{2} } \sqrt{\sum _{n=1}^{M} l_{jn}^{2} } \right) \end{aligned}$$
(2)

Where \(L_{i}=[l_{i1},l_{i2},\cdots ,l_{iM}]\), \(L_{j}=[l_{j1},l_{j2},\cdots ,l_{jM}]\) are the EV load sequences on days i, j, \(S_{COS}\) takes a value in the range of [0, 1] , and M is the sampling points in the clustering time period of EV load sequences, and the closer it is to 1, which indicates the more similarity in the shape of the EV load curves represented by sequences \(L_{i}\) and \(L_{j}\). For the clustering results for clustering the effectiveness of the test to determine the optimal number of clusters K \(_{M}\) value, this research selects a representative silhouette coefficient31 method, based on the similarity of the clusters of the class clusters within the degree of tightness and separation, is divided into the following three steps.

Step 1 for any data point i in the EV load \(L_{i}\) sample, compute the average cosine similarity \(a_{i}\) from i to other data points in the like cluster in which it is located.

Step 2 for data point i, compute the average cosine similarity \(b_{i}\) from i to all data points in class clusters \(L_{j}\) other than the class cluster \(L_{i}\) in which it is located.

Step 3 Silhouette coefficient is shown by Eq. (3), and in practical use, the average silhouette coefficient \(S_{i}\) of all data points N in the clustering result is used to evaluate the merit of the final clustering, as shown in Eq. (4). The value of \(S_{i}\) takes the range of [-1,1], and the closer it is to 1, it represents a better clustering result and a better clustering effect.

$$\begin{aligned} s_{i}= \left( a_{i} -b_{i} \right) /max\left( a_{i},b_{i} \right) \end{aligned}$$
(3)
$$\begin{aligned} S_{i} =\left( 1/N \right) \sum _{i=1}^{N} s_{i} \end{aligned}$$
(4)

In this research, the similar time period EV load dataset is constructed based on the characteristics of the EV load curve containing the day to be predicted, different from the traditional similar day dataset constructed based on the temperature and other factors, the extraction of the similar time period dataset can be based on the unit of one day or one week, for example, the day to be predicted is composed of the weekly time period data with the load data for the previous six days, and the K-Medoids algorithm is used to the original dataset is clustered into weekly time periods to obtain a similar weekly dataset constructed using the strong periodicity of the EV load curve, and the similar weekly time period dataset construction process is shown in Fig. 4.

Fig. 4
figure 4

Similar weekly time period dataset construction process.

The clustering selection of similar time periods of EV historical load data fully exploits the intrinsic characteristics of similar data, discards data sets with lower similarity, reduces memory occupation and computation time, and improves the quality of prediction data sets, which directly affects the subsequent decomposition and load prediction accuracy.

Optimized variational modal decomposition model

CPO35 is a nature-inspired meta-heuristic optimizer that uses its global search capability to optimize the decomposition parameters of the VMD. Most of the existing meta-heuristic algorithms try to simulate the aggressive behaviour of animals, whereas the biggest difference between the CPO algorithm and other meta-heuristic algorithms is that it is inspired by the defensive behaviour of the Crowned porcupine was proposed, which simulates the four protective defense mechanisms of the Crowned porcupine, sight, sound, scent and attack.

  1. 1.

    Population initialization. Determine the population size and candidate solutions as shown in Eq. (5).

    $$\begin{aligned} \vec {X_{i} }=\vec {L}+\vec {r}\times \left( \vec {U }-\vec {L } \right) \mid i=1,2,\cdots ,N^{'} \end{aligned}$$
    (5)

    where \(\vec {X_{i} }\) is the ith candidate solution in the search space. \(\vec {L}\) is the upper boundary of the search range. \(\vec {U}\) is the lower boundary of the search range. \(N^{'}\) is the number of population-sized individuals, and \(\vec {r}\) is a vector randomly initialized between 0 and 1.

  2. 2.

    Cyclic population reduction techniques.CPO introduces a new strategy of cyclic population reduction technique, whereby only some crown porcupines activate their defense mechanisms when they perceive a threat, a strategy that accelerates convergence while maintaining population diversity. As shown in Eq. (6).

    $$\begin{aligned} N=N_{min}+\left( N^{'} - N_{min}\right) \times \left[ 1-\left( t\%T_{max}/T \right) \right] /\left( T_{max}/T \right) \end{aligned}$$
    (6)

    Where T determines the variable of the number of cycles. \(T_{max}\) the maximum number of times the function seeks a value. t is the current function seeks a value. \(\%\) denotes the residual. \(N_{min}\) is the minimum number of individuals in the newborn population, i.e., the minimum population size \(N_{min}\).

  3. 3.

    Exploration phase. The first and second defense mechanisms (sound and vision) represent the exploratory behaviour of the crown porcupine and appear when the predator is far away from the crown porcupine, as shown in Fig. 5 in areas A and B. The mathematical model is shown in Eq. (7).

    $$\begin{aligned} {\left\{ \begin{array}{ll} \overrightarrow{x_{i}^{t+1} } = \overrightarrow{x_{i}^{t} } +\tau _{1\times } \left| 2\times \tau _{2} \times \overrightarrow{x_{CP}^{t} } -\overrightarrow{y_{i}^{t} }\right| \\ \overrightarrow{x_{i}^{t+1} } =\left( 1-\overrightarrow{U_{1}}\right) \times \overrightarrow{x_{i}^{t} }+\overrightarrow{U_{1}}\times \left( \overrightarrow{y_{i}^{t}}+\tau _{3} \times \left( \overrightarrow{x_{r1}^{t} } -\overrightarrow{x_{r2}^{t} } \right) \right) \\ \overrightarrow{y_{i}^{t} } =\left( \overrightarrow{x_{i}^{t} } +\overrightarrow{x_{r}^{t} } \right) /2 \end{array}\right. } \end{aligned}$$
    (7)

    Where \(\overrightarrow{x_{CP}^{t}}\) the function evaluates the optimal solution for t. \(\overrightarrow{x_{i}^{t}}\) the position of the ith individual at iteration t. \(\overrightarrow{y_{i}^{t}}\) is the vector generated between the current crown porcupine and a randomly selected crown porcupine from the population, and the position of the predator at iteration t. \(\tau _{1}\) is a random number based on normal distribution. \(\tau _{2}\) , \(\tau _{3}\) and \(\tau _{4}\) , \(\tau _{5}\) , \(\tau _{6}\) in Eq. (8) and Eq. (9) are random values between 0 and 1. r, r1, r2, and r3 in Eq. (8) are random numbers between 1 and N. \(\overrightarrow{U_{1}}\) is a randomly generated binary vector containing 0s and 1s, covering as much of the entire search space as possible to avoid falling into local minima.

  4. 4.

    Development phase. Here its odors and attack strategy is exploited thus embodying the local exploitation mechanism of CPO as shown in Eq. (8).

    $$\begin{aligned} {\left\{ \begin{array}{ll} \overrightarrow{x_{i}^{t+1} }=\left( 1- \overrightarrow{U_{1} }\right) \times \overrightarrow{x_{i}^{t} } +\overrightarrow{U_{1} }\times \left( \overrightarrow{x_{r1}^{t} } +\overrightarrow{S_{i}^{t} }\times \left( \overrightarrow{x_{r2}^{t} }-\overrightarrow{x_{r3}^{t} } \right) -\tau _{3} \times \overrightarrow{\delta }\times \gamma _{t} \times S_{i}^{t} \right) \\ \overrightarrow{x_{i}^{t+1} }=\overrightarrow{x_{CP}^{t} }+\left( \mu \left( 1 - \tau _{4} \right) + \tau _{4} \right) \times \left( \delta \times \overrightarrow{x_{CP}^{t} }-\overrightarrow{x_{i}^{t} } \right) -\tau _{5} \times \overrightarrow{\delta }\times \gamma _{t} \times \overrightarrow{F_{i}^{t} } \end{array}\right. } \end{aligned}$$
    (8)

    Where \(\delta\) is a parameter controlling the direction of search. \(\gamma _{t}\) is a defense factor. \(S_{i}^{t}\) is an odors diffusion factor. \(\mu\) is a convergence speed factor, and \(\overrightarrow{F_{i}^{t} }\) is the average force of the crown porcupine affecting the ith predator. As shown in Eq. (9).

    $$\begin{aligned} {\left\{ \begin{array}{ll} \overrightarrow{F_{i}^{t} }=\overrightarrow{ \tau _{6}}\times \frac{m_{i}^{t}\times \left( \overrightarrow{v_{i}^{t+1} } -\overrightarrow{v_{i}^{t} }\right) }{\bigtriangleup t}\Rightarrow \overrightarrow{F_{i}^{t} }=\overrightarrow{ \tau _{6}}\times m_{i}^{t} \times \left( \overrightarrow{v_{i}^{t+1} } -\overrightarrow{v_{i}^{t} }\right) \\ m_{i}^{t}=\frac{f\left( \overrightarrow{x_{i}^{t}}\right) }{e {\textstyle \sum _{k=1}^{N}f\left( \overrightarrow{x_{k}^{t}} \right) +e} } \\ \overrightarrow{v_{i}^{t}}=\overrightarrow{x_{i}^{t}}\\ \overrightarrow{v_{i}^{t+1}}=\overrightarrow{x_{r}^{t}} \end{array}\right. } \end{aligned}$$
    (9)

    Where \(m_{i}^{t}\) is the mass at iteration t of the ith individual. \(\overrightarrow{F_{i}^{t} }\) is the objective function.\(\overrightarrow{v_{i}^{t}}/\overrightarrow{v_{i}^{t+1}}\) are the initial and final velocities of the ith individual at iteration t versus t+1, respectively. \(\bigtriangleup t\) is the current iteration number.The influence of \(\overrightarrow{F_{i}^{t} }\) gradually decreases during the optimization process, and to ameliorate this drawback, the \(\overrightarrow{F_{i}^{t} }\) formula is improved by removing the small value of \(\bigtriangleup t\).The improvement helps to take a wide range of values in the search space, comprehensively check the optimal solution neighborhood, and accelerate the convergence to the near-optimal solution by taking into account both near and far regions.

Fig. 5
figure 5

Schematic diagram of the Crown porcupine defense zone.

The adaptive signal processing method VMD uses a non-recursive, variational modal decomposition to decompose the original complex signal into K amplitude and frequency modulated sub-signals. The core parameter K of VMD determines the number of modal components, \(\alpha\) affects its decomposition process and ensures the reconstruction accuracy of the signal36. However, K and \(\alpha\) in the VMD decomposition process are mostly set by human experience, which cannot bring out the best performance of the algorithm. To solve this problem CPO is introduced to optimize the search of its core parameters, the choice of the fitness function determines the effect of optimizing the VMD parameters, and the envelope entropy minima is used as the fitness function as shown in Eq. (10) to realize the parameters adaptive optimization settings and determine the optimal parameters.

$$\begin{aligned} {\left\{ \begin{array}{ll} E_{p} =-\sum _{i=1}^{N} p\left( i \right) \log {p\left( i \right) } \\ p\left( i \right) =a\left( i \right) /\sum _{i=1}^{N}a\left( i \right) \end{array}\right. } \end{aligned}$$
(10)

Where \(a\left( i \right)\) is the envelope signal after the Hilbert transform conditioning of the original signal and \(p\left( i \right)\) is its normalized expression. The CPO optimization VMD flowchart is presented in Fig. 6.

Fig. 6
figure 6

Flowchart of VMD algorithm for CPO optimization.

Predictive model BiGRU

GRU network is a variant of LSTM neural network, which is suitable for dealing with and predicting the problem of long-term dependence before and after dynamic nonlinear sequences such as time series. The internal architecture of GRU network is shown in Fig. 7.

Fig. 7
figure 7

GRU model.

Compared to the three-gate structure of the LSTM network, GRU controls the retention and updating of timing data through a two-gate structure of update and reset gates. GRU is simplified based on the LSTM oblivious gates, and simplifies the three-gate control to two gates, which reduces the use of memory, improves the speed of execution, and outperforms the LSTM neural network in terms of performance37. The computation process is represented by Eq. (11).

$$\begin{aligned} {\left\{ \begin{array}{ll} R_{t} =\sigma \left( W_{r} \cdot \left[ h_{t-1}\cdot X_{t} \right] +b_{r} \right) \\ Z_{t} =\sigma \left( W_{z} \cdot \left[ h_{t-1}\cdot X_{t} \right] +b_{z} \right) \\ h_{t}^{'} =tanh\left( W_{h} \cdot \left[ R_{t}h_{t-1}\cdot X_{t} \right] +b_{h} \right) \\ h_{t} =\left( 1-Z_{t} \right) h_{t-1} +Z_{t} h_{t}^{'} \end{array}\right. } \end{aligned}$$
(11)

Where \(R_{t}\) and \(Z_{t}\) correspond to the reset and update gates in the GRU model structure, respectively, controlling the degree of integration with historical moments and the degree of retention of memory data. \(X_{t}\) represents the input sequence data. \(h_{t}\) is the current cell end-time memory at moment t, which is also the output information of the current time, and \(h_{t-1}\) is the cell output information of the previous moment t-1, which is also the hidden state of the corresponding moment. \(\sigma\) and tanh are bipolar sigmoid and hyperbolic tangent activation functions. \(W_{r}\) , \(W_{z}\) , \(W_{h}\) and \(b_{r}\) , \(b_{z}\) , \(b_{h}\) are reset gates, update gates, tanh neural layer weights and biases, respectively.

In order to obtain more data characteristics of the actual charging load of EV charging stations, this paper adopts the bidirectional gated recurrent unit neural network BiGRU, the structure of which is shown in Fig. 8.

Fig. 8
figure 8

BiGRU neural network architecture.

BiGRU is an optimization improvement by combining the forward and backward propagation mechanism with a one-way gated recurrent unit GRU, with one layer of forward GRU and one layer of backward GRU, both of which affect the output. In this research, BiGRU is utilized to co-learn the relationship between past and future load influencing factors and current charging loads, which helps to extract features of the actual charging load data of EV charging stations and further improves the EV charging load prediction accuracy.

Example analysis and discussion

Environmental parameter settings and data sources

The simulation environment of this research is Windows 10 operating system, 32GBRAM, CPU is Intel Core i7- 12400F @2.5G Hz, and GPU is NVIDIA GeForce GTX 3060ti. The framework construction is developed in MATLAB (R2023a) language. The core relies on Deep Learning Toolbox (to build BiGRU, BiLSTM, etc.), Statistics and Machine Learning Toolbox (data preprocessing and analysis), and the CPO optimization algorithms are third-party open source code.The experimental data were selected from the charging load data of EV charging stations located on the campus of California Polytechnic State University in the U.S. ANN-DATA public dataset. Historical charging load data from 54 charging post EVs throughout the year were selected with a sampling interval of 1 hour, and data preprocessing collated 8760 sampling points. Historical temperature data are obtained through publicly available data from the National Oceanic and Atmospheric Administration, and other relevant factors are set according to local conditions. The raw data is proportionally divided into training set, test set and validation set as shown in Fig. 9.

Fig. 9
figure 9

Historical charging load and temperature data. Data source: https://ev.caltech.edu/dataset, ANN-DATA. https://www.noaa.gov, National Oceanic and Atmospheric Administration.

The input data are normalized to eliminate the influence of the magnitude between different factors on the prediction results38. The normalization formula is shown in Eq. (12).

$$\begin{aligned} x_{i}^{*} =\left( x_{i}-x_{min} \right) /\left( x_{max} -x_{min} \right) \end{aligned}$$
(12)

Where \(x_{i}\) is the original data of the ith sampling point. \(x_{i}^{*}\) is its normalized value. \(x_{max}\) and \(x_{min}\) are the maximum and minimum values in the original data, respectively. Details of the hyperparameter settings for EV charging load prediction by the combined model of this research are given in Table 2.

Table 2 Model parameter setting.

Indicators for the evaluation of predictive models

In order to verify the effectiveness and accuracy of the model prediction, Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE) are introduced as the evaluation indexes39, and the smaller the value is the more accurate the charging load prediction is. \(R^{2}\) is the coefficient of determination, whose value is not greater than 1. The more it tends to 1 means the better the linear correlation between the predicted and true values of the model.

$$\begin{aligned} E_{MAE}=\left( 1/n \right) \sum _{k=1}^{n} \left| \hat{y}_{k} -y_{k} \right| \end{aligned}$$
(13)
$$\begin{aligned} E_{RMSE}=\sqrt{\left( 1/n \right) \sum _{k=1}^{n} \left( \hat{y}_{k}-y_{k} \right) ^{2} } \end{aligned}$$
(14)
$$\begin{aligned} R^{2} =1-\left( \sum _{k=1}^{n} \left( \hat{y}_{k}-y_{k} \right) ^{2} \right) /\left( \sum _{k=1}^{n} \left( \bar{y}_{k}-y_{k} \right) ^{2} \right) \end{aligned}$$
(15)

Where \(\hat{y}_{k}\) is the predicted load data at moment k. \(y_{k}\) and \(\bar{y}_{k}\) are the real EV charging load data and its mean value at moment k, and n is the number of test sample points.

Results

K-Medoids clustering results

Load data from EV charging stations on the Cal Poly campus was analyzed on a weekly cluster basis. In order to verify the validity of the method in this paper, the K-Means algorithm was selected to perform cluster analysis with Euclidean Distance- K -Medoids (E-K-Medoids) and Cosine Similarity-K-Medoids (C-K-Medoids), respectively, and was evaluated by contour coefficients. Table 3 shows the contour coefficients of the K- Means algorithm with 2 metrics K-Medoids algorithm for different values of the number of K clusters.

Table 3 Silhouette coefficients of the K-Medoids algorithm for two metrics.

As shown in Table 3, the K-Means algorithm has the lowest profile coefficient at different K \(_{M}\) values, while the profile coefficient of C-K-Medoids is higher than that of E-K-Medoids, and the value of C-K-Medoids profile coefficient is closer to 1 under the two metrics, which indicates that the cosine similarity C-K-Medoids algorithm can better identify the morphological differences of the EV loading curves, and the clustering The effect is better (Bold data). Moreover, under the conditions of this sample, when the value of K \(_{M}\) of the number of clusters is greater than 2, the corresponding contour coefficient decreases significantly, in order to maximize the difference of EV load data in different class clusters, and to maximize the similarity of clusters of the same kind, the value of K \(_{M}\) was chosen to be 2, and the date of the similar weeks after clustering is shown in Fig. 10 and in Table 4 (partially).

Table 4 EV charging load profile clustering weekly similar time periods.
Fig. 10
figure 10

Comparison of loads at similar times of the week after clustering.

As shown in the figure, the two types of similar weekly load sequences have lower loads on the two days of the weekend and higher loads on other days of the week, which are strongly cyclical and in line with the actual situation of charging loads at EV charging stations, and the overall and local characteristics of each cluster clustering are more similar. A certain day is selected as the day to be predicted, and a total of six days before and after it constitute the weekly time period to be predicted, according to the time period to be predicted to determine the similar date load sequence is clustered 1 or clustered 2, so as to improve the quality of the data set to be predicted, and then carry out the optimal decomposition of the load sequence of the similar date, to weaken the non-smoothness of the load data, and to capture the local characteristics of the load data. In this paper, in order to verify the model accuracy and generalization, the following simulation experiments are discussed under three scenarios of clustering 1, clustering 2 and unclustered original loads.

CPO-VMD optimization decomposition

In order to verify the effectiveness and robustness of the CPO algorithm in solving optimization problems, the Grey Wolf Optimization (GWO)40, Whale Optimization Algorithm (WOA)41, Particle Swarm Optimization (PSO), Rime Optimization Algorithm(RIME)42 and CPO are compared in simulation tests. The parameters were set uniformly, where the comparison algorithm was determined according to the recommendations of the study authors. To ensure consistency and fairness, the population size was uniformly set to N=30 and the maximum number of iterations \(T_{max}\)=500. From the 23 benchmark test functions, four benchmark test functions, namely the unimodal test function and the multimodal test function, are selected for function optimization testing, as shown in Table 5, and the iterative curves of the different test functions and CPO optimization algorithms are shown in Fig.11.

Table 5 Selected benchmarking functions.
Fig. 11
figure 11

Comparison of iteration curves of different test functions and optimization algorithms.

As can be seen in Fig. 11, compared with the other four optimization algorithms, the CPO optimization algorithm has a strong global search capability, and the convergence speed has an obvious advantage in the single-peak test function, and the advantage of the increasing number of iterations in the multiple-peak test gradually emerges.

In order to reflect the superiority of CPO optimized VMD, the minimal value of envelope entropy shown in Eq. (9) is used as the fitness function to find the optimal solution of decomposition parameter of VMD [K, \(\alpha\)], and the five algorithms are set to have the initial parameter values as shown in Table 6. A comparison of the fitness value curves for the VMD optimization results under the conditions of clustered 1, clustered 2 and unclustered three sets of data is shown in Fig. 12.

Table 6 Optimization of VMD initial parameters.
Fig. 12
figure 12

Comparison plot of fitness value curves.

As shown in the chart, the VMD parameter optimization problem is actually a two-dimensional function optimization problem, and the initial parameters of the five algorithms are set consistently to ensure the fair accuracy of the experiment. As the number of iterations increases, the envelope entropy of the fitness function in the three cases gradually decreases until it is minimum, and compared with other algorithms, the CPO algorithm is fast and effective in finding the best. Therefore, CPO is used to optimize the key parameters of VMD to reduce the subjectivity of the empirical settings and improve the decomposition effect. The optimal solution of CPO optimized decomposition parameters [K, \(\alpha\)] is shown in Table 7.

Table 7 Decomposition parameter optimal solution.

The CPO algorithm optimizes the VMD decomposition parameters to obtain its optimal solution [K, \(\alpha\)] rounded to the nearest whole number, sets the number of different modal components K and the penalty factor \(\alpha\) , and sets the rest of the parameters to their default values (Bold data). The VMD after parameter optimization using the CPO algorithm is applied to the input variable EV historical charging load data series, and the clustered data are decomposed into K=4 sub-sequences, IMF1, IMF2, IMF3, IMF4, and the unclustered data are decomposed into K=5 sub-sequences, IMF1, IMF2, IMF3, IMF4, IMF5, and the decomposed sub-sequences and decomposition spectrogram are shown in Fig. 13. The three cases show that the decomposition results do not show obvious modal aliasing phenomenon, and the results are more satisfactory, laying a solid foundation for improving the accuracy of predicting charging loads.

Fig. 13
figure 13

VMD Decomposition and Spectrogram.

Combinatorial optimization decomposition forecasting

In order to verify the validity of the models proposed in this study and their accuracy, 12 neural network prediction models were established, where the prediction results of CPO-VMD-BiGRU for each subsequence in three cases are shown in Fig. 14.

Fig. 14
figure 14

CPO-VMD-BiGRU decomposition prediction results.

From the figure, it can be seen that the predicted trend of each IMF subsequence is basically consistent with the actual trend, and the overall predicted trend can be well laid out despite the slight deviation at some sharply changing points. To further illustrate the prediction accuracy of each IMF subsequence, the prediction evaluation index of each IMF subsequence is shown in Fig. 15.

Fig. 15
figure 15

Forecast evaluation metrics for IMF subseries.

As can be seen from the figure, the prediction accuracy of IMF1 is the highest in all three cases, and its RMSE, MAE and R2 are 0.16404, 0.12118 and 0.99959 on average, in which the clustering components IMF2 and IMF5 are next and close to each other with close prediction accuracy, and the prediction accuracy of IMF3 is relatively low, and the overall prediction accuracy of the unclustered component is lower than that of clustering component, in which the RMSE, MAE increased by 30.63\(\%\) and 21.47\(\%\) on average compared with the clustered components, and \(R^{2}\) decreased by 19.97\(\%\) on average, which proves that the similarity classification of the original historical charging load dataset using the K-Medoids clustering algorithm effectively improves the quality of the dataset to be predicted, and the prediction accuracy is improved.

Independent versus combined model prediction effects

In order to verify the rationality and performance advantages of the combined models proposed in this research, 12 neural network prediction models were established. They are independent prediction models LSTM, BiLSTM, GRU, BiGRU, mainstream prediction models such as Transformer with Attention Mechanism and CNN-BiGRU-Attention/Transformer, decomposition combination prediction model VMD-BiGRU and five optimized decomposition combination prediction models GWO-VMD- BiGRU, WOA-VMD-BiGRU, PSO-VMD-BiGRU, RIME-VMD-BiGRU and CPO-VMD-BiGRU. The comparison results of independent prediction model, decomposed prediction combination model and optimized decomposed combination prediction model for EV charging load prediction are shown in Figs. 16 and 17, and the comparison results of MAE and RMSE of each model are shown in Table 8.

Fig. 16
figure 16

Comparison of prediction results and errors of different models.

Fig. 17
figure 17

Comparison of errors in prediction results for all models.

Table 8 Comparison of model prediction accuracy.

The combined graphical results show that most of the models predict better with clustering. The prediction results of RMSE and MAS are reduced by 36.22\(\%\) and 35.81\(\%\) on average respectively compared to the unclustered prediction. It indicates that the quality of the dataset is improved after clustering by K-Medoids, which improves the final charging load prediction results. The better prediction accuracy of BiGRU in the independent model with less data after clustering indicates that bi-directional is better than uni-directional to mine the EV charging load data features, while both bi-directional and uni-directional GRUs are better compared to LSTMs because of their more streamlined internal structure and better performance. Meanwhile, compared to CNN-BiGRU-Attention /Transformer with attention mechanism and other models, the overall performance of BiGRU is slightly higher with less data volume after clustering, because BiGRU has the advantage of making the time series preserved in the time series data. For the combined model VMD-BiGRU prediction model compared to the independent prediction model, the RMSE and MAS were reduced by 59.20\(\%\) and 32.39\(\%\), respectively, indicating that the prediction accuracy of the combined model was higher than that of the independent model. Five optimized decomposition prediction models were compared and all of them outperformed the unoptimised decomposition prediction model VMD-BiGRU prediction, with the combined model CPO-VMD-BiGRU having the best prediction accuracy (Bold data). The RMSE and MAS were reduced by an average of 44.66\(\%\) and 38.20\(\%\), respectively, when compared with the unoptimised combined model, and by an average of 56.95\(\%\) and 41.60\(\%\) when compared with the remaining 11 prediction models RMSE and MAS. Fig. 17 shows that the boxes corresponding to the clustered error prediction results are flatter than the unclustered ones, and the boxes of the proposed combinatorial model in this study are relatively the flattest in the three data cases and the error median line is the closest to 0, which indicates that the error volatility of its prediction results is the smallest. Once again, the effectiveness and superiority of the model is proved. It has significant advantages in solving the problem of short-term charging load prediction of EV charging stations taking into account the multifactor optimization decomposition, and the CPO-optimized VMD-BiGRU improves the prediction accuracy, and the prediction results are closer to the real charging load curve.

Modelling the size of the problem and calculating the cost

In order to demonstrate the scale of the problem and the computational cost that the model can handle, different data volumes (1 and 3 consecutive years of data) were chosen to compare the performance, and the algorithm’s computation time was used as the horizontal axis, and the evaluation metrics RMSE, MAE, and \(R^{2}\) were used as the vertical axis to draw a scatter plot, and the results are shown in Fig. 18.

Fig. 18
figure 18

Comparison of problem size and computational complexity of different models.

As can be seen from the figure, firstly comparing the 1-year data volume and the 3-year data volume, the running time of the model increases by 42.08\(\%\) on average as the size of the problem it deals with becomes larger, and the accuracy of the model also decreases slightly, which indicates that the present model works better than the large-scale data in dealing with smaller-scale problems. Secondly, the figure shows that the running time of the independent model is faster, but the model prediction accuracy is lower, and the difference in computing time between the unoptimised combined model and the optimized combined model is not significant, which indicates that the optimization algorithm has a small impact on the overall running time. The optimized combination model CPO-VMD-BiGRU proposed in this research reduces the optimization runtime by an average of 8.72\(\%\) compared to other optimized combination models, proving the superior performance of the CPO optimization algorithm. In the 1-year and 3-year datasets, this research proposes that the model running time increases by 2.41 times on average, but the prediction accuracy improves by 2.83 times. The optimal combination of the prediction model increases the running time and computational cost due to the increase in the number of modules, but the improvement in prediction accuracy is more significant compared to the running time and computational cost.

Industrial field validation

The combined clustering optimization decomposition prediction model proposed in this paper is used to carry out a short-term analysis and prediction study on the charging load of an EV charging station in Qilihe District, Lanzhou City, Gansu Province, China. Fig. 19 shows the actual location of the charging station, which has a voltage level of AC 10 kV, and a total of 43 charging piles at the charging station, of which the type is DC charging piles.

Fig. 19
figure 19

Location map of industrial field validation charging stations.

The charging load of this charging station is statistically analysed, and the model of this paper is used for the short-term load prediction of this charging station for 3 days, with a sampling interval of 15 min. The final prediction results and errors are shown in Fig. 20.

Fig. 20
figure 20

Prediction results and errors of different models.

From the figure, it can be seen that the model proposed in this paper predicts the best results. The prediction result has the smallest error, and the error range is between (− 0.045,0.065). The error prediction result corresponds to a flatter box, the error median line is closest to 0, and the error volatility is the smallest. Once again, the validity and superiority of the model is demonstrated.

Conclusion

    Short-term EV charging load is affected by historical temperature, date type and other related factors, and EV charging is strongly cyclical, so this research proposes to decompose the short-term charging load prediction study of EV charging station based on K-Medoids clustering and multi-factor optimization. The specific conclusions are:

  1. 1.

    In this research, the proposed combined prediction model combines the features of K-Medoids clustering algorithm, CPO algorithm, VMD and BiGRU neural network to give full play to their respective advantages. Based on the results of simulation experiments and validation of engineering examples, it is shown that the prediction accuracy of EV charging load can be significantly improved by the combination of CPO-VMD-BiGRU prediction model after K-Medoids clustering. Compared with the remaining 11 prediction models RMSE and MAS are reduced by 56.95\(\%\) and 41.60\(\%\) on average, and compared with other optimized combination prediction models the calculation time is shortened by 8.72\(\%\). The effectiveness of the research model in regional short-term EV charging load prediction is demonstrated.

  2. 2.

    Compared with the unclustered original EV charging load dataset decomposition prediction results, using K-Medoids clustering algorithm to cluster the EV load data to improve the quality of the dataset, laying the foundation for the subsequent prediction, experimental comparisons can be seen that the clustered prediction evaluation indexes, RMSE and MAE, compared with the unclustered original dataset prediction is reduced by an average of 36.22\(\%\) and 35.81\(\%\), which verifies the clustering effectiveness on the quality improvement of the dataset.

  3. 3.

    Compared with the VMD, the two core parameters K and \(\alpha\) of the VMD were optimized using GWO, WOA, PSO, RIME and CPO, and the data decomposition and re-prediction using this optimal solution, and the prediction results were all better than the unoptimized VMD. Among them, the CPO optimization algorithm predicts the optimal results, comparing with the unoptimised combined model RMSE and MAS are reduced by 44.66\(\%\) and 38.20\(\%\) on average respectively, which greatly reduces the error brought by human experience settings, improves the decomposition effect, and illustrates the necessity of the optimization of VMD parameters. The optimized combined models of four optimization algorithms, GWO, WOA, PSO and RIME, were compared and the RMSE and MAS were reduced by an average of 34.38\(\%\) and 20.97\(\%\), which further demonstrates the superiority of the CPO-VMD over other optimization algorithms for optimizing VMD.

  4. 4.

    The combined K-Medoids clustering and multifactor CPO-VMD-BiGRU based prediction model proposed in this study is applied to historical charging load data from real EV charging stations in the US and China in example simulation and engineering example validation, respectively. The final RMSE, MAE, and \(R^{2}\) were averaged under the three datasets: 1.3076, 0.5208, and 0.97214, respectively. The generalizability of the research for regional short-term EV charging load forecasting is demonstrated, which is key to increasing the utilization of charging posts at charging stations and improving the charging experience.

The proposed combined model uses K-Medoids clustering algorithm and hybrid deep learning algorithm to predict EV charging loads at charging stations, which effectively improves the prediction accuracy. However, this research still suffers from the limitations of fewer experimental cases and the time cost of running the combined model. As the EV penetration rate and charging station planning and construction continue to increase, we will further collect the actual charging load data of each charging station and explore the improvement of the model, so as to make it balanced in terms of calculation cost and prediction accuracy. The effectiveness and superiority of the model in short-term EV charging load prediction will be verified with several engineering practice cases.