A novel SVD-UKFNN algorithm for predicting current efficiency of aluminum electrolysis

Fang, Xiaoyan; Fei, Xihong; Wang, Kang; Fang, Tian; Chen, Rui

doi:10.1038/s41598-025-94210-y

Download PDF

Article
Open access
Published: 17 March 2025

A novel SVD-UKFNN algorithm for predicting current efficiency of aluminum electrolysis

Xiaoyan Fang¹,
Xihong Fei²,
Kang Wang²,
Tian Fang² &
…
Rui Chen³

Scientific Reports volume 15, Article number: 9173 (2025) Cite this article

1097 Accesses
1 Citations
Metrics details

Subjects

Abstract

The prediction of current efficiency in the aluminum electrolysis production process (AEPP) is critical for improving industrial production efficiency and product quality. However, the inherent dynamic nonlinearity and multivariable complexity of AEPP hinder the development of accurate current efficiency prediction models. To address these challenges, a novel singular value decomposition unscented Kalman filtering neural network (NSVD-UKFNN) is proposed to improve the prediction accuracy of current efficiency in the AEPP. First, a dynamic prediction model is constructed within the framework of the unscented Kalman filtering neural network (UKFNN), employing artificial neural network (ANN) to capture the complex characteristics of the system. Second, singular value decomposition (SVD) is integrated with the UKFNN to compute the square root of the prior matrix, thereby improving the model’s numerical stability. Finally, the prediction variance of state variables is redefined as a cost function and optimized using the gradient descent method to reduce error accumulation during the computation process, enhancing the prediction robustness of proposed method. The experimental results show that the proposed NSVD-UKFNN reduces the mean absolute error (MAE) by 2.08 times and the sum of squared errors (SSE) by approximately 22.23 times compared to the baseline model.

Machine learning for improved density functional theory thermodynamics

Article Open access 17 May 2025

Machine learning based optimization of titanium electropolishing using artificial neural networks and Taguchi design in eco-friendly electrolytes

Article Open access 05 August 2025

Neural network based prediction of the efficacy of ball milling to separate cable waste materials

Article Open access 19 May 2023

Introduction

The aluminum electrolysis production process (AEPP) represents a critical high-energy-consumption stage in modern industry, where current efficiency directly impacts production costs and energy utilization efficiency. Current efficiency is defined as the ratio of the mass of aluminum produced per unit time to the theoretical output calculated based on Faraday’s law. It serves as a key metric for evaluating the economic performance of aluminum electrolysis production. However, the aluminum electrolytic cell, as a multivariable and nonlinear complex industrial process, is influenced by various factors, such as electrolyte composition, temperature, and current density, which significantly affect its current efficiency.

Currently, the adjustment of electrolytic cell parameters is primarily carried out by engineers based on operating conditions and personal experience to achieve control objectives^1,2. This experience-based approach inevitably introduces human subjectivity, hindering the fulfillment of the continuous high-efficiency production demands. Recent studies have explored optimal control strategies using intelligent decision-making and optimization technologies for real-time parameter tuning. Yi et al. proposed the application of the bacterial foraging optimization algorithm³ and the particle swarm optimization algorithm⁴ to optimize control parameters in real-time. Yao et al.⁵ proposed leveraging historical parameter data to refine adjustment ranges for newly installed equipment. These challenges underscore the importance of intelligent decision-making and optimization strategies in enhancing the efficiency of the AEPP. However, the effective implementation of such strategies relies on the development of accurate and reliable system models to describe the production process. Consequently, the study of AEPP system modeling remains significant importance.

Mechanistic models are a widely used approach for constructing system models in process industries, typically based on simplifying assumptions. However, the AEPP represents a highly complex system, characterized by dynamic nonlinearity and ambiguous mechanistic relationships. As a result, the information captured by simplified and hypothesized models is often insufficient, leading to significant deviations from actual system behavior and limiting the broader application of mechanistic models in AEPP^6,7,8. In contrast, AEPP operations generate and accumulate vast amounts of both online and offline data, which encapsulate valuable insights into system dynamics and present a promising avenue for modeling complex production processes⁹. Shi et al.¹⁰ employed the k-nearest neighbor (kNN) algorithm, combined with expert knowledge, to predict the material-energy balance in electrolyzers, thereby improving productivity. Notably, artificial neural network (ANN), as part of data-driven approaches, possess robust nonlinear approximation capabilities, enabling them to develop accurate models even in the presence of mechanistic uncertainty. Consequently, interest in employing ANNs for AEPP modeling has grown. Yang et al.¹¹ developed a convolutional neural network (CNN)-based model for superheat detection in electrolyzers. Lundby et al.¹² proposed a sparse neural network to model the aluminum electrolysis process. Yue et al.¹³ utilized a CNN to identify the occurrence of the anode effect in aluminum electrolysis. These studies collectively demonstrate the effectiveness of ANN in modeling AEPP systems.

Despite their widespread use, ANNs typically rely on the error backpropagation (BP) algorithm to achieve optimal model¹⁴. However, the AEPP involves complex physicochemical reactions within the electrolyser, which occur in high-temperature environments, alongside external operations such as crust breaking and discharge. Consequently, AEPP systems exhibit pronounced time-varying behavior and are subject to significant disturbances^15,16,17. When relying solely on an optimal model to describe AEPP’s production characteristics, the model’s accuracy tends to degrade under system variations or external perturbations. To overcome this limitation, researchers have developed a filtering neural network (FNN). The core concept involves treating the weights and thresholds of the ANN as state variables for filtering, while the output is considered as the measurement variable. By continuously adjusting the state variables in real time using field data, the FNN demonstrates a stronger ability to adapt to environmental changes compared to traditional ANN^{18,19,20,21,22}.

Current filtering techniques used to construct FNN frameworks typically include the Kalman filter (KF)²³, extended Kalman filter (EKF)²⁴, unscented Kalman filter (UKF)^25,26,27, and particle filter (PF)²⁸. It is well established that the KF is the optimal recursive state estimator for linear and Gaussian systems. However, since FNN generally exhibit highly nonlinear characteristics, nonlinear estimators such as the EKF, UKF, and PF are required. The EKF approximates the state and covariance updates by linearizing the nonlinear model. While effective in many scenarios, its reliance on a first-order Taylor expansion can result in significant estimation errors for highly nonlinear systems due to the omission of higher-order terms²⁹. The PF, although capable of handling both nonlinearity and non-Gaussianity, suffers from challenges such as the selection of an appropriate importance function and particle degeneracy, leading to unstable estimation results. In contrast, the UKF offers substantial improvements in estimation accuracy and stability over the EKF. Compared to the PF, the UKF deterministically samples from the prior covariance matrix and prior mean based on predefined rules, thereby avoiding particle degeneracy and providing accurate, robust state estimates for a wide range of nonlinear systems^30,31. Julier posits that approximating a system’s probability distribution is often simpler than approximating the system itself²⁹. To accomplish this, a sigma-point symmetric sampling method, based on the prior mean and prior covariance matrices, is introduced to facilitate an accurate approximation of the system. In this approach, sigma points are propagated through a nonlinear function, a process known as the unscented transformation (UT), which is a key step in the UKF. A critical element of the UT involves calculating the square root of the prior covariance matrix. In current research, this is typically achieved using Cholesky decomposition³².

The Cholesky decomposition requires the matrix to remain positive definite. However, factors such as computer rounding errors, measurement imprecision, and other disturbances can cause the matrix to become negative definite, resulting in the failure of the model. To address this issue, SVD is introduced for calculating the square root of matrices, as it offers a more stable decomposition method^33,34. Moreover, the predicted covariance matrix in the UKF is derived based on the principle of minimizing the mean squared error. In theory, as the number of iterations increases, the variance of the state variables should converge to a small value. However, in real-world operations, errors arising from model inaccuracies and coarse observations lead to error accumulation, preventing the variance from remaining low. To mitigate this, Li et al.³⁵ proposes utilizing graph networks to reconstruct the Kalman gain, thereby reducing the number of computational steps and minimizing error accumulation. Although this method is effective, it involves a complex implementation process. A more recent approach, a robust UKF based on the principle of maximum error entropy, theoretically guarantees optimal estimation. However, its application to complex systems remains challenging³⁶. In this article, an optimization strategy inspired by the gradient descent method is proposed to ensure that the prediction variance of state variables remains consistently low. This approach improves the predictive accuracy of SVD-UKFNN in AEPP. As shown in Fig. 1, the main contributions of this article are as follows:

1.
This article employs a state-space approach to construct the unscented Kalman filtering neural network (UKFNN), enhancing the self-learning capability of the neural network. To improve the numerical stability of the model, SVD is used to compute the prior covariance matrix within the UKFNN framework, resulting in the SVD-UKFNN algorithm.
2.
The state variable prediction variance in the SVD-UKFNN framework is reformulated as a cost function. Gradient descent is applied to minimize the value of the cost function, ensuring that the model consistently achieves optimal prediction performance during iterative computations.
3.
ased on this approach, a NSVD-UKFNN algorithm is developed and applied to predict current efficiency in aluminum electrolysis. After in-depth theoretical analysis and comprehensive evaluation, comparison results confirm the significant performance advantages of the proposed method for enhancing optimization control in the AEPP.

Problem analysis of ANN in AEPP

ANN based on BP algorithm

The architecture of the three-layer feedforward ANN is illustrated in Fig. 2. In this structure, $w_{ij,k}^1,b_{j,k}^1$ represent the weights and biases of the hidden layer at time k, while $w_{j,k}^2,b_k^2$ denote the weights and biases of the output layer at time k. The predicted value generated by the model at time k, denoted as ${\widetilde{z}_k}$, is specifically expressed as:

$$\begin{aligned}&{\widetilde{z}_k} = \sum \nolimits _{j = 1}^9 {\frac{{w_{j,k}^2}}{{1 + \exp [ - (\sum \nolimits _{i = 1}^9 {w_{ij,k}^1} {u_{i,k}} + b_{j,k}^1)]}}} + b_k^2&\end{aligned}$$

(1)

where $u _{i,k}$ represent the inputs of the ANN at time k, i represents the neuron nodes in the input layer of the ANN. The j denotes the neuron nodes in the hidden layer, and superscripts 1 and 2 differentiate between the hidden layer and the output layer, respectively. 9 represents the number of neurons in the input layer and the number of neurons in the hidden layer, respectively.

The optimal ANN model is derived using the BP algorithm based on the training dataset. Specifically, a set of weights and thresholds is obtained through iterative optimization aimed at minimizing the model’s prediction error (for detailed computational procedures, readers are referred to reference³⁷). The actual AEPP system is characterized as illustrated in Fig. 3.

ANN based on UKF

Currently, the state-space-based FNN method has achieved online updates of the ANN weights and thresholds. The UKFNN state space is defined as

$$\begin{aligned}&\left\{ \begin{array}{l} {x_k} = {x_{k - 1}} + {q_{k - 1}}\\ {z_k} = f({u_k},{x_k}) + {r_k} \end{array} \right. \ &\end{aligned}$$

(2)

where ${q_{k - 1}}$ and ${r_k}$ represent the process noise and observation noise, respectively. They follow a Gaussian distribution with a mean of 0 and covariances of Q and R respectively, and are uncorrelated with each other. $f( \cdot )$ represents the feedforward propagation structure of the ANN network. For specific implementation steps of the UKFNN, readers are referred to reference⁶. Equation 2 indicates how the state space equations for the UKFNN are constructed. It shows that the UKFNN alters only the observation nonlinear transfer function compared to the UKF. Therefore, the performance of the UKFNN is determined by the UKF.

UKF typically uses Cholesky decomposition to obtain the square root of the prior covariance matrix ${\widetilde{p}_{k - 1}}$ and sigma-symmetric sampling to complete the UT. If ${\widetilde{p}_{k - 1}} \in {\mathbb {R}^{n \times n}}$ cholesky decomposition expresses ${\widetilde{p}_{k - 1}}$ as the product of a lower triangular matrix and its transpose, denoted as ${\widetilde{p}_{k - 1}} = L{L^T}$. Here, $L \in {\mathbb {R}^{n \times n}}$ is a lower triangular matrix with all positive diagonal elements, which serves as the cholesky factor of ${\widetilde{p}_{k - 1}}$, and the square root represented $L = chol({\widetilde{p}_{k - 1}})$. This method requires the decomposed matrix to be positive and definite. However, rounding errors in computer calculations, system errors, and measurement outliers may cause the decomposed matrix to shift from being positive definite to negative definite. When the matrix ${\widetilde{p}_{k - 1}}$ loses its positive definiteness, the decomposition will fail, which will cause the program to terminate and result in poor stability.

It is observed that the prediction variance in the UKF is solely determined by the model’s prediction process, without any external adjustments. However, errors such as modeling inaccuracies and observation outliers can accumulate over iterations, leading to significant deviation. The gradient descent method, commonly used to optimize cost functions in artificial neural networks, aims to minimize the cost function. Importantly, this objective aligns with the need to update the prediction variance of state variables in the UKF. Drawing inspiration from this, this article explores the use of gradient descent to optimize the state variable variance, ensuring that the model consistently maintains high performance.

Design of the NSVD-UKFNN algorithm

Design of sigma sampling using SVD decomposition techniques

The SVD decomposition is one of the most stable and accurate matrix decomposition methods in numerical algebra. Let matrix $B \in {\mathbb {R}^{m \times n}}(m \ge n)$, the SVD decomposition of matrix B is expressed as

$$\begin{aligned}&B = U\Lambda {V^T}\ &\end{aligned}$$

(3)

where $U \in {\mathbb {R}^{m \times m}},V \in {\mathbb {R}^{n \times n}}$ represent the left and right singular vectors of B, respectively. $\Lambda = \left( {\begin{array}{*{20}{c}} \begin{array}{l} S\\ 0 \end{array}& \begin{array}{l} 0\\ 0 \end{array} \end{array}} \right)$, $S\in {\mathbb {R}^{m \times n}}$, $S = diag{ {s_1},{s_2},...{s_r}} ,{s_1}> {s_2}> ... > {s_r}$, for the singular values of matrix B. Therefore, the prior covariance matrix ${\widetilde{p}_{k - 1}}$ of the SVD decomposition of UKFNN is expressed as

$$\begin{aligned}&{\widetilde{p}_{k - 1}} = {\widetilde{U}_{k - 1}}{\widetilde{S}_{k - 1}}\widetilde{V}_{k - 1}^T&\end{aligned}$$

(4)

In the UKFNN, since the matrix ${\widetilde{p}_{k - 1}}$ is symmetric, it follows that ${\widetilde{U}_{k - 1}} = {\widetilde{V}_{k - 1}}$. The computation of the square root using SVD decomposition can be simplified to

$$\begin{aligned}&\sqrt{{{\widetilde{p}}_{k - 1}}} = {\widetilde{U}_{k - 1}}\sqrt{{{\widetilde{S}}_{k - 1}}}&\end{aligned}$$

(5)

where $\sqrt{{{\widetilde{p}}_{k - 1}}}\in {^{n \times n}}$. Therefore, the sigma symmetric sampling based on SVD decomposition is expressed as

$$\begin{aligned}&\left\{ \begin{array}{l} {x_{0,k - 1}} = {\widetilde{x}_{k - 1}}\\ {x_{i,k - 1}} = {\widetilde{x}_{k - 1}} + {(\sqrt{n + \lambda } {\widetilde{U}_{k - 1}}\sqrt{{{\widetilde{S}}_{k - 1}}} )_i}, i = 1,2,...,n\\ {x_{i,k - 1}} = {\widetilde{x}_{k - 1}} - {(\sqrt{n + \lambda } {\widetilde{U}_{k - 1}}\sqrt{{{\widetilde{S}}_{k - 1}}} )_i}, i = n + 1,n + 2,...,2n \end{array} \right. \ &\end{aligned}$$

(6)

The square root of the matrix is calculated using SVD, followed by sigma symmetry sampling can address the limitations of Cholesky decomposition, which may fail due to a loss of positive definiteness.

Gradient descent-based optimization method for state variables variance

Given a cost function $f({\nu _k})$ and an initial value ${\nu _0}$, the iterative process begins from ${\nu _0}$. By selecting an appropriate learning rate lr , the cost function is minimized gradually through iterations in the direction of the negative gradient $- \nabla f({\nu _k})$, leading to convergence towards its minimum. This iterative process is represented as

$$\begin{aligned}&{\nu _k} = {\nu _{k - 1}} - lr \times \nabla f({\nu _{k - 1}})\ &\end{aligned}$$

(7)

The predicted variance of the SVD-UKFNN state variable is reconstructed as the cost function $J({a_k},{b_k})$, and the gradient descent method is used to optimize the parameters $a_k$ and $b_k$. The cost function is constructed as follows :

$$\begin{aligned}&J({a_k},{b_k}) = a_k^2 \times tr({\widetilde{p}_{k|k - 1}}) + b_k^2 \times tr( - {K_k}{\widetilde{p}_{z,k}}K_k^T)&\end{aligned}$$

(8)

where tr represents the trace of a matrix. In SVD-UKFNN, the traces of the predicted covariance matrix are the predicted variances. ${\widetilde{p}_{k|k - 1}}$ and ${K_k}{\widetilde{p}_{z,k}}K_k^T$ represent the propagated covariance and the measurement update covariance in the prediction covariance matrix , respectively. According to Eq.7, a and b in Eq. 8 are updated at time k:

$$\begin{aligned}&\begin{array}{l} \left| \begin{array}{l} {a_k}\\ {b_k} \end{array} \right| = \left| \begin{array}{l} {a_{k - 1}}\\ {b_{k - 1}} \end{array} \right| - lr \times \nabla J({a_{k - 1}},{b_{k - 1}}) \\ = \left| \begin{array}{l} {a_{k - 1}}\\ {b_{k - 1}} \end{array} \right| - lr \times \left| \begin{array}{l} \frac{{\partial J({a_{k - 1}},{b_{k - 1}})}}{{\partial {a_{k - 1}}}}\\ \frac{{\partial J({a_{k - 1}},{b_{k - 1}})}}{{\partial {b_{k - 1}}}} \end{array} \right| \end{array}\ &\end{aligned}$$

(9)

Then the prediction covariance matrix ${\widetilde{p}_k}$ of SVD-UKFNN is updated as follows Eqs. 10 and 11:

$$\begin{aligned}&{\widetilde{p}_k}\mathrm{{ = }}a_k^2 \times {\widetilde{p}_{k|k - 1}} - b_k^2 \times {K_k}{\widetilde{p}_{z,k}}K_k^T,if tr({\widetilde{p}_k}) > loss&\end{aligned}$$

(10)

$$\begin{aligned}&{\widetilde{p}_k}\mathrm{{ = }}{\widetilde{p}_{k|k - 1}} - {K_k}{\widetilde{p}_{z,k}}K_k^T,if tr({\widetilde{p}_k}) \le loss\ &\end{aligned}$$

(11)

where loss represents the predicted variance threshold. Equations 10 and 11 are designed to ensure that the predicted variance is kept to a minimum.

A NSVD-UKFNN algorithm

In this article, SVD decomposition is used to calculate the square root of the matrix ${\widetilde{P}_{k - 1}}$ in UKFNN. Compared to other matrix decomposition methods, SVD exhibits greater stability, making it particularly suitable for handling rank-deficient matrices. This approach enhances the numerical stability of the model. In addition, the predictive variance is reconstructed as a cost function, and the gradient descent method is employed to optimize it to enhance the prediction performance of the model. Based on the state-space equation in Eq. 2, Algorithm 1 provides the complete implementation of the NSVD-UKFNN. Figure 4 illustrates the implementation process of the NSVD-UKFNN method.

Computational complexity analysis

The difference between the UKFNN and the NSVD-UKFNN is as follow

1.
Cholesky decomposition may encounter numerical stability issues in such cases. In contrast, SVD, owing to its singular value-based characteristics, is better equipped to handle these problems, thereby providing more reliable results. The method for computing the matrix square root is replaced from the cholesky decomposition with the more stable SVD decomposition. The computational complexity of cholesky decomposition is $O({n^3})$, while the computational complexity of SVD decomposition remains $O({n^3})$ as well. Therefore, using SVD decomposition to obtain the matrix square root does not increase the order of computation.
2.
This article proposes a method that integrates gradient descent into the update of the prediction variance. Compared to traditional methods introduces the construction of the cost function (Eq. 8), the parameter optimization process (Eq. 9), and the prediction covariance matrix update processes (Eqs. 10 and 11). Among these four equations, Eq. 8 involves the computation of a constant multiplied by the trace of a matrix, with a computational complexity of O(n). Eq. 9 involves constant subtraction and differentiation operations, with a computational complexity of O(1). Equations 10 and 11 include conditional statements if and matrix multiplication operations. The complexity of the if operation is O(1), while the computational complexity for the matrix multiplication within the if statements is $O({n^2})$. Therefore, the computational complexity of the prediction covariance matrix for NSVD-UKFNN is $O({n^2})$. The computational complexity of the prediction covariance matrix for UKFNN is $O({n^2})$. This indicates that, although gradient descent was introduced to optimize the prediction variance, it did not increase the algorithm’s computational complexity, maintaining the same computational efficiency as the traditional UKFNN. This improvement not only enhances the optimization capability of the algorithm but also retains its computational efficiency, making NSVD-UKFNN more effective and practical in real-world applications.

Case validation

Experimental objectives

Aluminum electrolytic products are extensively used in industries such as aviation, construction, and automotive manufacturing, with production primarily carried out through AEPP systems. A key characteristic of these systems is their high energy consumption, with energy utilization rates below 50%³⁸. Current efficiency and energy consumption are critical economic indicators for AEPP, and improving current efficiency can significantly reduce energy consumption. To modernize AEPP systems and enhance production efficiency, researchers have proposed intelligent decision-making optimization technologies. A fundamental requirement for this approach is the availability of an accurate and reliable current efficiency prediction model. The experimental equipment used in this study is the aluminum electrolytic cell, the core component of the AEPP system, as depicted in Fig. 5. In this context, this article introduces a stable and reliable NSVD-UKFNN algorithm.

Experimental dataset

This article collects real-world daily data from the AEPP system of Chongqing Tiantai Aluminum Co., Ltd. in Southwest China as the experimental dataset to validate the effectiveness of the proposed method for predicting current efficiency. A total of 1505 data samples were collected from actual aluminum electrolysis production to construct the experimental dataset. Based on expert experience³⁸, nine control parameters were selected as inputs for the artificial neural network. During the data preprocessing stage, outliers with current efficiency exceeding 100%, caused by measurement errors, were removed. To ensure data integrity and accuracy, records with missing values were also eliminated. For the missing values in the fluoride salt data, an interpolation method was employed for imputation. The dataset description is provided in Table 1.

Table 1 Description of experimental dataset for current efficiency prediction.

Full size table

Analysis and discussion of experimental results

To validate the effectiveness of the proposed NSVD-UKFNN algorithm, comparative experiments were conducted with backpropagation neural network (BPNN), gated recurrent unit (GRU), UKFNN, particle filtering neural network (PFNN), and SVD-UKFNN. The learning rates for both BPNN and GRU were set to 0.001. All experiments were conducted version 3.9 of Python. The experimental system software was Windows 10 64-bit, running on hardware comprising an AMD Ryzen 7 5800H CPU (3.2 GHz), 8 GB of memory, and an NVIDIA GeForce RTX 2080 GPU. Furthermore, the error evaluation metrics contained in this article are mean absolute error (MAE), mean squared error (MSE), the sum of squared errors (SSE), and the coefficient of determination (${r^2}$), with their respective equations shown as follows:

$$\begin{aligned}&MAE = \frac{1}{N}\sum \nolimits _{i = 1}^N {\left| {{y_i} - {{\widehat{y}}_i}} \right| }&\end{aligned}$$

(12)

$$\begin{aligned}&MSE = \frac{1}{N}\sum \nolimits _{i = 1}^N {({y_i} - {{\widehat{y}}_i}} {)^2}&\end{aligned}$$

(13)

$$\begin{aligned}&SSE = {\sum \nolimits _{i = 1}^N {({y_i} - {{\widehat{y}}_i})} ^2}&\end{aligned}$$

(14)

$$\begin{aligned}&{r^2} = \frac{{\sum \nolimits _{i = 1}^N {({{\widehat{y}}_i} - {y_i})({y_i} - \overline{y} )} }}{{\sqrt{\sum \nolimits _{i = 1}^N {({{\widehat{y}}_i} - {y_i})\sum \nolimits _{i = 1}^N {({y_i} - \overline{y} )} } } }}&\end{aligned}$$

(15)

where $y_i$ is the i-th true value, ${\widehat{y}_i}$ is the i-th model prediction value, $\overline{y}$ represents the average of all true values, and N denotes the number of samples. BPNN and GRU need to divide the training set to obtain the optimal weight and threshold. Therefore, the first 1455 sets of data from the dataset mentioned above were used as the training set, while the last 50 data sets were used as the testing set.

The relationship between neuron number and model prediction performance

The principle for the number of neurons in the hidden layer of an ANN is

$$\begin{aligned}&h = \sqrt{l + s} + m&\end{aligned}$$

(16)

where h represents the number of neurons in the hidden layer, l represents the number of neurons in the input layer, s represents the number of neurons in the output layer, and m denotes a constant($1 \le m \le 10$). Based on the number of neurons in the hidden layer, the h is set to 6, 9, and 12, respectively. The feedforward neural network structure is $9-h-1$. In addition, the activation function of the input layer to hidden layer is set to Sigmoid , the activation function of hidden layer to the output layer is set to ReLu.

Table 2 presents the prediction accuracy of BPNN, SVD-UKFNN, and NSVD-UKFNN when the hidden neuron number are set to 6, 9, and 12, respectively. Providing the maximum error (Max), minimum error (Min), mean error (Mean), MAE, MSE, SSE, and ${r^2}$ seven error evaluation metrics. Table 2 shows that all three models gradually improve in accuracy with the increase in the number of hidden neurons. The BPNN method, relying on an optimal model derived from the training set, exhibits a strong dependence on the number of parameters for achieving high prediction accuracy. In contrast, SVD-UKFNN and NSVD-UKFNN adaptively update model parameters in response to environmental changes, enabling the real-time obtain of optimal weights and thresholds. Consequently, high prediction accuracy is maintained even with fewer parameters. Notably, NSVD-UKFNN consistently maintains an SSE of less than 2 across different numbers of hidden layer neurons, with MAE reduced by approximately two times compared to SVD-UKFNN, and the ${r^2}$ being much closer to 1, indicating a very high prediction accuracy and stability. Furthermore, when h is set to 6, 9, and 12, the state variable dimensions for SVD-UKFNN and NSVD-UKFNN are 67, 100, and 133, respectively. The increase in dimensions inevitably necessitates additional time. According to the experimental results, NSVD-UKFNN demonstrates high precision at different values of h. The bold fonts in Table 2 are the optimal operation result of the model. Indeed, NSVD-UKFNN offers potential for online applications in production systems.

Table 2 The results of different methods with different number of neurons in the hidden layer.

Full size table

Comparative analysis with different models

Figure 6 illustrate the AEPP process current efficiency prediction models established in this study using BPNN, GRU, PFNN, UKFNN, SVD-UKFNN, and the proposed NSVD-UKFNN. The PFNN model was evaluated with particle quantities of 100, 500, and 1000 to observe the relationship between particle quantity and model predictive capability, denoted as PFNN(100), PFNN(500), and PFNN(1000), respectively. In this section, the feedforward ANN architectures for BPNN, PFNN, UKFNN, SVD-UKFNN, and NSVD-UKFNN are all configured as $9-9-1$. The learning rate is set to 0.001 for both BPNN and GRU.

BPNN and GRU utilize the optimal model to predict current efficiency. The weights and thresholds of the optimal model are obtained using the BP algorithm based on the training set, making it a typical static prediction model. In this research, BPNN is constructed as a $9-9-1$ network structure. Due to the relatively small number of parameters, the optimal weights and thresholds obtained from the training set may not accurately capture the characteristics of the complex production system of aluminum electrolysis. Figure 6a shows that the fitting performance of BPNN on the training set is slightly better than that on the testing set, but the overall fitting performance is relatively poor. The gated structure of GRU effectively enhances the predictive capability of the model. Figure 6b indicates that GRU performs well on the training set. Unfortunately, the optimal weights and thresholds lack adaptive learning ability for the system, resulting in poor fitting performance on the testing set.

Table 3 The prediction results of different methods.

Full size table

Figure 6c–h illustrate the fitting performance of PFNN, UKFNN, SVD-UKFNN, and NSVD-UKFNN. Figure 6c–e show that the fitting capability of PFNN gradually improves with an increase in the number of particles. Currently, the importance function remains a challenging issue in PFNN research, particularly due to the lack of diversity resulting from particle degradation. As shown in Fig. 6b–d, increasing the number of particles can help alleviate this issue. However, the increase in particle quantity inevitably leads to a significant computational burden, which may not meet the requirements for online control of AEPP. Figure 6f–h indicate that the fitting performances of UKFNN and SVD-UKFNN are comparable. The SVD-UKFNN model achieves superior numerical stability compared to the UKFNN model by employing SVD decomposition to calculate the square root of matrices. Additionally, the SVD decomposition method reduces computational costs. Therefore, SVD-UKFNN is selected in this article as the baseline model for designing the AEFP current efficiency prediction framework. Furthermore, Fig. 6h demonstrates that reconstructing the MSE of the state variables in NSVD-UKFNN as a cost function and optimizing it using gradient descent significantly enhances the model’s predictive capability.

Figure 7 shows that the trend of MAE changes during the prediction processes of each model. Figure 7a shows that BPNN exhibits a relatively large fitting error, while the testing error of GRU is significantly greater than that of the training set, and UKFNN has a considerable error in the initial phase. Figure 7b indicates that the prediction errors of UKFNN and SVD-UKFNN are comparable. Figure 7c shows that NSVD-UKFNN accelerates the convergence speed of the model while demonstrating an overall more stable performance.

Table 3 shows the results of the different current efficiency prediction methods. It can be observed that the NSVD-UKFNN algorithm demonstrates a significant advantage in all error metrics compared to the other methods. Furthermore, the prediction errors of UKFNN and SVD-UKFNN are comparable, indicating that SVD decomposition and Cholesky decomposition achieve the same level of accuracy when computing the square root of matrices. In summary, the proposed NSVD-UKFNN in this article exhibits the highest prediction accuracy and stability. The bold fonts in Table 3 are the optimal operation result of the model.

Table 4 The results of disturbance experiments with different methods.

Full size table

Disturbance experiment

In practical applications, the AEFP involves multiple operational processes, such as feeding and discharging, during which complex physical and chemical reactions are prone to occur under high-temperature conditions, causing significant disturbances to the AEFP. To evaluate the learning capability of NSVD-UKFNN in response to abrupt changes in AEFP, disturbances were artificially introduced to the input data using the formula $X=X \cdot +g \times r a n d \times X$, where X represents the input data, rand denotes a uniformly distributed random number in the range $(-1,1)$, and g is the disturbance factor. To investigate the effect of varying disturbance levels on method prediction performance, four disturbance scenarios were designed, with the disturbance factor g set to ± 0.1%, ± 0.5%, ± 1%, and ± 5%. As shown in Table 4, the error of all methods increases with the magnitude of the disturbance. However, compared to other methods, NSVD-UKFNN demonstrates superior robustness in prediction performance under these disturbance scenarios. The bold fonts in Table 4 are the optimal operation result of the model.

Conclusion

This article enhances the UKFNN algorithm by improving its numerical stability through the use of SVD to compute the square root of the prior covariance matrix. Additionally, a novel algorithm is developed that integrates gradient descent into the SVD-UKFNN framework. Specifically, the prediction variance of the state variables in the SVD-UKFNN is reformulated as a cost function, and gradient descent is utilized for optimization, ensuring that the prediction variance consistently remains at a low level. The proposed NSVD-UKFNN demonstrates significant advantages, reducing the MAE by 2.08 times and the SSE by approximately 22.23 times compared to the baseline model.

Current efficiency and energy consumption are two key technical and economic indicators in the AEPP, and they are crucial for evaluating and optimizing the overall performance of the process. The focus of this study is on the prediction of current efficiency. By establishing and verifying relevant models, this study provides a theoretical basis and technical support for improving current efficiency. In future research, the application of closed-loop optimization and prediction in AEPP can be further explored. These new ideas can adapt to the dynamic changes of process parameters, and provide a more comprehensive and effective solution for the performance improvement and optimization of AEPP.

Data availability

The datasets used and/or analyzed during the current study are available from the corresponding author Xihong Fei on reasonable request via e-mail fxhong@mail.ustc.edu.cn. Although the training model has not yet been publicly released, we are considering the following measures to enhance the reproducibility of our research and contribute to the community in future studies: Model sharing: we plan to publish the code and weights of the trained model in a public GitHub repository when appropriate, allowing other researchers to access and utilize them. Detailed documentation: if we choose not to release the model at this time, we will ensure that the GitHub repository includes clear instructions for other researchers on how to obtain the model weights or build the model, including necessary environment setup and code examples. Ongoing updates: we commit to continuously improving and updating the availability of data and the model based on feedback, ensuring it provides practical value to the broader scientific community.

References

Wang, J., Xie, Y., Xie, S. & Chen, X. Development of data-knowledge-driven predictive model and multi-objective optimization for intelligent optimal control of aluminum electrolysis process. Eng. Appl. Artif. Intell. 134, 108664 (2024).
Article MATH Google Scholar
Chen, X., Yang, Y., Liu, Y. & Wu, L. Feature-driven economic improvement for network-constrained unit commitment: A closed-loop predict-and-optimize framework. IEEE Trans. Power Syst. 37, 3104–3118 (2021).
Article ADS MATH Google Scholar
Yi, J., Huang, D., Fu, S., He, H. & Li, T. Multi-objective bacterial foraging optimization algorithm based on parallel cell entropy for aluminum electrolysis production process. IEEE Trans. Ind. Electron. 63, 2488–2500 (2015).
MATH Google Scholar
Yi, J., Bai, J., Zhou, W., He, H. & Yao, L. Operating parameters optimization for the aluminum electrolysis process using an improved quantum-behaved particle swarm algorithm. IEEE Trans. Ind. Inf. 14, 3405–3415 (2017).
Article MATH Google Scholar
Yao, L., He, T. & Luo, H. Piggybacking on past problem for faster optimization in aluminum electrolysis process design. Eng. Appl. Artif. Intell. 126, 106937 (2023).
Article MATH Google Scholar
Yao, L., Li, T., Li, Y., Long, W. & Yi, J. An improved feed-forward neural network based on UKF and strong tracking filtering to establish energy consumption model for aluminum electrolysis process. Neural Comput. Appl. 31, 4271–4285 (2019).
Article MATH Google Scholar
Wang, P., Zhu, Z., Liang, W., Liao, L. & Wan, J. Hybrid mechanistic and neural network modeling of nuclear reactors. Energy 282, 128931 (2023).
Article CAS MATH Google Scholar
Sabzalian, M. H., Pirouzi, S., Aredes, M., Wanderley Franca, B. & Carolina Cunha, A. Two-layer coordinated energy management method in the smart distribution network including multi-microgrid based on the hybrid flexible and securable operation strategy. Int. Trans. Electr. Energy Syst. 2022, 3378538 (2022).
Article Google Scholar
Wang, J., Xie, S., Xie, Y. & Chen, X. A general knowledge-guided framework based on deep probabilistic network for enhancing industrial process modeling. IEEE Trans. Ind. Inf. 20, 3050–3059 (2024).
Article MATH Google Scholar
Shi, J., Chen, X., Xie, Y., Zhang, H. & Sun, Y. Delicately reinforced $k$-nearest neighbor classifier combined with expert knowledge applied to abnormity forecast in electrolytic cell. IEEE Trans. Neural Netw. Learn. Syst. 35, 3027–3037 (2024).
Article PubMed MATH Google Scholar
Yang, C., Wang, Z., Huang, K. & Gui, W. Physical-knowledge embedded convolutional neural network for aluminum electrolysis superheat degree identification. IEEE Trans. Ind. Electron. 71, 9698–9707 (2024).
Article MATH Google Scholar
Lundby, E. T. B., Rasheed, A., Gravdahl, J. T. & Halvorsen, I. J. Sparse deep neural networks for modeling aluminum electrolysis dynamics. Appl. Soft Comput. 134, 109989 (2023).
Article Google Scholar
Yue, W. et al. PKG-DTSFLN: Process knowledge-guided deep temporal-spatial feature learning network for anode effects identification. J. Process Control 138, 103221 (2024).
Article CAS MATH Google Scholar
Altan, G., Alkan, S. & Baleanu, D. A novel fractional operator application for neural networks using proportional Caputo derivative. Neural Comput. Appl. 35, 3101–3114 (2023).
Article MATH Google Scholar
Zhang, H., Li, T., Li, J., Yang, S. & Zou, Z. Progress in aluminum electrolysis control and future direction for smart aluminum electrolysis plant. JOM 69, 292–300 (2017).
Article CAS MATH Google Scholar
Zhu, Y., Xie, S., Xie, Y. & Chen, X. Temperature prediction of aluminum reduction cell based on integration of dual attention LSTM for non-stationary sub-sequence and ARMA for stationary sub-sequences. Control. Eng. Pract. 138, 105567 (2023).
Article MATH Google Scholar
Akbari, E., Faraji Naghibi, A., Veisi, M., Shahparnia, A. & Pirouzi, S. Multi-objective economic operation of smart distribution network with renewable-flexible virtual power plants considering voltage security index. Sci. Rep. 14, 19136 (2024).
Article CAS PubMed PubMed Central Google Scholar
Hernandez-Gonzalez, M., Basin, M. V. & Hernández-Vargas, E. A. Discrete-time high-order neural network identifier trained with high-order sliding mode observer and unscented Kalman filter. Neurocomputing 424, 172–178 (2021).
Article MATH Google Scholar
Choi, J., Yeap, T. H. & Bouchard, M. Online state-space modeling using recurrent multilayer perceptrons with unscented Kalman filter. Neural Process. Lett. 22, 69–84 (2005).
Article Google Scholar
Pérez-Ortiz, J. A., Gers, F. A., Eck, D. & Schmidhuber, J. Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets. Neural Netw. 16, 241–250 (2003).
Article PubMed MATH Google Scholar
Yao, L., Ding, W., He, T., Liu, S. & Nie, L. A multiobjective prediction model with incremental learning ability by developing a multi-source filter neural network for the electrolytic aluminium process. Appl. Intell. 52, 17387–17409 (2022).
Article Google Scholar
Azzalini, L. J., Crompton, D., D’Eleuterio, G. M., Skinner, F. & Lankarany, M. Adaptive unscented Kalman filter for neuronal state and parameter estimation. J. Comput. Neurosci. 51, 223–237 (2023).
Article MathSciNet PubMed MATH Google Scholar
Bai, Y., Yan, B., Zhou, C., Su, T. & Jin, X. State of art on state estimation: Kalman filter driven by machine learning. Annu. Rev. Control. 56, 100909 (2023).
Article MathSciNet MATH Google Scholar
Jiang, C. et al. A state-of-charge estimation method of the power lithium-ion battery in complex conditions based on adaptive square root extended Kalman filter. Energy 219, 119603 (2021).
Article MATH Google Scholar
Zhou, T., Jie, Y., Wei, Y., Zhang, Y. & Chen, H. A real-time prediction interval correction method with an unscented Kalman filter for settlement monitoring of a power station dam. Sci. Rep. 13, 4055 (2023).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Li, Z., Shen, S., Ye, Y., Cai, Z. & Zhen, A. An interpretable online prediction method for remaining useful life of lithium-ion batteries. Sci. Rep. 14, 12541 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Wang, L. et al. SOC estimation of lead-carbon battery based on GA-MIUKF algorithm. Sci. Rep. 14, 3347 (2024).
Article ADS CAS PubMed PubMed Central MATH Google Scholar
Jouin, M., Gouriveau, R., Hissel, D., Péra, M.-C. & Zerhouni, N. Particle filter-based prognostics: Review, discussion and perspectives. Mech. Syst. Signal Process. 72, 2–31 (2016).
Article ADS Google Scholar
Julier, S., Uhlmann, J. & Durrant-Whyte, H. F. A new method for the nonlinear transformation of means and covariances in filters and estimators. IEEE Trans. Autom. Control 45, 477–482 (2000).
Article MathSciNet MATH Google Scholar
Kuncara, I. A., Widyotriatmo, A., Hasan, A. & Kim, C.-S. Integration of nonlinear observer and unscented Kalman filter for pose estimation in autonomous truck-trailer and container truck. Nonlinear Dynamics 112, 11217–11236 (2024).
Article MATH Google Scholar
Luo, Y. et al. Enhancing physically-based flood forecasts through fusion of long short-term memory neural network with unscented kalman filter. J. Hydrol. 641, 131819 (2024).
Article Google Scholar
Zhang, Y., Ding, Y., Bu, J. & Guo, L. A novel adaptive square root UKF with forgetting factor for the time-variant parameter identification. Struct. Control. Health Monit. 2023, 4160146 (2023).
Article Google Scholar
Yuan, H., Dai, H., Wei, X. & Ming, P. A novel model-based internal state observer of a fuel cell system for electric vehicles using improved Kalman filter approach. Appl. Energy 268, 115009 (2020).
Article MATH Google Scholar
Xu, F., Yu, W., Xie, Y. & Tang, J. Algorithm 1043: Faster randomized SVD with dynamic shift. ACM Trans. Math. Softw. 50, 1–27 (2024).
Article MathSciNet MATH Google Scholar
Li, W., Fu, X., Zhang, B. & Liu, Y. Unscented Kalman filter of graph signals. Automatica 148, 110796 (2023).
Article MathSciNet MATH Google Scholar
Dang, L., Chen, B., Wang, S., Ma, W. & Ren, P. Robust power system state estimation with minimum error entropy unscented Kalman filter. IEEE Trans. Instrum. Meas. 69, 8797–8808 (2020).
Article ADS MATH Google Scholar
Li, Z., Guo, Y., Liu, H. & Zhang, C. A theoretical view of linear backpropagation and its convergence. IEEE Trans. Pattern Anal. Mach. Intell. 46, 3972–3980 (2024).
Article PubMed MATH Google Scholar
Ding, W., Yao, L., Li, Y., Long, W. & Yi, J. Incremental learning model based on an improved CKS-PFNN for aluminium electrolysis manufacturing. Neural Comput. Appl. 34, 2083–2102 (2022).
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

School of Electronic and Electrical Engineering, Chongqing University of Science and Technology, Chongqing, 401331, China
Xiaoyan Fang
School of Electrical and Information Engineering, Anhui University of Technology, Ma’anshan, 243002, China
Xihong Fei, Kang Wang & Tian Fang
School of Electrical and Optoelectronic Engineering, West Anhui University, Lu’an, 237012, China
Rui Chen

Authors

Xiaoyan Fang
View author publications
Search author on:PubMed Google Scholar
Xihong Fei
View author publications
Search author on:PubMed Google Scholar
Kang Wang
View author publications
Search author on:PubMed Google Scholar
Tian Fang
View author publications
Search author on:PubMed Google Scholar
Rui Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

This manuscript is written by both F.X. and X.F. R.C. evaluated the feasibility of the algorithm. T.F. and K.W. prepared manuscript pictures, X.F. assisted R.C. in making experimental plans. F.X, X.F. and K.W. embellished the language of the full manuscript.

Corresponding author

Correspondence to Xihong Fei.

Ethics declarations

Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this article.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Fang, X., Fei, X., Wang, K. et al. A novel SVD-UKFNN algorithm for predicting current efficiency of aluminum electrolysis. Sci Rep 15, 9173 (2025). https://doi.org/10.1038/s41598-025-94210-y

Download citation

Received: 20 October 2024
Accepted: 12 March 2025
Published: 17 March 2025
DOI: https://doi.org/10.1038/s41598-025-94210-y