Efficient estimating and clustering lithium-ion batteries with a deep-learning approach

Wu, Jie; Sun, Zhongxian; Li, Dingquan; He, Weilin; Yang, Dongchen; Wu, Zhenguo; Geng, Xin; Yang, Hui; Wang, Hailong; Hu, Linyu; Tu, Haiyan; He, Xin

doi:10.1038/s44172-025-00488-1

Download PDF

Article
Open access
Published: 12 August 2025

Efficient estimating and clustering lithium-ion batteries with a deep-learning approach

Jie Wu¹^na1,
Zhongxian Sun¹^na1,
Dingquan Li²^na1,
Weilin He¹,
Dongchen Yang³,
Zhenguo Wu⁴,
Xin Geng⁴,
Hui Yang⁵,
Hailong Wang⁴,
Linyu Hu⁶,
Haiyan Tu¹ &
…
Xin He ORCID: orcid.org/0000-0002-0272-9079^1,3

Communications Engineering volume 4, Article number: 151 (2025) Cite this article

4518 Accesses
3 Citations
15 Altmetric
Metrics details

Subjects

Abstract

Growing energy storage demand has solidified the dominance of lithium-ion batteries (LIBs) in modern societies but intensifies recycling pressures. Precise state-of-health (SOH) assessment is crucial to grouping retired batteries from an unknown state for secondary utilization. However, batteries in the pack exhibit distinct capacity fading behaviors due to their service scenarios and working conditions. We develop a deep-learning framework for rapid, transferable SOH estimation and battery classification. This framework integrates deep neural networks with interconnected electrochemical, mechanical, and thermal features. Our model delivers optimal accuracy with a mean absolute error (MAE) of 0.822% and a root mean square error (RMSE) of 1.048% using combined features. It demonstrates robust performance across various conditions and enables SOH prediction with data from merely one previous cycle. Moreover, the well-trained model could adapt to other electrode systems with a minimal number of additional samples. This work highlights critical features for SOH estimation and enables efficient battery classification toward sustainable recycling.

Machine learning pipeline for battery state-of-health estimation

Article 05 April 2021

Transfer learning based generalized framework for state of health estimation of Li-ion cells

Article Open access 01 August 2022

Deep learning approach towards accurate state of charge estimation for lithium-ion batteries using self-supervised transformer model

Article Open access 01 October 2021

Introduction

Rechargeable lithium-ion batteries (LIBs) are widely used in portable electronics¹, electric vehicles (EV)², and energy storage systems³. As the demand for clean and renewable energy grows, the diverse applications of LIBs in electrical energy storage will make a significant contribution to reducing carbon emissions and ultimately mitigating global warming⁴. Modern commercial LIBs have been highly optimized, from chemical composition to manufacturing technology, enabling a service lifespan ranging from months to decades. Energy is efficiently stored and utilized through reversible electrochemical reactions within the battery, while a certain level of irreversible degradation reactions also occurs with the cycling process, including active material loss⁵ and increased impedance⁶ over time. This degradation leads to performance issues like capacity fading, mechanical failure, and thermal instability⁷. To prevent unexpected failures and safety concerns, batteries in EVs should be replaced once their real-time capacity drops to 80% of the initial capacity⁸.

The rapidly growing battery industry faces an imminent increase in spent LIBs, creating a substantial waste management challenge for sustainable development. However, this also presents an opportunity to recover economic value by recycling these batteries into raw materials or repurposing them for secondary utilization⁹. Currently, secondary applications of LIBs are typically carried out at the module level¹⁰. However, there remains a latent risk posed by severely degraded individual cells within the module. To accurately cluster each individual cell based on its state of health (SOH), the most reliable approach involves disassembling the pack or module for precise identification. This assessment procedure maximizes the residual value of the cells, as opposed to opting for complete recycling when some cells are failing¹¹. By ensuring the optimal performance and safe operation of regrouped cells in their reconditioned second life, this method enhances the overall value and functionality of the battery system. However, several challenges arise, including the mixture of cells with unknown chemical compositions or varying types and sizes from different manufacturers^12,13. Specific application scenarios and dynamic operating conditions, such as current rates, voltage ranges, and temperature fluctuations, further complicate the situation¹⁴. Additionally, each cell within a pack or module displays a unique non-linear degradation pattern due to thermal diffusion and pressure distribution¹⁵. Therefore, accurately estimating the SOH of spent LIBs or unknown-condition LIBs is crucial for effective clustering, which in turn enhances recycling safety and sustainability¹⁶.

Previous studies on SOH estimation for LIBs using model-based approaches include the equivalent circuit model (ECM)¹⁷, Kalman filter (KF) model¹⁸, and pseudo two dimensions (P2D) model¹⁹. The accuracy of the output from these models not only relies on physical insights and intricate calculations from a lengthy development process, but also on an understanding of degradation mechanisms affecting electrochemical response and dynamic behaviors, such as ions diffusion²⁰. Additionally, model-based approaches face challenges in scaling across variations in chemistry, manufacturers, and operating conditions²¹. To address these challenges, some of recent research have shifted towards data-driven approaches for estimating battery SOH^22,23. The data-driven approach involves using statistical machine learning on experimental datasets to establish correlations across diverse operating conditions, including various cell chemistries and manufacturers, without modeling the physical mechanisms²⁴. Various methods have been used for battery SOH estimation, such as the support vector machines (SVM)²⁵, random forests (RF)²⁶, long short-term memory (LSTM) network²⁷, and convolutional neural networks (CNN)²⁸. Extracting correlation features, such as peak area, peak position, and peak width from incremental capacity curves for estimating battery SOH is considered as a well-established technique^29,30,31. However, most of these features are derived from complete charge and discharge curves, which require high-frequency data acquisition to obtain incremental capacity and differential voltage curves. This process involves handling large amounts of data, placing a significant burden on the battery management system³². To overcome the limitation of voltage profiles, Michael Knapp et al.³³ developed a mathematical approach to convert the voltage profiles in the relaxation state into statistical features, which enables extracting data for mathematical computation and converting it into statistical features within any voltage interval. In order to realize rapid estimation, Darius Roman et al.³² employed a voltage thresholding approach to extract features within 15 minutes over the charging voltage and charging current curves. The limitation associated with the factor is the voltage threshold, whereas an inappropriate setting may lead to the loss of important information for estimation. In practice, deep learning-based methods are able to derive appropriate data-driven models to accomplish accurate battery SOH estimation with frequently changed load, usage conditions, and electrode materials. A transfer learning framework allows retraining the model to correct the deviation of the model from the source domain to the target domain^34,35. Ye Yuan et al.³⁶ developed a deep transfer learning approach to realize personalized battery health status prediction using the cycling knowledge from cells with completely different usage protocols, charge–discharge configurations, and battery chemistries. While these approaches have demonstrated satisfactory performance based solely on sufficient electrical features, the demand for a large quantity of high-quality datasets remains a significant challenge. This requirement burdens the practical application of data-driven methods, impacting their speed, accuracy, and suitability for practical deployment.

An alternative is to reduce the scale of data processing while still obtaining reliable, relevant features for accurate estimation in practical applications. Compared to features extracted from current-voltage datasets, electrochemical impedance spectroscopy (EIS) provides representative information on materials properties and interfacial conditions, which correlates with degradation inside the battery³⁷. Although it is debatable whether the spectrum fitted non-unique equivalent circuit model fully captures the physical, chemical and materials properties as well as the degradation process within the battery, the low-frequency region of the EIS spectra provides valuable features for accurate SOH estimation³⁸. In addition to changes in impedance spectra, cells also exhibit thermal and mechanical variations during the electrochemical process, which serve as alternative indicators for battery SOH. Correlating internal stress diagnostics with the voltage profile provides insights into chemo-mechanical processes at the interfaces and within electrodes, and helps distinguish the porosity behavior of Si-based electrodes³⁹. Decoupling temperature and pressure signals allows tracking of chemical events, such as solid electrolyte interphase (SEI) formation and structural evolution⁴⁰. The degradation reactions proceed in the form of irreversible volume expansion⁴¹ and thickening of the SEI layer⁴², associated with the accumulation of thermal and mechanical effects. Consequently, these thermal and mechanical features offer valuable insights for fundamental research and hold potential for enhancing model training.

In this study, we designed a deep neural network (DNN) model for efficiently estimating and clustering retired and unknown-state LIBs. The constructed framework is leveraged to integrate electrical features with thermal or mechanical features from the dataset of 46 pouch cells (including commercial and lab-assembled cells) with the varied manufacturers, cell sizes, electrode materials, and usage conditions. As illustrated in Fig. 1, our approach begins with a base model using LiFePO₄ (LFP) battery system under various operating conditions, including current densities, cycle number,s, and cell sizes. We developed a dynamic voltage thresholding method to extract critical features for SOH estimation in 269 seconds. The data fusion method, combining thermal or mechanical features, improves the performance of the model by reducing mean absolute error (MAE) from 1.786% to 0.822% and root mean square error (RMSE) from 1.859% to 1.048%. Notably, our model achieves rapid and accurate estimation using only one discharge profile. Validation at room temperature with varied current rates demonstrated strong conditional adaptability, with MAE of 1.382% and RMSE of 1.635%. Fine-tuning model parameters with 1109 and 2631 experimental samples from LiNi_0.6Co_0.2Mn_0.2O₂ (NCM622) and LiNi_0.92Co_0.05Al_0.03O₂ (NCA) batteries, respectively, achieved transfer model performance metrics of MAE 1.067% and 0.817%, and RMSE of 1.186% and 0.939%. These results highlight the efficiency, versatility, and accuracy of our model in estimating battery SOH, enabling reliable classification for sustainable recycling. Following SOH estimation, retired batteries are first clustered via K-means based on critical degradation indicators, then refined through multimodal metric analysis (e.g., applying an 80% SOH threshold) to align with specific secondary applications. This two-tiered classification framework mitigates the intrinsic heterogeneity of retired batteries while supporting customizable, application-driven reuse strategies.

Results

Data generation

In this study, 46 LIBs from three suppliers (Supplementary Note 1. Datasets and Supplementary Table 1) were subjected to cyclic aging tests under controlled experimental conditions, generating a total of 71,559 data-cycles. The datasets are divided into four groups based on operating temperature, charging/discharging current rate, and electrode material types: group G-I, G-II, G-III and G-IV (Supplementary Table 1). Each cell and its corresponding operating conditions are detailed in Supplementary Table 2.

The electrical dataset commonly used in battery management systems (BMS) includes voltage, current, and capacity, recorded during battery charging and discharging management at a frequency of 1 Hz. Though the energy conversion process is reflected byan electrical signal, related details for electrochemical reactions are buried. To address this, our approach also collects a dataset on relevant physical features, including surface temperatures at three different points (T₁, T₂, and T₃), ambient temperature (T), and surface strain (S), recorded by a micro-control processor combined with sensors at 1 Hz. This provides indirect insights into structural degradation and internal resistance evolution. The sensor array placement on the cell surface is shown in Supplementary Fig. 1. The collected signals are pre-processed to enhance quality by removing outliers and filling missing values, as illustrated in Supplementary Note 5. The file forms, data names and units of the dataset are also normalized to ensure data consistency.

Designing dynamic voltage thresholding to extract features

The estimation of battery SOH is primarily performed based on the extracted features correlated with the discharge profile⁴³. Figure 2a shows the voltage curve of cell G-I#2 during the discharge process. The voltage curve shifts slightly to the left as the cycle number increases, which attribute to the aging of the battery. This pronounced trend provides statistical features by mathematical calculation^32,33,44, each discharging voltage curve is transformed into four statistical features: mean (V__mean), kurtosis (V__kur), skewness (V__sk), and standard deviation of the voltage (V__std).

**Fig. 2: Feature extraction and analysis on cell G-I#2.**

Overlaying discharge profiles reveal a knee compression after the long plateau region, which distinctly reflects capacity fading behaviors. To avoid information redundancy when using the entire voltage profile for model training, a dynamic voltage thresholding method with specific rules is designed (Fig. 2b) to effectively extract features from the discharging voltage curve. The voltage threshold is divided by a lower limit of voltage (V_l) and an upper limit of voltage (V_u). The V_l is fixed at the discharging cutoff voltage, while the V_u ranges from the initial discharge voltage to sliding values within a 2.60 V–3.20 V range with an interval of 0.05 V. The time region (TR) is determined by V_u, and the length of each TR for the experimental samples corresponding to LFP cells (1 Ah) in G-I dataset is collected (Fig. 2c).

As cycling proceeding, the relative temperature (T1′ = T1-T) profile for cell G-I#2 is shown in Fig. 2d. Supplementary Fig. 2 displays temperature variations at spatially distributed points and ambient temperature curves for different cells. The analog strain (filtered using Kalman filtering and calculated from raw data) of cell G-I#2 is presented in Fig. 2e, with surface strain curves for different cells are shown in Supplementary Fig. 3. These thermal and mechanical curves exhibit quantifiable variations across cycles, suggesting a correlation between physical features and the capacity fading. The mean relative temperatures (T′_{_mean}), which include T1′_mean, T2′_mean, T3′_mean, and the mean of the strains (S__mean) are calculated as features for estimating the SOH of the cells in each TR (Supplementary Fig. 4). Supplementary Fig. 5, 6 illustrate the correlation of voltage, temperature, and strain with capacity.

Capability of single features to estimate battery SOH in different TRs

The prediction capability (ρ) of individual features across different TRs are assessed, and their correlation for LFP cells (1 Ah) in G-I dataset are listed in Fig. 3a. The ρ of the voltage feature increases gradually with TRs, reaches the optimal value with the V_u ranging from 3.10 V to 2.95 V, and then declines with further increases in V_u. The ρ of the strain feature remains consistently around 0.73 across all TRs, while the ρ of temperature feature initially decreases as the V_u lowers, but stabilizes between 3.10 V to 2.60 V. These differences in mechanical and thermal factors stabilize the model framework with their inherent physical correlations during electrochemical process.

**Fig. 3: Predictive capability (ρ) and estimation performance with single feature.**

The performance of SOH estimation with single feature input is displayed in Fig. 3b, c, with the MAE and RMSE across different TRs. The performance varies among the representative features during the discharge process. Features such as V__sk display high value in their overall trend. In contrast, V__mean, V__kur, V__std, S__mean and T′_{_mean} exhibit random variations at a lower MAE and RMSE level, reflecting their association with state of charge (SOC) related ionic de/intercalation and internal resistance. The averaged MAE and RMSE for all the LFP (1 Ah) cells in G-I are summarized in Fig. 3d–g according to the feature type and time region. Figure 3d, f illustrates the time-wise performance for each feature, revealing that the mean voltage, mean strain and mean values of relative temperature consistently demonstrate superior performance. As a higher order statistic, the limited predictive power of voltage skewness may stem from its ability to capture only specific asymmetries in the voltage profile. While it can offer some insights into battery degradation, LIBs undergo complex and non-linear aging processes, such as solid electrolyte interface (SEI) layer formation, lithium plating, and electrolyte decomposition. These degradation mechanisms can manifest in ways that are not directly captured by voltage skewness alone. Figure 3e, g displays the MAE and RMSE for features across different TRs. Notably, a V_u of 3.05 V yields the lowest MAE and RMSE values. As indicated in Fig. 2c, the TR at 3.05 V is 269 seconds, which reduces the time cost by 90.52% compared to the 2,837 seconds when V_u is set to the initial voltage of the discharging process. Details of the experimental setup are provided in Supplementary Table 3.

Combined features to estimate battery SOH

To enhance the efficiency and accuracy of battery SOH estimation, the electrical, thermal and mechanical features are combined. This combination includes voltage features (V: V__mean, V__kur, V__sk, V__std), temperature features (T: T′__mean) and strain features (S: S__mean). Four combinations are tested: V, VT, VS and VTS. Analysis shows that the shortest extraction time occurs at the V_u of 2.60 V (31 seconds, as shown in Fig. 2c). The best prediction performance for a single feature is observed at V_u of 3.05 V. Setting V_u at the initial voltage covers the complete information of the discharge profile. Thus, three upper voltage limits (V_u) at 2.60 V, 3.05 V, and 3.20 V are selected for comparison. A K-fold cross-validation method (Supplementary Note 2) is employed to estimate the SOH for all the cells in the G-I dataset (details are provided in Supplementary Table 4).

Figure 4a presents the SOH estimation results for cell G-I#6 using the base features in the TRs corresponding to the V_u of 2.60 V, 3.05 V, and 3.20 V, based on data accumulated from the previous ten cycles. The V_u of 3.05 V yields the best estimation results compared to V_u values of 2.60 V and 3.20 V. The density distribution plot of capacity estimation errors shown in the inserted figure reveals that errors are more tightly clustered around zero under the 3.05 voltage platform, in contrast to the broader error distributions observed at the 2.60 and 3.20 voltage platforms. Cross-validation results shown in Fig. 4b indicate that the predicted SOH closely matches the actual SOH. This suggests that the base features extracted from this TR for fourteen cells operated at 20 °C, 30 °C and 40 °C in the G-I dataset offer high-performance and stable inputs for SOH estimation.

**Fig. 4: K-fold cross-validation method for the SOH estimation of each cell in G-I Dataset.**

The averaged MAE and RMSE of the SOH estimation using various combined features at Vu of 3.05 V are displayed in Fig. 4c. With the addition of auxiliary features, the estimation accuracy is enhanced compared to using a single feature alone. The best performance is achieved by incorporating both thermal feature and strain feature, resulting in an improvement of 53.98% in MAE and 43.63% in RMSE. These results provide solid evidence for the value of integrating relevant features, and reveal the contribution of combining features on substantially improving the accuracy of SOH estimation.

Model deployment capability

In practical application, models should be able to rapidly adapt to various operating conditions, such as fluctuating temperature, current densities, and service lifetime. Unlike the constant operating conditions in G-I, the G-II dataset is collected under room conditions with temperature ranging from 20 °C to 40 °C, while the G-III dataset involves variation in both temperature and current rates (Supplementary Tables 1 and 2). To assess model robustness, the models are first trained using the G-I dataset at the V_u of 3.05 V with different combined features, and then tested on the G-II and G-III datasets. Detailed experimental information is provided in Supplementary Table 5.

The prediction accuracy for the G-II dataset is shown in Fig. 5a, with specific cell results detailed in Supplementary Fig. 7, demonstrating high prediction capability under variable temperatures. Despite the susceptibility of LIBs to thermal factors, the estimation errors for each cycle with combined features remain predominantly clustered around zero error (Fig. 5b). However, the presence of multiple peaks suggests that the thermal feature and strain feature are more noticeably affected by environmental temperature. It should be noticed that both G-I and G-II datasets are collected under relatively low current rates, with limited influence by thermal release from Joule heating. Nevertheless, room temperature fluctuations still affect the signals collected from individual samples. In contrast, the combination of voltage, temperature, and strain features achieves lower error rates due to their complementary effects. When an additional parameter, such as current density, is introduced, the estimated SOHs are still confined within a narrow range of real SOH but exhibit a more disordered and discrete pattern (shown in Fig. 5c). The inducement of variable current density, especially at high rates, alters the weight of thermal and strain factor in estimation framework. Consequently, as shown in Fig. 5d, the peaks of GII-VT, GII-VS, and GII-VTS become more dispersed. Figure 5e presents the averaged MAE and RMSE of the estimation results in G-II and G-III datasets. The model utilizing the combined features of VTS achieves the best estimation results across all the sample groups, but the differences between VT and VS become less when exposed to additional variables with current rates. This is attributed to the role of different factors in complex cases. As an electrochemical process involves strong thermal release from Joule heat, both electrical and thermal features become dominant in the model. Therefore, developing models with more relevant features is a promising approach to address multi-dimensional variations in practical applications.

**Fig. 5: Battery SOH estimation results for G-II and G-III datasets.**

To highlight the efficiency of our model in rapidly estimating battery SOH from unknown states, we use data from just one previous cycle for rapid estimation. As shown in Supplementary Fig. 8, our model with VTS combine features achieves exceptionally low MAE (ranging from 3.013–6.319%) and RMSE (ranging from 3.093–6.471%) across these three groups. The experimental details are provided in Supplementary Note 3 and Supplementary Table 6, and results are depicted in Supplementary Fig. 8.

Model validation by transfer learning

To assess the applicability of our models to other types of electrode materials, a test is conducted with the dataset from G-IV, which collects the electrochemical and physical features from NCM and NCA cells. Given the difference in voltage, temperature and strain profiles between LFP and NCM/ NCA cells (Supplementary Fig. 9), the models are retrained to accommodate these variations (set V_u to extract features are shown in Supplementary Note 4 and Supplementary Fig. 10). The models were fine-tuned (FT) by adjusting the weights using three NCM samples and three NCA pouch samples from the G-IV dataset. Transfer learning was employed to improve the adaptability to the NCM and NCA cells. Then the estimation performance of the models before and after FT is shown in Fig. 6a, b, with specific cells results are shown in Supplementary Fig. 11. The experimental details are outlined in Supplementary Table 7.

**Fig. 6: The material domain transfer validation for SOH estimation of each cell in G-IV (LiNi_0.6Co_0.2Mn_0.2O₂ (NCM) and LiNi_0.92Co_0.05Al_0.03O₂ (NCA) battery).**

Compared to the models prior to FT, the models after FT exhibit a more concentrated distribution of estimation error around the ideal case (Fig. 6c), with lower MAE and RMSE values (Fig. 6d). This improvement indicates the effectiveness of the FT in mitigating the adverse effects caused by feature differences during the transfer learning process. It is observed that both the pre- and post-FT models exhibit higher errors in SOH estimation for NCM cells compared to NCA cells. To investigate this discrepancy, we analyzed the distribution of cycling numbers for LFP, NCM, and NCA cells under the same operating conditions. Notably, the NCA cells exhibit a sample size distribution pattern similar to that of LFP cells (Fig. 6e), suggesting that the cycling number weights in the model trained on the G-I dataset are more aligned with those of the NCA cells. Despite some level of estimation error across different electrode materials, these results provide strong evidence that our model can be effectively applied to different electrode material domains. This capability makes it well-suited for clustering mixed spent cells for sustainable recycling.

Classification for batteries after service

As demonstrated in previous sections, the model developed in this work excels in both efficiency and accuracy when assessing the SOH of LIBs with various electrode materials and pouch sizes. Given that safety is a top priority for battery operation, accurately determining the SOH of unknown cells is crucial for appropriately categorizing retired batteries for secondary use. To address this, a K-means clustering model was created based on the SOH and voltage standard deviation using aging data from 14 LFP batteries in the G-I dataset, as well as 6 NCM and 6 NCA batteries in the G-IV dataset.

As illustrated in Fig. 7, retired batteries are preliminarily categorized into distinct groups based on their capacity retention (SOH) and voltage stability (quantified by standard deviation); safety thresholds (e.g., SOH < 80%) are subsequently applied during post-clustering refinement to optimize battery allocation for secondary applications. This clustering serves as a priority-grading step to guide secondary applications (e.g., grid storage, low-power devices) rather than replacing safety-critical evaluations.

**Fig. 7: Clustering results by K-means.**

LFP Cluster 1 encompasses cells exhibiting minimal degradation, directly recommended for secondary reuse in high-performance systems. Cluster 2 (80.00–91.90% SOH) includes cells with moderate capacity loss but stable voltage profiles, which are preliminarily flagged for non-critical applications like grid storage, while Cluster 3 (75.33–80.00% SOH) comprises severely aged cells marked for immediate recycling to recover raw materials. In contrast, the NCM dataset is segmented into five clusters due to its limited sample size and heterogeneous degradation patterns. Cells with a SOH below 76.1% are flagged for disassembly to recover raw materials, while the NCA dataset employs a separation boundary at 77.5% SOH for similar material recovery purposes. A detailed overview is provided in Supplementary Fig. 12. Critically, clustering is not the final decision-making step. For applications requiring stringent safety or performance thresholds (e.g., electric vehicles, high-power systems), post-clustering analysis requires the integration of additional parameters such as temperature trends, strain measurements, and voltage fluctuations to comprehensively assess their suitability for secondary use. This aligns with industrial workflows where broad initial sorting. By combining SOH ranges with multi-modal degradation indicators, our methodology prioritizes efficiency in preliminary sorting. This approach addresses the inherent variability in retired batteries and supports tailored reuse strategies based on real-world requirements.

Conclusion

Current commercialized battery SOH estimation heavily relies on electrical data to evaluate degradation levels over the lifetime, which consumes enormous time and resources for data collection, processing, and analysis by the BMS. The redundant datasets are hard to further improve the prediction accuracy, and are unable to cope with the requirement of fast estimation and classification. Hence, in the present study, we design a deep-learning framework to achieve efficient and reliable battery SOH estimation with the combined features. Proposed models can be applied to a wide variety of electrode materials and deployed in various application scenarios. With the addition of mechanical and thermal features as model input, the estimation accuracy and efficiency can be highly improved. This improvement is attributed to the nature of electrochemical processes, where current flow and localized chemical reactions simultaneously occur in the battery system. Thus, constructing a model with the combination of relevant features indicates grasping the overall situation with a confined approach, which reflects the realistic process more completely and precisely. However, this complex process is the result of multi-factors, the contribution of inhomogeneous SEI formation and evolution, the impact of current density on heterogeneous reactions and heat release, the level of irreversible chemical reactions and Li ionic loss, leading to a varied weight in distinct conditions. Our validation in Fig. 5e merely provides a limited example case; the estimation accuracy of combined VS, VT, or VTS features may perform differently in other extreme situations. In regular circumstances or within a certain temperature range, the mechanical feature has better performance because the weight of the strain factor is critical. In addition, the temperature indicator is influenced by the location of the sensor and uneven internal resistance from each individual sample, whereas the strain sensor suggests general information with high correlation to electrochemical data. As it comes to a high current condition or operates with a large amount of thermal release, the weight of thermal factor is increased, thus models using VT have improved performance in G-III. The VTS features have obvious advantages in higher current rates, or thermal risk scenarios, which often happen in the retired batteries.

In summary, our work highlights the significant value of relevant physical features and demonstrates the effectiveness of a deep-learning approach for efficient and reliable battery classification across various scenarios. Enhanced SOH estimation offers valuable guidance for developing battery management systems and their practical deployment in sustainable recycling applications. We anticipate that machine learning, with expanded dimensions to address data redundancy and reduce time consumption, will become a standard technique in recycling economics for sustainable development.

Methods

Statistical feature

In this paper, we have calculated six features for three types of curves (voltage, temperature, and strain) by performing mathematical operations on the data in the TRs corresponding to the voltage thresholds. These six features are V__mean, V__kur, V__sk, V__std, T’__mean, and S__mean, where V__mean is denoted as μ and V__std is denoted as σ. The calculations are performed as follows:

$${V}_{{{\rm{\_}}}{mean}}=\mu =\frac{1}{n}{\sum }_{i=1}^{n}{v}_{i}$$

(1)

$${V}_{{{\rm{\_}}}{std}}=\sigma =\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({v}_{i}-\mu \right)}^{2}}$$

(2)

$${V}_{{{\rm{\_}}}{kur}}=\frac{1}{n}{\sum }_{i=1}^{n}\left[{\left(\frac{{v}_{i}-\mu }{\sigma }\right)}^{4}\right]$$

(3)

$${V}_{{{\rm{\_}}}{sk}}=\frac{1}{n}{\sum }_{i=1}^{n}\left[{\left(\frac{{v}_{i}-\mu }{\sigma }\right)}^{3}\right]$$

(4)

$${T}_{{{\rm{\_}}}{mean}}^{{\prime} }=\frac{1}{3n}{\sum }_{i=1}^{n}({{T1}^{{\prime} }}_{i}+{{T2}^{{\prime} }}_{i}+{{T3}^{{\prime} }}_{i})$$

(5)

$${S}_{{{\rm{\_}}}{mean}}=\frac{1}{n}{\sum }_{i=1}^{n}{S}_{i}$$

(6)

where n denotes the number of sample points in the TR, ${v}_{i}$ denotes each voltage data in the TR, (T1′, T2′, T3′) denotes the data for the three relative temperatures in the TR, and S_i denotes each analog strain data in the TR.

Feature prediction capability

The following equation is used to quantify battery SOH:

$${SOH}=\frac{{C}_{i}}{C}$$

(7)

Where C_i denotes the capacity released by the complete discharging of the present cycle, and C denotes the nominal capacity of the battery. The ρ of the features is measured by calculating the Pearson linear correlation coefficient (PLCC), and we denote the six features calculated as F: ${F}_{1},{F}_{2},{F}_{3},{F}_{4},{F}_{5}\,{{\mathrm{and}}}\,{F}_{6}$. n is the total number of estimated capacity values.

$$\rho =\frac{{\sum }_{i=1}^{n}\left({C}_{i}-\bar{C}\right)\left({F}_{i}-\bar{F}\right)}{\sqrt{\mathop{\sum }_{i=1}^{N}{\left({C}_{i}-\bar{C}\right)}^{2}}\sqrt{\mathop{\sum }_{i=1}^{N}{\left({F}_{i}-\bar{F}\right)}^{2}}}$$

(8)

Where $\bar{C}$ denotes the mean value of the capacity, and $\bar{F}$ denotes the mean value of the F. It should be noted that the Pearson linear correlation coefficient for T′__mean is the average of the Pearson linear correlation coefficients for the three temperatures (T1′, T2′, T3′)

Model explanation

This study focuses on LIBs SOH estimation, which is interpreted as a regression task, for which a model is designed for estimating the SOH. Essentially, when we are given a input vector X, which containing L steps and K feature channels, the structure of X can be shown as L × K, L set to 10. Then the DNN model obtains the input vector X and the output vector Y, which is represented by the following equation:

$$Y={{\rm{Linear}}}({{\rm{ReLU}}}({{\rm{LSTM}}}(X)))$$

(9)

The LSTM layer is used to obtain hidden vectors. ReLU activation function is used to introduce nonlinear factors to enhance the model’s ability to model complex patterns. A linear layer is used to map the last layer of the model output to the target SOH estimation.

Model training and evaluation

In the model training phase, the Adam optimizer and MSE loss function are used to optimize the weights. Each training has a total of n training samples. Therefore, the MSE loss function used to calculate the difference between the estimated and the real values of the model is as follows:

$${{\rm{MSE}}}=\frac{1}{n}\mathop{\sum }_{i=1}^{n}{\left({Y}_{i}-{Y^{\prime} }_{i}\right)}^{2}$$

(10)

where ${Y}_{i}$ is the real capacity value, ${Y^{\prime} }_{i}$ is the estimated capacity value, and n is the total number of estimated capacity values.

The initial implementation of the algorithm is based on the PyTorch framework to build a model, using NVIDIA GeForce RTX4060 GPUs to accelerate the model training, and Python 3.10.12 for processing and analyzing the data, which contains Pandas, Numpy, and Scikit-Learn. The model is finally deployed on Chengdu Intelligent Computing Center servers using the MindSpore framework.

The model’s ability to estimate individual battery SOH is evaluated based on MAE and RMSE.

$${{\rm{MAE}}}=\frac{1}{n}{\sum }_{i=1}^{n}\left|{Y}_{i}-{Y^{\prime} }_{i}\right|\times 100 \%$$

(11)

$${{\rm{RMSE}}}=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({Y}_{i}-{Y^{\prime} }_{i}\right)}^{2}}\,\times 100 \%$$

(12)

where n is the total number of estimated capacity values of an individual battery.

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

Code availability

Code for the modeling work is available from the corresponding authors upon request.

References

Gu, F. et al. An investigation of the current status of recycling spent lithium-ion batteries from consumer electronics in China. J. Clean. Prod. 161, 765–780 (2017).
Article Google Scholar
Duan, J. et al. Building safe lithium-ion batteries for electric vehicles: a review. Electrochem. Energy Rev. 3, 1–42 (2020).
Article Google Scholar
Han, X., Ji, T., Zhao, Z. & Zhang, H. Economic evaluation of batteries planning in energy storage power stations for load shifting. Renew. Energy 78, 643–647 (2015).
Article Google Scholar
Grey, C. P. & Hall, D. S. Prospects for lithium-ion batteries and beyond—a 2030 vision. Nat. Commun. 11, 6279 (2020).
Article Google Scholar
Zhang, Q. & White, R. E. Capacity fade analysis of a lithium ion cell. J. Power Sources 179, 793–798 (2008).
Article Google Scholar
Bharathraj, S. et al. An efficient and chemistry independent analysis to quantify resistive and capacitive loss contributions to battery degradation. Sci. Rep. 9, 6576 (2019).
Article Google Scholar
Heenan, T. M. M. et al. Mapping internal temperatures during high-rate battery applications. Nature 617, 507–512 (2023).
Article Google Scholar
Zhou, L., Garg, A., Zheng, J., Gao, L. & Oh, K. Battery pack recycling challenges for the year 2030: Recommended solutions based on intelligent robotics for safe and efficient disassembly, residual energy detection, and secondary utilization. Energy Storage 3, e190 (2021).
Article Google Scholar
Harper, G. et al. Recycling lithium-ion batteries from electric vehicles. Nature 575, 75–86 (2019).
Article Google Scholar
Zhang, Q., Li, X., Du, Z. & Liao, Q. Aging performance characterization and state-of-health assessment of retired lithium-ion battery modules. J. Energy Storage 40, 102743 (2021).
Article Google Scholar
Xu, Z. et al. A novel clustering algorithm for grouping and cascade utilization of retired Li-ion batteries. J. Energy Storage 29, 101303 (2020).
Article Google Scholar
Feng, H. & Song, D. A health indicator extraction based on surface temperature for lithium-ion batteries remaining useful life prediction. J. Energy Storage 34, 102118 (2021).
Article Google Scholar
Feng, F. et al. Propagation mechanisms and diagnosis of parameter inconsistency within Li-Ion battery packs. Renew. Sustain. Energy Rev. 112, 102–113 (2019).
Article Google Scholar
Belt, J. R., Ho, C. D., Miller, T. J., Habib, M. A. & Duong, T. Q. The effect of temperature on capacity and power in cycled lithium ion batteries. J. Power Sources 142, 354–360 (2005).
Article Google Scholar
Li, R. et al. Effect of external pressure and internal stress on battery performance and lifespan. Energy Storage Mater. 52, 395–429 (2022).
Article Google Scholar
Tao, S. et al. Collaborative and privacy-preserving retired battery sorting for profitable direct recycling via federated machine learning. Nat. Commun. 14, 8032 (2023).
Article Google Scholar
Amir, S. et al. Dynamic equivalent circuit model to estimate state-of-health of lithium-ion batteries. IEEE Access 10, 18279–18288 (2022).
Article Google Scholar
Liu, S. et al. A method for state of charge and state of health estimation of lithium-ion battery based on adaptive unscented Kalman filter. Energy Rep. 8, 426–436 (2022).
Google Scholar
Liu, B., Tang, X. & Gao, F. Joint estimation of battery state-of-charge and state-of-health based on a simplified pseudo-two-dimensional model. Electrochim. Acta 344, 136098 (2020).
Article Google Scholar
Balke, N. et al. Nanoscale mapping of ion diffusion in a lithium-ion battery cathode. Nat. Nanotechnol. 5, 749–754 (2010).
Article Google Scholar
Hong, J. et al. Online accurate state of health estimation for battery systems on real-world electric vehicles with variable driving conditions considered. J. Clean. Prod. 294, 125814 (2021).
Article Google Scholar
Ng, M. F., Zhao, J., Yan, Q., Conduit, G. J. & Seh, Z. W. Predicting the state of charge and health of batteries using data-driven machine learning. Nat. Mach. Intell. 2, 161–170 (2020).
Article Google Scholar
Li, X., Ju, L., Geng, G. & Jiang, Q. Data-driven state-of-health estimation for lithium-ion battery based on aging features. Energy 274, 127378 (2023).
Article Google Scholar
Brembeck, J. A physical model-based observer framework for nonlinear constrained state estimation applied to battery state estimation. Sensors 19, 4402 (2019).
Article Google Scholar
Klass, V., Behm, M. & Lindbergh, G. A support vector machine-based state-of-health estimation method for lithium-ion batteries under electric vehicle operation. J. Power Sources 270, 262–272 (2014).
Article Google Scholar
Li, Y. et al. Random forest regression for online capacity estimation of lithium-ion batteries. Appl. Energy 232, 197–210 (2018).
Article Google Scholar
Peng, S. et al. State of health estimation of lithium-ion batteries based on multi-health features extraction and improved long short-term memory neural network. Energy 282, 128956 (2023).
Article Google Scholar
Xu, H. et al. An improved CNN-LSTM model-based state-of-health estimation approach for lithium-ion batteries. Energy 276, 127585 (2023).
Article Google Scholar
Naha, A. et al. An incremental voltage difference based technique for online state of health estimation of Li-ion batteries. Sci. Rep. 10, 9526 (2020).
Article Google Scholar
Pan, W. et al. A health indicator extraction and optimization for capacity estimation of Li-ion battery using incremental capacity curves. J. Energy Storage 42, 103072 (2021).
Article Google Scholar
Wang, Q., Ye, M., Cai, X., Sauer, D. U. & Li, W. Transferable data-driven capacity estimation for lithium-ion batteries with deep learning: A case study from laboratory to field applications. Appl. Energy 350, 121747 (2023).
Article Google Scholar
Roman, D., Saxena, S., Robu, V., Pecht, M. & Flynn, D. Machine learning pipeline for battery state-of-health estimation. Nat. Mach. Intell. 3, 447–456 (2021).
Article Google Scholar
Zhu, J. et al. Data-driven capacity estimation of commercial lithium-ion batteries from voltage relaxation. Nat. Commun. 13, 2261 (2022).
Article Google Scholar
Liu, K. et al. Transfer learning for battery smarter state estimation and ageing prognostics: Recent progress, challenges, and prospects. Adv. Appl. Energy 9, 100117 (2023).
Article Google Scholar
Shen, L., Li, J., Meng, L., Zhu, L. & Shen, H. T. Transfer Learning-based state of charge and state of health estimation for Li-ion batteries: a review. IEEE Trans. Transp. Electr. PP, 1 (2023).
Google Scholar
Ma, G. et al. Real-time personalized health status prediction of lithium-ion batteries using deep transfer learning. Energy Environ. Sci. 15, 4083–4094 (2022).
Article Google Scholar
Huet, F. A review of impedance measurements for determination of the state-of-charge or state-of-health of secondary batteries. J. Power Sources 70, 59–69 (1998).
Article Google Scholar
Zhang, Y. et al. Identifying degradation patterns of lithium-ion batteries from impedance spectroscopy using machine learning. Nat. Commun. 11, 1706 (2020).
Article Google Scholar
Albero Blanquer, L. et al. Optical sensors for operando stress monitoring in lithium-based batteries containing solid-state or liquid electrolytes. Nat. Commun. 13, 1153 (2022).
Article Google Scholar
Huang, J. et al. Operando decoding of chemical and thermal events in commercial Na(Li)-ion cells via optical sensors. Nat. Energy 5, 674–683 (2020).
Article Google Scholar
Zhang, S., Zhao, K., Zhu, T. & Li, J. Electrochemomechanical degradation of high-capacity battery electrode materials. Prog. Mater. Sci. 89, 479–521 (2017).
Article Google Scholar
Yoshida, T. et al. Degradation mechanism and life prediction of Lithium-ion batteries. J. Electrochem. Soc. 153, A576–A582 (2006).
Article Google Scholar
Deng, Z. et al. General discharge voltage information enabled health evaluation for Lithium-ion batteries. IEEE/ASME Trans. Mechatron. 26, 1295–1306 (2021).
Article Google Scholar
Severson, K. A. et al. Data-driven prediction of battery cycle life before capacity degradation. Nat. Energy 4, 383–391 (2019).
Article Google Scholar

Download references

Acknowledgements

We would like to express our sincere gratitude to the Chengdu Intelligent Computing Center in China for their computational support.

Author information

These authors contributed equally: Jie Wu, Zhongxian Sun, Dingquan Li.

Authors and Affiliations

College of Electrical Engineering, Sichuan University, Chengdu, China
Jie Wu, Zhongxian Sun, Weilin He, Haiyan Tu & Xin He
Pengcheng Laboratory, Shenzhen, China
Dingquan Li
Faculty of Engineering, Architecture and Information Technology (EAIT), University of Queensland, St Lucia, QLD, Australia
Dongchen Yang & Xin He
School of Chemical Engineering, Sichuan University, Chengdu, China
Zhenguo Wu, Xin Geng & Hailong Wang
The Key Laboratory of Renewable Energy, China Tower Corporation Limited, Beijing, China
Hui Yang
School of Microelectronics, Southern University of Science and Technology, Shenzhen, China
Linyu Hu

Authors

Jie Wu
View author publications
Search author on:PubMed Google Scholar
Zhongxian Sun
View author publications
Search author on:PubMed Google Scholar
Dingquan Li
View author publications
Search author on:PubMed Google Scholar
Weilin He
View author publications
Search author on:PubMed Google Scholar
Dongchen Yang
View author publications
Search author on:PubMed Google Scholar
Zhenguo Wu
View author publications
Search author on:PubMed Google Scholar
Xin Geng
View author publications
Search author on:PubMed Google Scholar
Hui Yang
View author publications
Search author on:PubMed Google Scholar
Hailong Wang
View author publications
Search author on:PubMed Google Scholar
Linyu Hu
View author publications
Search author on:PubMed Google Scholar
Haiyan Tu
View author publications
Search author on:PubMed Google Scholar
Xin He
View author publications
Search author on:PubMed Google Scholar

Contributions

X.H. designed the experiment and supervised the work. Z.X.S. collected and analyzed the experimental data. J.W. and Z.X.S. developed the model to realize the battery SOH estimation and classification. W.L.H., X.G., and Z.G. W. participated in data processing and analysis. D.Q.L. and H.Y.T. provided technical guidance for the development of the deep neural network model. J.W., D.C.Y., H.Y., L.Y.H., and X.H. revised the manuscript.

Corresponding author

Correspondence to Xin He.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

: Communications Engineering thanks Tobias Hofmann and the other anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: [Jiangong Zhu] and [Miranda Vinay and Rosamund Daw].

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wu, J., Sun, Z., Li, D. et al. Efficient estimating and clustering lithium-ion batteries with a deep-learning approach. Commun Eng 4, 151 (2025). https://doi.org/10.1038/s44172-025-00488-1

Download citation

Received: 24 November 2024
Accepted: 01 August 2025
Published: 12 August 2025
Version of record: 12 August 2025
DOI: https://doi.org/10.1038/s44172-025-00488-1