Introduction

Rechargeable lithium-ion batteries (LIBs) are widely used in portable electronics1, electric vehicles (EV)2, and energy storage systems3. As the demand for clean and renewable energy grows, the diverse applications of LIBs in electrical energy storage will make a significant contribution to reducing carbon emissions and ultimately mitigating global warming4. Modern commercial LIBs have been highly optimized, from chemical composition to manufacturing technology, enabling a service lifespan ranging from months to decades. Energy is efficiently stored and utilized through reversible electrochemical reactions within the battery, while a certain level of irreversible degradation reactions also occurs with the cycling process, including active material loss5 and increased impedance6 over time. This degradation leads to performance issues like capacity fading, mechanical failure, and thermal instability7. To prevent unexpected failures and safety concerns, batteries in EVs should be replaced once their real-time capacity drops to 80% of the initial capacity8.

The rapidly growing battery industry faces an imminent increase in spent LIBs, creating a substantial waste management challenge for sustainable development. However, this also presents an opportunity to recover economic value by recycling these batteries into raw materials or repurposing them for secondary utilization9. Currently, secondary applications of LIBs are typically carried out at the module level10. However, there remains a latent risk posed by severely degraded individual cells within the module. To accurately cluster each individual cell based on its state of health (SOH), the most reliable approach involves disassembling the pack or module for precise identification. This assessment procedure maximizes the residual value of the cells, as opposed to opting for complete recycling when some cells are failing11. By ensuring the optimal performance and safe operation of regrouped cells in their reconditioned second life, this method enhances the overall value and functionality of the battery system. However, several challenges arise, including the mixture of cells with unknown chemical compositions or varying types and sizes from different manufacturers12,13. Specific application scenarios and dynamic operating conditions, such as current rates, voltage ranges, and temperature fluctuations, further complicate the situation14. Additionally, each cell within a pack or module displays a unique non-linear degradation pattern due to thermal diffusion and pressure distribution15. Therefore, accurately estimating the SOH of spent LIBs or unknown-condition LIBs is crucial for effective clustering, which in turn enhances recycling safety and sustainability16.

Previous studies on SOH estimation for LIBs using model-based approaches include the equivalent circuit model (ECM)17, Kalman filter (KF) model18, and pseudo two dimensions (P2D) model19. The accuracy of the output from these models not only relies on physical insights and intricate calculations from a lengthy development process, but also on an understanding of degradation mechanisms affecting electrochemical response and dynamic behaviors, such as ions diffusion20. Additionally, model-based approaches face challenges in scaling across variations in chemistry, manufacturers, and operating conditions21. To address these challenges, some of recent research have shifted towards data-driven approaches for estimating battery SOH22,23. The data-driven approach involves using statistical machine learning on experimental datasets to establish correlations across diverse operating conditions, including various cell chemistries and manufacturers, without modeling the physical mechanisms24. Various methods have been used for battery SOH estimation, such as the support vector machines (SVM)25, random forests (RF)26, long short-term memory (LSTM) network27, and convolutional neural networks (CNN)28. Extracting correlation features, such as peak area, peak position, and peak width from incremental capacity curves for estimating battery SOH is considered as a well-established technique29,30,31. However, most of these features are derived from complete charge and discharge curves, which require high-frequency data acquisition to obtain incremental capacity and differential voltage curves. This process involves handling large amounts of data, placing a significant burden on the battery management system32. To overcome the limitation of voltage profiles, Michael Knapp et al.33 developed a mathematical approach to convert the voltage profiles in the relaxation state into statistical features, which enables extracting data for mathematical computation and converting it into statistical features within any voltage interval. In order to realize rapid estimation, Darius Roman et al.32 employed a voltage thresholding approach to extract features within 15 minutes over the charging voltage and charging current curves. The limitation associated with the factor is the voltage threshold, whereas an inappropriate setting may lead to the loss of important information for estimation. In practice, deep learning-based methods are able to derive appropriate data-driven models to accomplish accurate battery SOH estimation with frequently changed load, usage conditions, and electrode materials. A transfer learning framework allows retraining the model to correct the deviation of the model from the source domain to the target domain34,35. Ye Yuan et al.36 developed a deep transfer learning approach to realize personalized battery health status prediction using the cycling knowledge from cells with completely different usage protocols, charge–discharge configurations, and battery chemistries. While these approaches have demonstrated satisfactory performance based solely on sufficient electrical features, the demand for a large quantity of high-quality datasets remains a significant challenge. This requirement burdens the practical application of data-driven methods, impacting their speed, accuracy, and suitability for practical deployment.

An alternative is to reduce the scale of data processing while still obtaining reliable, relevant features for accurate estimation in practical applications. Compared to features extracted from current-voltage datasets, electrochemical impedance spectroscopy (EIS) provides representative information on materials properties and interfacial conditions, which correlates with degradation inside the battery37. Although it is debatable whether the spectrum fitted non-unique equivalent circuit model fully captures the physical, chemical and materials properties as well as the degradation process within the battery, the low-frequency region of the EIS spectra provides valuable features for accurate SOH estimation38. In addition to changes in impedance spectra, cells also exhibit thermal and mechanical variations during the electrochemical process, which serve as alternative indicators for battery SOH. Correlating internal stress diagnostics with the voltage profile provides insights into chemo-mechanical processes at the interfaces and within electrodes, and helps distinguish the porosity behavior of Si-based electrodes39. Decoupling temperature and pressure signals allows tracking of chemical events, such as solid electrolyte interphase (SEI) formation and structural evolution40. The degradation reactions proceed in the form of irreversible volume expansion41 and thickening of the SEI layer42, associated with the accumulation of thermal and mechanical effects. Consequently, these thermal and mechanical features offer valuable insights for fundamental research and hold potential for enhancing model training.

In this study, we designed a deep neural network (DNN) model for efficiently estimating and clustering retired and unknown-state LIBs. The constructed framework is leveraged to integrate electrical features with thermal or mechanical features from the dataset of 46 pouch cells (including commercial and lab-assembled cells) with the varied manufacturers, cell sizes, electrode materials, and usage conditions. As illustrated in Fig. 1, our approach begins with a base model using LiFePO4 (LFP) battery system under various operating conditions, including current densities, cycle number,s, and cell sizes. We developed a dynamic voltage thresholding method to extract critical features for SOH estimation in 269 seconds. The data fusion method, combining thermal or mechanical features, improves the performance of the model by reducing mean absolute error (MAE) from 1.786% to 0.822% and root mean square error (RMSE) from 1.859% to 1.048%. Notably, our model achieves rapid and accurate estimation using only one discharge profile. Validation at room temperature with varied current rates demonstrated strong conditional adaptability, with MAE of 1.382% and RMSE of 1.635%. Fine-tuning model parameters with 1109 and 2631 experimental samples from LiNi0.6Co0.2Mn0.2O2 (NCM622) and LiNi0.92Co0.05Al0.03O2 (NCA) batteries, respectively, achieved transfer model performance metrics of MAE 1.067% and 0.817%, and RMSE of 1.186% and 0.939%. These results highlight the efficiency, versatility, and accuracy of our model in estimating battery SOH, enabling reliable classification for sustainable recycling. Following SOH estimation, retired batteries are first clustered via K-means based on critical degradation indicators, then refined through multimodal metric analysis (e.g., applying an 80% SOH threshold) to align with specific secondary applications. This two-tiered classification framework mitigates the intrinsic heterogeneity of retired batteries while supporting customizable, application-driven reuse strategies.

Fig. 1
figure 1

Illustration of the machine learning approach with a deep neural network model for efficient battery estimation and classification in sustainable recycling.

Results

Data generation

In this study, 46 LIBs from three suppliers (Supplementary Note 1. Datasets and Supplementary Table 1) were subjected to cyclic aging tests under controlled experimental conditions, generating a total of 71,559 data-cycles. The datasets are divided into four groups based on operating temperature, charging/discharging current rate, and electrode material types: group G-I, G-II, G-III and G-IV (Supplementary Table 1). Each cell and its corresponding operating conditions are detailed in Supplementary Table 2.

The electrical dataset commonly used in battery management systems (BMS) includes voltage, current, and capacity, recorded during battery charging and discharging management at a frequency of 1 Hz. Though the energy conversion process is reflected byan electrical signal, related details for electrochemical reactions are buried. To address this, our approach also collects a dataset on relevant physical features, including surface temperatures at three different points (T1, T2, and T3), ambient temperature (T), and surface strain (S), recorded by a micro-control processor combined with sensors at 1 Hz. This provides indirect insights into structural degradation and internal resistance evolution. The sensor array placement on the cell surface is shown in Supplementary Fig. 1. The collected signals are pre-processed to enhance quality by removing outliers and filling missing values, as illustrated in Supplementary Note 5. The file forms, data names and units of the dataset are also normalized to ensure data consistency.

Designing dynamic voltage thresholding to extract features

The estimation of battery SOH is primarily performed based on the extracted features correlated with the discharge profile43. Figure 2a shows the voltage curve of cell G-I#2 during the discharge process. The voltage curve shifts slightly to the left as the cycle number increases, which attribute to the aging of the battery. This pronounced trend provides statistical features by mathematical calculation32,33,44, each discharging voltage curve is transformed into four statistical features: mean (V_mean), kurtosis (V_kur), skewness (V_sk), and standard deviation of the voltage (V_std).

Fig. 2: Feature extraction and analysis on cell G-I#2.
figure 2

a The discharge voltage profile at different cycles. b Design of dynamic voltage threshold in the discharging voltage profile. c The mean value for length of time within each time region (TR) from the total cycles of LiFePO4 (LFP) cells (1 Ah) in G-I dataset. d Variation of the relative temperature (T1′ = T1-T) profile with cycling. e Variation of the analog strain profile with cycling.

Overlaying discharge profiles reveal a knee compression after the long plateau region, which distinctly reflects capacity fading behaviors. To avoid information redundancy when using the entire voltage profile for model training, a dynamic voltage thresholding method with specific rules is designed (Fig. 2b) to effectively extract features from the discharging voltage curve. The voltage threshold is divided by a lower limit of voltage (Vl) and an upper limit of voltage (Vu). The Vl is fixed at the discharging cutoff voltage, while the Vu ranges from the initial discharge voltage to sliding values within a 2.60 V–3.20 V range with an interval of 0.05 V. The time region (TR) is determined by Vu, and the length of each TR for the experimental samples corresponding to LFP cells (1 Ah) in G-I dataset is collected (Fig. 2c).

As cycling proceeding, the relative temperature (T1′ = T1-T) profile for cell G-I#2 is shown in Fig. 2d. Supplementary Fig. 2 displays temperature variations at spatially distributed points and ambient temperature curves for different cells. The analog strain (filtered using Kalman filtering and calculated from raw data) of cell G-I#2 is presented in Fig. 2e, with surface strain curves for different cells are shown in Supplementary Fig. 3. These thermal and mechanical curves exhibit quantifiable variations across cycles, suggesting a correlation between physical features and the capacity fading. The mean relative temperatures (T_mean), which include T1′_mean, T2′_mean, T3′_mean, and the mean of the strains (S_mean) are calculated as features for estimating the SOH of the cells in each TR (Supplementary Fig. 4). Supplementary Fig. 5, 6 illustrate the correlation of voltage, temperature, and strain with capacity.

Capability of single features to estimate battery SOH in different TRs

The prediction capability (ρ) of individual features across different TRs are assessed, and their correlation for LFP cells (1 Ah) in G-I dataset are listed in Fig. 3a. The ρ of the voltage feature increases gradually with TRs, reaches the optimal value with the Vu ranging from 3.10 V to 2.95 V, and then declines with further increases in Vu. The ρ of the strain feature remains consistently around 0.73 across all TRs, while the ρ of temperature feature initially decreases as the Vu lowers, but stabilizes between 3.10 V to 2.60 V. These differences in mechanical and thermal factors stabilize the model framework with their inherent physical correlations during electrochemical process.

Fig. 3: Predictive capability (ρ) and estimation performance with single feature.
figure 3

a The matrix for ρ of the relevant features measured by calculating the Pearson linear correlation coefficient. The b MAE and c RMSE of battery SOH estimation in different TRs. The d MAE and f RMSE for each feature in time wise, and e MAE and g RMSE for each TR in feature-wise.

The performance of SOH estimation with single feature input is displayed in Fig. 3b, c, with the MAE and RMSE across different TRs. The performance varies among the representative features during the discharge process. Features such as V_sk display high value in their overall trend. In contrast, V_mean, V_kur, V_std, S_mean and T_mean exhibit random variations at a lower MAE and RMSE level, reflecting their association with state of charge (SOC) related ionic de/intercalation and internal resistance. The averaged MAE and RMSE for all the LFP (1 Ah) cells in G-I are summarized in Fig. 3d–g according to the feature type and time region. Figure 3d, f illustrates the time-wise performance for each feature, revealing that the mean voltage, mean strain and mean values of relative temperature consistently demonstrate superior performance. As a higher order statistic, the limited predictive power of voltage skewness may stem from its ability to capture only specific asymmetries in the voltage profile. While it can offer some insights into battery degradation, LIBs undergo complex and non-linear aging processes, such as solid electrolyte interface (SEI) layer formation, lithium plating, and electrolyte decomposition. These degradation mechanisms can manifest in ways that are not directly captured by voltage skewness alone. Figure 3e, g displays the MAE and RMSE for features across different TRs. Notably, a Vu of 3.05 V yields the lowest MAE and RMSE values. As indicated in Fig. 2c, the TR at 3.05 V is 269 seconds, which reduces the time cost by 90.52% compared to the 2,837 seconds when Vu is set to the initial voltage of the discharging process. Details of the experimental setup are provided in Supplementary Table 3.

Combined features to estimate battery SOH

To enhance the efficiency and accuracy of battery SOH estimation, the electrical, thermal and mechanical features are combined. This combination includes voltage features (V: V_mean, V_kur, V_sk, V_std), temperature features (T: T′_mean) and strain features (S: S_mean). Four combinations are tested: V, VT, VS and VTS. Analysis shows that the shortest extraction time occurs at the Vu of 2.60 V (31 seconds, as shown in Fig. 2c). The best prediction performance for a single feature is observed at Vu of 3.05 V. Setting Vu at the initial voltage covers the complete information of the discharge profile. Thus, three upper voltage limits (Vu) at 2.60 V, 3.05 V, and 3.20 V are selected for comparison. A K-fold cross-validation method (Supplementary Note 2) is employed to estimate the SOH for all the cells in the G-I dataset (details are provided in Supplementary Table 4).

Figure 4a presents the SOH estimation results for cell G-I#6 using the base features in the TRs corresponding to the Vu of 2.60 V, 3.05 V, and 3.20 V, based on data accumulated from the previous ten cycles. The Vu of 3.05 V yields the best estimation results compared to Vu values of 2.60 V and 3.20 V. The density distribution plot of capacity estimation errors shown in the inserted figure reveals that errors are more tightly clustered around zero under the 3.05 voltage platform, in contrast to the broader error distributions observed at the 2.60 and 3.20 voltage platforms. Cross-validation results shown in Fig. 4b indicate that the predicted SOH closely matches the actual SOH. This suggests that the base features extracted from this TR for fourteen cells operated at 20 °C, 30 °C and 40 °C in the G-I dataset offer high-performance and stable inputs for SOH estimation.

Fig. 4: K-fold cross-validation method for the SOH estimation of each cell in G-I Dataset.
figure 4

a The estimated capacity versus cycle number of cell G-I#6 when the model uses base features at Vu of 2.60 V, 3.05 V and 3.20 V, respectively. The density distribution of the capacity estimation error is shown as inserted figure. b Model prediction accuracy with the input of base feature at Vu of 3.05 V. c The average MAE and RMSE for the model with combined features at an upper voltage limit (Vu) of 3.05 V.

The averaged MAE and RMSE of the SOH estimation using various combined features at Vu of 3.05 V are displayed in Fig. 4c. With the addition of auxiliary features, the estimation accuracy is enhanced compared to using a single feature alone. The best performance is achieved by incorporating both thermal feature and strain feature, resulting in an improvement of 53.98% in MAE and 43.63% in RMSE. These results provide solid evidence for the value of integrating relevant features, and reveal the contribution of combining features on substantially improving the accuracy of SOH estimation.

Model deployment capability

In practical application, models should be able to rapidly adapt to various operating conditions, such as fluctuating temperature, current densities, and service lifetime. Unlike the constant operating conditions in G-I, the G-II dataset is collected under room conditions with temperature ranging from 20 °C to 40 °C, while the G-III dataset involves variation in both temperature and current rates (Supplementary Tables 1 and 2). To assess model robustness, the models are first trained using the G-I dataset at the Vu of 3.05 V with different combined features, and then tested on the G-II and G-III datasets. Detailed experimental information is provided in Supplementary Table 5.

The prediction accuracy for the G-II dataset is shown in Fig. 5a, with specific cell results detailed in Supplementary Fig. 7, demonstrating high prediction capability under variable temperatures. Despite the susceptibility of LIBs to thermal factors, the estimation errors for each cycle with combined features remain predominantly clustered around zero error (Fig. 5b). However, the presence of multiple peaks suggests that the thermal feature and strain feature are more noticeably affected by environmental temperature. It should be noticed that both G-I and G-II datasets are collected under relatively low current rates, with limited influence by thermal release from Joule heating. Nevertheless, room temperature fluctuations still affect the signals collected from individual samples. In contrast, the combination of voltage, temperature, and strain features achieves lower error rates due to their complementary effects. When an additional parameter, such as current density, is introduced, the estimated SOHs are still confined within a narrow range of real SOH but exhibit a more disordered and discrete pattern (shown in Fig. 5c). The inducement of variable current density, especially at high rates, alters the weight of thermal and strain factor in estimation framework. Consequently, as shown in Fig. 5d, the peaks of GII-VT, GII-VS, and GII-VTS become more dispersed. Figure 5e presents the averaged MAE and RMSE of the estimation results in G-II and G-III datasets. The model utilizing the combined features of VTS achieves the best estimation results across all the sample groups, but the differences between VT and VS become less when exposed to additional variables with current rates. This is attributed to the role of different factors in complex cases. As an electrochemical process involves strong thermal release from Joule heat, both electrical and thermal features become dominant in the model. Therefore, developing models with more relevant features is a promising approach to address multi-dimensional variations in practical applications.

Fig. 5: Battery SOH estimation results for G-II and G-III datasets.
figure 5

a Validation results (predicted vs. actual SOH) for the G-II dataset. b Corresponding estimation error (y-y′) distribution for G-II. c Validation results (predicted vs. actual SOH) for the G-III dataset. d Corresponding estimation error (y-y′) distribution for G-III. e Average MAE and RMSE Values for G-II and G-III Datasets.

To highlight the efficiency of our model in rapidly estimating battery SOH from unknown states, we use data from just one previous cycle for rapid estimation. As shown in Supplementary Fig. 8, our model with VTS combine features achieves exceptionally low MAE (ranging from 3.013–6.319%) and RMSE (ranging from 3.093–6.471%) across these three groups. The experimental details are provided in Supplementary Note 3 and Supplementary Table 6, and results are depicted in Supplementary Fig. 8.

Model validation by transfer learning

To assess the applicability of our models to other types of electrode materials, a test is conducted with the dataset from G-IV, which collects the electrochemical and physical features from NCM and NCA cells. Given the difference in voltage, temperature and strain profiles between LFP and NCM/ NCA cells (Supplementary Fig. 9), the models are retrained to accommodate these variations (set Vu to extract features are shown in Supplementary Note 4 and Supplementary Fig. 10). The models were fine-tuned (FT) by adjusting the weights using three NCM samples and three NCA pouch samples from the G-IV dataset. Transfer learning was employed to improve the adaptability to the NCM and NCA cells. Then the estimation performance of the models before and after FT is shown in Fig. 6a, b, with specific cells results are shown in Supplementary Fig. 11. The experimental details are outlined in Supplementary Table 7.

Fig. 6: The material domain transfer validation for SOH estimation of each cell in G-IV (LiNi0.6Co0.2Mn0.2O2 (NCM) and LiNi0.92Co0.05Al0.03O2 (NCA) battery).
figure 6

Before and after fine-tuning (FT) model to estimate a NCM cells and b NCA cells SOH results. c SOH estimation errors between NCM and NCA cells. d MAE and RMSE of the model before and after fine-tuning. e LiFePO4 (LFP), NCM and NCA cells at capacity/initial capacity with 1% interval of cycling number distributions.

Compared to the models prior to FT, the models after FT exhibit a more concentrated distribution of estimation error around the ideal case (Fig. 6c), with lower MAE and RMSE values (Fig. 6d). This improvement indicates the effectiveness of the FT in mitigating the adverse effects caused by feature differences during the transfer learning process. It is observed that both the pre- and post-FT models exhibit higher errors in SOH estimation for NCM cells compared to NCA cells. To investigate this discrepancy, we analyzed the distribution of cycling numbers for LFP, NCM, and NCA cells under the same operating conditions. Notably, the NCA cells exhibit a sample size distribution pattern similar to that of LFP cells (Fig. 6e), suggesting that the cycling number weights in the model trained on the G-I dataset are more aligned with those of the NCA cells. Despite some level of estimation error across different electrode materials, these results provide strong evidence that our model can be effectively applied to different electrode material domains. This capability makes it well-suited for clustering mixed spent cells for sustainable recycling.

Classification for batteries after service

As demonstrated in previous sections, the model developed in this work excels in both efficiency and accuracy when assessing the SOH of LIBs with various electrode materials and pouch sizes. Given that safety is a top priority for battery operation, accurately determining the SOH of unknown cells is crucial for appropriately categorizing retired batteries for secondary use. To address this, a K-means clustering model was created based on the SOH and voltage standard deviation using aging data from 14 LFP batteries in the G-I dataset, as well as 6 NCM and 6 NCA batteries in the G-IV dataset.

As illustrated in Fig. 7, retired batteries are preliminarily categorized into distinct groups based on their capacity retention (SOH) and voltage stability (quantified by standard deviation); safety thresholds (e.g., SOH < 80%) are subsequently applied during post-clustering refinement to optimize battery allocation for secondary applications. This clustering serves as a priority-grading step to guide secondary applications (e.g., grid storage, low-power devices) rather than replacing safety-critical evaluations.

Fig. 7: Clustering results by K-means.
figure 7

The clustering results are presented for the a LiFePO4 (LFP), b LiNi0.6Co0.2Mn0.2O2 (NCM) and c LiNi0.92Co0.05Al0.03O2 (NCA) battery datasets. The investigation of category distribution is conducted in the d LFP, e NCM and f NCA battery datasets based on battery state-of-health (SOH).

LFP Cluster 1 encompasses cells exhibiting minimal degradation, directly recommended for secondary reuse in high-performance systems. Cluster 2 (80.00–91.90% SOH) includes cells with moderate capacity loss but stable voltage profiles, which are preliminarily flagged for non-critical applications like grid storage, while Cluster 3 (75.33–80.00% SOH) comprises severely aged cells marked for immediate recycling to recover raw materials. In contrast, the NCM dataset is segmented into five clusters due to its limited sample size and heterogeneous degradation patterns. Cells with a SOH below 76.1% are flagged for disassembly to recover raw materials, while the NCA dataset employs a separation boundary at 77.5% SOH for similar material recovery purposes. A detailed overview is provided in Supplementary Fig. 12. Critically, clustering is not the final decision-making step. For applications requiring stringent safety or performance thresholds (e.g., electric vehicles, high-power systems), post-clustering analysis requires the integration of additional parameters such as temperature trends, strain measurements, and voltage fluctuations to comprehensively assess their suitability for secondary use. This aligns with industrial workflows where broad initial sorting. By combining SOH ranges with multi-modal degradation indicators, our methodology prioritizes efficiency in preliminary sorting. This approach addresses the inherent variability in retired batteries and supports tailored reuse strategies based on real-world requirements.

Conclusion

Current commercialized battery SOH estimation heavily relies on electrical data to evaluate degradation levels over the lifetime, which consumes enormous time and resources for data collection, processing, and analysis by the BMS. The redundant datasets are hard to further improve the prediction accuracy, and are unable to cope with the requirement of fast estimation and classification. Hence, in the present study, we design a deep-learning framework to achieve efficient and reliable battery SOH estimation with the combined features. Proposed models can be applied to a wide variety of electrode materials and deployed in various application scenarios. With the addition of mechanical and thermal features as model input, the estimation accuracy and efficiency can be highly improved. This improvement is attributed to the nature of electrochemical processes, where current flow and localized chemical reactions simultaneously occur in the battery system. Thus, constructing a model with the combination of relevant features indicates grasping the overall situation with a confined approach, which reflects the realistic process more completely and precisely. However, this complex process is the result of multi-factors, the contribution of inhomogeneous SEI formation and evolution, the impact of current density on heterogeneous reactions and heat release, the level of irreversible chemical reactions and Li ionic loss, leading to a varied weight in distinct conditions. Our validation in Fig. 5e merely provides a limited example case; the estimation accuracy of combined VS, VT, or VTS features may perform differently in other extreme situations. In regular circumstances or within a certain temperature range, the mechanical feature has better performance because the weight of the strain factor is critical. In addition, the temperature indicator is influenced by the location of the sensor and uneven internal resistance from each individual sample, whereas the strain sensor suggests general information with high correlation to electrochemical data. As it comes to a high current condition or operates with a large amount of thermal release, the weight of thermal factor is increased, thus models using VT have improved performance in G-III. The VTS features have obvious advantages in higher current rates, or thermal risk scenarios, which often happen in the retired batteries.

In summary, our work highlights the significant value of relevant physical features and demonstrates the effectiveness of a deep-learning approach for efficient and reliable battery classification across various scenarios. Enhanced SOH estimation offers valuable guidance for developing battery management systems and their practical deployment in sustainable recycling applications. We anticipate that machine learning, with expanded dimensions to address data redundancy and reduce time consumption, will become a standard technique in recycling economics for sustainable development.

Methods

Statistical feature

In this paper, we have calculated six features for three types of curves (voltage, temperature, and strain) by performing mathematical operations on the data in the TRs corresponding to the voltage thresholds. These six features are V_mean, V_kur, V_sk, V_std, T’_mean, and S_mean, where V_mean is denoted as μ and V_std is denoted as σ. The calculations are performed as follows:

$${V}_{{{\rm{\_}}}{mean}}=\mu =\frac{1}{n}{\sum }_{i=1}^{n}{v}_{i}$$
(1)
$${V}_{{{\rm{\_}}}{std}}=\sigma =\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({v}_{i}-\mu \right)}^{2}}$$
(2)
$${V}_{{{\rm{\_}}}{kur}}=\frac{1}{n}{\sum }_{i=1}^{n}\left[{\left(\frac{{v}_{i}-\mu }{\sigma }\right)}^{4}\right]$$
(3)
$${V}_{{{\rm{\_}}}{sk}}=\frac{1}{n}{\sum }_{i=1}^{n}\left[{\left(\frac{{v}_{i}-\mu }{\sigma }\right)}^{3}\right]$$
(4)
$${T}_{{{\rm{\_}}}{mean}}^{{\prime} }=\frac{1}{3n}{\sum }_{i=1}^{n}({{T1}^{{\prime} }}_{i}+{{T2}^{{\prime} }}_{i}+{{T3}^{{\prime} }}_{i})$$
(5)
$${S}_{{{\rm{\_}}}{mean}}=\frac{1}{n}{\sum }_{i=1}^{n}{S}_{i}$$
(6)

where n denotes the number of sample points in the TR, \({v}_{i}\) denotes each voltage data in the TR, (T1′, T2′, T3′) denotes the data for the three relative temperatures in the TR, and Si denotes each analog strain data in the TR.

Feature prediction capability

The following equation is used to quantify battery SOH:

$${SOH}=\frac{{C}_{i}}{C}$$
(7)

Where Ci denotes the capacity released by the complete discharging of the present cycle, and C denotes the nominal capacity of the battery. The ρ of the features is measured by calculating the Pearson linear correlation coefficient (PLCC), and we denote the six features calculated as F: \({F}_{1},{F}_{2},{F}_{3},{F}_{4},{F}_{5}\,{{\mathrm{and}}}\,{F}_{6}\). n is the total number of estimated capacity values.

$$\rho =\frac{{\sum }_{i=1}^{n}\left({C}_{i}-\bar{C}\right)\left({F}_{i}-\bar{F}\right)}{\sqrt{\mathop{\sum }_{i=1}^{N}{\left({C}_{i}-\bar{C}\right)}^{2}}\sqrt{\mathop{\sum }_{i=1}^{N}{\left({F}_{i}-\bar{F}\right)}^{2}}}$$
(8)

Where \(\bar{C}\) denotes the mean value of the capacity, and \(\bar{F}\) denotes the mean value of the F. It should be noted that the Pearson linear correlation coefficient for T′_mean is the average of the Pearson linear correlation coefficients for the three temperatures (T1′, T2′, T3′)

Model explanation

This study focuses on LIBs SOH estimation, which is interpreted as a regression task, for which a model is designed for estimating the SOH. Essentially, when we are given a input vector X, which containing L steps and K feature channels, the structure of X can be shown as L × K, L set to 10. Then the DNN model obtains the input vector X and the output vector Y, which is represented by the following equation:

$$Y={{\rm{Linear}}}({{\rm{ReLU}}}({{\rm{LSTM}}}(X)))$$
(9)

The LSTM layer is used to obtain hidden vectors. ReLU activation function is used to introduce nonlinear factors to enhance the model’s ability to model complex patterns. A linear layer is used to map the last layer of the model output to the target SOH estimation.

Model training and evaluation

In the model training phase, the Adam optimizer and MSE loss function are used to optimize the weights. Each training has a total of n training samples. Therefore, the MSE loss function used to calculate the difference between the estimated and the real values of the model is as follows:

$${{\rm{MSE}}}=\frac{1}{n}\mathop{\sum }_{i=1}^{n}{\left({Y}_{i}-{Y^{\prime} }_{i}\right)}^{2}$$
(10)

where \({Y}_{i}\) is the real capacity value, \({Y^{\prime} }_{i}\) is the estimated capacity value, and n is the total number of estimated capacity values.

The initial implementation of the algorithm is based on the PyTorch framework to build a model, using NVIDIA GeForce RTX4060 GPUs to accelerate the model training, and Python 3.10.12 for processing and analyzing the data, which contains Pandas, Numpy, and Scikit-Learn. The model is finally deployed on Chengdu Intelligent Computing Center servers using the MindSpore framework.

The model’s ability to estimate individual battery SOH is evaluated based on MAE and RMSE.

$${{\rm{MAE}}}=\frac{1}{n}{\sum }_{i=1}^{n}\left|{Y}_{i}-{Y^{\prime} }_{i}\right|\times 100 \%$$
(11)
$${{\rm{RMSE}}}=\sqrt{\frac{1}{n}{\sum }_{i=1}^{n}{\left({Y}_{i}-{Y^{\prime} }_{i}\right)}^{2}}\,\times 100 \%$$
(12)

where n is the total number of estimated capacity values of an individual battery.