Domain knowledge-guided machine learning framework for state of health estimation in Lithium-ion batteries

Lanubile, Andrea; Bosoni, Pietro; Pozzato, Gabriele; Allam, Anirudh; Acquarone, Matteo; Onori, Simona

doi:10.1038/s44172-024-00304-2

Download PDF

Article
Open access
Published: 12 November 2024

Domain knowledge-guided machine learning framework for state of health estimation in Lithium-ion batteries

Communications Engineering volume 3, Article number: 168 (2024) Cite this article

9330 Accesses
20 Citations
Metrics details

Subjects

Abstract

Accurate estimation of battery state of health is crucial for effective electric vehicle battery management. Here, we propose five health indicators that can be extracted online from real-world electric vehicle operation and develop a machine learning-based method to estimate the battery state of health. The proposed indicators provide physical insights into the energy and power fade of the battery and enable accurate capacity estimation even with partially missing data. Moreover, they can be computed for portions of the charging profile and real-world driving discharging conditions, facilitating real-time battery degradation estimation. The indicators are computed using experimental data from five cells aged under electric vehicle conditions, and a linear regression model is used to estimate the state of health. The results show that models trained with power autocorrelation and energy-based features achieve capacity estimation with maximum absolute percentage error within 1.5% to 2.5%.

Deep learning to estimate lithium-ion battery state of health without additional degradation experiments

Article Open access 13 May 2023

Probabilistic machine learning for battery health diagnostics and prognostics—review and perspectives

Article Open access 03 June 2024

Machine learning pipeline for battery state-of-health estimation

Article 05 April 2021

Introduction

The pressing concern of global warming is driving a global shift towards electrified mobility. With the transportation sector contributing to approximately 12% of all global emissions¹, adjustments are required in order to transition to a zero-emissions energy sector. Studies by the Intergovernmental Panel on Climate Change¹ and the International Energy Agency² emphasize the critical need for clean transportation solutions to address the urgent issue of climate change. This has driven governments and policymakers to innovate and collaborate in advancing electric vehicle (EV) technologies.

Lithium-ion batteries (LIBs) are the preferred energy storage technology for EVs due to their superior power and energy density, which enables longer driving ranges compared to other battery technologies³. For a compelling and sustainable EV mass market, accurate state of health (SOH) estimation⁴ and remaining useful life (RUL)⁵ prediction of LIB systems are essential. Existing methods for SOH estimation and RUL prediction can be broadly divided into model-based and data-driven approaches. Model-based estimation approaches rely on empirical or equivalent circuit models (ECMs), or electrochemical models, and formulate estimation algorithms around them. Various ECM-based filters for SOH estimation have been proposed in the literature, including Extended Kalman Filter⁶, dual and joint Extended Kalman Filter⁷, Unscented Kalman Filter⁸, Adaptive Extended Kalman Filter⁹, Particle Filter¹⁰ and genetic algorithms¹¹.

Other methods for SOH estimation and RUL prediction, utilizing empirical degradation models, include Unscented Kalman Filters¹² and Particle Filters¹³. In a Bayesian Monte Carlo approach¹⁴, the parameters of an empirical capacity model are updated to compute the posterior probability density function for capacity fade prediction. Despite their simplicity, these methods lack explicit physical understanding and require significant calibration effort. Also, electrochemical battery models^15,16,17 which demand increased computational power, have been employed, and adaptive observers based on the enhanced single particle model^18,19 have been tested in a battery-in-the-loop setup.

With the advancement in cloud computing technologies and Internet of things, data-driven methods for battery SOH estimation, such as linear regression, gaussian process regression, support vector machine, or artificial neural network, have gained traction in recent years²⁰. For instance, multiple linear regression models have been trained using descriptive features of the voltage distribution²¹ or incremental capacity curves²² to predict capacity fade and resistance increase. Among the more sophisticated prediction models, Gaussian process regression models have been used for capacity estimation²³, taking as inputs different statistical features extracted from the charging curves²⁴. Furthermore, neural networks²⁵ are used to establish relationships between input features, such as equivalent circuit model parameters and state of charge (SOC), and battery capacity fade. Regarding RUL prediction, support vector machines⁴ and random forests²⁶ are utilized. These methods are effective in forecasting the remaining operational lifespan of batteries based on historical data and operational conditions. Battery SOH estimation works can be classified into three primary categories based on the dataset used for development. The first category includes datasets acquired from field operations, which accurately reflect the aging phenomena affecting batteries in real-world EVs driving²⁷. However, a challenge with these datasets is the absence of a baseline for evaluating SOH. To address this limitation, several studies^28,29,30 utilize internal resistance as a direct metric for assessing battery SOH. Alternatively, some studies³¹ suggest using the peak values derived from incremental capacity curves to overcome this challenge. However, these metrics can be challenging to evaluate due to their strong dependence on operating conditions, such as temperature. Capacity fade is often measured using Coulomb counting^32,33, which involves integrating battery management system (BMS) current over a limited SOC window. This method, however, may produce inaccurate results due to high sensor noise and quantization in-vehicle sensors. Conversely, tests are conducted in a temperature-controlled environment³⁴ to ensure consistent capacity measurements that serve as ground truth. Here, SOH is estimated using supervised learning models that directly utilize BMS signals—such as voltage, current, SOC, and pack temperature—as inputs. The second category of SOH estimation works relies on datasets collected in laboratory settings^{24,35,36,37,38}. In these datasets, cells are cycled with current profiles that do not accurately represent actual EV battery operation. As a result, features computed using these datasets are not, in general, transferable nor generalizable to real-world applications. The third category of datasets utilizes data collected in laboratory settings aimed to mimic EV real-use case scenarios. Examples include ARTEMIS³⁹ and the Urban Dynamometer Driving Schedule (UDDS)⁴⁰. These datasets provide more realistic conditions for testing and developing battery SOH estimation algorithms, still providing ground truth capacity through periodic reference performance tests (RPTs).

Using these datasets, different machine learning algorithms, such as support vector machine⁴, Gaussian process regression, or neural network⁴¹, have been used to estimate SOH for batteries undergoing EV driving cycles using statistical features from current, voltage, and temperature signals as input features. Other studies have proposed various physics-based health indicators to estimate battery capacity fade, e.g., features derived from the ECM⁴² or the time taken for voltage to rise from a low to a high level during the charging process⁴³. Another crucial physical quantity linked to battery aging is internal resistance, with several nuanced indicators proposed in the literature^44,45. Most of these indicators are computed using ECMs and algorithms such as recursive least squares. However, these methods typically come with increased computational requirements. Another approach^27,46 involves evaluating the indicators during the vehicle acceleration (discharge) and braking (charge), requiring less computational power and facilitating its real-time implementation and integration within the vehicle BMS. Another physics-based SOH indicator is charging impedance²⁷, which combines variations in electrolyte resistance, charge transfer resistance, and polarization due to aging. This feature can be extracted from the initial portion of the charging phase⁴⁷. Additionally, the energy during charging and discharging can offer valuable insights into battery degradation. Energy metrics are typically calculated over extended portions of full charging profiles to effectively estimate capacity fade^48,49,50.

The health features used in previous studies are typically based on idealized constant charging and discharging profiles. However, these profiles do not accurately reflect how electric vehicles are charged and discharged in real-world conditions. Most research has focused on extracting health indicators during complete, repetitive charging cycles, where the battery is charged from a low SOC to a high SOC. In reality, charging patterns are much more variable, and it’s uncommon for batteries to go through full cycles or always follow the same charging profile. This discrepancy makes it difficult to apply findings from controlled experiments directly to real-world EV use.

The contributions of this work are the following. First, our work systematically formulates various SOH indicators based on domain knowledge and proposes a framework for their integration into BMS. The proposed SOH indicators include: power autocorrelation, resistance, charging impedance, energy during charging, and energy during discharging. Second, unlike previous research⁵¹ that focused on voltage signal autocorrelation, here, the power autocorrelation is used to quantify the battery’s power-delivery capability over time. Additionally, the proposed SOH indicators are derived from an experimental dataset⁴⁰ that replicates real-world EV battery operation. Unlike most prior studies^48,49,50 that rely on constant current discharging profiles, in this work, energy consumption is evaluated during discharging under realistic EV driving cycles. Moreover, a windowed approach is proposed to assess energy consumption during charging, thereby improving the effectiveness of the energy as an indicator of battery health, especially in scenarios involving partial charging. Furthermore, we modify the formulation of the charging impedance indicator²⁷ by calculating it over an optimized voltage window and averaging the values within this range to improve accuracy and reliability in assessing battery health during partial charging. Through correlation analysis, power autocorrelation, energy during charging, and energy during discharging emerge as the most effective indicators for capacity estimation. It is worth noting that the proposed SOH indicators are agnostic to specific battery chemistries. Moreover, they operate independently of cumulative data such as total aging cycles or ampere-hour throughput. This design choice helps mitigate inaccuracies that could arise from sensor errors or insufficient data. These indicators can be easily evaluated during EV operation. This makes them suitable for real-time deployment and integration into existing BMS strategies.

The battery capacity is estimated through the machine learning pipeline shown in Supplementary Note 1 where 1) SOH indicators are first extracted from the experimental dataset⁴⁰, 2) the correlation between the indicators and SOH is analyzed through a regression analysis, and 3) a linear regression model (LRM) is trained to estimate capacity fade. The results show that models trained using power autocorrelation and energy-based features obtain capacity estimation with absolute percentage errors (APE) ranging between 1.5% and 2.5%.

Previous works that used linear regression models to estimate battery SOH^21,22, are based on features selected over simplistic charge/discharge profile not representative of EV driving.

Results

The SOH indicators are extracted using data from five Nickel Manganese Cobalt (NMC)/Graphite cells⁴⁰ (reported in Table 1 and detailed in Sec. Cell cycling and experimental dataset). In this section, a thorough analysis of each indicator and regression analysis is carried out, and the estimation results obtained by the linear regression model are shown.

Table 1 Battery cells

Full size table

SOH indicators analysis

Power autocorrelation function

The autocorrelation function of the battery’s power signal, evaluated during discharge, has shown to offer valuable insights into the battery SOH⁵¹. Assuming that discharging occurs periodically with an identical current profile, the power autocorrelation P_Autocorr indicator varies as the battery ages, providing a method to monitor battery health.

Of particular interest is the change in the central peak of the power autocorrelation function (see Fig. 1a). The reduction in this peak, defined as P_{Autocorr,loss} = (P_{Autocorr,fresh} − P_Autocorr,i)/P_{Autocorr,fresh} ⋅ 100) where P_{Autocorr,fresh} is the central peak value for the fresh cell and P_Autocorr,i is the central peak value during cycle i, correlates with a decrease in capacity. This relationship is illustrated in Fig. 1b, where the power autocorrelation loss shows a strong linear relation with capacity loss. Capacity loss is calculated as Q_cell,loss = (Q_cell,fresh − Q_cell,i)/Q_cell,fresh, with Q_cell,fresh and Q_cell,i representing the fresh cell capacity and the cell capacity at cycle i, respectively.

Despite the promising potential of this indicator for estimating capacity loss, it is important to highlight that the periodicity of the current profile may not hold true in real-driving conditions. Nevertheless, this study suggests that this indicator can be engineered in an offline setting, for example, as part of onboard diagnostics routines. In this work, the power autocorrelation function proves to be an effective SOH indicator given consistent usage of the UDDS discharge profile.

Resistance

Abrupt charge and discharge events, related to braking and acceleration maneuvers, respectively, offer the opportunity to evaluate the battery’s internal resistance²⁷. As the battery ages, various factors such as electrode degradation, electrolyte breakdown, and formation of passivation layers contribute to an increase in its internal resistance, R. This increase limits the flow of ions within the battery, reducing conductivity and affecting the battery’s power output capability. As resistance increases, less power can be delivered to the motors due to higher Joule losses.

Demanding acceleration and braking events lead to changes in the battery current, referred to as current peaks. The resistance is calculated at each discharge current peak corresponding to an acceleration event over the discharging phase of the aging cycle, as described in Sec. Definition of SOH indicators. It is important to note that the battery’s internal resistance is influenced by factors such as SOC (see Supplementary Note 3), C-rate, and temperature. For accurate aging assessments in real-world scenarios, resistance should be measured under consistent conditions throughout the battery’s lifespan. In this work, temperature effects on this indicator are not studied since the cells are maintained in a controlled temperature environment.

A single resistance value is computed by averaging the resistances calculated during each discharge phase, which consists of multiple concatenated UDDS cycles between two charging phases, as further detailed in Sec. Cell cycling and experimental dataset, to minimize noise in the resistance, as shown in Fig. 2. This method effectively minimizes noise and variations in resistance measurements, offering a more consistent and representative value to assess battery health. Figure 2(a) highlights the importance of determining the average resistance. Despite the large standard deviation observed in the distribution of internal resistances for each discharge event, the average values, represented by green points, clearly exhibits an increasing trend as the battery ages.

Additionally, Fig. 2b shows the percentage increase in average internal resistance for all five cells, correlated with their corresponding capacity losses. This increase is calculated as R_increase = (R_i − R_fresh)/R_fresh ⋅ 100, where R_fresh represents the average internal resistance measured during the first discharging phase of the cell, and R_i denotes the average internal resistance determined during the discharging at cycle i.

Charging impedance

The charging impedance²⁷Z_CHG represents the battery’s resistance to the flow of electrons during charging. Variations in Z_CHG reflect how this resistance evolves as the battery ages. The Z_CHG profiles for three cells (V4, W8, and W9), charged at different C-rates, are illustrated in Fig. 3 as a function of cell degradation and SOC.

The rising trend of Z_CHG over cells’ lifetime aligns with the understanding that, as the battery ages, its overpotential increases due to factors such as the growth of the Solid-Electrolyte Interface, increased of contact resistance, and changes in reaction kinetics and transport dynamics^6,52. Additionally, it is important to note that the Z_CHG profiles reach different SOC values at the end of charge (at 4 V). This phenomenon can be attributed to the varying polarization losses resulting from the different C-rates used during charging for cells V4, W8, and W9⁵³. The charging impedance indicator is computed by averaging the impedance within the specific voltage range [V_in = 3.8 V, V_fin = 3.9 V], which is selected through the analysis reported in Supplementary Note 4. As shown in Fig. 4b, the increase in charging impedance (Z_CHG,increase = (Z_CHG,i − Z_CHG,fresh)/Z_CHG,fresh ⋅ 100) is highly correlated with capacity loss across all the battery cells. Therefore, the charging impedance Z_CHG can be used directly as a feature to correlate with capacity loss.

Energy during charging

The energy during charging indicator, E_ch, quantifies the energy stored in the battery during charging. This is computed by integrating the battery power within a specific voltage range [V_in,ch, V_fin,ch] (as detailed in Sec. Definition of SOH indicators). Figure 5a illustrates E_ch in relation to the charging duration required to reach V_fin,ch from V_in,ch. Figure 5b shows the energy during charging over the voltage range [V_in,ch = 3.6 V, V_fin,ch = 3.9 V] as a function of capacity loss. The y-axis of Fig. 5b quantifies the percentage energy loss during charging for each cell. Energy loss during charging for each cell is computed as E_ch,loss = (E_ch,fresh − E_ch,i)/E_ch,fresh ⋅ 100, where E_ch,fresh is the energy for the fresh cell and E_ch,i is the amount of energy the battery is charged at during aging cycle i of the same cell. These results show that capacity loss is linearly correlated with energy loss during charging over the selected voltage range.

Energy during discharging

The energy during discharging indicator, E_dis,quantifies the energy delivered by the battery during its discharge phase. This is computed by integrating the battery power over a specific voltage range [V_in,dis, V_fin,dis], as detailed in Sec. Definition of SOH indicators. Figure 5c illustrates E_dis in relation to the discharging duration needed to reach V_fin,dis from V_in,dis. Figure 5d displays the energy during discharging over the voltage range [V_in,dis = 3.85 V, V_fin,dis = 3.4 V] as a function of capacity loss. The y-axis of Fig. 5d quantifies the percentage energy loss during discharging for each cell. Energy loss during discharging for each cell is computed using E_dis,loss = (E_dis,fresh − E_dis,i)/E_dis,fresh ⋅ 100, where E_dis,fresh represents the energy of a fresh cell, and E_dis,i is the energy charged during aging cycle i of the same cell. The results indicate a linear relationship between capacity loss and energy loss during discharging within the selected voltage range. In real EV scenarios, the variability in discharging rates complicates the consistent computation and monitoring of E_dis. A practical approach is to compare E_dis across driving scenarios with similar driving styles to account for this variability.

SOH indicators regression analysis

The health indicators are pre-processed according to the pipeline outlined in Supplementary Note 1. This process involves calculating incremental values for each feature: ΔP_Autocorr (power autocorrelation), ΔR_ch (resistance), $\Delta {Z}_{{{\rm{CHG}}}}^{{{\rm{NORM}}}}$ (normalized charging impedance), ΔE_ch (energy during charging), and ΔE_dis (energy during discharging). These incremental values are derived by subtracting the initial feature value, measured during the first aging cycle, from the value at each subsequent aging cycle i throughout the cell’s life cycle. Additional details are provided in Sec. Methods. In this work, we use features’ incremental values to simplify the detection of aging trends. For each cell, we assess the correlation between its capacity loss and feature variations using Pearson’s correlation coefficient r, defined as:

$$r=\frac{{\sum }_{i = 1}^{N}({X}_{i}-\overline{X})({Y}_{i}-\overline{Y})}{\sqrt{\mathop{\sum }_{i = 1}^{N}{({X}_{i}-\overline{X})}^{2}\mathop{\sum }_{i = 1}^{N}{({Y}_{i}-\overline{Y})}^{2}}},$$

(1)

where X_i represents the value of a specific incremental feature for a given cell at the i-th aging cycle, Y_i is the corresponding capacity loss value for the same cell at that cycle, $\overline{X}$ is the mean of the incremental feature values, $\overline{Y}$ is the mean of the capacity loss values across all cycles, and N is the total number of data points (aging cycles analyzed). The results are shown in the heatmap of Fig. 6a. Each cell shows a high Pearson’s correlation coefficient between capacity loss and each feature, underlying that the variations in these features are consistent indicators of aging across all the cells. However, since feature trends can vary across different cells, an additional analysis was conducted to identify features with more generalizable trends. We performed a correlation analysis between the extracted incremental features and capacity fade across all cells. This approach helps identify features that consistently reflect cell aging, regardless of individual cell differences. Figure 6b shows that some indicators generalize better across different cells.

We select features to train an estimation model according to two different cases. In the first case, Power autocorrelation (P_Autocorr) is selected, as the sole feature, due to its superior overall performance. In the second case, we choose the best-performing feature for charging (energy during charging, E_ch) and the best-performing feature for discharging (energy during discharging, E_dis), excluding power autocorrelation.

The strong correlation between the extracted features and capacity fade can be attributed to the physical phenomena driving battery degradation. The linear relationship observed between charging impedance, resistance, and energy features with respect to charge throughput aligns with the linear trend of the capacity fade curve⁵⁴. Given that the cells are cycled within a linear SOC window of 80% to 20% at ambient temperature, Solid-Electrolyte Interface layer growth is considered the dominant aging mechanism, leading to a linear capacity decrease trajectory. However, to thoroughly assess the aging modes present in the cells, a post-mortem analysis would be necessary.

SOH estimation

In this paper, we use capacity calculated at C/20 during RPTs as SOH metric. Additionally, for the purpose of training the machine learning models, the experimental C/20 capacity points are augmented using a linear data augmentation method as discussed in Sec. Data augmentation approach.

The features selected through the regression analysis are utilized to estimate capacity loss using a data-driven model. The performance of various models, namely, LRM, feed-forward neural networks, autoregressive moving average with extra input, and recurrent neural networks, is compared using the same training and testing datasets, as detailed in Supplementary Note 6. Despite its simplicity, the LRM achieves estimation performance comparable to that of more complex models, owing to the strong linear correlation between the SOH indicators and capacity degradation. Therefore, the LRM is chosen for capacity loss estimation due to its lower computational time. Additionally, the LRM has the advantage of requiring fewer parameters to tune and fewer training samples compared to neural network-based models⁵⁵. The LRM is trained using distinct sets of incremental SOH features: first with power autocorrelation, and then with energy during charging and energy during discharging (see Sec. SOH indicators regression analysis). Additionally, the estimation capabilities of the selected features are evaluated in two Scenarios. In Scenario 1 the LRM is trained exclusively on the data from cell W8 and tested on the other cells. In Scenario 2 the LRM is trained using data from all cells except the test cell. In the second Scenario, for cross-validation, the data is split into two subsets: one for the target cell and another for the remaining cells. The model is trained on the data from the remaining cells and tested on the data from the target cell.

Since the autocorrelation function of the power signal ΔP_Autocorr exhibits the highest correlation with capacity fade, the data-driven model is initially trained using using ΔP_Autocorr as input. Figure 7 displays the capacity estimation results for both ΔP_Autocorr and the energy-based features. In Scenario 1, the training dataset consists solely of data from cell W8, while in Scenario 2, it includes data from all cells except the test cell. The absolute percentage error (as defined in Sec. Methods) remains consistently below 1.5%, underscoring the relevant information provided by this individual feature. Moreover, using a more extensive set of training data from multiple cells (Scenario 2) does not improve estimation accuracy, leading to conclude that ΔP_Autocorr is effective even with limited data. However, this feature has limitations in real-world scenarios and is better suited for offline diagnostics rather than online applications. It is also important to note that gaps in the observed capacity curves are due to voltage measurements anomalies, which resulted in unreliable feature values. This irregularity is attributed to unidentified equipment issues, as discussed in Sec. Cell cycling and experimental dataset and detailed further in Supplementary Note 5.

Fig. 7: SOH estimation results from the linear regression model (LRM) using power autocorrelation (${{{\rm{LRM}}}}_{{P}_{{{\rm{Autocorr}}}}}$), and energy during charging and discharging (${{{\rm{LRM}}}}_{{E}_{{{\rm{ch}}}},{E}_{{{\rm{dis}}}}}$) as input features versus the aging cycle number (Cycle).

The LRM is subsequently trained using features that can be calculated during vehicle operation, specifically during driving and charging. The features selected for their high linear correlation with capacity during charging and discharging are energy during charging (ΔE_ch) and energy during discharging (ΔE_dis), respectively. As illustrated in Fig. 7, accurate capacity fade estimation is achieved with these features. Notably, when the LRM is trained using data from only cell W8 (Scenario 1), it achieves an absolute percentage error below 2.5% when tested on data from the other four cells. This result highlights the strong estimation capability of these features even with a limited dataset. For a more comprehensive analysis, the same estimation model is trained using data from multiple cells, leading to improved performance with the larger dataset. When using data from four cells for training (Scenario 2) and testing on the remaining cell, the absolute percentage error is below 1.6%. Notably, the estimation models perform well even for cells like W7 and W5, where some data is missing. This adaptability of the features and estimation models to partially available data is particularly advantageous in real-world scenarios, where acquiring complete EV battery data may not always be feasible. Moreover, to evaluate if adding extra features alongside the energy-based indicators could enhance model estimation capabilities, the LRM was also trained with incremental resistance and charging impedance included as additional inputs. However, the performance of the model with these additional features was worse than when using only energy during charging and discharging, as shown in Fig. 8. This indicates that the inclusion of resistance and charging impedance may introduce more noise than valuable information. It should be noted that for cell W7, only charging impedance is used as an additional feature, as the resistance data was compromised due to acquisition issues discussed in Sec. Methods. The superior performance of energy during charging and discharging as SOH indicators, compared to the increase in resistance or charging impedance, can be attributed to several factors. Energy loss reflects not only resistance increases but also other factors such as heat generation, electrode degradation, and Solid-Electrolyte Interface formation, which impact overall energy efficiency. Additionally, the integration of the power signal offers a comprehensive measure of battery energy dynamics throughout an entire cycle, whereas resistance and charging impedance are computed over shorter time periods, making them more sensitive to short-term fluctuations.

**Fig. 8: SOH estimation results from the linear regression model (LRM) using charge and discharge energies, charging impedance, and resistance as features versus the aging cycle number (Cycle).**

Conclusions

This work extracts and evaluates five knowledge based SOH indicators, demonstrating their effectiveness as inputs to ML models for estimating capacity fade. The formulation of these indicators is guided by battery domain knowledge, allowing for the quantification of internal state variability due to battery degradation. Since none of the indicators rely on cumulative information (such as cycle number or Ah-throughput), they are suitable for real-world applications even with partial battery history. The high correlation between the indicators and capacity indicates that battery aging mechanisms leading to capacity fade are directly related to energy decrease and impedance rise. Two subsets of the engineered indicators, i.e.,power autocorrelation, energy during charging, and energy during discharging, were utilized to train the estimation model for accurate cell capacity estimation. Due to their high correlation with capacity fade, combining energy during charging and energy during discharging as inputs results in accurate SOH estimation, with an absolute percentage error consistently below 2.5%. Conversely, power autocorrelation is the most informative feature, enabling precise capacity fade estimation with an absolute percentage error below 1.5%, even with limited training data. However, its effectiveness is influenced by the periodicity of discharging events. Consequently, power autocorrelation cannot be directly used as an SOH indicator in real-world driving scenarios but could be incorporated into a diagnostic tool by applying a periodic current signal to the battery when it is not in use. These findings suggest that domain knowledge-based features have the potential to be used as online tools for real-time capacity estimation. However, the model’s effectiveness may be limited in practical applications. The dataset used in this study does not account for temperature variations or practical discharge events typical in real-world battery usage. Additionally, the current and voltage signals used to extract features have a high signal-to-noise ratio, which may not always be present in EV batteries. Having demonstrated the potential of these features on the studied dataset⁴⁰, further investigations will be conducted using field data as future work. While this study primarily focuses on capacity estimation, utilizing a larger dataset could allow for the application of these indicators in RUL prediction. Extending the method proposed in this paper, these indicators could be integrated into forecasting models, enabling the BMS to anticipate and effectively manage battery capacity degradation.

Methods

Cell cycling and experimental dataset

The experimental dataset⁴⁰ used in this work involves INR21700-M50T battery cells with graphite/silicon anode and nickel manganese cobalt oxides (NMC) cathode tested over a period of 30 months. For each cell, periodic RPTs, including C/20 capacity tests, Hybrid Pulse Power Characterization, and Electrochemical Impedance Spectroscopy, were conducted to assess the battery aging from fresh conditions. The cells underwent aging cycles as described in Supplementary Note 2. Each cycle includes a Constant Current-Constant Voltage (CC-CV) charge phase followed by a discharge phase. Specifically, there are two charge phases. Once the batteries reach 20% SOC (from the discharge phase), they are charged through the CC-a phase (at different C-rates) until reaching 4 V. They then continue charging at C/4 until 4.2 V, followed by the CV phase until the current drops below 50mA. The discharge phase, using concatenated UDDS driving profiles, simulates EV battery discharging, reducing the cell’s SOC from 80% to 20%. Aging cycles conducted between the j^th and (j+1)^th RPTs for each cell are grouped into the j^th batch of aging cycles. Supplementary Note 2 details the number of aging cycles in each batch for all cells used in this study. Among the ten cells (G1, V4, V5, W3, W4, W5, W7, W8, W9, W10) in the dataset, five (V4, W5, W7, W8, W9) are used in this study, as detailed in Table 1. The remaining cells were excluded for the following reasons. Cells W3, W10, and G1 were charged using a fast-charging 3C current profile during the CC-a phase, resulting in a very short charging duration interval that hindered feature extraction. Cell V5 was excluded due to insufficient aging, having undergone only 59 cycles with less than a 3% capacity decrease from the beginning of life. Cells W4, W5, and W7 were reported to have voltage measurements anomalies due to experimental issues, as noted in the “README” file of the dataset⁴⁰ and detailed in Supplementary Note 5. Specifically, cell W4 was affected for 310 cycles out of the total 760.

Data augmentation approach

This work uses capacity to describe battery SOH. Given the limited number of RPTs, we have adopted an approach that uses data augmentation with linear interpolation for training purposes. For each cell, to assign a capacity value at every aging cycle i contained in batch j, we use the capacity values measured at the j-th and (j + 1)-th RPTs and estimate the capacity for cycle i, Q_i as follows:

$${Q}_{i}=\frac{i-{{{\rm{cycle}}}}_{j}^{{{\rm{RPT}}}}}{{{{\rm{cycle}}}}_{j+1}^{{{\rm{RPT}}}}-{{{\rm{cycle}}}}_{j}^{{{\rm{RPT}}}}}\times \left({Q}_{j+1}^{{{\rm{RPT}}}}-{Q}_{j}^{{{\rm{RPT}}}}\right)+{Q}_{j}^{{{\rm{RPT}}}}$$

(2)

where ${{{\rm{cycle}}}}_{j}^{{{\rm{RPT}}}}$ and ${{{\rm{cycle}}}}_{j+1}^{{{\rm{RPT}}}}$ denote the numbers of the aging cycle preceeding the j-th and (j + 1)-th RPTs, respectively, while ${Q}_{j}^{{{\rm{RPT}}}}$ and ${Q}_{j+1}^{{{\rm{RPT}}}}$ represent the capacity values measured during these tests for the considered cell. Index i ranges from 1 to the number of aging cycles a cell has undergone (Table 1, fourth column), while index j ranges from 1 to the number of times the cell has been tested (Table 1, third column).

For example, capacity for cell V4 at aging cycle #30, namely ${Q}_{30}^{{{\rm{V4}}}}$, is defined as:

$${Q}_{30}^{{{\rm{V4}}}}=\frac{30-{{{\rm{cycle}}}}_{2}^{{{\rm{RPT}}},{{\rm{V}}}4}}{{{{\rm{cycle}}}}_{3}^{{{\rm{RPT}}},{{\rm{V}}}4}-{{{\rm{cycle}}}}_{2}^{{{\rm{RPT}}},{{\rm{V}}}4}}\times \left({Q}_{3}^{{{\rm{RPT}}},{{\rm{V}}}4}-{Q}_{2}^{{{\rm{RPT}}},{{\rm{V}}}4}\right)+{Q}_{2}^{{{\rm{RPT}}},{{\rm{V}}}4}$$

(3)

where ${{{\rm{cycle}}}}_{2}^{{{\rm{RPT}}}}=20$ and ${{{\rm{cycle}}}}_{3}^{{{\rm{RPT}}}}=45$, since cell V4 has undergone 20 aging cycles before RPT #2 and 45 aging cycles before RPT #3.

Definition of SOH indicators

V_ch and V_dis represent the voltage profiles during charging and discharging, respectively. I_ch and I_dis, are the current profiles during charging and discharging, respectively. Voltage variations due to acceleration peaks during discharging are indicated with ΔV_acc, and the corresponding current variations with ΔI_acc. The autocorrelation function measures the linear relationship between a signal x(t) and its time-delayed version x(t + τ), where τ is the time delay. In this work, power autocorrelation during the discharge phase is quantified by correlating the power signal with its delayed copies. First, cell power is calculated from the voltage and current signals as follows:

$$P(t)={V}_{{{\rm{dis}}}}(t)\cdot {I}_{{{\rm{dis}}}}(t)$$

(4)

The autocorrelation function of the power signal ${\hat{\rho }}_{\tau }$ is computed with delays τ limited to a range [ − τ_max, τ_max]. In our study, τ_max is set to 3000 s. For each value within this range, ${\hat{\rho }}_{\tau }$ is computed as follows:

$${\hat{\rho }}_{\tau }=\mathop{\sum }_{t=\tau +1}^{T}(P(t)-\bar{P})(P(t-\tau )-\bar{P})$$

(5)

where T is the duration of the discharging phase, P(t) is the power at time t, $\bar{P}$ is the average of the power over the time window T, and P(t − τ) is the power at instant t − τ. The power autocorrelation indicator P_Autocorr is defined as the autocorrelation with null delay: ${P}_{{{\rm{Autocorr}}}}={\hat{\rho }}_{\tau = 0}$.

The resistance R indicator is extracted for each aging cycle during the discharging phase using the following procedure. First, acceleration peaks are identified during the discharge²⁷ as explained in Supplementary Note 7. Then, the resistance R_peak corresponding to the l^th current peak within the i^th aging cycle is computed as follows:

$${R}_{{{\rm{peak}}},l}^{i}=\frac{\Delta {V}_{{{\rm{acc}}},l}^{i}}{\Delta {I}_{{{\rm{acc}}},l}^{i}}$$

(6)

where $\Delta {V}_{j}^{i}$ and $\Delta {I}_{j}^{i}$ are the voltage and current variations at the peak occurrence, respectively, as shown in Supplementary Note 7. Thus, P resistances ${R}_{{{\rm{peak}}},1}^{i},{R}_{{{\rm{peak}}},2}^{i},\ldots ,{R}_{{{\rm{peak}}},P}^{i}$ are computed for each i^th aging cycle, with i = 1, …, N, where N represents the number of aging cycles during the cell’s life and P is the total number of acceleration peaks within each cycle. Note that the number of total accelatrion peaks, P, varies with the aging cycle. Subsequently, a single resistance value for each aging cycle is obtained by averaging the P resistances extracted from all acceleration peaks within that cycle:

$${R}^{i}=\frac{{\sum }_{l = 1}^{P}{R}_{{{\rm{peak}}},l}^{i}}{P}\quad i=1,2,\ldots ,N$$

(7)

The instantaneous battery charging impedance ${Z}_{{{{\rm{CHG}}}}_{{{\rm{ist}}}}}$ is computed over the CC-a phase²⁷ as follows:

$${Z}_{{{{\rm{CHG}}}}_{{{\rm{ist}}}}}({t}_{k})=-\frac{{V}_{{{\rm{ch}}}}({t}_{k})-{V}_{{{\rm{ch}}}}({t}_{k-1})}{{I}_{{{\rm{ch}}}}}$$

(8)

where V_ch(t_k) − V_ch(t_k−1) is the voltage difference over the interval Δt = t_k − t_k−1, and I_ch is the constant charging current during the CC-a phase.

The choice of the time window Δt is crucial. Increasing Δt helps filter out noise from the voltage difference in the numerator of Equation (9) and reduces current quantization effects. However, too large a window can excessively filter and result in information loss. Therefore, Δt is tuned to balance noise reduction while preserving the information content of ${Z}_{{{{\rm{CHG}}}}_{{{\rm{ist}}}}}$. The time intervals Δt are selected based on the C-rate: Δt = 60 s for C/4, Δt = 30 s for C/2, and Δt = 1 s for 1C charging events.

After extracting the instantaneous battery impedance for all the time intervals of the charging phase, the Z_CHG indicator is computed for each charging phase by averaging the ${Z}_{{{{\rm{CHG}}}}_{{{\rm{ist}}}}}$ within a specific voltage range [V_in, V_fin]:

$${Z}_{{{\rm{CHG}}}}=\frac{1}{M}\mathop{\sum }_{{t}_{k}={t}_{{{\rm{in}}}}}^{{t}_{{{\rm{fin}}}}}{Z}_{{{{\rm{CHG}}}}_{{{\rm{ist}}}}}({t}_{k})$$

(9)

where M is the number of ${Z}_{{{{\rm{CHG}}}}_{{{\rm{ist}}}}}$ measurements within the considered voltage range, and t_in and t_fin are the initial and final time instants such that V(t_in) = V_in and V(t_fin) = V_fin, respectively. The voltages V_in and V_fin were set to 3.8 V and 3.9 V, respectively, based on the sensitivity analysis presented in Supplementary Note 4.

An alternative formulation would be to compute the average of ${Z}_{{{{\rm{CHG}}}}_{{{\rm{ist}}}}}$ within a SOC range instead of a voltage range. However, we opted for the voltage-based formulation to avoid estimation errors affecting the SOC, which is a non-measurable quantity generally estimated by the BMS. Additionally, a different definition of charging impedance, discussed in Supplementary Note 8, has been excluded in the present work due to its lower correlation with capacity fade.

Finally, the energy during charging and discharging is computed on the CC-a charging segment (see Supplementary Note 2) and driving UDDS profile, respectively, by integrating the electrical power within a fixed voltage window, specifically [V_in,ch, V_fin,ch] and [V_in,dis, V_fin,dis]:

$${E}_{{{\rm{ch}}}} = {\int_{{t}_{{{\rm{in}}}}}^{{t}_{{{\rm{fin}}}}}}{V}_{{{\rm{ch}}}}(t)\cdot {I}_{{{\rm{ch}}}}(t)\,{{\rm{dt}}}$$

(10)

$${E}_{{{\rm{dis}}}} = {\int_{{t}_{{{\rm{in}}}}}^{{t}_{{{\rm{fin}}}}}}{V}_{{{\rm{dis}}}}(t)\cdot {I}_{{{\rm{dis}}}}(t)\,{{\rm{dt}}}$$

(11)

where V_ch is the cell voltage during charging, I_ch is the cell current during charging and t_in and t_fin are the initial and final time instants such that V_ch(t_in) = V_in,ch and V_ch(t_fin) = V_fin,ch. Similarly, V_dis is the cell voltage during discharging, I_dis is the cell current during discharging and t_in and t_fin are the initial and final time instants such that V_dis(t_in) = V_in,dis and V_dis(t_fin) = V_fin,dis. Thus, energy is not only a function of the C-rate but also depends on the voltage window over which it is calculated.

We selected the fixed voltage windows [V_in,ch = 3.6 V, V_fin,ch = 3.9 V] and [V_in,dis = 3.85 V, V_fin,dis = 3.4 V] for computing E_ch and E_dis, respectively, to bypass the initial and final stages of charging and discharging, which are potentially prone to noise.

Sensitivity of charging energy to voltage window

To assess the feasibility of using energy during charging for partial charging profiles, the correlation between E_ch and capacity loss was quantified across different voltage ranges. First, the interval [V_in,ch, V_f,ch] was divided into sub-intervals of 0.25 V amplitude, and the energy was computed for each sub-interval.

As shown in Fig. 9, there is a strong correlation between energy during charging and capacity loss across all voltage sub-intervals. These results show that energy can be effectively used to estimate the SOH for partial and narrow charging periods. The analysis indicates that the voltage interval with the highest correlation also depends on the charging rate. This insight facilitates straightforward integration into the BMS.

**Fig. 9: Impact of voltage range, C-rate and charging cycle number on energy during charging.**

Pre-processing and incremental indicators

Data pre-processing is essential for effectively using SOH indicators in data-driven algorithms. A critical step is removing outliers—data points that deviate from the majority. Outliers can affect feature extraction and machine learning model performance. Therefore, a careful approach is used to remove outlier-containing data, ensuring more robust and reliable feature representation. A second step of the pre-processing phase is the computation of the incremental features, denoted by Δ. This subsection explains how to obtain these features, using incremental resistances as an example.

For each cell in the dataset, the vector of incremental resistances ΔR is calculated as follows:

1.
For each aging cycle i^th, i = 1, …, N, the resistance during the discharge phase over acceleration peaks is calculated as a function of SOC. R¹ represents the average resistance over the SOC range of 80% and 20% during discharge. The resulting resistance vector is:
$${{\bf{R}}}=[{R}^{1},{R}^{2},\ldots ,{R}^{N}]$$
(12)
where R¹ is the average fresh cell resistance and R^N is the average resistance at the last cycle.
2.
Obtain the incremental resistance vector by subtracting R¹ from each value in R.
$$\Delta {{\bf{R}}}={{\bf{R}}}-{R}^{1}$$
(13)

This approach ensures that the first element of the incremental vector for each feature is zero, facilitating the comparison of aging trends across cells. Additionally, the charging impedance vector Z_CHG requires further pre-processing due to its dependency on the C-rate (see, Fig. 3. this feature strongly depends on the C-rate at which it is computed. To standardize across different C-rates, the incremental vector ΔZ_CHG is normalized using the fresh cell impedance value:

$$\Delta {{{\bf{Z}}}}_{{{\rm{CHG}}}}^{{{\rm{NORM}}}}=\frac{\Delta {{{\bf{Z}}}}_{{{\rm{CHG}}}}}{{Z}_{{{\rm{CHG}}}}^{1}}$$

(14)

where ${Z}_{{{\rm{CHG}}}}^{1}$ is the charging impedance calculated over the first aging cycle in Batch #1 in the voltage range [3.8 V - 3.9 V] as described in Sec. Definition of SOH indicators. Normalization reduces variations from different charging rates, providing a consistent feature representation. This pre-processing step is crucial for evaluating the ML model across cells cycled at various rates, effectively excluding C-rate as a training feature. It ensures a more refined data representation for machine learning algorithms.

Estimation model

In this work, the LRM estimates capacity fade due to its strong linear correlation with SOH indicators. The LRM relates the response variable y to the input vector u as follows⁵⁶:

$$y(t)={\beta }_{0}+\beta {{\boldsymbol{u}}}(t)+\epsilon (t)$$

(15)

where ϵ represents model error, capturing deviations between the model and observed data. Coefficients β are determined using the least-squares method, which minimizes the model error on the training dataset. To evaluate the accuracy of the estimation models, the root mean square error (RMSE) is calculated as:

$${{\rm{RMSE}}}=\sqrt{\frac{\mathop{\sum }_{i = 1}^{N}{e}_{i}^{2}}{N}}$$

(16)

$${e}_{i}=\frac{{Q}_{{{\rm{cell}}},i}-{Q}_{{{\rm{est}}},i}}{{Q}_{{{\rm{cell}}},i}}$$

(17)

where e_i is the relative error, with Q_cell,i and Q_est,i representing the actual and estimated capacities at the cycle i, respectively. Additionally, the absolute percentage error is given by APE(%) = ∣e_i∣ ⋅ 100.

Data availability

All experimental data⁴⁰ are available online at the following Open Science Framework repository: OSF.

Code availability

The code supporting the findings of this study is available at the following Open Science Framework repository: OSF.

References

On Climate Change, I. P. Ipcc sixth assessment report https://www.ipcc.ch/report/ar6/wg1/ (accessed August 2021).
Agency, I. E. Global ev outlook 2022 https://www.iea.org/reports/global-ev-outlook-2022 (May 2022).
Li, M., Lu, J., Chen, Z. & Amine, K. 30 years of lithium‐ion batteries. Advanced Materials 30, 1800561 (2018).
Nuhic, A., Terzimehic, T., Soczka-Guth, T., Buchholz, M. & Dietmayer, K. Health diagnosis and remaining useful life prognostics of lithium-ion batteries using data-driven methods. J. Power Sources 239, 680–688 (2013).
Article Google Scholar
Rezvanizaniani, S. M., Liu, Z., Chen, Y. & Lee, J. Review and recent advances in battery health monitoring and prognostics technologies for electric vehicle (EV) safety and mobility. J. Power Sources 256, 110–124 (2014).
Article Google Scholar
Plett, G. L (2015) Battery management systems, Volume II: Equivalent-circuit methods. Artech House, Boston.
Plett, G. L. Dual and joint EKF for simultaneous SOC and SOH estimation. 21st Electric Vehicle Symposium (EVS21) 1–12 (2005).
Zhang, F., Liu, G. & Fang, L. Battery state estimation using unscented kalman filter. In 2009 IEEE International Conference on Robotics and Automation, 1863–1868 (IEEE, Kobe, Japan, 2009).
Taborelli, C. et al. Advanced battery management system design for soc/soh estimation for e-bikes applications. Int. J. Powertrains 5, 325 (2016).
Article Google Scholar
Chu, A., Allam, A., Cordoba Arenas, A., Rizzoni, G. & Onori, S. Stochastic capacity loss and remaining useful life models for lithium-ion batteries in plug-in hybrid electric vehicles. J. Power Sources 478, 228991 (2020).
Article Google Scholar
Chen, Z., Mi, C. C., Fu, Y., Xu, J. & Gong, X. Online battery state of health estimation based on genetic algorithm for electric and hybrid vehicle applications. J. Power Sources 240, 184–192 (2013).
Article Google Scholar
Miao, Q., Xie, L., Cui, H., Liang, W. & Pecht, M. Remaining useful life prediction of lithium-ion battery with unscented particle filter technique. Microelectron. Reliab. 53, 805–810 (2013).
Article Google Scholar
Xing, Y., Ma, E. W., Tsui, K.-L. & Pecht, M. An ensemble model for predicting the remaining useful performance of lithium-ion batteries. Microelectron. Reliab. 53, 811–820 (2013).
Article Google Scholar
He, W., Williard, N., Osterman, M. & Pecht, M. Prognostics of lithium-ion batteries based on dempster-shafer theory and the bayesian monte carlo method. J. Power Sources 196, 10314–10321 (2011).
Article Google Scholar
Li, J., Adewuyi, K., Lofti, N., Landers, R. G. & Park, J. A single particle model with chemical/mechanical degradation physics for lithium ion battery state of health (soh) estimation. Appl. Energy 212, 1178–1190 (2018).
Article Google Scholar
Moura, S. J., Chaturvedi, N. A. & Krstić, M. Adaptive partial differential equation observer for battery state-of-charge/state-of-health estimation via an electrochemical model. J. Dyn. Syst. Meas. Control 136, 011015 (2014).
Santhanagopalan, S., Zhang, Q., Kumaresan, K. & White, R. E. Parameter estimation and life modeling of lithium-ion cells. J. Electrochem. Soc. 155, A345 (2008).
Article Google Scholar
Prada, E. et al. A simplified electrochemical and thermal aging model of lifepo4-graphite li-ion batteries: Power and capacity fade simulations. J. Electrochem. Soc. 160, A616 (2013).
Article Google Scholar
Allam, A. & Onori, S. Online capacity estimation for lithium-ion battery cells via an electrochemical model-based adaptive interconnected observer. IEEE Trans. Control Syst. Technol. 29, 1636–1651 (2021).
Article Google Scholar
Nagulapati, V. M. et al. Capacity estimation of batteries: Influence of training dataset size and diversity on data driven prognostic models. Reliab. Eng. Syst. Saf. 216, 108048 (2021).
Article Google Scholar
Vilsen, S. B. & Stroe, D.-I. Battery state-of-health modelling by multiple linear regression. J. Clean. Prod. 290, 125700 (2021).
Article Google Scholar
Lin, C. P., Cabrera, J., Yu, D. Y. W., Yang, F. & Tsui, K. L. Soh estimation and soc recalibration of lithium-ion battery with incremental capacity analysis; cubic smoothing spline. J. Electrochem. Soc. 167, 090537 (2020).
Article Google Scholar
Liu, D., Pang, J., Zhou, J., Peng, Y. & Pecht, M. Prognostics for state of health estimation of lithium-ion batteries based on combination gaussian process functional regression. Microelectron. Reliab. 53, 832–839 (2013).
Article Google Scholar
Yang, D., Zhang, X., Pan, R., Wang, Y. & Chen, Z. A novel gaussian process regression model for state-of-health estimation of lithium-ion battery using charging curve. J. Power Sources 384, 387–395 (2018).
Article Google Scholar
Yang, D., Wang, Y., Pan, R., Chen, R. & Chen, Z. A neural network based state-of-health estimation of lithium-ion battery in electric vehicles. Energy Procedia 105, 2059–2064 (2017).
Article Google Scholar
Mansouri, S. S., Karvelis, P., Georgoulas, G. & Nikolakopoulos, G. Remaining useful battery life prediction for uavs based on machine learning. IFAC-PapersOnLine 50, 4727–4732 (2017).
Article Google Scholar
Pozzato, G. et al. Analysis and key findings from real-world electric vehicle field data. Joule 7, 1–19 (2023).
Article Google Scholar
Yang, H., Hong, J., Liang, F. & Xu, X. Machine learning-based state of health prediction for battery systems in real-world electric vehicles. J. Energy Storage 66, 107426 (2023).
Article Google Scholar
Hou, Y., Zhang, Z., Liu, P., Song, C. & Wang, Z. Research on a novel data-driven aging estimation method for battery systems in real-world electric vehicles. Adv. Mech. Eng. 13, 16878140211027735 (2021).
Article Google Scholar
Hong, J. et al. Online accurate state of health estimation for battery systems on real-world electric vehicles with variable driving conditions considered. J. Clean. Prod. 294, 125814 (2021).
Article Google Scholar
She, C., Wang, Z., Sun, F. & Zhang, L. Battery aging assessment for real-world electric buses based on incremental capacity analysis and radial basis function neural network. IEEE Trans. Ind. Inform. 16.5, 3345–3354 (2019).
Google Scholar
Song, L., Zhang, K., Liang, T., Han, X. & Zhang, Y. Intelligent state of health estimation for lithium-ion battery pack based on big data analysis. J. Energy Storage 32, 101836 (2020).
Article Google Scholar
He, Z. et al. State-of-health estimation based on real data of electric vehicles concerning user behavior. J. Energy Storage 41, 102867 (2021).
Article Google Scholar
Huo, Q., Ma, Z., Zhao, X., Zhang, T. & Zhang, Y. Bayesian network based state-of-health estimation for battery on electric vehicle application and its validation through real-world data. IEEE Access 9, 11328–11341 (2021).
Article Google Scholar
Zhou, Y., Huang, M., Chen, Y. & Tao, Y. A novel health indicator for on-line lithium-ion batteries remaining useful life prediction. J. Power Sources 321, 1–10 (2016).
Article Google Scholar
Wu, J., Wang, Y., Zhang, X. & Chen, Z. A novel state of health estimation method of li-ion battery using group method of data handling. J. Power Sources 327, 457–464 (2016).
Article Google Scholar
Severson, K. A. et al. Data-driven prediction of battery cycle life before capacity degradation. Nat. Energy 4.5, 383–391 (2019).
Article Google Scholar
Bole, B., Kulkarni, C. S. & Daigle, M. Adaptation of an electrochemistry-based Li-ion battery model to account for deterioration observed under randomized use. Annu. Conf. PHM Soc. 6, 1 (2014).
Birkl, C. Oxford battery degradation dataset 1 (2017).
Pozzato, G., Allam, A. & Onori, S. Lithium-ion battery aging dataset based on electric vehicle real-driving profiles. Data Brief. 41, 107995 (2022).
Article Google Scholar
Zhang, Y., Wik, T., Bergström, J., Pecht, M. & Zou, C. A machine learning-based framework for online prediction of battery ageing trajectory and lifetime using histogram data. J. Power Sources 526, 231110 (2022).
Lyu, Z., Wang, G. & Tan, C. A novel bayesian multivariate linear regression model for online state-of-health estimation of lithium-ion battery using multiple health indicators. Microelectron. Reliab. 131, 114500 (2022).
Article Google Scholar
Shi, M., Xu, J., Lin, C. & Mei, X. A fast state-of-health estimation method using single linear feature for lithium-ion batteries. Energy 256, 124652 (2022).
Article Google Scholar
Chen, L., Lü, Z., Lin, W., Li, J. & Pan, H. A new state-of-health estimation method for lithium-ion batteries through the intrinsic relationship between ohmic internal resistance and capacity. Measurement 116, 586–595 (2018).
Article Google Scholar
Lin, M. et al. A data-driven approach for estimating state-of-health of lithium-ion batteries considering internal resistance. Energy 277, 127675 (2023).
Article Google Scholar
Paxton, W. et al. Battery management system for determining a health of a power source based on driving events (2024). US patent filed under application number US17/975.
Cui, Y. et al. State of health diagnosis model for lithium ion batteries based on real-time impedance and open circuit voltage parameters identification method. Energy 144, 647–656 (2018).
Article Google Scholar
Cai, L., Lin, J. & Liao, X. An estimation model for state of health of lithium-ion batteries using energy-based features. J. Energy Storage 46, 103846 (2022).
Article Google Scholar
Gong, D., Gao, Y., Kou, Y. & Wang, Y. State of health estimation for lithium-ion battery based on energy features. Energy 257, 124812 (2022).
Article Google Scholar
Peng, S. et al. State of health estimation of lithium-ion batteries based on multi-health features extraction and improved long short-term memory neural network. Energy 282, 128956 (2023).
Article Google Scholar
Khaleghi, S., Firouz, Y., Van Mierlo, J. & Van den Bossche, P. Developing a real-time data-driven battery health diagnosis method, using time and frequency domain condition indicators. Appl. Energy 255, 113813 (2019).
Article Google Scholar
Ovejas, V. J. & Cuadras, A. Effects of cycling on lithium-ion battery hysteresis and overvoltage. Sci. Rep. 9, 14875 (2019).
Fly, A. & Chen, R. Rate dependency of incremental capacity analysis (dq/dv) as a diagnostic tool for lithium-ion batteries. J. Energy Storage 29, 101329 (2020).
Article Google Scholar
Yang, X.-G., Leng, Y., Zhang, G., Ge, S. & Wang, C.-Y. Modeling of lithium plating induced aging of lithium-ion batteries: Transition from linear to nonlinear aging. J. Power Sources 360, 28–40 (2017).
Article Google Scholar
Jiao, S., Gao, Y., Feng, J., Lei, T. & Yuan, X. Does deep learning always outperform simple linear regression in optical imaging? Opt. Express 28, 3717 (2020).
Article Google Scholar
Khuri, A. I. Introduction to linear regression analysis, fifth edition by douglas c. montgomery, elizabeth a. peck, g. geoffrey vining. Int. Stat. Rev. 81, 318–319 (2013).
Article Google Scholar

Download references

Acknowledgements

The authors thank the members and alumni of the Stanford Energy Control Lab: Luca Pulvirenti, Sara Ha, Sai Thatipamula, Muhammad Aadil Khan and, in particular, Le Xu for their feedback on the manuscript. This work was partially funded by the Precourt Institute of Energy, Stanford University. This research is enabled in part through computational resources and support provided by Sherlock compute cluster of Stanford University.

Author information

These authors contributed equally: Andrea Lanubile, Pietro Bosoni.

Authors and Affiliations

Energy Science & Engineering, Stanford University, Stanford, CA, USA
Andrea Lanubile, Pietro Bosoni, Gabriele Pozzato, Anirudh Allam & Simona Onori
Energy Department, Politecnico di Torino, Torino, Italy
Matteo Acquarone

Authors

Andrea Lanubile
View author publications
Search author on:PubMed Google Scholar
Pietro Bosoni
View author publications
Search author on:PubMed Google Scholar
Gabriele Pozzato
View author publications
Search author on:PubMed Google Scholar
Anirudh Allam
View author publications
Search author on:PubMed Google Scholar
Matteo Acquarone
View author publications
Search author on:PubMed Google Scholar
Simona Onori
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: A.A., G.P., and S.O.; data curation: A.L., G.P., and P.B.; formal analysis: A.L., M.A., and P.B.; funding acquisition: S.O.; investigation: A.L., G.P., P.B., and S.O.; methodology: A.L., G.P., M.A., and P.B., and S.O.; project administration: S.O.; resources: S.O.; supervision: S.O.; visualization: A.L., G.P., M.A., and P.B.; validation: A.L., M.A., and P.B.; writing—original draft: A.L., G.P., M.A., P.B., S.O.; writing—review and editing: A.L., M.A., P.B., S.O.

Corresponding author

Correspondence to Simona Onori.

Ethics declarations

Competing interests

The authors declare the following competing interests: A.A. is with Archer Aviation and G.P. is with Form Energy. They were both affiliated with Stanford University at the time of the research. All other autors declare no competing interests.

Peer review

Peer review information

Communications Engineering thanks Yongzhi Zhang, Zhibin Zhao, and the other, anonymous, reviewer for their contribution to the peer review of this work. Primary Handling Editors: [Jiangong Zhu] and [Rosamund Daw and Saleem Denholme].

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Lanubile, A., Bosoni, P., Pozzato, G. et al. Domain knowledge-guided machine learning framework for state of health estimation in Lithium-ion batteries. Commun Eng 3, 168 (2024). https://doi.org/10.1038/s44172-024-00304-2

Download citation

Received: 02 December 2023
Accepted: 23 October 2024
Published: 12 November 2024
Version of record: 12 November 2024
DOI: https://doi.org/10.1038/s44172-024-00304-2

This article is cited by

Battery management systems for vehicle electrification

Communications Engineering (2026)
Lithium battery state of health (SOH): analysis based on capacity increments and data-driven methods
- Lu He
- Chen Lu
- Wei Wei
Electrical Engineering (2025)