Introduction

Lithium-ion batteries (LIBs) are a vital technology for advancing the transition from fossil fuels towards renewable energy solutions1. Their high energy density, long cycle life, and steadily decreasing costs2 have spurred rapid adoption in both electric vehicles (EVs) and grid-scale battery energy storage systems (BESS). Modeling the inherent degradation of LIBs is challenging for several reasons. First, due the rapid development of LIB technology and the commercial value of battery data, operational or experimental datasets with batteries near their end-of-life are often either limited in size, or lack relevant information such as temperature, voltage, or current3. Second, the degradation is driven by several internal non-linear chemical processes, and depends strongly on the operating conditions. In particular, degradation, sometimes referred to as State of Health (SoH), is affected by the number of operating cycles, temperature, charge/discharge rate, and depth of discharge4. Battery degradation can be accurately parameterized by the combination of loss of \(\hbox {Li}^+\) inventory (LLI) and loss of active material (LAM)5,6.

Methods for modeling degradation can be broadly divided into two categories – physics-based and data-driven techniques. Physics-based techniques attempt to model the underlying physical and chemical mechanisms that cause degradation, such as lithium plating7 and solid-electrolyte interface (SEI) growth8. On the other hand, data driven methods predominantly use operational characteristics to predict the future capacity.

Physics-based modelling of lithium-ion batteries aims to describe the internal electrochemical, thermal, and mechanical processes governing battery behaviour using first-principles equations9,10. Unlike empirical or data-driven models, these approaches are grounded in the fundamental laws of physics such as mass conservation, charge conservation, and thermodynamics, and offer physically interpretable parameters that often generalize well across operating conditions and battery types11,12,13,14. The most widely used and foundational physics-based model is the pseudo-two-dimensional (P2D) model, also known as the Doyle-Fuller-Newman (DFN) model or the Porous Electrode Theory (PET) model. This model represents the battery cell as a one-dimensional domain in the through-plane direction (from anode to cathode), while resolving lithium diffusion in spherical particles in the electrode materials. It includes coupled partial differential equations (PDEs) for the underlying intra-cell electrochemical dynamics governing mass and charge transport, potential distributions, and chemical reactions across (and between) the electrolyte and solid-phases15,16,17,18,19.

One major benefit to the physics-based models is that they can be extended to include additional physics. These include side reactions such as SEI layer growth and lithium plating20,21,22, and particle and binder fracture due to mechanical strain and stress23,24,25. The resulting models are typically much more complex than the P2D and, despite the fact that many built from reduced-order models, typically the Single Particle Model (SPM)26,27,28, parametrisation of such models is challenging and often not possible for many commercial use-cases.

Despite its accuracy and rigor in the formulation, P2D-based models are not appropriate for all use-cases. They require sophisticated techniques for numerical solutions, which can be sensitive changes in parametrisation and are computationally expensive29. Models vary in accuracy across different use-cases, requiring extensive research and development, and model parametrisation typically has stringent data requirements and is difficult to automate. These inevitably make it difficult to deal with many practical use-cases where data-drift and scaling are an issue, such as in grid energy storage or electric vehicle fleets.

Data-driven methods include recursive algorithms such as Kalman filters30 and Sequential Monte Carlo methods31. Whilst these techniques can yield useful predictions, they are model dependent and struggle to handle measurement noise and inaccuracies effectively. Consequently, there has been a growing shift toward time-series machine learning models for forecasting lithium-ion battery (LIB) degradation, including recurrent neural networks (RNNs), long short-term memory (LSTM) networks, and convolutional neural networks (CNNs)32,33.

Although deep-learning models have achieved some success in forecasting LIB degradation, most studies primarily focus on estimating remaining useful life (RUL) or capacity curves. These approaches face two significant limitations: First, they generalize poorly to conditions not seen in the training set and often fail to predict knee-points, particularly in cases with unusual degradation paths34. Second, they make no attempt to diagnose the degradation by quantifying the underlying LLI and LAM. This work aims to address both of these challenges.

On the other hand, generating data purely from pre-parameterized physics-based models (either through direct simulation or training data-driven models on synthetic datasets) is often insufficient for accurate degradation prediction. Real-world factors such as manufacturing defects, subtle variations in operating conditions, or inaccuracies in model parameters can lead to large discrepancies between simulated and actual battery performance35. Whilst parametrizing the physics-based models can mitigate this, the process is computationally expensive, has strict data requirements, and depends heavily on knowledge of LIB design. Additionally, the accuracy of physics-based models relies on a detailed understanding of the degradation mechanisms for the specific LIBs in question – an ongoing area of research with no one-size-fits-all solution36,37,38,39,40. Therefore, in many real-world scenarios where data is incomplete and real-time solutions are required, purely physics-based approaches can be impractical.

An alternative approach is offered by Dubarry et al., who employ a modified equivalent-circuit model (ECM) to emulate LIB performance under various states of degradation41. The authors propose a broad set of potential degradation behaviours that the ECM can simulate, thus reproducing voltage curves and (dis)charging characteristics for a wide variety of degradation paths. By comparing these simulated curves with experimental data, Dubarry et al. demonstrate that it is possible to identify the state of degradation in LIBs and infer their physical state, including LAM and LLI. However, this method doesn’t allow accurate degradation forecasting, as different degradation pathways are often similar in the early stage, so extrapolating a simulated curve introduces prohibitively large uncertainties.

In this work, we introduce ACCEPT (Adaptive Contrastive Capacity Estimation Pre-Training), a novel model that forecasts degradation by combining the strengths of both data-driven and physics-based approaches. The model structure is inspired by OpenAI’s CLIP42, which demonstrated the power of contrastive learning for zero-shot prediction. ACCEPT learns to predict the most probable future degradation path from a large range of simulated curves, by using combination of historical capacity sequences and operational features, including temperature, current, and voltage.

This approach offers a significant advantage: by incorporating a physics-based battery model, we can quickly and cheaply simulate a wide range of degradation paths without the need for extensive experimentation. These simulated curves can be matched to capacity curves from real, operational data of batteries in open-source datasets to create a labeled dataset. Thus, information from both the operational and simulated data is leveraged to accurately forecast the degradation path.

Using the physics-based model, each simulated data point can be associated with a combination of LLI and LAM. The operational data and the simulated data are encoded separately using time-series models. These encoded representations are then projected into a shared embedding space, enabling the model to learn complex relationships between observed and simulated degradation patterns. During inference, an LIB’s operational features are encoded and the closest matching simulated degradation curve is retrieved. This simulated curve is taken as the prediction for its future degradation.

To embed the operational data, a modified Temporal Fusion Transformer (TFT)43 is used. This allows battery metadata, such as the chemistry of the LIB and the initial capacity, to be included in the model, allowing the model to be generalized across different LIB types. To embed the simulated degradation curves, a CNN-based architecture is used. This retrieval-based method allows the quantification of degradation modes, which is not feasible in existing data-driven methods.

This approach allows for the degradation pathway to be estimated from as few as 100 cycles. We show that this generalizes better than existing approaches to unknown operational scenarios, helping to mitigate the data availability problem encountered in battery research. Additionally, if the parameters of an LIB are not known and therefore curves cannot be generated using physics-based models, users can still obtain accurate results using curves generated for LIBs with similar chemistries.

Sec. 2 outlines the method for generating the simulated curves and then describes the model in detail. The model training and results are shown in Sec. 3.

Proposed approach

The proposed approach is to train a model to match a set of operational data associated with a specific degradation pathway with the corresponding simulated scenario. This allows both diagnosis of historic degradation and forecasting of the future capacity fade. The operational and simulated data are embedded separately to perform the comparison. Features such as temperature, voltage, and current are included in the operational embedding so that the model can generalize to unseen scenarios. Before training, a set of simulated curves is generated from a simple set of degradation equations, as described in Sec. 2.1. The model architecture is shown in Sec. 2.2.

Generating simulated degradation curves

The choice of degradation model is not fundamental to ACCEPT. In principle, any method capable of generating simulated degradation curves that sufficiently describe the full range of operational curves under study can be used. Below, we present a simple dimensionless set of equations as a proof of concept, but acknowledge that other methods, such as44, may be more suitable for real-world applications.

A mathematical model, in the form of a system of ordinary differential equations describing capacity fade as a combination of LLI and LAM, is proposed. This is built on the idea of41,45, which proposes that capacity fade can be described as a combination of LLI and LAM. This model is built on simplified physics as the product of remaining active material and lithium inventory. The LIB’s active material is assumed to degrade exponentially over time, while lithium lost is described as a combination of SEI growth and lithium plating. SEI growth is assumed to increase linearly, whereas for plating a time delay is included in order to simulate that this mode of degradation may not begin at start of life. This former mechanism aims to emulate the idea that plating often occurs only once local voltages within the LIB are large enough, an effect which occurs only during extreme operating conditions e.g., high C-rates, low temperatures or unsafe voltage ranges, and once other degradation mechanisms have accumulated to cause such effects, e.g., low porosity due to SEI growth or high effective C-rates due to reduced capacity36,37,38,39,40,46.

During the inference process, any number of positive samples from one to infinity (constrained by available computing power) can be fed into the simulated profile encoder. The model will then return the similarity score of the operational data to each of these samples. This allows for the creation of custom simulated curves that can be used for specific use-cases, or curves from pre-existing data. Two curves can be compared to see how likely it is a LIB’s trajectory will follow each of them (linear projection vs expected knee), or a large number can be used for accurate curve estimation.

Degradation Equations

The proposed dimensionless model that describes the decline of LIB capacity due to SEI growth and lithium plating in \(t\in [0,\infty )\) is given by (1)-(4). Here, t can be defined as both time and cycle number, which are equivalent under transformations to the model parameters. LAM degradation is modeled by,

$$\begin{aligned} {\dot{M}} = -k M, \end{aligned}$$
(1)

where M is the material-capacity of the LIB, with initial condition \(M(0)=1\), and k is the rate of material degradation. LLI degradation is modeled by,

$$\begin{aligned} {\dot{S}} = a, \quad {\dot{P}} = {\left\{ \begin{array}{ll}0 & t\le t_p,\\ 0.5 b \left( 1+\tanh \left( c\left( t-t_p\right) \right) \right) & t>t_p,\end{array}\right. } \end{aligned}$$
(2)

where S and P are the lithium-loss due to SEI and plating, respectively, which have initial conditions \(S(0)=P(0)=0\), a and b are growth rates of SEI and plating, respectively, c determines the sharpness of the knee, and \(t_p\) is the point in time which plating begins. Capacity is modeled by,

$$\begin{aligned} L = S + P, \qquad C = (1-L)M, \end{aligned}$$
(3)

where L is the total loss of lithium inventory and C is the LIB capacity. \(L\in [0,1]\), where \(L=1\) corresponds to the complete loss of lithium inventory. The SEI and plating growth rates, a and b, are defined as functions of time by,

$$\begin{aligned} \begin{aligned} a(t)&= 0.5 a_0 \left( 1+\tanh \left( \kappa \left( 1-L\right) \right) \right) , \qquad \\ b(t)&= 0.5 b_0 \left( 1+\tanh \left( \kappa \left( 1-L\right) \right) \right) , \end{aligned} \end{aligned}$$
(4)

where \(a_0\) and \(b_0\) are the typical rate parameters for a and b, respectively. To simulate a sharp gradient we take \(\kappa =100\). We could have instead taken a and b to be step functions with discontinuities at \(L=1\), however, discontinuous functions often cause instabilities in numerical methods and so this was instead modelled as a continuous function with a sharp gradient.

In equations (2) and (4), the hyperbolic tangent function, \(\tanh\), is used as a way to continuously activate or deactivate processes as the battery state evolves. In equation (2), plating does not begin until \(t\approx t_p\), which simulates a knee in the capacity curve, where the sharpness of the knee is determined by c. In equations (4) LLI growth is switched off as \(L\rightarrow 1\), accounting for scenarios where all lithium inventory is lost.

Numerical Simulation

The system of ordinary differential equations (1)-(4) are solved numerically using the classic fourth-order Runge-Kutta method, RK447, with a step size of \(h = 0.01\). An early stop criterion is applied when \(C<0.7\) (70% SoH), since the experimental data does not include degradation beyond this point. The total accumulated error of the RK4 method is \(\mathbb {O}(h^4)\) and hence, given the method’s stability, the numerical error is negligible.

Each curve in the operational dataset is parametrized by finding optimal values for the parameters k, \(a_0\), \(b_0\), and \(t_p\). This is achieved using Python’s SciPy48 minimize function which minimizes the L2-norm (approximated by the midpoint rule) between the data and simulation.

To create a comprehensive training set, simulations are generated by solving the model for a large set of parameter values sampled uniformly within the ranges identified from the parameterized data curves. Fig. 1 shows an example of an operational curve and the best matching simulated curve.

Fig. 1
figure 1

Example of an operational degradation curve with the corresponding matched simulated curve.

Model architecture

Fig. 2
figure 2

Overview of the contrastive pre-training approach. The two encoders are trained jointly using a contrastive loss. Operational data—including current, temperature, voltage, and static metadata (e.g., cell chemistry)—is mapped into a shared latent space with simulated degradation curves. The model learns to align operational data with its corresponding simulated behaviour by maximising similarity for matched pairs and minimising it for mismatched ones.

Fig. 3
figure 3

Schematic showing inference (a) and diagnosis (b) using the trained model. At inference time, the encoder maps new operational data (observed mid-life measurements) into the latent space and retrieves the most similar simulated degradation curve from a database. This provides a degradation prediction and enables diagnosis by retrieving the underlying simulation parameters associated with the predicted trajectory.

An overview of the training and inference procedure for ACCEPT is shown in Figs. 2 and 3. Below is detailed a description of the model architecture.

Let \(\mathbb {S}\) and \(\mathbb {O}\) represent the vector spaces for the simulated and operational data, respectively, where \(\mathbb {S}\) consists solely of the capacity (degradation) curves, while \(\mathbb {O}\) includes both the degradation curves and associated operational features such as temperature, voltage, and current. The architecture consists of two distinct encoders: (i) a simulation encoder \(E_s: \mathbb {S} \rightarrow \mathbb {R}^{d}\), and; (ii) an operational encoder \(E_o: \mathbb {O} \rightarrow \mathbb {R}^{d}\), where d is the dimension of the shared latent space (\(d=512\) is used in this case).

The simulation encoder, \(E_s\), employs a CNN architecture:

$$\begin{aligned} z_s = \text {Pool}(\text {CNN}(s; \theta _s)), \end{aligned}$$

where CNN consists of L layers of 1D convolutions with ReLU activations. Each layer l applies:

$$\begin{aligned} x^{(l)} = \text {ReLU}(\text {Conv1D}(x^{(l-1)})) \in \mathbb {R}^{C_l \times T_l}, \end{aligned}$$

where \(C_l\) represents the number of channels in layer l, and \(T_l\) represents the temporal length of the sequence at that layer. This model showed a good trade-off between extracting long-term trends and computational efficiency, although it is possible to replace this block with other architectures well suited for time-series processing.

Operational Encoding We adopt the TFT43 as the basis of the operational encoder due to its state-of-the-art performance in time-series forecasting. The TFT effectively captures temporal dependencies at multiple scales and includes specialized components such as the Variable Selection Network for identifying the most relevant features at each time step. Moreover, it naturally integrates static real-valued and categorical data, enabling us to incorporate crucial information like LIB type and chemistry. This flexibility is vital for the zero-shot setting, as we aim to learn general representations that transfer effectively to a wide range of battery configurations and operational conditions. In theory, any model that captures this information could be used as the operational encoder. When passing the data to the operational encoder, multiple input sequences of varying lengths are created from each operational curve, as shown in Fig. 4. This enhances the model’s ability to make predictions at different points in the LIB lifecycle. Note that since different cells are used for training and testing, there is no possibility of data leakage from future observations of the same cell during inference.

Fig. 4
figure 4

On the left of the image, a degradation curve from the same battery at different points in its lifecycle is shown. In the training process, each of these curves would form a positive and negative pair with the simulated curves on the upper right and lower right, respectively.

The operational encoder \(E_o\) modifies the TFT by replacing the forecasting head with a dense embedding layer:

$$\begin{aligned} h = \text {TFT}_{\text {base}}(o; \theta _o), \\ z_o = W_e h + b_e, \end{aligned}$$

where:

  • \(\text {TFT}_{\text {base}}\) is the standard TFT encoder-decoder architecture until the final forecasting layer

  • \(h \in \mathbb {R}^{d_h}\) is the final hidden state of the TFT

  • \(W_e \in \mathbb {R}^{d \times d_h}\) and \(b_e \in \mathbb {R}^{d}\) are learnable parameters of the embedding layer

  • \(z_o \in \mathbb {R}^{d}\) is the final operational embedding

Specifically, \(\text {TFT}_{\text {base}}\) processes the input through:

$$\begin{aligned} v_t = \text {VSN}(o_t), \\ c_t = \text {MHA}({v_1,...,v_T}), \\ h = \text {Pool}({c_1,...,c_T}), \end{aligned}$$

where VSN is the variable selection network and MHA is multi-head temporal attention. Unlike the original TFT, we omit the quantile forecast layer and instead map the pooled representations directly to the embedding space.

Contrastive Learning Contrastive learning is a self-supervised learning technique that has gained significant attention in machine learning, especially in the domain of computer vision and natural language processing. It learns useful representations by comparing pairs of examples and pulling similar ones together while pushing dissimilar ones apart in a high-dimensional latent space49.

For a batch of paired samples \({(s_i, o_i)}_{i=1}^N\), the contrastive loss function can be constructed as:

$$\begin{aligned} {\mathcal {L}} = -\sum _i \log \frac{\exp (\text {sim}(p_{s_i}, p_{o_i})/\tau )}{\sum _j \exp (\text {sim}(p_{s_i}, p_{o_j})/\tau )} \;, \end{aligned}$$
(5)

where the normalized embeddings are:

$$\begin{aligned} \begin{aligned} p_{s_i}&= \displaystyle \frac{E_s(s_i)}{\bigl \Vert E_s(s_i)\bigr \Vert _2} \;, \\ p_{o_i}&= \displaystyle \frac{E_o(o_i)}{\bigl \Vert E_o(o_i)\bigr \Vert _2} \;, \end{aligned} \end{aligned}$$
(6)

\(\text {sim}(a, b) = a^\top b\) is cosine similarity, and \(\tau\) is a temperature parameter. j indexes all positive and negative matches in the batch. In equation (5), \(\tau\) is a learnable parameter which specifies the size of the penalty for negative matches and is determined as part of the training process.

Additionally, we employ a queue of negative pairs to enhance the in-batch negatives and diversify the examples learned by the model. Let \({\textbf{Q}}\) be a queue that stores all the additional negative embeddings \(\{p_{o_i}^{-}\}_{i=1}^M\). During training, for each batch, \(K = 2048\) random elements are selected from \({\textbf{Q}}\). For each positive pair \((p_{s_i}, p_{o_i})\) in a mini-batch, the denominator in Eq. (5) is replaced by this term:

$$\begin{aligned} \begin{aligned} \alpha (p_{s_i}, p_{o_j}, \tau )&= \sum _j \exp \bigl (\textrm{sim}(p_{s_i}, p_{o_j})/\tau \bigr ) \\&\quad + \sum _{k=1}^{K} \exp \bigl (\textrm{sim}(p_{s_i}, p_{o_k}^{-})/\tau \bigr )\,, \end{aligned} \end{aligned}$$
(7)

where \(p_{o_k}^{-}\) represents a negative match from the queue. The full loss function becomes:

$$\begin{aligned} {\mathcal {L}} = -\sum _i \log \frac{\exp (\text {sim}(p_{s_i}, p_{o_i})/\tau )}{\alpha (p_{s_i}, p_{o_j}, \tau )} \;. \end{aligned}$$
(8)

This design ensures that each sample is contrasted against a large and continually refreshed set of negatives, improving the robustness of the learned embeddings. The queue is dynamic, a different sample of negatives is randomly selected for each batch.

Zero-shot

When creating data-driven models for the estimation of battery degradation, a big problem faced by researchers is the poor transferability to downstream tasks. This is due to the inability of most data-driven models to generalize to unseen operating conditions. Through matching operational data to degradation curves using embedding models, ACCEPT is better able to handle unforeseen circumstances. Zero-shot learning refers to the ability of a model to generalize to new classes without being specifically trained on them50. Due to the large testing times of Li-ion batteries under development, as well as the importance of early detection of LIBs set to experience accelerated degradation, models with this capability are of great interest to the industry.

Algorithm 1
figure a

ACCEPT Training

Experiments

Dataset

Severson et al.51 generated a comprehensive dataset consisting of 124 lithium iron phosphate/graphite LIBs that were cycled under fast-charging conditions. The experiments were stopped when the batteries reached end-of-life (EOL) criteria. EOL cycle number ranged from 150 to 2300. For zero-shot estimation, as described in Sec. 3.6, we used a separate dataset of LIBs with end-of-life (EOL) cycle numbers ranging from 450 to 1200, which was not included in the training process52.

Training procedure

A pseudocode outline of the model training procedure is shown in Algorithm 1. ACCEPT was trained using an Adam optimizer53 and a learning rate of \(1\times 10^{-5}\). The model was trained until convergence, which typically happened around the 7th epoch. The model was relatively quick to train, taking around two hours on a single Nvidia A100 GPU. Eight LiBs were reserved from the total dataset as validation LiBs during the training process. 16 LIBs were used as the test and evaluation set. Unique groups of cells were used in the training, test, and validation samples respectively to ensure that there is no accidental data leakage. Hyperparameter optimisation was performed using a grid search, the configuration with the lowest validation loss was selected. Full search ranges and optimal values for hyperparameters are shown in Tab. 1.

Table 1 Search space and final results for hyperparameter optimization.

Accuracy comparison with state-of-the-art results

The model was compared to a variety of conventional data-driven techniques for estimation of future degradation. It was shown that by using as Little as 100 cycles of input data the model was capable of producing accurate degradation curves. The prediction for several test cells is shown in Fig. 5. An additional benefit of this approach is that we are not limited by the output sequence length of the model, meaning that we can return a degradation pathway of any length, rather than just the model output dimension n, as with other data-driven techniques.

In Tab. 2 it can be seen that ACCEPT achieves state-of-the-art results compared to existing methods for accuracy.

Table 2 Comparison of state-of-the-art results for degradation models, where proposed model uses inputs from 100 cycles. Results from the proposed method are averaged across all 16 test LIBs. The results show our method is the best-performing across all evaluation metrics.
Fig. 5
figure 5

Future capacity degradation curves as predicted by the model against test LiBs no. 1−4 from 100 cycles of input data. The model has the added benefit against current techniques that it is not limited to a certain output (context) length.

Quantifying degradation modes

Once the future degradation curve has been returned, the corresponding degradation modes that led to capacity fade can be deduced, as the parameters used to generate them are known. The results for Test LiB 1 can be seen in Fig. 6, and values are given according to the quantification scheme in section 2.1.

Fig. 6
figure 6

Quantitative results of degradation modes for capacity loss of Test LiB 1.

Estimated uncertainty

Fig. 7
figure 7

Top five most Likely degradation paths for test LiB 2 where “Prediction k” represents the \(k^{\text {th}}\) most likely scenario. They all have relatively similar shapes, indicating that the model has learned the degradation behaviour well.

Fig. 8
figure 8

Prediction for two test cells with calculated uncertainty. In both cases, the real degradation curve is consistent with the prediction.

As the model returns a similarity score for each simulated scenario, it is possible to return a number of most likely degradation paths for each input. By returning more than one simulated scenario, it is possible to assess the consistency of the model’s predictions. An example of the best five predictions for a single LIB is shown in Fig. 7. This process can also be used to estimate how likely it is a particular LIB is to experience a knee-point.

Furthermore, when the model makes a prediction, additional degradation simulations can be generated by introducing small variations to the simulation parameters of the best-matching curve. These new curves are passed to the model and a new prediction is made. This process can be repeated many times, giving a distribution of selected curves from the many iterations of alternative simulations. This distribution reflects the prediction uncertainty resulting from the granularity of the simulations. Two examples are shown in Fig. 8.

Zero-shot

Zero-shot estimation for LIB degradation is difficult due to inherent cross-domain variability between LIBs of different sizes and chemistries. Enabling zero-shot transfer to downstream tasks presents the possibility of reducing testing times in battery development cycles. To model the zero-shot capabilities of ACCEPT, we took two battery LIBs from the dataset described in Ref.58: one with a standard degradation profile and one with accelerated degradation. The purpose of this was to test whether our method could correctly differentiate between LIBs likely to experience the critical scenario of accelerated degradation. The corresponding simulated curves, one with accelerated degradation and one with standard degradation, were fed to the simulated curve encoder.

Despite the similarity in input between the two LIBs, ACCEPT was able to differentiate between the two profiles and chose the correct degradation pathway, as shown in Fig. 9. This is a significant result given that these two cells were taken from a completely independent dataset to the one used to train ACCEPT.

Fig. 9
figure 9

(A) Zero-shot predictions against test LIB with accelerated degradation and (B) Zero-shot predictions against test LIB with standard degradation. Both LIBs taken from dataset from different source to the training set. The model was able to detect whether a particular LIB will experience accelerated degradation, a critical scenario to detect in the operation of LIBs.

Ablations

To assess the significance of individual model components, an ablation study was performed. A transformer architecture was evaluated as a substitute for the original one-dimensional convolutional embedding model for simulated degradation pathways; however, it yielded lower performance and imposed greater hardware constraints due to an increased memory footprint. The negative queue was omitted in one training configuration to evaluate its impact, resulting in decreased test performance, indicating a reduced contrastive effect. In a separate experiment, the queue was initialized with hard negatives exhibiting high dissimilarity to the positive match, which also led to diminished model effectiveness. The results of this study are summarized in Tab. 3.

Table 3 Ablation study evaluating the impact of encoder type, contrastive strategy, and training constraints on ACCEPT performance. \(^*\)Compare queue sampling using random negatives and hard negatives (lowest cosine similarity) to evaluate how sensitive the contrastive alignment is to negative sample selection.

Conclusion

We have aimed to address the challenge of using data-driven techniques to accurately forecast degradation of Li-ion batteries and quantify the underlying electrochemical causes. Our method (ACCEPT) showed state-of-the-art accuracy for degradation modeling. Our approach differs from purely data-driven time-series by utilizing known physics to generate a number of simulated curves. ACCEPT then matched operational data to these simulations. As the underlying physical parameters of the model used to generate the simulated profiles are known, our framework made it possible to quantify the underlying degradation mode causing this. Our experiments showed that ACCEPT was able to generalize well to unseen scenarios, and could correctly anticipate if an LIB from an unseen dataset was likely to experience accelerated degradation, a critical phenomenon in the operation of EVs and BESSs.

Future work

This preliminary work aimed to demonstrate how a new class of machine learning models, paired with information from known physics about \(\hbox {Li}^+\) batteries, can be used to accurately model degradation, whilst also quantifying the underlying degradation mode, something previous data-driven techniques have been unable to achieve. Currently, the model is purely trained on the Severson dataset; however the model can be fine-tuned on any dataset. Further open-source data on battery degradation could also be used to broaden the models applicability and increase its generalization to unseen LIB types. In this study, a TFT was used as the embedding model for the LIB’s operational data. This model has the additional benefit of providing interpretable results through its variable selection network. This was not explored in this work, however future studies could use this to quantify the impact of different stress factors in the operational data on the overall degradation pathway.

Impact statement

This paper presents work whose goal is to advance application of machine learning to battery modeling. There are many potential societal consequences of this work. By improving the accuracy and efficiency of degradation forecasting, this work contributes to the development of more reliable and sustainable battery technologies, which are essential for reducing greenhouse gas emissions and enabling widespread adoption of clean energy solutions. We recognize the need for further research in this direction before practical use in production settings.