Introduction

Stability studies in biologics development are simple yet time-consuming. Their impact is significant, as stability data guides our formulation and process development, primary packaging selection, comparability, and shelf-life setting, among other factors. To expedite development, the majority of decisions are based on stress or accelerated stability data. The only modelling done is linear regression, which is used to assess stability profiles and shelf-life at a given temperature. Extrapolations applying linear regression model are also accepted by health authorities and described in ICH Q1 guidelines to support the clinical development phase of drugs1. This simple approach is possible because, in most cases, changes in protein quality attributes such as purity, fragments, aggregates (dimers, trimers), and charge variants at storage conditions (2–8 °C) are relatively small. Therefore, experimental data follows a straight line.

It was or still is believed that long-term stability predictions for biologics based on short-term (stress, accelerated) stability data were not possible due to their complex behaviour2,3. In particular, predicting concentration-dependent modifications (such as aggregates observed through size exclusion chromatography) seemed impossible or required complex models that were impractical for routine use in development4,5,6,7,8,9,10. Another concern was the shift from well-known and characterized monoclonal antibodies to more sophisticated biologics, such as fusion proteins, bispecific mAbs, and trispecific mAbs. However, recently it was shown that long-term stability predictions for monoclonal antibodies in solution could be achieved using simple first-order kinetics combined with the Arrhenius equation8,11. They were able to predict long-term changes of various stability indicating attributes, including protein purity, fragments, aggregates, charge variants, and even potency, based on short-term stability data. This was possible when stability studies were designed in a way that only one degradation pathway, relevant at storage conditions, was present across all temperature conditions. Since then, other companies have published similar examples with similar conclusions12,13, leading to a joint effort among various companies to revise ICH Q1 guidelines14. At the time of this publication, the revision is in an advanced stage, introducing the general approach of Accelerated Predictive Stability (APS). APS follows the general principles of enhanced stability modelling and can be applied when there is limited real-time stability data at the recommended storage condition. APS involves the use of Arrhenius-based Advanced Kinetic Modelling (AKM)15,16,17 to predict the long-term stability of non-frozen drug substances or drug products based on results from short-term accelerated stability studies. In addition to AKM modelling, APS utilizes intensive FMEA (Failure Mode and Effects Analysis) analysis to evaluate the risk of out-of-specification events related to critical quality attributes that could not be modelled using AKM. Appropriate risk mitigation actions are implemented as needed to minimize associated risks. APS thus holistically supports the proposal of assigning retest periods or shelf life for various biologics, including, but not limited to, monoclonal antibodies, fusion proteins, bispecific mAbs, trispecific mAbs, and therapeutic proteins, both in the clinical and commercial phases.

In this article, the focus will primarily be on AKM modelling of the critical long-term predictions of aggregates for different protein modalities. It will be demonstrated that this concentration-dependent quality attribute can be effectively modelled using a first-order kinetic model. This model characterizes the stability profiles of quality attributes through exponential functions, providing robustness and high precision in stability predictions. The use of a first-order kinetic model emphasizes the vital role of temperature selection in stability studies. By carefully choosing the appropriate temperature conditions, it becomes possible to identify the dominant degradation process and accurately describe it using a simple first-order kinetic model. This approach helps to prevent the activation of additional degradation mechanisms that are not relevant for storage conditions, allowing for the design of a study focused on a single mechanism. The simplicity of the kinetic model obtained from this approach reduces the number of parameters that need to be fitted and minimizes the number of samples that need to be measured. This enhances the robustness and reliability of predictions. Simple models help prevent overfitting, ensuring better generalizability. While they perform well on training data, they avoid poor performance on new data by reducing sensitivity to minor input changes, thereby improving accuracy and effectiveness. Furthermore, by utilizing proper temperature conditions, the majority of quality attributes in biologics, including aggregation, can be successfully modelled using a first-order kinetic framework. This highlights the crucial importance of temperature selection in studying the stability and degradation of biologics.

Material and methods

Materials

All proteins investigated in this work are part of Novartis development projects (Table 1). Their specific formulations represent Novartis’ and Molecular Partners’ intellectual property and cannot be disclosed. However, we stress that while indeed the formulation can affect protein stability, the modelling framework herein presented is formulation-independent. Proteins 1 (P1) and 2 (P2) are two different Immunoglobulins G1 (IgG1) formulated at 50 at 80 mg/mL, respectively. Protein 3 (P3) is an IgG2 formulated at 150 mg/mL. Protein 4 (P4) is a bispecific IgG formulated at 150 mg/mL. Protein 5 (P5) is a fragment crystallizable region (Fc)-fusion protein formulated at 50 mg/mL. Protein (P5 is a single-chain variable fragment (scFv) formulated at 120 mg/mL. Protein 7 (P7) is a bivalent nanobody formulated at 150 mg/mL. Protein 8 (P8) is a designed ankyrin repeat protein (DARPin)—ensovibep, targeting SARS CoV2—as provided by Molecular Partners and formulated at 110 mg/mL. Formulation reagents were acquired at pharmaceutical grade, while analytical reagents were purchased at HPLC grade.

Table 1 Model summaries for the tested proteins. Proteins are categorized according to the traditionally perceived format complexity. Final model timepoint denotes the total duration of collection of data used to fit the model, and the validation timepoint is the time of the last available long-term measurement, which was compared to the predicted value. The prediction is deemed correct if the measured value is within the 95% prediction confidence interval. Additional formulation specific model parameters are available in Table S1.

Quiescent storage stability

The fully formulated drug substances (DS) were filtrated through a 0.22 µm PES membrane filter (Millex GP—Merck) and filled aseptically into glass vials. The protein concentrations were derived through absorbance at 280 nm using a UV–Vis spectrometer (NanoDrop One—Thermo Fisher). The vials were incubated upright at 5 °C (all proteins), 15 °C (P8), 25 °C (all proteins), 30 °C (P1/3/6/7/8), 33 °C (P2), 35 °C (P5/7), 40 °C (P2/4/5), 45 °C (P5) or 50 °C (P5) for 12 (P2), 18 (P4/P6) or 36 (P1/3/5/7/8) months in a stability chamber. At pre-defined intervals (pull points), the samples were subjected to size exclusion chromatography (SEC) to determine the level of high-molecular species.

Size exclusion chromatography

SEC was performed on an Agilent 1290 HPLC equipped with an Acquity UHPLC protein BEH SEC column 450 Å (Waters), a 210 nm UV detector, quaternary pump, auto sampler, column thermostat and photo-diode detector with a micro flow cell. The protein solution was diluted to 1 mg/mL and 1.5 µL of diluted protein solution was injected in the instrument to perform a 12 min run at 40 °C (better separation of the fragments from the monomer) with a flow rate of 0.4 mL/min in a mobile phase consisting of 50 mM sodium phosphate and 400 mM sodium perchlorate to reduce secondary interactions of the analyte with the column at pH 6.0. The purity of the main peak, as well as the amount of high-molecular species (aggregates) were determined as a percentage of the total area obtained for the sample in each chromatogram. Prior to each measurement series, the column was conditioned by saturation with a bovine serum albumin/thyroglobulin/NaCl solution and by injection of a blank. System suitability was established after each column conditioning by evaluating molecular-weight markers (pattern and peak resolution) and limit of quantification (relative standard deviation).

Results and discussion

Model optimization

The models underlying Arrhenius-based stability predictions have been previously described in several works8,11,12. While the most recent method was proven to accurately predict the shelf-life stability of biotherapeutics12, preliminary reports from regulatory agencies raised concerns about the complexity of the model and thus the high risk of overfitting. In order to address these concerns, we initiated this investigation by revising and optimizing the method (Fig. 1). The reaction rate \(\frac{d\alpha }{dt}\) can be calculated by a competitive kinetic model with two parallel reactions using Eq. (1):

$$\begin{aligned} \frac{d\alpha }{{dt}} = & v \times A_{1} \times \exp \left( { - \frac{Ea1}{{RT}}} \right) \times \left( {1 - \alpha_{1} } \right)^{n1} \times \alpha_{1}^{m1} \times C^{p1} + \left( {1 - v} \right) \times A_{2} \\ & \quad \times \exp \left( { - \frac{Ea2}{{RT}}} \right) \times \left( {1 - \alpha_{2} } \right)^{n2} \times \alpha_{2}^{m2} \times C^{p2} \\ \end{aligned}$$
(1)

where

Fig. 1
figure 1

Advanced kinetic modelling (AKM) and its optimization. (A) Schematic representation of AKM. The protein modalities included in this study (left) were analyzed during in stability studies monitoring several QAs (center) and AKM was employed to predict shelf life stability (right). (B) Model optimization. The comprehensive model (Eq. 1) is shown on the top-left and visually unpacked in its base components: low-temperature pathway (cyan), high-temperature pathway (red), autocatalytic and concentration-dependent terms (dashed boxes). The comprehensive method can be compared with the optimized method presented in this work (Eq. 3). The biphasic aggregation behavior is exemplified on the right through an Arrhenius plot, where theoretical aggregation rate constants at various temperatures are linearly fitted in a low- (cyan) or high-temperature (red) pathway.

$$\alpha \; is \; the \; {\alpha }_{1}+{\alpha }_{2}= sum \; of \; fraction \; of \; degradation \; products \; 1 \; and \; 2$$
$$A \; is \; the \; pre-exponential \; factor$$
$$Ea \; is \; the \; activation \; energy \; \left(\frac{kcal}{mol}\right)$$
$$n \; is \; the \; reaction \; order$$
$$m \; is \; the \; autocatalytic-type \; contribution$$
$$v \; is \; the \; ratio \; between \; first \; and \; second \; reactions$$
$$R \; is \; the \; universal \; gas \; constant \; \left(\frac{kcal}{mol}\right)$$
$$T \; is \; the \; temperature \; \left(K\right)$$
$$C \; is \; the \; measured \; value \; of \; the \; initial \; concentration \; \left(\frac{mg}{mL}\right)$$
$$p \; is \; the \; associated \; fitted \; number$$

While this model includes a comprehensive description of even the most complex degradation processes encountered by biotherapeutics, a total of twelve fitting parameters are necessary for its use18. The design can be greatly simplified by decoupling the effect of high- and low-temperature aggregation pathways. Protein temperature-dependent aggregation rates relevant for biotherapeutics can be generally divided into two competitive regimes: low-temperature and high-temperature8,19. In order to account for both regimes, Eq. (1) includes two duplicated terms that describe each pathway. However, the high-temperature pathway includes contribution from partial unfolding of the protein, which indicates that the protein is affected by a harsh temperature stress not representative of the intended storage conditions. Therefore, by introducing the assumption that only the low-temperature pathway is relevant for the shelf-life stability of modern biotherapeutics, the model can be simplified to:

$$\frac{d\alpha }{dt}=A \times \text{exp}\left(-\frac{Ea}{RT}\right) \times {\left(1- \alpha \right)}^{n} \times {\alpha }^{m} \times {C}^{p}$$
(2)

Additionally, the contributions of the auto-catalytic term \({a}^{m}\) and the concentration-dependent term \({C}^{p}\) can be neglected. The degradation mechanisms of biologics in solution under relevant biopharmaceutical conditions are rarely observed to have a significant auto-catalytic contribution, where the degradation products participate in catalyzing the reaction13. Furthermore, the reaction order \(n\) can be neglected (i.e. set to a value of 1) in case of small changes, as detailed in the SI. As a rule, this is the case for relevant degradation of biopharmaceuticals, where high purity (e.g. > 95%) is required for administration. Therefore, the optimized model can be ultimately expressed as:

$$\frac{d\alpha }{dt}=A \times \text{exp}\left(-\frac{Ea}{RT}\right) \times (1- \alpha )$$
(3)

Notably, the original twelve parameters required to fit the model have been reduced to two. This severely reduces the risk of overfitting, as well as reduces the amount of data necessary to obtain a reliable fit. Examples showing that the prediction accuracy remains unaffected when training the reduced model (Eq. 3) on a smaller dataset, compared to the comprehensive model (Eq. 1) trained on a larger dataset, have been published by Huelsmeyer et al.12. As advanced kinetic modelling to determine shelf-life stability has been applied to incrementally shorter timeframes to enable acceleration to first in human clinical studies, limiting the reliance on extensive datasets is an optimization that brings considerable advantages. Finally, we would like to remark that a variant of Eq. (3) can be used to model charged variants, as described in the SI, although not applied herein as this study focus on the dimer-mediated aggregation pathway.

Applying the model to complex molecules

In order to validate the above optimization, Eq. (3) was fitted to eight proteins of various formats as shown in Table 1. The formats include IgGs and IgG-like (bispecific Abs, Fc-fusion proteins, nanobodies), as well as completely non-IgG-like (DARPin) proteins. We used data collected at up to 9 months of all stability conditions between 5 °C and 40 °C (depending on the threshold temperature and data availability) to build a model, and validated the long-term prediction with a measurement after 18/36 months of storage at 5 °C. Validation was only performed at 5 °C, as that is the actual long-term storage temperature and model’s primary intended use. One model was built independently for each formulation. Individual prediction/validation plots, together with the table of model parameters, are shown in Fig. S1. In this section, we focus on aggregation as the most complex degradation pathway—other available attributes (chemical variants, fragmentation) were also correctly fitted with the proposed model and are shown in Fig. S2. Figure 2 shows the comparison of predicted and measured aggregation values at the timepoint of validation. By using only ~ 10 datapoints to construct the models, the models yield correct predictions for 7 out of 8 proteins (validation measurement within the 95% prediction confidence interval).

Fig. 2
figure 2

Model performance on the tested proteins, with the color scheme corresponding to the traditionally perceived format complexity (red = complex). (a) Predicted versus true measured aggregate content values at the validation timepoint for all the tested proteins under all formulation conditions. The solid line denotes unity. If the error bars and the solid line intersect, the prediction is deemed correct. Predictions and error bars are schematically described in insert (b)—every point in (a) represents a value predicted by a corresponding model (one example shown), and the error bar corresponds to the 95% prediction confidence interval of the model. The example model in (b) was fitted to data collected up to 6 months (solid markers), the rest is validation (empty markers).

The applicability of the described model is further shown in Fig. 3, where kinetic modeling is directly compared to linear regression model for the aggregation prediction in one of the DARPin formulations. After 9 months, linear regression modeling still lacks sufficient data for precise predictions, resulting in wide prediction intervals. In contrast, kinetic modeling provides a reasonable stability estimate by using just 10 weeks stability data. Prediction is further improved by using 9 months and precisely predict the measured value at 36 months. Similarly, AKM outperforms the linear regression model in predicting the stability profiles of other proteins, beyond DARPin (Fig. S4). This observation aligns with recent publications indicating that kinetic modeling generally offers higher precision and accuracy than linear regression models for various quality attributes, including aggregation and others11,13.

Fig. 3
figure 3

Comparison of kinetic model (blue) against linear regression model (red). Prediction intervals (dashed lines) denote the range within which true values are likely to fall with 95% confidence level. Models were trained by 10 weeks (left) and 9 month (right) experimental stability data (closed circles). Data points used for model validation are shown as empty circles and were not used for model training. Data shown is for DARPin formulation stored at 5 °C. For kinetic model training, data points from 15, 20 25 and 30 °C conditions up to the respective time point were used (data points not shown for clarity reasons).

Failed predictions

One notable discrepancy is observed in Fig. 2. A closer look into Protein 2 shows that this is not a statistical anomaly, rather a true failed prediction which cannot be avoided by using the described approach. The case is shown in more detail in Fig. 4. As shown in panel a, data from 25 °C, 33 °C, and 40 °C, are used to build the model. At 6 months at 5 °C (panel b), the aggregate content increases, but it can still be explained by the method variability. At 12 months, the measured aggregation rate conclusively proves to be different to the predicted one. A detailed investigation in an Arrhenius plot (panel c) shows that the aggregation rate at 5 °C is predicted to be orders of magnitude lower than the one measured at 6 months—however, since the measured increase is still within the measurement method variability (the dotted line), the true aggregation rate cannot be determined yet. Therefore, even at 6 months, it is impossible to say whether aggregation conforms to the model. At 12 months, the increase is well out of the method variability, and it becomes apparent that a different aggregation pathway takes place at 5 °C, and that the threshold temperature is somewhere between the lower two temperatures.

Fig. 4
figure 4

Analysis of Protein 2 aggregation: (a) the constructed model, (b) the model prediction overlaid with long-term measurements, and (c) the Arrhenius plot showing a linear fit to the three accelerated conditions with 1- and 2-sigma intervals. The datapoint at 5 °C (rightmost) represents the long term stability measurements up to 6 months and is not used for the fit. The dashed line represents an aggregation rate corresponding to an increase equal to 2xSD (0.2%/6 months). Any datapoint at or below the dashed line cannot be reliably determined in this timeframe, as schematically shown by the embedded subplots.

The reported activation energies for mAb and Fc fusion protein aggregation in literature8 are reported to be 10–25 kcal/mol for the low-temperature (LT) aggregation pathway (relevant at 5 °C), and 50 − 150 kcal/mol for the high-temperature aggregation pathway (not relevant at 5 °C). The activation energies fitted to our data (Table 1) are mostly in agreement with the reported values—indeed, the long-term data showed that the highest fitted Ea of 76.8 kcal/mol (Protein 2) does not correspond to the LT pathway, so a high activation energy (i.e. above the literature value of 50 kcal/mol) can be considered a red flag, potentially indicating that the threshold temperature has not been detected. As a comment, the activation energy fitted to Protein 6 (scFv) data is around 60 kcal/mol, but the long-term data conform to the model. Further data collection of activation energies for aggregation of different protein formats in liquid is needed to generalize the above criterion, and we see this as an important part of our future work.

Notably, cases like Protein 2 cannot be avoided even by using more complex models, since the method variability does not allow for the selection of a one-step or a two-step model; the threshold temperature cannot be detected in the model construction time-frame. Therefore, continuous validation with the long-term data is crucial in cases where the measured change at the lowest temperature is comparable to the method variability.

In general, with the described simplified kinetic model fitted to measurements collected in 3–9 months, we successfully predict aggregation over a 4–12 fold larger timeframe for 7 out of 8 tested proteins. These proteins belong to a variety of IgG-like, as well as completely non-IgG-like formats. In one of the eight proteins, notably an IgG1, the threshold temperature proved too low to be detected in the timeframe used to construct the model, and was only detected after the long-term data were available.

Table 1 also summarizes a practical observation for aggregation of biopharmaceutically relevant proteins. We show that in rare cases, kinetic modeling predictions are unsuitable due to method limitations, and that this is not a function of the traditionally perceived format complexity. To ensure the detection of failures before utilizing AKM modeling to determine shelf life, it is crucial to conduct model validation, especially in cases where the measured change at the lowest temperature is comparable to the method variability. This validation should encompass verifying Arrhenius behavior (Arrhenius plot) with a single slope (activation energy) in the range of fitted temperatures, evaluation of model parameters such as activation energy, and confirming the stability predictions by using prior data.

Conclusions

In this study, we demonstrated that long-term stability predictions for biologics, specifically protein aggregates, can be effectively modeled using a first-order kinetic model. This model provides robust and precise stability predictions by characterizing the stability profiles of quality attributes through exponential functions. The key findings and conclusions are as follows.

The selection of appropriate temperature conditions is crucial in stability studies. By carefully choosing the temperature, it is possible to identify the dominant degradation process and accurately describe it using a first-order kinetic model. This approach prevents the activation of additional degradation mechanisms that are not relevant for storage conditions, allowing for a study focused on a single mechanism. The simplicity of the first-order kinetic model reduces the number of parameters that need to be fitted and minimizes the number of samples that need to be measured. This enhances the robustness and reliability of predictions. The majority of quality attributes in biologics, including aggregation, can be successfully modeled using a first-order kinetic framework. This highlights the importance of temperature selection in studying the stability and degradation of biologics. The optimized model was validated using data collected from various protein formats, including IgGs, bispecific antibodies, Fc-fusion proteins, nanobodies, and DARPins. The model yielded correct predictions for 7 out of 8 proteins, demonstrating its applicability and reliability. This highlights the importance of conducting thorough model validation before utilizing it for example to support shelf life claim. The kinetic model provided more accurate and reliable stability estimates compared to linear extrapolation, even with limited data points. This further emphasizes the advantages of using a kinetic modeling approach for stability predictions.

Overall, the findings of this study support the use of first-order kinetic modeling for long-term stability predictions of biologics, providing a robust and reliable framework for stability studies.