Abstract
Longitudinal data comprised of neutralizing antibody (NAb) activity measurements from subjects who received COVID-19 vaccinations were collected between November 2020 and April 2022. To detect differences between convalesced and naïve groups with respect to the evolution of NAb activity since the subject’s first COVID-19 vaccine, we initially fit a linear mixed effects model to only the decay section of NAb evolution. We conclude that NAb activity, when restricted to this region, behaves differently between these two groups, with the convalesced group generally having higher neutralizing antibody levels than the naïve group. We then fit a nonlinear mixed effects model over the entire NAb progression using a system of ordinary differential equations described by De Pillis et al. as our structural component to the model. This analysis not only supports the claim that over the entire progression, NAb activity behaves differently for convalesced and naïve groups, but aligns with the linear analysis in confirming that NAb decay is slower in the convalesced group than the naïve group. Finally, we use the estimated parameters from the nonlinear mixed effects model to predict NAb progression for each subject from their last observed measurement to 100 days past this measurement.
Similar content being viewed by others
An important question asked during the COVID-19 pandemic is how long an individual’s immune response to vaccination or viral exposure would provide protection against future infections1. One of the primary mechanisms by which the adaptive immune response to SARS-CoV-2 provides protection against re-infection is the production of neutralizing antibodies (NAb), which target the receptor binding domain (RBD) on the S1 subunit of the spike protein (S1)2,3,4. The critical role of the spike protein mediating SARS-CoV-2’s entry into cells was the rationale behind the development of most current COVID-19 vaccines, which are potent stimulators of neutralizing antibodies5. Importantly, the ability of vaccines to elicit NAb responses has been correlated with their reduction in incident cases3. Thus, assessment of circulating NAbs can serve as a “surrogate of protection” to estimate one’s likelihood of becoming infected by a future exposure.
The strength of protection provided by increased NAb levels induced by mRNA vaccines in individuals who had not yet been exposed to SARS-CoV-2 was explored in1. In that work, the authors showed that there is individual variability in NAb responses to mRNA vaccination in the COVID-19-naïve cohort analyzed. They observed that the initial strength of response to vaccine as measured by NAb-level increase did not always predict how long the response would persist. While the vast majority of people will seroconvert within a few weeks of completing a two-dose regimen, the authors saw that the length of time an individual sustains protective NAb levels can vary substantially.
Certain studies have explored the differences in immune responses to vaccine between those who had previously been infected with SARS-CoV-2 (the convalesced population), and those who had not (the COVID-naïve population). These studies indicate one mRNA vaccine dose is sufficient to confer appropriate protection on convalesced individuals, and that there is not much benefit to be gained from a second vaccination6,7,8,9,10. There is evidence that convalesced and naive groups that received two mRNA vaccine doses show equivalent antibody responses when measured about one week after the second dose6,7,8. The influence of COVID status on the long-term persistence of vaccine-induced NAb responses has not been as extensively reported. We hypothesize that those who have recovered from a COVID-19 infection will sustain higher NAb levels in response to mRNA vaccine when compared to those who have not been sick with COVID-19. The earlier studies pool longitudinal data to apply common statistical comparison techniques for their analysis. The authors of11 warn, however, that these standard statistical approaches, while useful, can sometimes lead to inaccurate conclusions. They argue that mixed-effects modeling can yield better estimates of the duration of the immune response in individuals.
For that reason, we have developed two mixed effects models that capture the dynamics of NAb levels over time. The first model utilizes a linear mixed-effects analysis to describe exclusively the decay region of NAb evolution, that is, from around the time NAb response is at its peak until the subject received their first booster shot or experienced a breakthrough event. The second makes use of nonlinear mixed-effects (NLME) analysis, capturing NAb progression beginning at the time of a subject’s initial vaccine dose and ending until the subject received their first booster shot or experienced a breakthrough event. For the structural component of the NLME, we use a mathematical model of NAb dynamics proposed by de Pillis et. al1. This mathematical model describes the evolution of NAb activity as a solution to a simple two-population system of ordinary differential equations with four subject-specific parameters that captures the high-level mechanistic dynamics of a vaccine-triggered immune response within an individual. While the linear mixed effects model is simpler than the nonlinear one and focuses on the most prominent feature of the NAb progression, namely, the decay region, the nonlinear mixed effects model analyzes the entire NAb evolution. It is more mechanistic rather than phenomenological, allowing us to gain insight into connections between the data and model parameters that are directly connected with biological processes. Moreover, this model can be used to predict how long an individual’s NAb levels may stay above a clinical cutpoint based on a few post vaccination blood assessments, potentially guiding decisions about appropriateness and timing of booster shots.
Here we describe the performance of these models in describing NAb levels following vaccination in subjects with or without prior COVID-19 history.
Spaghetti plot of NAb activity against days since first dose. Dots represent individual NAb activity measurements, while a line denotes a subject’s NAb level over the course of time. The red and blue colors denote the convalesced and naïve groups, respectively. The black, vertical, dashed line signifies 37 days since the first vaccine dose, which indicates our starting point of where we analyze the NAb activity decay.
Results
Given the data pictured in Fig. 1, we carried out an analysis using both linear and nonlinear effects modeling. We confirmed our hypothesis that in a population of convalesced then vaccinated individuals, the measured neutralizing antibody levels decayed more slowly than did those of naive then vaccinated individuals. This pattern was observed over a period of more than 400 days.
A spaghetti plot of NAb activity, in units of % neutralization, against days since first vaccine dose for this data set is shown in Fig. 1. NAb activity tends to increase Quickly up until around 30 days after first dose; this is especially noticeable in the naïve group. Also, NAb activity for the convalesced group typically starts off at higher values than the naïve group. Then for both groups, NAb activity generally decays over time.
When we focus on the decay region of NAb progression and model it using linear mixed effects (LME) techniques, we find our data are better explained when we take into account convalesced and naïve groups than not, indicating a difference between these two groups (Fig. 2(i)). Furthermore, a clear separation between both groups remains even when considering vaccine type (Fig. 2(ii) and Fig. 2(iii)). Next, when we analyze the entire NAb evolution utilizing nonlinear mixed effects (NLME) modeling, beginning with the first vaccine dose and following the initial increase then decay of NAb activity, the parameter tied to NAb rate of decay is significantly lower in the convalesced group than it is in the naïve group, confirming that NAb levels in the naïve group decay more rapidly than they do in the convalesced group (Fig. 3(v)).
Linear mixed effects analysis of neutralizing antibody decay
We are interested in examining the decay region of NAb activity, and we provide evidence that within this area, mean NAb activity is significantly different between convalesced and naïve groups. Examining the data empirically, the decrease in NAb level in our data set begins around 37 days since the first vaccine dose (Fig. 1). Despite an imbalance in the number of subjects per group and in the number of observations per subject, Fig. 1 suggests that a difference between groups does exist. A visual inspection indicates that most of the observations in the naïve group are clustered below those of the convalesced group.
Now, we fit an LME model with subject acting as the grouping variable, days since first vaccine dose as our time variable, subject’s convalesced/naïve status as a covariate, and the logit transform of NAb activity as the response variable. While carrying out this analysis, we shift our time variable t to the mean number of days since the first dose (here, \(\bar{t} \approx 178\) days) then scale this variable from days to years since first dose. In other words, we transform our time variable from t to \(\frac{t - \bar{t}}{365}\) simply for mathematical convenience. We model the fixed effects structure as a polynomial that depends on the time variable and the subject’s convalesced/naïve status, and using a step-up procedure (detailed further in Sect. "Linear mixed effects modeling details"), we find that a third-order polynomial best describes the downward trend (p-value: 0.054). Our analysis discovers that a correlated random intercept, random slope model is sufficient to describe the random effects structure. We are limited to examing no more than two random effects since we do not have enough observations. This implies that our beyond optimal model describing the decay is mathematically given by
with
where \(t_{ij}\) denotes the number of days when the jth NAb observation for subject i occurred since first vaccine dose (for example, \(t_{4,2} = 49\) means the second NAb activity measurement for subject 4 happened 49 days after this subject received their first vaccine dose.), \(\bar{t}\) represents the mean number of days since the first vaccine dose, \(C_i\) is an indicator variable denoting whether subject i is in the convalesced group (1) or not (0), and \(y_{ij}\) denotes the NAb activity measurement for subject i at the jth time point. In addition, \(\epsilon _{ij}\) denotes the error term for the jth time point for subject i along with \(b_{0i}\) and \(b_{1i}\) representing the random effects of subject i; we assume the error term and random effects are normally distributed with mean 0, each with standard deviations \(\omega _{\epsilon }\), \(\omega _{b_0}\), and \(\omega _{b_1}\), respectively.
To determine if convalesced and naïve groups are different, we test the hypothesis that
and test this using a likelihood ratio test, which results in rejecting the null hypothesis at the 5% level (\(\chi ^2(5) = 74.999\), \(p\text {-value} = 9.31 \times 10^{-15}\)). Thus, we conclude that there is a statistically significant difference between mean decay behavior over time between the naïve and convalesced groups.
Using the top-down strategy mentioned in Zuur12, we obtain the optimal model
with
Comparing this to the subject and population level equations in (1) and (2), both models share the same subject level dynamics, but at the population level, indicator variables associated with \(\beta ^*_1\) and \(\beta ^*_3\) vanished, suggesting convalesced/naïve status primarily affects the y-intercept and second-order terms of the polynomial model. Checking model validity, our optimal model appears to be linear, it has constant variance (Breusch-Pagan test: \(p\text {-value} = 0.925\)), and it has normally distributed residuals (Shapiro-Wilks test: \(p\text {-value} = 0.669\)) as well as normally distributed random effects. Parameter estimates for the fixed effects and random effects terms are given in Table 2. A plot of the fitted mean NAb activity against days since first dose for both convalesced and naïve groups along with their 95% bootstrap confidence band is given in Fig. 2(i).
Mean curves for convalesced and naïve groups using NAb activity as the response over time. Panel (i) displays the mean curves when only considering convalesced/naïve status while panels (ii) and (iii) consider both vaccine type and convalesced/naïve status. Panel (ii) displays the mean curves when controlling for vaccine type while panel (iii) controls for convalesced/naïve status. Also plotted are the corresponding 95% bootstrap confidence bands of the mean for each combination. Note the wide-ranging confidence band for the mean curve corresponding to the convalesced-Pfizer cohort; this is due to the cohort’s small sample size.
Fig. 2(i) shows an increase in mean NAb activity level in the convalesced group at approximately 250 days since the first dose. Note we are not claiming that NAb activity for the convalesced group increases; we attribute this increase in part to the sparse number of convalesced observations one year past the first dose. Three of those observations are quite large and appear to drive the mean curve upward. Note that the data we are working with themselves also evidence counter-intuitive behavior. For example, there is at least one individual in the naïve population, and one in the convalesced population, whose data show that their NAB levels increase with no apparent trigger - either by new infection or by vaccination. We have double-checked these data sets to confirm that there was no transcription error. Although the data set is counterintuitive, we did not exclude these anomalous data from our analysis. Even with the inclusion of such data, we still see the overall trend that convalesced individuals appear to have a more robust persistent response to vaccine.
In Fig. 2(ii) we compare the vaccine type (Moderna or Pfizer), and within each vaccine type, identify convalesed/naive groups. In Fig. 2(iii), we compare convalesced to naive populations, and within each population, identify the Moderna/Pfizer groups. In both Fig. 2(ii) and Fig. 2(iii), we see that the convalesced/naive effect persists. Similarly, when stratifying for sex (figure not shown), the convalesced/naive effect also persists. See Sect. "Model selection and parameter fitting details" for details exploring the effects of vaccine type and sex as covariates.
Non-linear mixed effects analysis of neutralizing antibody progression
While we applied an LME analysis to the decay region of NAb activity progression, now we use nonlinear mixed effects (NLME) modeling to analyze the entire NAb evolution, from the day of the first vaccine dose to the first booster shot or breakthrough event. As in the linear case, subject and days since first vaccine dose will act as the grouping variable and time variable, respectively. For our structural component, we use a system of ordinary differential equations (ODE) described by de Pillis et al.1. Model states are:
-
A: Neutralizing antibody (NAb) activity. Units: % Neutralization
-
V: Proxy for transfected cells in response to mRNA vaccine. Units: mL
The dynamics of NAb activity over time are described by two nonlinear ordinary differential equations. The time scale is a 24-hour day:
In the equation for antibody dynamics, the term \(r_1 V\) represents initiation of antibody activity in response to vaccine. The term \(r_2 A V\) represents an antibody boost in response to vaccine and will not be triggered until there are some antibodies in the system. The term \(A \left( r_3 - r_4 A \right)\) models intrinsic antibody dynamics as logistic. In the equation representing vaccination kinetics, u(t) represents a vaccine dose that drives the presence of transfected cells. The vaccine dose is modeled by a discrete unit pulse on the days vaccine is administered, and has value 0 otherwise. Elimination is modeled by \(\frac{k_1 V}{(k_2 + V)}\) which is a Michaelis-Menten type decrease in transfected cells over time. The values of PK parameters \(k_1\) and \(k_2\) are fixed for the entire population. The model has a total of four parameters \(r_1\), \(r_2\), \(r_3\), \(r_4\) and an initial NAb level \(A_0\), each of which is subject-specific and fit to individual subject data. Our aim is to determine how the covariates convalesced/naïve status, vaccine type, and sex influence each of the four parameters \(r_1,\dotsc , r_4\). As expected, \(A_0\), the initial NAb level, is higher in the convalesced group than in the naïve group. This is biologically reasonable since the convalesced group has been exposed to SARS-CoV-2 and would therefore be expected to have higher initial antibody levels.
Following the methods detailed in Sect. "Non-linear mixed effects modeling details", we apply the logitnormal transformation to both our observed NAb values as well as to NAb state A in the structural model. We use Monolix’s combined 1 residual error model, which takes the form \((a + bf(A))\epsilon\), to capture our residual errors since when we compare Monolix’s other error models, this provides the lowest AIC value; here, \(\epsilon\) is a standard normal random variable, f is a function depending on NAb activity A, as well as a and b constants to be determined. For individual parameter values \(r_1\), \(r_2\), \(r_3\), \(r_4\), as well as \(A_0\), we model them as a function of random effects and our categorial covariates. Moreover, we transform individual parameter values and populations parameters with a logitnormal transformation. Finally, we specify the four structural model parameters to be pair-wise correlated with each other and not with initial condition \(A_0\).
We obtain a best fitting model whose covariates impact ODE parameters in the following manner:
-
\(r_1\) depends on convalesced/naïve status, and vaccine type.
-
\(r_2\) depends on convalesced/naïve status, and vaccine type.
-
\(r_3\) depends on convalesced/naïve status, and vaccine type.
-
\(r_4\) depends only on convalesced/naïve status.
More specifically, the corresponding subject level equation with the logitnormal transformation is
where \(y_{ij}\) and \(A_{ij}\) denote, respectively, the observed and predicted jth NAb observation for subject i. The logitnormal transformation is appropriate when the observed values are between 0 and 1. Monolix13 sets \(m = -0.00954434\) and \(M = 1.01970\) as the default boundary values when using the logitnormal transformation. This selection provides numerical stability if any observations are equal to 0 or to 1. Here \(\epsilon _{ij}\) is a standard normal random variable. The population level equations are given by
where \(C_i\) is an indicator variable denoting whether subject i is in the convalesced group (1) or not (0) and \(D_i\) is an indicator variable denoting whether subject i received the Pfizer vaccine (1) or not (0). Variables \(\eta _{0i}\), \(\eta _{1i}\), \(\eta _{2i}\), \(\eta _{3i}\), and \(\eta _{4i}\) are correlated, normally distributed random variables with mean 0 and standard deviations \(\omega _0\), \(\omega _1\), \(\omega _2\), \(\omega _3\), and \(\omega _4\), respectively. Parameter estimates can be found in Table 3. In the work of de Pillis et al.1, this structural model was trained on data from naïve individuals only. As a check that the same structural model is appropriate to use for data from convalesced individuals, we ran a root mean squared error (RMSE) analysis. Fig. 4 highlights the results: our structural model is able to achieve a fit to data from convalesced individuals that is at least as good as the fit to data from naïve individuals. In fact, the median RMSE for the fit to the naïve population data is 0.026, with quartiles [0.017, 0.037]. The median RMSE fit to the convalesced population data is even lower at 0.016, with quartiles [0.005, 0.024].
In addition to this RMSE analysis, we used Monolix to generate an observation vs. individual prediction plot, a residual plot, as well as visual predictive check to determine whether modeling assumptions were satisfied. These plots are given in Sect. "Model selection and parameter fitting details", Figs. 6, 7, and 8, and confirm that our model fits the data from both the naïve and convalesced populations reasonably well; there are no exceptional misspecifications in our structural, variability, or covariate models.
Boxplots of logit transformed individual parameter values \(\log \left( \frac{\text {parameter value}}{1 - \text {parameter value}} \right)\) against convalesced/naïve status covariate. In boxplot (i), \(A_0\) represents initial NAb levels. The boxplots indicate that convalesced individuals have higher antibody levels at the time of getting a first vaccine dose than do naïve individuals. In boxplots (ii), (ii), and (iv), parameters \(r_1\), \(r_2\), and \(r_3\) are generally higher in the naïve individuals, indicating that NAb levels in naïve individuals will fluctuate more than in convalesced individuals. In (v), parameter \(r_4\), which is connected to decay rates of NAb levels in our mechanistic model, is lower in the convalesced than in naïve group, indicating that convalesced individuals maintain higher NAb levels over time after vaccination than do naïve individuals.
The structural model of1 was trained on data from naïve subjects, but is appropriate to use for data from convalesced individuals as well. The RMSE of the model fit to data is no larger than the RMSE for the fit to data from naïve individuals. Median RMSE for the fit to the naïve population data is 0.026, with quartiles [0.017, 0.037]. Median RMSE fit to the convalesced population data is even lower at 0.016, with quartiles [0.005, 0.024].
Another advantage of using NLME analysis is that through the mechanistic structural model, we can connect a biological interpretation to each of the model parameters. In the boxplots shown in Fig. 3, we compare the fitted individual model parameters of the convalesced and the naïve cohorts. Boxplot 3 (i) shows that \(A_0\) values are generally higher in the convalesced group than in the naïve group. Variable \(A_0\) denotes the initial NAb level in the structural model, indicating that convalesced individuals start with higher initial NAb level before being vaccinated. This is consistent with what we would expect, since convalesced subjects should already have antibodies in their systems. Boxplots 3 (ii) to 3 (iv), respectively, show that \(r_1\), \(r_2\), and \(r_3\) values for the convalesced cohort are generally lower than the naïve group’s \(r_1\), \(r_2\), and \(r_3\) values, suggesting that these parameters act as a type of control on the variation of NAb levels. In particular, each act as a cap to the amount of NAb growth allowed: the convalesced individuals have lower \(r_1\), \(r_2\), and \(r_3\) values because NAb activity starts off at higher levels (as indicated by \(A_0\)), so there is only so much variation in NAb response that can occur when levels are already close to maximal.
The interpretation of parameter \(r_4\) is of particular interest, since it aligns with the conclusions of our linear analysis. Observe that boxplot 3 (v) suggests that the median \(r_4\) value for the convalesced group is smaller than that of the naïve group. Since \(r_4\) controls the model’s decay rate, we conclude that the decay rate for the naïve group is faster than that of the convalesced group. This interpretation is consistent with the plotted mean curves for both convalesced and naïve groups shown in Fig. 2. These curves are based on our linear effects model, and this interpretation would explain why the decay rates of the convalesced and naïve groups inferred from the figure differ from each other.
Predictions using NLME
With the system of differential equations used as the structural component throughout our nonlinear mixed effects analysis, we can go beyond an analysis of the data and actually generate subject level predictions for how NAb activity will continue to evolve, even beyond the subject’s last observed measurement. As noted in Sect. "Prediction with NLME model", we take Monolix parameter estimates for \(r_1, r_2, r_3\), and \(r_4\) as well as initial NAb activity \(A_0\), substitute these values into the structural model, then numerically solve the system. For each subject in our data set, we predict the evolution of NAb levels from the day of the subject’s last observed measurement to 100 days past this measurement. Results are shown in Fig 5, where red and blue trajectories indicate whether the subject is in the convalesced cohort or naïve cohort, respectively. This visualization of predicted decay trends indicates that we expect individuals in the convalesced group to maintain higher NAb levels than those in the naïve group for at least three months following the last collected sample.
Discussion
The preceding analyses support the claim that the evolution of NAb activity generally differs between convalesced and naïve cohorts. Specifically, the rate of decay of NAb activity is slower in convalesced individuals than naïve individuals. The initial linear mixed effects analysis establishes that, with respect to convalesced/naïve status, NAb behavior within the decay region is different and this difference still persists even when we control for the type of vaccine dose. While it appears the decay rates vary between groups under this linear framework, the nonlinear mixed effects analysis confirms this observation. It asserts the entire progression, not just the decay region, varies based on convalesced/naïve status since all structural model parameters are affected by this covariate, particularly the parameter responsible for controlling the rate of decay. Smaller values of this parameter correspond to a slower rate of decay, and when fitting our model to each subject, estimates of this parameter in the convalesced cohort are in general lower than estimates in the naïve cohort.
Note our investigation only examines NAb activity from the day since the subject’s first vaccine dose to either the subject’s first booster shot or COVID-19 breakthrough event. Of course, it is now common for an individual to obtain multiple booster shots or to experience more than one breakthrough event. Future work should explore how these models can be modified to handle these types of cases and whether they, too, support the hypothesis that NAb activity progression differs between convalesced and naïve subjects, and that convalesced NAb decay rates are lower than naïve subjects decay rates.
Bear in mind that the number of individuals in the convalesced group is nearly 3 times smaller than the naïve group, and the number of observations is sparse as the number of days since first dose increases. This may have resulted in the predicted NAb curve for the convalesced group to slightly increase around day 250. In addition, in the convalesced group, days since symptom onset and days since first dose was not controlled for when doing this study. Both of these factors may be contributing to the large standard errors associated with parameters found in Table 3.
An important practical consideration affecting the viability of attempts to model durability of neutralizing antibody levels as a surrogate of protection is the degree to which new SARS-CoV-2 variants have evolved the ability to escape neutralization. Since the end of 2021, newer variants of the SARS-CoV-2 virus have become the predominant forms of the virus in circulation in the U.S. Studies have shown that it takes higher levels of antibodies formed in response to vaccination or a previous exposure to an earlier form of the virus to neutralize newer compared to older variants14,15,16. However, exposure to an newer variant of the virus (or one of the updated targeted vaccines) will increase neutralizing antibodies specific to the newer variants17,18,19. This has been the rationale for the development of new commercial variant-specific neutralizing antibody tests, e.g20.. For example, in 2022, subvariants of Omicron, e.g. XBB.1.5, became dominant in the U.S., and showed an even greater ability to evade neutralizing antibodies — including those elicited by earlier forms of Omicron or the bivalent vaccine boosters available since the fall of 202221,22. This continued evolution of SARS-CoV-2 requires that caution be taken when estimating the degree of protection afforded by neutralizing antibodies established against earlier forms of the virus.
Regardless, the models described here aid attempts to understand individual differences in vaccine responses, and the differential equations model used as the structural component in the nonlinear mixed effect analysis potentially lead to novel tools for informing individualized strategies about the optimal cadence of vaccine boosters.
Methods
We implement all data manipulation, analyses, and visualization either in R version 4.2.023, Monolix 2024R13, or MATLAB 202224.
Study participants
Here we use a dataset comprised of NAb activity measurements from post-vaccine assessments in cohorts of convalesced and COVID-naïve subjects, which were collected by Aditxt Inc. This dataset consists of 324 observations from \(n = 92\) subjects, 24 subjects from the convalesced group (80 observations) and 68 subjects from the naïve group (244 observations), collected between November 2020 to April 2022. Subjects included in this analysis were a subset of study participants enrolled in a prospective cohort study evaluating SARS-CoV-2 immune responses (NCT05379478). All methods and protocols were conducted in accordance with relevant guidelines and regulations and were approved by WCG IRB (IRB #20202768), and all participating subjects provided written informed consent. More information about the approving body WCG can be found at https://www.wcgclinical.com/solutions/irb-review/. At successive study visits, subjects with a history of COVID-19 infection or vaccination provided blood samples, as well as a limited set of clinical and demographic information. This analysis included all samples from subjects who had received a two-dose regimen of one of the mRNA COVID-19 vaccines, i.e., Moderna or Pfizer. Any samples from these subjects collected after a subsequent booster shot or breakthrough infection were excluded. Subjects who were not aware of a COVID-19 infection before their vaccination but demonstrated the presence of nucleocapsid protein (NP) antibodies in their pre-vaccine samples were also excluded. Characteristics of participants included in this analysis are presented in Table 1. Note that the number of study visits depended on the willingness of the participant, ranging from a single visit to as many as 13 visits. Visit dates also varied by subject and at the subject’s convenience, so measurements were not necessarily collected on the same number of days since first vaccine dose or at the same intervals. This resulted in 243 total visits or observations. Furthermore, after an initial visual examination of the data set, a total of five observations coming from two subjects were worth investigating, so Aditxt Inc. reanalyzed these observations resulting in five new observations. For this analysis, we replaced the old values with the average of the old and new observations.
Data filtering conditions
We use a series of data filtering steps to transform our raw data set into a useful state. First, we filter for subjects who had one or two doses of the Moderna or Pfizer vaccine since we are interested only in mRNA type vaccines. Then, for each of these subjects, we select NAb measurements that fall between the day of their first vaccine dose and end either (a) the day of their first booster shot or (b) the day of their first breakthrough event, whichever event came earlier. Note that we include observations that are taken from the day of a subject’s first vaccination up to but not including the day of a booster injection or a breakthrough event. If a subject neither obtained a booster nor experienced a breakthrough event, we include all of the subject’s NAb measurements on and after the first vaccine dose.
The individual trajectories of NAb activity over time that satisfy the above filtering requirements are shown on Fig. 1. Because we are dealing with hierarchical as well as longitudinal data, we choose to analyze the data using both linear and nonlinear mixed effects modeling techniques with each subject acting as its own grouping variable. Such models were also used in25 and11 to analyze longitudinal data associated with SARS-CoV-2 investigations. For the sake of model simplicity, we model just the decay region observed in most of these trajectories using linear mixed analysis, and this required additional filtering. Specifically, instead of considering observations starting on the day of their first vaccine dose, we start on the day NAb activity generally decreases; for our particular data set, this happens 37 days since the first vaccine dose. When performing a nonlinear mixed effects modeling analysis, there is no need to apply this additional filter, and we are able to use the subject’s first vaccine dose as our starting point.
Linear mixed effects modeling details
We make a few general remarks before describing the linear mixed effects (LME) modeling techniques in detail. First, we use the lme4 package26 in R to perform linear mixed effects modeling analysis. Second, we follow essentially the procedure described by Long27 to analyze NAb activity decay; this requires us to (a) select the number of time predictors to use by using successive likelihood ratio tests (LRT), (b) determine an appropriate random effects structure, and (c) perform hypothesis testing and model diagnostics. Third, we model the logit transformed NAb activity values \(\log \left( \frac{\text {NAb value}}{1 - \text {NAb value}}\right)\) instead of the NAb activity itself because NAb values in the raw data set were given as percentages, and we observed that modeling without the logit transformation violated the constant variance for residual errors assumption. All NAb values were strictly between 0 and 1 (0% and 100%) except one observation, where \(\text {NAb} = 1\). Having a value exactly on the boundary is problematic since the domain of the logit function is (0, 1). A way to resolve this issue would be to subtract a small amount from 1, in other words winsorize this value, and then apply the logit transformation. Here, we subtract 0.001. Several values were also examined, and it appeared that values greater than 0.0005 would essentially yield the same results presented here.
Regarding the selection of time predictors, we model the decay of logit transformed NAb activity as a polynomial. To determine which polynomial is appropriate, we use a step-up approach, starting with a first-order polynomial and comparing successive, higher-order polynomials with the likelihood ratio test (LRT) at a significance level of 5%. Note that we scale our time variable from units of days to years and then center this new, scaled time variable when constructing and comparing polynomial models. Moreover, we use “beyond optimal models” when comparing; here, this means our models include the subject’s convalesced/naïve status as a covariate along with all their interactions with the time variable. Each comparison needed the model’s random effects structure to be specified, so, for the time being, we set this to the correlated random intercept, random slope model. Once we obtained results, we also compared models with respect to model AIC values; results were similar to the LRT results. Finally, we estimate model parameters using maximum likelihood estimation.
Next, we compare different random effects structures, with the polynomial model chosen from the procedure above as our fixed effects component. We chose the appropriate model by comparing model variances of the error terms as well as AIC values. Of note, we obtain model parameter estimates using restricted maximum likelihood (REML) estimation.
Since we are interested in knowing if a subject’s convalesced/naïve status is needed to best explain our data set, we use a likelihood ratio test to compare the beyond optimal model formed with this covariate along with the fixed and random effects structures chosen above to the null model that excludes convalesced/naïve status at the 5% level of significance. If we reject the null model, we perform top-down strategy outlined in Zuur et al.12 to select the optimal model that best explains how the convalesced/naïve status covariate alters the polynomial governing NAb activity trajectories. Moreover, it requires us to compare p-values associated with model parameters, something that the lme4 package does not provide. Fortunately, the lmerTest package28 supplies these quantities and take advantage of them. Similar to our method in selecting time predictors, we obtain model parameter estimates using (unrestricted) maximum likelihood estimation and use these and their associated quantities when performing model selection, but when reporting the optimal model’s parameter estimates, we use REML. Lastly, we determine if our optimal model satisfies the linear mixed effects modeling assumptions using the performance package29.
Including sex and vaccine type as covariates
Until now, our linear mixed effects modeling analysis dealt with a single covariate, namely convalesced/naïve status. We also examine how a subject’s sex and vaccine type received contribute to the NAb activity decay region. The method of analysis is similar to that described above, more specifically, we conduct two analyses: one with convalesced/naïve status and sex as covariates and another with convalesced/naïve status and vaccine type. We refrain from examining a liner mixed effect model that includes time and all three covariates since the results would be difficult to interpret. See Sect. "Model selection and parameter fitting details" for details.
Non-linear mixed effects modeling details
One advantage to employing nonlinear mixed effects (NLME) to understand patterns in our data is that NLME allows us to extend our analysis to include data over a larger time interval: we now track NAb progression starting on the day of first vaccine dose, which captures both the increase and decrease in NAb levels over time. We use the system of differential equations proposed by de Pillis et al.1 as our structural component to our NLME model, and we use Monolix version 2021R13 to perform this part of the analysis.
Whether a model is identifiable may be of interest in certain cases, but the question of identifiability was not addressed in de Pillis et al.1. Independent of whether the structural model is identifiable, this mechanistic model is useful for implementing the nonlinear mixed effects analysis. As stated, with the chosen structural model, we can investigate the behavior of the entire data set for an individual, including both the increase and the decrease of NAB levels, and the effect of a vaccine dose. We cannot analyze the entire nonlinear data set with the linear mixed effects model. Our focus is on examining the differences between convalesced and naive individuals, so whether the model is identifiable does not impact the data analysis. Nonetheless, we carried out our own numerical exploration of identifiability. With respect to structural identifiability, we analyzed the structural model with two different freely available software packages available with web interfaces, COMBOS30 and SIAN31. Both analyses indicate that the NAb model is structurally identifiable (parameters \(r_1, r_2, r_3, r_4\) as well as A(0) and V(0) are identifiable). With respect to practical identifiability, the NLME modeling approach used in Monolix13 that fits population-level data using a combination of the underlying structural model for NAb and a statistical model for parameter fitting indicates that the model is also practically identifiable.
Our goal is to understand how all four structural model parameters \(r_1, r_2, r_3\) and \(r_4\) are influenced by a subject’s convalesced/naïve status, sex, and vaccine type. Here, our fixed and random effects represent covariates and subject differences, respectively. Similar to the LME model selection approach, we begin with a beyond optimal model, namely, the model where all parameters plus initial NAb value \(A_0\) (1) each depend on all three covariates and (2) have their random effects all correlated, as our starting point.
Before investigating how each covariate affects structural model parameters, we first determine an appropriate residual error model and appropriate distribution for the four parameters and initial NAb activity value. We accomplish this by merging various residual error model and transformation combinations to our structural model and then examining the resulting fit information. Monolix allows us to analyze these combinations effortlessly. For any transformation requiring certain parameters to be specified, we simply use Monolix’s preferred choices. In particular, when applying a logit transformation, instead of winsorizing as we did in the linear mixed effects case, Monolix will automatically adjust lower and upper bounds when it encounters NAb observations equal to 0 or 1. Fixed effects and random effects are estimated using the stochastic approximation expectation-maximization (SAEM) algorithm. Combinations that reasonably fit the data set become candidate models, and from this collection of candidate models, we choose the best one by comparing AIC values.
Now, to determine how the covariates influence model parameters, we consider three strategies. First, Monolix output for each model analysis contains, among other things, results of a correlation test that tests which model coefficient is significantly different from 0. Starting with the best candidate model, we construct a new model by dropping the coefficient whose test resulted in the largest p-value and then analyze the new model. We continue the process until all correlation tests are significant at the 5% significance level. Second, model output also provides results of a Wald test for each coefficient, and we use the same strategy above to determine an optimal model. Third, Monolix will propose an improved model based on BIC criteria and simulated individual parameters, so the strategy here is to use the proposed model as the new one and iterate until Monolix no longer proposes new changes. Although we examine the results of all three outcomes, we use predominantly the results of the third strategy. Finally, we assess the optimal model’s fit and the associated modeling assumptions by examining the models residual errors, its associated observations vs. prediction plot, and visual predictive check plot. These plots are presented in Sect "Model selection and parameter fitting details".
Model selection and parameter fitting details
In this section we investigate the impact of the vaccine type and sex covariates on NAb progression. First, when controlling for vaccine type, a difference in decay rates between convalesced and naïve groups still persists, as shown in panel (ii) of Fig. 2. Furthermore, panel (iii) of Fig. 2 shows that when controlling for convalesced/naïve status, we find subjects receiving the Moderna vaccine tend to have higher NAb activity than those receiving Pfizer and that the NAb decay rate for Pfizer appears faster than that of Moderna.
To determine if vaccine type received is statistically significant, we compare full and reduced models, where the reduced model is the one described in Equations (1) and (2), and the full model has identical subject level equations as the reduced model but population level equations are modified to be
Here, \(D_{i}\) is an indicator variable denoting whether subject i received the Moderna (1) or Pfizer (0) vaccine. The likelihood ratio test rejects the reduced model at the 5% significance level (\(\chi ^2(8) = 18.718\), \(p\text {-value} = 0.016\)), so including vaccine type is statistically significant in explaining the decay region of NAb activity progression. Again, we apply the top-down strategy outlined in Zuur12 to obtain an optimal model, but here, plotting the resulting model produces mean response curves where the convalesced mean curves eventually cross and fall below naïve curves. This would suggest at some point after receiving the first vaccine dose, mean convalesced NAb activity is lower than mean naïve NAb activity, an outcome that disagrees with our understanding of the convalesced/naïve relationship. To resolve this, we examine models close to the optimal model with respect to AIC values that are consistent with the biological constraint that convalesced NAb levels are never lower than naïve NAb levels. This more biologically consistent model is the one plotted in Fig. 2(ii) and Fig. 2(iii) and has an AIC value of 514.06. The additional biological constraint only increases the AIC value from 511.61 to 514.06, so we consider the model close to mathematically optimal.
We perform a similar analysis to determine if the sex covariate has a statistically significant impact on the evolution of NAb activity. Using a likelihood ratio test with the same reduced and full models except indicator variable \(D_{i}\) now denotes if the subject is female (1) or male (0), we reject the reduced model at the 5% significance level (\(\chi ^2(8) = 16.718\), \(p\text {-value} = 0.033\)) and conclude that including gender information into the model also improves model fit. However, AIC values for reduced and full models are 517.09 and 516.37, respectively, so the improvement is small and not plotted here.
To check our NLME model fit, we used Monolix to generate an observation vs. individual prediction plot, a residual plot, and a visual predictive check for the best fitting model. Figures 6, 7, and 8, confirm that our model fits the data from both the naïve and convalesced populations. There are no exceptional misspecifications in our structural, variability, or covariate models.
Observations versus Predictions. This figure shows observed data versus the corresponding ODE model predictions computed using the individual parameters. The 90% prediction interval, which depends on the residual error model, is overlaid. The residual error model shown above is the combined 1 error model. Predictions that are outside of the interval are denoted as outliers. A high proportion of outliers suggest misspecifications in the structural model, which we do not see here. Note also that the distribution of the observations is fairly symmetrical around the corresponding predicted values, as it should be.
Scatterplot of the residuals. These plots display the PWRES (population weighted residuals) and IWRES (individual weighted residuals) as scatter plots with respect to prediction. The PWRES is computed using the population parameters while the IWRES are computed using the individual parameters. These plots are useful to detect misspecifications in the structural and residual error models: if the model is true, residuals should be randomly scattered around the horizontal zero-line.
Visual Predictive Check - Corrected Prediction. This figure provides an intuitive assessment of misspecification in structural, variability, and covariate models. The aim is to assess graphically whether simulations from a model of interest are able to reproduce both the central trend and variability in the observed data, when plotted versus an independent variable (typically time). It summarizes the structural and statistical models by computing several quantiles of the empirical distribution of the data after having regrouped them into bins over successive intervals. VPCs can be misleading if applied to data that include a large variability in dose and/or influential covariates, or that follow adaptive designs such as dose adjustments. The prediction-corrected VPC (pcVPC), with prediction correction, was developed to maintain the diagnosis value of a VPC in these cases. In each bin, the observed and simulated data are normalized based on the typical population prediction for the median time in the bin. This removes the variability coming from binning across independent variables.
Note the large standard error associated with \(\beta _{01}\) in Table 3. Examining system of equations (8), this parameter denotes the difference of the quantity \(\log \left( A_0 / (1 - A_0) \right)\) between convalesced and naïve populations, with \(A_0\) representing the initial NAb response. The estimate of \(\beta _{01}\) shows that there is a clear distinction between initial NAb levels in the convalesced and naïve groups, which is what we would expect to see biologically. The standard error for \(\beta _{01}\) is computed in Monolix and is extracted from the Fisher information matrix. This standard error is about an order of magnitude larger than the standard errors of other parameters in the table and indicates that there is a higher degree of uncertainty as to the actual value of \(\beta _{01}\). However, this does not obviate the practical significance of the difference between the naïve and convalesced populations. We believe this is a reflection of the fact that the sample size of the convalesced group is smaller, and the wide range of initial NAb values possibly due to variability between the number of days between the beginning of COVID symptoms and first vaccine dose for subjects within this group. Biologically, it is not unexpected that there is a wide range of initial NAb values in the convalesced population. Not considering the model but from the raw data alone, we see that the standard deviation for the Convalesced group is about 19, and the sample size is 24 with 80 data points; for the naïve group, the standard deviation is 7 and the sample size is 68 with 244 data points (see Table 1). Based on the data alone, we expect the standard error related to the convalesced group to be an order of magnitude larger than that of the naïve group.
Prediction with NLME model
Once we established an optimal NLME model, we can use it to make subject level predictions of NAb levels1. In particular, for each subject, we take Monolix parameter estimates for \(r_1, r_2, r_3\), and \(r_4\) as well as initial value \(A_0\), substitute these values into our ODE model, and then numerically solve the system. Here, we predict a subject’s NAb progression for 100 days, starting on the day of their last measured NAb value. We numerically solve the system of ordinary differential equations using MATLAB’s ode45.
Data availability
All data and codes used for this project are available at https://github.com/depillis/Covid_AntibodyResponse_Analysis.git.
References
de Pillis, L. et al. A mathematical model of the within-host kinetics of sars-cov-2 neutralizing antibodies following covid-19 vaccination. Journal of Theoretical Biology (2023) .
Addetia, A. et al. Neutralizing antibodies correlate with protection from sars-cov-2 in humans during a fishery vessel outbreak with a high attack rate. Journal of clinical microbiology 58(11), e02107-20 (2020).
Khoury, D. S. et al. Neutralizing antibody levels are highly predictive of immune protection from symptomatic sars-cov-2 infection. Nature medicine 27(7), 1205–1211 (2021).
McMahan, K. et al. Correlates of protection against sars-cov-2 in rhesus macaques. Nature 590(7847), 630–634 (2021).
Liu, Z. et al. Rbd-fc-based covid-19 vaccine candidate induces highly potent sars-cov-2 neutralizing antibody response. Signal Transduction and Targeted Therapy 5(1), 1–10 (2020).
Demonbreun, A. R. et al. Comparison of IgG and neutralizing antibody responses after one or two doses of COVID-19 mRNA vaccine in previously infected and uninfected individuals. EClinicalMedicine 38, 101018 (2021).
Ebinger, J. E. et al. Antibody responses to the BNT162b2 mRNA vaccine in individuals previously infected with SARS-CoV-2. Nature medicine 27(6), 981–984 (2021).
Goel, R. R. et al. Distinct antibody and memory B cell responses in SARS-CoV-2 naïve and recovered individuals after mRNA vaccination. Science immunology 6 (58), eabi6950 (2021) .
Hwang, J.-Y. et al. Humoral and Cellular Responses to COVID-19 Vaccines in SARS-CoV-2 Infection-Naïve and-Recovered Korean Individuals. Vaccines 10(2), 332 (2022).
Muena, N. A. et al. Induction of SARS-CoV-2 neutralizing antibodies by CoronaVac and BNT162b2 vaccines in naïve and previously infected individuals. EBioMedicine 78, 103972 (2022).
Bottino, D. et al. Using mixed-effects modeling to estimate decay kinetics of response to sars-cov-2 infection. Antibody therapeutics 4(3), 144–148 (2021).
Zuur, A. F. et al. Mixed effects models and extensions in ecology with R Vol. 574 (Springer, 2009).
Lixoft SAS, F., Antony. Monolix version 2024r1 (2024). http://lixoft.com/products/monolix.
Cameroni, E. et al. Broadly neutralizing antibodies overcome sars-cov-2 omicron antigenic shift. Nature 602(7898), 664–670 (2022).
Planas, D. et al. Considerable escape of sars-cov-2 omicron to antibody neutralization. Nature 602(7898), 671–675 (2022).
Garcia-Beltran, W. F. et al. mrna-based covid-19 vaccine boosters induce neutralizing immunity against sars-cov-2 omicron variant. Cell 185(3), 457–466 (2022).
Hong, Q. et al. Molecular basis of receptor binding and antibody neutralization of omicron. Nature 604(7906), 546–552 (2022).
Richardson, S. I. et al. Sars-cov-2 omicron triggers cross-reactive neutralization and fc effector functions in previously vaccinated, but not unvaccinated, individuals. Cell host & microbe 30(6), 880–886 (2022).
Evans, JP. et al. Neutralization of sars-cov-2 omicron sub-lineages ba. 1, ba. 1.1, and ba. 2. Cell Host & Microbe 30 (8), 1093–1102 (2022) .
Varvel, SA. et al. Simultaneous measurement of multiple variant-specific sars-cov-2 neutralizing antibodies with a multiplexed flow cytometric assay. Frontiers in Immunology 6978 (2022) .
Qu, P. et al. Enhanced evasion of neutralizing antibody response by omicron xbb. 1.5, ch. 1.1 and ca. 3.1 variants. Cell Reports (2023) .
Ao, D., He, X., Hong, W. & Wei, X. The rapid rise of sars-cov-2 omicron subvariants with immune evasion properties: Xbb. 1.5 and bq. 1.1 subvariants. MedComm 4 (2), e239 (2023) .
R Core Team. R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2022). https://www.R-project.org/.
Inc., T. M. Matlab version: 9.13.0 (r2022b) (2022). https://www.mathworks.com.
Wheatley, A. K. et al. Evolution of immune responses to sars-cov-2 in mild-moderate covid-19. Nature communications 12(1), 1162 (2021).
Bates, D., Mächler, M., Bolker, B. & Walker, S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software 67 (1), 1–48 (2015). https://doi.org/10.18637/jss.v067.i01 .
Long, J. D. Longitudinal data analysis for the behavioral sciences using R (Sage, 2012).
Kuznetsova, A., Brockhoff, P. B. & Christensen, R. H. B. lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software 82 (13), 1–26 (2017). https://doi.org/10.18637/jss.v082.i13 .
Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P. & Makowski, D. performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software 6(60), 3139 (2021). https://doi.org/10.21105/joss.03139
Meshkat, N., Kuo, C.E.-Z., DiStefano, J. & III. On finding and using identifiable parameter combinations in nonlinear dynamic systems biology models and combos: A novel web implementation. PLOS ONE 9(10), 1–14. https://doi.org/10.1371/journal.pone.0110261 (2014).
Hong, H., Ovchinnikov, A., Pogudin, G. & Yap, C. Sian: software for structural identifiability analysis of ode models. Bioinformatics 35(16), 2873–2874. (2019). https://doi.org/10.1093/bioinformatics/bty1069. https://academic.oup.com/bioinformatics/article-pdf/35/16/2873/50719226/bty1069_supplementary_data.pdf
Acknowledgements
We thank the reviewers for their thoughtful feedback on our work, and believe that their input has made this a better paper.
Author information
Authors and Affiliations
Contributions
Conceptualization: LdeP, MD, SV, SS, LE. Methodology: LdeP, MD. Mechanistic model creation: LdeP. Data cleaning and preparation: LdeP, MD, SV. Numerical computations: MD, LdeP. Statistical analysis: MD. Manuscript writing and creation of figures: MD, LdeP. Manuscript reviewing and editing: LdeP, MD, JPM, SS, SV. Validation: SS, SV, RC, JPM NAb test development: SS, GC, HL.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Pillis, L.d., Caffrey, R., Chen, G. et al. Comparing neutralizing antibody activity over time between naïve and convalesced COVID-19 vaccinated individuals. Sci Rep 15, 34800 (2025). https://doi.org/10.1038/s41598-025-18673-9
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-18673-9