SIMPLICITY is an agent-based, multi-scale mathematical model to study SARS-CoV-2 intra- and between-host evolution

Gerletti, Pietro; Gubela, Nils; Escudié, Jean-Baptiste; Kühnert, Denise; Von Kleist, Max

doi:10.1038/s42003-025-09403-y

Download PDF

Article
Open access
Published: 09 January 2026

SIMPLICITY is an agent-based, multi-scale mathematical model to study SARS-CoV-2 intra- and between-host evolution

Communications Biology volume 9, Article number: 124 (2026) Cite this article

1202 Accesses
Metrics details

Subjects

Abstract

Computational tools are frequently used to describe pathogen evolutionary dynamics either within infected hosts or at the population level. However, there is a lack of models that capture the complex interplay between within-host and between-host evolutionary dynamics, leaving a knowledge gap with regard to realistic evolutionary dynamics. We present SIMPLICITY, a multi-scale mathematical model that combines within-host disease progression and viral evolution with a population-level model of virus transmission and immune evasion. We parameterize SIMPLICITY based on SARS-CoV-2 within-host viral dynamics, observed evolutionary rates, and dynamics of immune waning. We then apply it to study the dynamics and mechanisms driving SARS-CoV-2 evolution at the population level. We compare a baseline toy model of gradually increasing transmission fitness with an adaptive fitness landscape model that accounts for infection history and immune waning. Our simulations demonstrate that escape from population immunity generates evolutionary dynamics encompassing selective sweeps, which resemble SARS-CoV-2 evolution.

Dynamic causal modelling of immune heterogeneity

Article Open access 31 May 2021

SARS-CoV-2 shifting transmission dynamics and hidden reservoirs potentially limit efficacy of public health interventions in Italy

Article Open access 21 April 2021

Intervention strategies with 2D cellular automata for testing SARS-CoV-2 and reopening the economy

Article Open access 05 August 2022

Introduction

During the last decades, mathematical modeling of infectious diseases has become an important tool in public health for understanding the dynamics of disease transmission, for assessing the impact of interventions, and to propose public health policies^1,2. During the SARS-CoV-2 pandemic, the importance of mathematical models that can describe intra-host viral dynamics, as well as transmission patterns and evolutionary dynamics, became ever more evident. Such models were used to predict transmission dynamics^1,3, develop containment strategies⁴, help forecasting case-numbers and vaccine efficacy^5,6, and quantify dominant selection forces⁷, to name a few.

Mathematical modeling of infectious diseases can take many forms, depending on their scope, granularity and scale^1,2,8. Susceptible-Infected-Recovered (SIR) models divide the population into compartments and describe transitions between compartments through differential equations⁹. They are commonly used to model outbreaks that can be described by assuming homogeneous contact networks. These models can be extended to more complex scenarios, for example, by accounting for acquired immunity through infection or vaccination, or by considering more complex contact networks, as well as stochastic dynamics¹⁰. For example, in 2020, Chinazzi et al. used a disease transmission model to understand the impact of travel restrictions on the spread of SARS-CoV-2 in China and globally⁵. Moore et al. used a compartmental, age-structured model to model optimal vaccination strategy in the UK⁶. At the finest granularity, agent-based models are used to simulate individual-level interactions within a population, giving more insight into how mobility or individual behavior affects the dynamics of an outbreak^11,12.

Mathematical modeling can also be used to study within-host pathogen dynamics^4,13, i.e., how the pathogen responds to antiviral drugs^14,15,16, interacts with the immune system, or for investigating the pathogen’s within-host evolution dynamics^17,18. For example, van der Toorn et al. developed a statistical model of SARS-CoV-2 intra-host viral dynamics to help policymakers develop recommendations for non-pharmaceutical interventions at the individual level⁴.

Lastly, a number of approaches aim at predicting population-level evolutionary dynamics by combining compartment, or agent-based models, with simple models of viral evolution^8,19,20. However, these models rarely take within-host viral dynamics into account. Notably, within-host viral dynamics may have profound implications for the population-level evolution: for SARS-CoV-2, we observed near-identical viral genomes in typical outbreak scenarios²¹, which may explain gradual- or slow population-level evolution over extended periods of time²². However, many variants of concern, which are highly divergent from circulating lineages at their time of emergence, are believed to have arisen in chronically infected, immune-suppressed individuals²³. Such variants may subsequently cause selective sweeps at the population level by circumventing prevalent immunity at the time of their emergence⁷. Aside from SARS-CoV-2, intra-host dynamics are important for other viral pathogens, such as HIV, as well as some bacterial infections^24,25. These observations highlight a need to better understand the impact of within-host processes on between-host evolutionary dynamics.

Phylodynamic approaches address the inverse problem by learning evolutionary models from molecular surveillance data²⁶. These approaches, ranging from genomic epidemiology to phylogenetics, provide us with a quantitative description of population-level evolutionary parameters, averaging over potentially heterogeneous within-host dynamics. While these approaches generate meaningful interpretations of actual outbreaks or evolutionary trajectories, there is a lack of ground-truth data to challenge interpretations derived from phylodynamics approaches or to define their scope and limitations.

Multi-scale models that can capture the relationship between within-host and between-host viral dynamics may provide a tool to generate ground truth data, while delivering mechanistic insight into evolutionary dynamics^27,28,29,30. Having a framework to simulate the virus spread in a population coupled to its evolutionary dynamics both within and between hosts can be a powerful tool to study the intricate mechanisms that couple virus evolution and transmission.

Here, we present an agent-based model of population-level and intra-host processes, enabling us to generate realistic simulations of virus evolution. The model, StochastIc siMulation of sars-cov-2 sPreading and evoLutIon aCcountIng for wiThin-host dYnamics (SIMPLICITY), was developed for SARS-CoV-2 and includes four components: (i) an agent-based SIRD model that simulates the spread of the virus at the population level, (ii) a stochastic intra-host model of disease progression within individuals, (iii) an evolutionary model of SARS-CoV-2, which introduces mutations into the within-host dominant lineages, and (iv) a phenotype model that relates the viral genome to a transmission fitness. By integrating these components into a single model, we aim to gain a deeper understanding of the mechanisms driving SARS-CoV-2 evolution, which could lead to more effective public health interventions and future pandemic response strategies. At the same time, we want to provide the modeling framework for other scientists to be able to run their own simulations on SARS-CoV-2 epidemics, to produce synthetic data, and to adapt the model to other viruses. In this paper, we present the theoretical framework and provide SIMPLICITY as a software package specifically within the context of SARS-CoV-2.

Results

Overview

In this paper, we present SIMPLICITY, a multi-scale mathematical model combining models of intra- and between-host virus evolution and population infection spreading dynamics. Figure 1 highlights the four core parts of SIMPLICITY: the population modeling layer of SIMPLICITY is a SIRD compartment model, which simulates the population dynamics of a viral outbreak in a human population. Each individual who enters the infected compartment will progress in the clinical stages described by the intra-host model taken from the work of Van der Toorn et al.⁴ (Fig. 2A). We model evolutionary processes using a nucleotide substitution model that can be fine-tuned to investigate different viral evolution scenarios. The genotype of simulated lineages that evolve during a SIMPLICITY simulation influences the spreading dynamics of the virus through a relative transmission fitness score, calculated with the phenotype model. We implemented two different phenotype models: a baseline phenotype model and an immune-waning phenotype model (see Methods). We set up and run two experiments: the first one with the goal of fine-tuning the model parameter nucleotide substitution rate (NSR) to reproduce the observed nucleotide substitution rate (OSR) of the SARS-CoV- 2 pandemic. In the second experiment, we explored the effect of population immunity landscape on SARS-CoV-2 lineage emergence and showed that immune waning dynamics generate selective sweeps.

Reproducing Observed Substitution Rates

In the first experiment, we (i) individuate the NSR value range that we can use to reproduce real-world SARS-CoV-2 substitution rates, and (ii) developed a model fine-tuning pipeline that allows simulations to reproduce any desired OSR. Figure 2B shows how an estimate of the OSR can be obtained using a linear regression run on the simulated sequencing dataset produced by a single simulation run (OSR is the line slope) with parameters given in Supplementary Table S1. As the OSR does not exactly match the NSR due to the influence of population-level transmission events and the intra-host dynamics, we explored their relationship by estimating the OSR for a range of NSR. Based on AIC, the best fit for the NSR/OSR relationship among five different functions (linear, logarithm, exponential, tangent, and a polynomial spline) was an exponential function (Supplementary Table S2). Taking the inverse of the fitted curve, we obtained the NSR value needed to run simulations that will produce the target OSR for the given set of parameters (Fig. 2C). As can be seen in the figure, the exponential curve fits the data very well, showing that we can reproduce any OSR value, including the ones that were observed during the SARS-CoV-2 pandemic. We then fixed the NSR to 0.00008759 (OSR = 1 × 10⁻³])²² to run the second experiment, with the goal of investigating the importance of population immunity landscape in the evolution of new lineages with high spreading potential.

Immune waning dynamics generate selective sweeps

To assess how selection may shape the evolutionary dynamics of SARS-CoV-2, we simulated SARS-CoV-2 outbreaks using two phenotype models (see Methods). In simulations with the baseline phenotype model, the virus gained fitness as it diverged away from the founder (’wild-type’).

Interestingly, this tended to produce evolutionary dynamics in which no single virus lineage dominated, but rather the population of circulating lineages diversified (Fig. 3A). This tendency is also reflected in Fig. 4A, which shows the trajectory of the system entropy (calculated using lineage frequency) over a single simulation run. In this simulation scenario, the entropy increases steadily as the evolutionary landscape becomes more fragmented. Clustering single substitution lineages into ’super-lineages’ (i.e., lineages that share at least 5 substitutions), did not affect this tendency (Fig. S3).

**Fig. 3: Single simulation of a SARS-CoV-2 outbreak using different phenotype models.**

**Fig. 4: System entropy and selective sweeps violin plot.**

We then ran SIMPLICITY with the ’immune waning model’, where a lineage had a transmission advantage if it was distinct from the lineages that appeared within the previous months (weighed by immune waning pharmacokinetics). Interestingly, considering immune waning dynamics produced selective sweeps, in which dominating lineages are replaced by waves of lineages that escape predominant immunity (Fig. 3C). Figure 4C shows how the entropy trajectory in these simulations tends to swing in waves, instead of continuously increasing. In this case, lineage clustering becomes more pronounced: minority lineages are more frequently grouped together, as they stem from the same phylogenetic tree branch as the dominant lineage driving the current wave (Fig. S3). This implies that the immune-waning model changes the evolutionary trajectory from a simple drift from the founder sequence to a more selective, directed evolutionary pathway.

Figure 3 shows a representative example (random seed = 7) of a simulation outcome using the baseline phenotype model (upper panel) and the immune-waning model (lower panel). While the trajectories of the infected compartments (A, C subplot 1) are similar between models, marked differences emerge in the average fitness score (A, C subplot 2), lineage frequency subplots (A, C subplot 3) and lineage-specific R effective (A, C subplot 4). In the baseline phenotype model, the average fitness score increases gradually, plateaus, and then increases again. Over time, the baseline model produces an increasingly diverse evolutionary landscape, with no single lineage infecting more than 30% of the population by the end of the simulation. In contrast, the immune-waning phenotype model shows markedly different evolutionary dynamics. After an initial linear increase in average fitness, when population immunity is still negligible, the system enters a wave-like phase driven by immune escape. New dominant lineages periodically replace earlier ones (selective sweeps) as population immunity builds and wanes. We quantified the number of selective sweeps happening in each simulation batch (n = 206), after filtering out simulations shorter than 300 days, and plotted the two groups' distribution as violin plots (Fig. 4B). We used a Mann-Whitney two-sided statistical test to ensure that the differences were statistically significant, and obtained a p-value of 10⁻⁴. Subplots 4 (A, C) show the R effective value for the population (average) and for each lineage over the course of the simulations. In the baseline phenotype model simulation, one can observe many frequent R_lineage peaks, which correspond to new lineage emergence events; in comparison, the immune-waning model displays fewer but more pronounced spikes in R_lineage, aligning with the wave-like immune-escape dynamics.

Finally, the phylogenetic tree of the simulations, shown in Fig. 3A, B gives a complete overview of the evolutionary trajectory of simulated lineages. The baseline model tree shows a more scattered time of emergence (indicated by coloring), implying that temporally concurrent lineages populate distant branches on the phylogenetic tree. The immune waning model tree instead shows more visually homogeneous color clusters, indicating how lineages circulating at the same time tend to belong to the same branch of the tree and are thus more closely related.

Discussion

Our model (SIMPLICITY) combines classical epidemiological models, intra-host disease dynamic modeling, and models of viral evolution into a cohesive framework that we parameterized for SARS-CoV-2. The experimental results demonstrate that SIMPLICITY captures the dynamics of SARS-CoV-2 at multiple biological scales, such as intra-host clinical states, population spreading, and evolutionary events (selective sweeps). SIMPLICITY employs the Extrande algorithm for the exact simulation of inhomogeneous Poisson processes, while saving compute time in comparison to state-of-the-art methods (i.e,. Gillespie’s algorithm). The framework is memory-efficient and runs on standard hardware. Validation tests confirmed the mathematical robustness and consistency of SIMPLICITY. The adapted intra-host model reproduced the same average infection and infectious periods reported in the original publication⁴, indicating that the modifications preserved the host disease dynamics. As expected, individuals in the diagnosed compartment exhibited shorter residence times in the infected phase, consistent with model assumptions of detection and isolation. At the population level, the effective reproduction number R_effective matched the R parameter value specified for the simulations (within 10% variability), confirming that the model behaves as intended.

Users may utilize accessible high-level parameters like the reproduction number R to adjust internal model parameters (in particular, the infection rate). This allows to easily setup and customize simulation scenarios. Likewise, we provide a method to fine-tune the model to any desired evolutionary rate. While simple, the root-to-tip regression performed on the simulation time-stamped sequencing data to estimate the OSR works well and allows SIMPLICITY simulations to be tuned to accessible, real-world pandemic data. SIMPLICITY provides both infection and phylogenetic trees (Fig. S2).

Agent-based models such as nosoi³¹ simulate transmission chains, but do not model viral evolution or intra-host dynamics. Tools like VGsim³² and FAVITES³³ incorporate viral sequence evolution and can generate phylogenetic trees, but they either simplify within-host processes or treat them independently from epidemiological dynamics. The Opqua framework³⁴ models genome-level viral evolution with intra-host selection, but it does not include a compartmental between-host model, and simulations with more than 10³ individuals rapidly become computationally intractable, while SIMPLICITY can handle simulations that are two orders of magnitude larger. Another model, e3SIM, integrates epidemiological, ecological, and evolutionary processes in a scalable agent-based framework with explicit contact networks, while using discrete-time forward simulation (which may result in numerical errors)³⁵. SIMPLICITY bridges the gap between these approaches, with a focus on SARS-CoV-2, a virus that has impacted public health in unprecedented ways: it integrates an SIRD model, a within-host clinical dynamics model, and mutation-driven evolutionary processes within a stochastic, agent-based framework. This allows thorough investigation of how different mechanisms shape the spread of infection, transmission lineage dynamics, and evolutionary dynamics. Beyond its core functionality, SIMPLICITY enables investigation of public health interventions. By incorporating diagnosis and isolation dynamics into the intra-host model, the model reproduces reduced infectious periods for diagnosed (and isolated) individuals. This feature can be used to explore how different levels of detection impact both outbreak trajectory and evolutionary paths for the virus. This provides a tool for exploring outbreak scenarios and public health strategies, such as identifying detection thresholds that may slow or halt transmission or result in reduced lineage emergence.

Experiments using different phenotype models highlight the importance of the population immune landscape in shaping SARS-CoV-2 evolution. The baseline phenotype model produces gradually diversifying evolutionary dynamics, which may spawn into multiple evolutionary directions at the same time. Over a long time period, no single lineage dominates while Shannon entropy increases (Fig. 4). While comparing model-generated entropy predictions to real-world observational data would be valuable, the currently utilized evolutionary model may be too simplistic, and many factors, including the impact of long-shedders on viral evolution, remain to be included in SIMPLICITY. The immune-waning model assumes that immunity from past infections protects from re-infection by similar variants until immunity wanes off. However, immunity against a dissimilar variant is waning much faster⁷, such that re-infection can happen after a shorter time since the last infection. In contrast to the baseline model, we observed that the immune-waning model induces oscillatory lineage dynamics with periodic hard selective sweeps, driven by immune escape. These dynamics more closely reflect actual SARS-CoV-2 variant dynamics and our results also align with recent findings by Raharinirina et al. (2025)⁷: currently prevalent, antibody-mediated immunity and its ability to cross-neutralize emerging variants denote the major selective force driving SARS-CoV-2 evolution. Together, these findings emphasize the need to simultaneously account for infection- and immunity dynamics in evolutionary modeling of SARS-CoV-2.

Despite its versatility, the current implementation of SIMPLICITY has a number of limitations: although exact event simulation using Extrande reduces computational overhead in comparison to other exact methods, the model currently does not scale to very large population sizes. Simulations remain tractable for populations with up to 100000 cumulative infections, but larger scenarios would benefit from abstraction strategies, such as clustering individuals into super-agents, to reduce computational load and runtime. At the population level, assuming that individuals become susceptible quickly after recovery is not biologically realistic, as both variant-specific and cross-variant immunity following infection are well documented. However, in the immune-waning phenotype model, the duration of immunity is implicitly represented through the gradual loss of immune protection at the population level. This formulation therefore accounts for transient immunity without requiring explicit tracking of individual immune states. Considering the evolutionary model, in the applications presented here, viral evolution is limited to modeling substitutions (under a strict clock model and without genome site variation) and does not include recombination. Nevertheless, the model is able to accommodate site-specific rate categories. As for the clock model, one could define groups of lineages that evolve at different, user-specified rates to simulate evolution under a relaxed clock assumption. Appropriate model parameterization would be crucial to ensure results in line with SARS-CoV-2 empirical data. Moreover, the SIMPLICITY framework lays the groundwork for future recombination modeling by incorporating dominant lineages in the host. Finally, we need to consider the abstract nature of the fitness function used in the phenotype models. We simplified highly complex evolutionary processes, which emerge from a constellation of factors that affect viral evolution, with relatively simple conceptual models, aimed at exploring specific evolutionary scenarios. In particular, in the model presented here, we do not explicitly model intra-host competition and lineage emergence, limiting it to the emergence of random intra-host lineages on which then inter-host selective pressure acts. Our modeling choice is based on the assumption that for acute respiratory viruses like SARS-CoV-2, the selective pressures at the intra-host and inter-host are largely independent from one another^36,37. Nevertheless, for different modeling scenarios or for using SIMPLICITY with different pathogens, it would be important to adapt the intra-host evolutionary model to better reflect the underlying biology.

Future development of SIMPLICITY may focus on extending its biological realism and range of applications. One direction would be modeling long-shedders or immunocompromised individuals, i.e. hosts with prolonged infection durations who may provide an environment that facilitates the accumulation of multiple mutations and enables large evolutionary jumps. This hypothesis, which has gained traction in the literature^23,38,39, can be tested in future simulations by adapting the intra-host dynamics to include a subpopulation with persistent infection. Another possible direction is the application of SIMPLICITY to other respiratory viruses that show similar infection dynamics to SARS-CoV-2. By refitting the model parameters, SIMPLICITY could be employed to investigate viral dynamics in e.g., influenza virus, expanding its relevance for pandemic preparedness.

In this study, we presented SIMPLICITY, an open-source Python software that implements an agent-based, multi-layer, stochastic compartment model to simulate SARS-CoV-2 spread and evolution through integration of intra-host and population dynamics with genomic evolution. The model can generate ground truth data in the form of the true infection and phylogenetic trees of a simulation, together with synthetic, time-stamped sequencing data in aligned FASTA format. These model outputs can be used to test evolutionary hypotheses on the mechanisms underlying new lineage emergence, and to potentially benchmark existing phylogenetic pipelines. We were able to reproduce qualitative characteristics of SARS-CoV-2 lineage evolution, such as the selective sweep events that we observed during the SARS-CoV-2 pandemic, and to obtain simulations with observed evolutionary rates in line with real-world data, through appropriate model parametrization. Our experimental results show the importance of population immunity as a driver of SARS-CoV-2 evolution, and we invite other modelers to consider such processes in future studies.

Methods

We developed SIMPLICITY, which consists of an epidemiological model and an intra-host model of disease progression, combined with a within-host evolution model and a model relating transmission fitness to the viral genome of infected individuals. To enable efficient simulation of this multiscale model, we adapt a rejection-based exact stochastic simulation method⁴⁰, which operates at the time-scale of the epidemiological process (akin to the work from Gubela et al.¹²), thus avoiding numerical errors, while enabling efficient computation. Rejection-based exact stochastic simulation methods are algorithmic frameworks designed to efficiently simulate systems where stochastic events occur with dynamically changing rates. The Gillespie stochastic simulation algorithm (SSA) is the foundational exact method that ensures correct sampling of both event times and reaction channels based on fixed propensities between events. However, in systems with state-dependent propensities that vary continuously, the SSA assumption of constant propensities between events no longer holds. Rejection-based methods, such as Extrande, address this by drawing candidate event times from exponential distributions using propensity upper bounds and applying a rejection step to maintain numerical exactness. In the following sections, we will go over the details and parametrization of each part of the SIMPLICITY model.

Intra-host model

The intra-host model denotes a semi-mechanistic stochastic transit compartment model that was previously fitted to clinical data and reflects variability in SARS-CoV-2 infection dynamics⁴. Due to its model design, the model can be solved algebraically. In brief, the SARS-CoV-2 infection time course is represented by discrete compartments x = (x₀, …, x_n) with exponentially distributed waiting times. Disease progression is partitioned into five sequential phases j, each consisting of several sub-compartments $({x}_{i},\ldots ,{x}_{i+{m}_{j}})$ that capture variability in the duration of that phase across individuals. These phases correspond to biologically distinct stages of infection: (i) pre-detection (virus not yet detectable, non-infectious, m = 5 sub-compartments, mean duration τ = 2.86 days), (ii) pre-symptomatic (virus detectable and infectious, but no symptoms, m = 1, τ = 3.91 days), (iii) infectious/symptomatic (detectable, infectious, symptomatic, m = 13, τ = 7.5 days), (iv) post-infectious (detectable but no longer infectious, m = 1, τ = 8 days), and (v) recovered. We use a total of twenty compartments, x = (x₀, …, x₁₉), plus a final absorbing state that corresponds to the recovered state, to represent the infection course with sufficient granularity to reproduce the empirically observed distributions of incubation time, infectious period, and test sensitivity profiles.

The probabilistic evolution of the model can be written in matrix form, yielding the master equation:

$$\frac{d}{dt}{{{{\boldsymbol{p}}}}}_{t}\left({{{\boldsymbol{x}}}}\right)={{{\boldsymbol{A}}}}\cdot {{{{\boldsymbol{p}}}}}_{t}\left({{{\boldsymbol{x}}}}\right),$$

where p_t(x) denotes the probability that an individual is in any infection compartment ${{{\boldsymbol{x}}}}={({x}_{0},\ldots ,{x}_{20})}^{T}$ at time t and A is the transition rate matrix (the transpose of the generator of the underlying continuous-time Markov process).

$$\begin{array}{l}\begin{array}{ccccccc} \,\,\,\,\,\,\,{{\mathsf{x}}}_{{\mathsf{0}}}& {{\mathsf{x}}}_{{\mathsf{1}}}& \,\,\cdots \,& \,\,\,\,& \,\,& \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\cdots & \,{{\mathsf{x}}}_{{\mathsf{20}}}\end{array}\\ {{{\boldsymbol{A}}}}=\left(\begin{array}{ccccccc}-{r}_{1}&0& \,\cdots \,&&& \cdots \,&0\\ {r}_{1}&-{r}_{1}&&&&&\vdots \\ \vdots &\ddots &\ddots &&&&\\ &&{r}_{1}&-{r}_{2}&&&\\ &&&\ddots &\ddots &&\\ \vdots &&&&{r}_{3}&-{r}_{4}&\vdots \\ 0&\cdots \,&&\cdots \,&0&{r}_{4}&0\end{array}\right)\begin{array}{l}{{\mathsf{x}}}_{{\mathsf{0}}}\\ {{\mathsf{x}}}_{{\mathsf{1}}}\\ \vdots \\ \\ \\ \vdots \\ {{\mathsf{x}}}_{{\mathsf{20}}}\\ \end{array}\end{array}$$

Given an initial distribution ${{{{\boldsymbol{p}}}}}_{{t}_{0}}({{{\boldsymbol{x}}}})$, the analytical solution is given by

$${{{{\boldsymbol{p}}}}}_{t}({{{\boldsymbol{x}}}})=\exp ({{{\boldsymbol{A}}}}\Delta t)\cdot {{{{\boldsymbol{p}}}}}_{{t}_{0}}({{{\boldsymbol{x}}}})$$

for any time t = t₀ + Δt.

In A, the parameters r_j ∈ {r₁, r₂, r₃, r₄} correspond to micro-state transition rates r_j = m_j/τ_j, where m_j denotes the number of sub-compartments in each phase j and τ_j denotes the mean transition time between the five major phases of infection: from pre-detection to pre-symptomatic (τ₁), pre-symptomatic to infectious (τ₂), infectious to post-infectious (τ₃), and post-infectious to recovery (τ₄). Within each phase, sub-compartments share the same rate constant as outlined above, so that the total residence time in a phase follows a gamma distribution determined by the number of sub-compartments and the rate r_j of that phase.

Biologically, each compartment x_i represents a micro-state along the infection trajectory. The assignment of transmission potential and detectability follows directly from experimental viral load data used to calibrate the model (exemplified in ref. ⁴). Transmission becomes possible when the simulated infection has reached the pre-symptomatic and symptomatic stages (corresponding to compartments x₅ through x₁₈) and viral RNA remains detectable until later, up to x₁₉. This mapping ensures that model-derived quantities such as the timing of infectiousness and PCR positivity reproduce empirical data from clinical and virological studies.

SIRD model

The SIRD (population) model integrates the intra-host model into a population model by adding two extra propensities that regulate the rate of infection of new individuals (a₁) and the rate of diagnosis of infectious individuals (a₂). Upon a positive COVID-19 diagnosis, individuals are removed from the system for the time of infection (due to self-isolation). The propensities are formalized as follows:

$${a}_{1}=\beta \left(t\right)\cdot | S\left(t\right)| \cdot {\sum }_{i=5}^{18}| {x}_{i}(t)| ,$$

(1)

where β(t) is the infection rate (related to the virus reproduction number R), ∣S(t)∣ is the number of susceptible individuals at time t and ∣x_i(t)∣ is the number of infected agents in compartment x_i. Note that we only account for agents in the infectious compartments.

Similarly we derive the diagnosis propensities

$${a}_{2}={k}_{d}\cdot {\sum }_{i=5}^{19}| {x}_{i}(t)| ,$$

(2)

where k_d is the rate of diagnosis.

We adjust the transition rate matrix and add the diagnosis compartment to obtain:

$$\begin{array}{l}\begin{array}{ccccccc} \,\,\,\,\,{{\mathsf{x}}}_{{\mathsf{0}}}& \,\,\,{{\mathsf{x}}}_{{\mathsf{1}}}& \,\cdots \,& \,\,\,\,& \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\cdots \,& \,\,\,\,\,\,\,\,\,\,\,{{\mathsf{x}}}_{{\mathsf{20}}}& \,\,\,\,\,\,\,\,{\mathsf{D}}\end{array}\\ {{{\boldsymbol{B}}}}=\left(\begin{array}{ccccccc}-{r}_{1}&0&\cdots \,&&&\cdots \,&0\\ {r}_{1}&-{r}_{1}&&&&&\vdots \\ \vdots &\ddots &\ddots &&&&\\ &&{r}_{1}&-({r}_{2}+{k}_{d})&&&\\ &&&\ddots &\ddots &&\\ \vdots &&&&{r}_{3}&-({r}_{4}+{k}_{d})&\vdots \\ 0&\cdots \,&&\cdots \,&0&{r}_{4}&0\\ 0&\cdots \,&&{k}_{d}&\cdots \,&{k}_{d}&0\\ \end{array}\right)\begin{array}{l}{{\mathsf{x}}}_{{\mathsf{0}}}\\ {{\mathsf{x}}}_{{\mathsf{1}}}\\ \vdots \\ \\ \\ \vdots \\ {{\mathsf{x}}}_{{\mathsf{20}}}\\ {\mathsf{D}}\end{array}\end{array}$$

After an individual recovers, a new susceptible is introduced in the population pool, keeping the sum of infected + susceptible constant during a simulation (reinfection).

Evolutionary model

In SIMPLICITY, we only consider substitution events within-host lineages that are relevant to transmission (i.e., we do not consider entire quasi-species⁴¹). Herein, we only modeled the evolution of the SARS-CoV-2 Spike coding sequence, since substitutions in the Spike protein are responsible for the vast majority of SARS-CoV-2 evolution and for driving SARS-CoV-2 immune escape^7,42,43. In our model, we break down viral evolution into four essential steps: (i) determining the number of nucleotide substitutions happening in a time step; (ii) distributing substitutions between lineages within infected individuals; (iii) assigning the positions in the genome (location of nucleotide) where these take place; and (iv) choosing the nucleotide substitution (which nucleotide will substitute the previous one, e.g. A − > T).

We modeled the total number of substitution events as a homogeneous Poisson process with expectation value: $\lambda (t)=NSR\cdot \Delta t\cdot L\cdot {\sum }_{i = 1}^{| I(t)| }| l(t){| }_{i}$, where Δt denotes the time step to the next epidemiological event, NSR is the nucleotide substitution rate, ∣l(t)∣_i the number of lineages hosted by individual i, L the genome length and ∣I(t)∣ being the number of infected individuals. We assume that individuals within the population have the same likelihood of hosting a substitution per time step and consequently, we assign substitutions to infected individuals at random (uniform). While our model is able to utilize site-specific substitution rates (which can be derived from phylodynamics analysis) and any nucleotide substitution model, in the experiments presented here, we assumed uniform rates across the genome and utilized the Jukes-Cantor model⁴⁴ that assigns transitions and transversions probabilities as shown:

and can be described by a transition matrix. The transition matrix P(t) gives the probability that a specific substitution occurs at a site:

$$P(t)=\left(\begin{array}{cccc}P(A\to A)&P(A\to G)&P(A\to C)&P(A\to T)\\ P(G\to A)&P(G\to G)&P(G\to C)&P(G\to T)\\ P(C\to A)&P(C\to G)&P(C\to C)&P(C\to T)\\ P(T\to A)&P(T\to G)&P(T\to C)&P(T\to T)\\ \end{array}\right)$$

Intra-host viral diversification

There is evidence that during acute SARS-CoV-2 infections, sometimes minor intra-host genetic lineages emerge with a frequency sufficient for transmission⁴⁵. Depending on the duration of infection, the virus can establish distinct populations of intra-host lineages that may cross the threshold to become transmissible. This process is modeled by a third propensity:

$${a}_{3}=\,{k}_{v}\cdot \mathop{\sum }_{i=0}^{19}| {x}_{i}(t)|$$

(3)

where k_v is the intra-host lineage emergence rate, and the summation term refers to each infected individual. When this reaction happens, a new lineage is introduced in a randomly selected individual as a copy of an already existing lineage within that host; this models intra-host lineage emergence, a process that is independent of inter-host selective pressures. The lineages then proceed to evolve independently under the evolutionary model described above. To capture the observed heterogeneity of within-host diversification without imposing strong assumptions about its underlying biology, we assume that the rate of intra-host diversification is constant until a maximum number of dominant intra-host lineages is reached within an individual. For each host, this maximum is drawn from a uniform distribution, and we adopt an upper limit of five lineages so as not to constrain the model with conservative assumptions. Users can adjust this intra-host lineage limit (and distribution) as needed for specific modeling scenarios.

Once an individual has reached the maximum number of dominant intra-host lineages, subsequent lineage emergence events trigger the replacement of one existing lineage by a newly introduced one. This mechanism represents the competitive de-selection of intra-host lineages, driven by within-host competition, and operates independently of the lineages’ transmission fitness. A sensitivity analysis with regard to the intra-host lineage emergence rate k_v is shown in Supplementary Fig. S3.

Phenotypic model

We assign a relative transmission fitness score to each virus lineage present in the population at time t. In the model, an infecting virus variant is chosen according to its relative transmission fitness p(l, t) = f(l, t)/∑_lf(l, t). Notably, the transmission fitness of a variant will change over time and entails a part representing competition among lineages, and a non-competition term. The competition term represents the variants’ ability to re-infect individuals, while the non-competition term is related to infection of virus-naive individuals, where we assume that virus lineages are equally capable of infecting individuals who had never been infected. By contrast, lineages need to be distinct from past lineages in order to re-infect individuals who had already been exposed to the virus.

$$f(l,t)=\underbrace{{\pi }_{{{{\rm{non}}}}\mbox{-}\inf}(t)\cdot {(| l(t)| )}^{-1}}_{first\,infection}+\underbrace{{\pi }_{\inf }(t)\cdot d(l(t),c)}_{re-infection}$$

(4)

where ${\pi }_{\inf }(t)=\min \left(\frac{| D(t)| +| R(t)| }{| S(t)| },1\right)$ denotes the proportion of the population that has exited the infected compartment by time t and ${\pi }_{{{{\rm{non-inf}}}}}=\max \left(\frac{| S(t)| -(| D(t)| +| R(t)| )}{| S(t)| },0\right)$ denotes the proportion of infection-naive individuals. The variable (∣l(t)∣) in the equation above denotes the number of viral lineages prevalent at time t. The function d(l(t), c) denotes a measure of antigenic distance to some past immunity-inducing viral population. In the experiments, we test two scenarios for d(l(t), c):

Baseline model (linear)

In the baseline model, fitness is computed as the Hamming distance to the founder virus (wild-type). This assumes that as the virus evolves, new lineages emerge that are better at re-infecting individuals. While not biologically realistic for SARS-CoV-2, we utilize it as a baseline scenario to compare with when not considering infection history in the simulations. Therefore, we define

$$d(l(t),{c}_{0})=\,{\mbox{Hamming}}({\mbox{lineage}},{\mbox{consensus}}\,({t}_{0}))$$

i.e. the Hamming distance between the lineage within an individual and the founder sequence for the simulation.

Immune waning model

The second model accounts for acquired lineage-specific neutralizing immunity in the population and is related to mechanistic approaches of estimating relative lineage fitness in heavily immunized populations based on infection (and vaccination) history^7,43. In this model, the relative fitness of viral lineages is assigned as the Hamming distance to the temporally weighted consensus sequence of all virus lineages that circulated in the population in the previous months. Past consensus sequences are weighed by a pharmacokinetic function describing the waning of antibodies in previously exposed individuals⁷. The weights for a consensus sequence that was prevalent s ∈ [0, 180] days ago are calculated as

$$\,{{\mbox{w(t,s)}}}\,=\frac{{e}^{-{k}_{e}(t-s)}-{e}^{-{k}_{a}(t-s)}}{{e}^{-{k}_{e}({t}_{\max }-s)}-{e}^{-{k}_{a}({t}_{\max }-s)}}$$

(5)

which denotes a normalized pharmacokinetic Bateman function where t is the current time point and k_e and k_a are the antibody elimination and absorption rates (1/day), respectively. In the immune waning model, we define

$$\,{\mbox{d}}({\mbox{l(t)}}\,,{\overline{{{\rm{c}}}}})=\,{\mbox{Hamming}}({\mbox{lineage}},{{\mbox{consensus}}}_{{{{\rm{w}}}}}eighted)$$

(6)

In this model, we thus introduce a mechanism of interaction between the emerging lineages and the waning immunity in the population, which adds the time dimension to the phenotypic model and assumes that SARS-CoV-2 evolves to maximize its ability to infect an immunologically-experienced population⁷.

Numerical implementation

To run SIMPLICITY simulations, we adapted the Extrande algorithm⁴⁰ to our reaction network with time-varying propensities. Extrande enables exact stochastic simulation of such networks by overcoming the limitations of SSA, which assumes constant reaction propensities between events. Extrande introduces a virtual reaction (a reaction that does not change the state of the system) with a time-dependent propensity chosen such that the total propensity of the augmented system is piecewise constant (ensured by the UPPERBOUND, which is an upper bound on the sum of all the system’s reaction rates). Reactions associated with the virtual reaction are discarded (‘thinning’ step). This permits correct numeric sampling of waiting times that are not exponentially distributed. We extended the Extrande algorithm, adding steps for intra-host state evolution, mutation, and fitness updates. Below is the pseudocode of the core SIMPLICITY simulation algorithm. Note that the LOOKAHEAD time horizon in this implementation is simply the final time of the simulation.

Algorithm 1

Extrande Core Loop in SIMPLICITY

1: Initialize time t ← t₀, accumulator Δt_acc ← 0

2: while t < t_final do

3: L ← LOOK_AHEAD(t, t_final)

4: B ← COMPUTE_UPPERBOUND(population)

5: Δt ~ Exp(1/B)

6: if Δt > L then ⊳ Reject (leap) step

7: t ← t + L

8: UPDATE_TIME(population, t)

9: Δt_acc ← Δt_acc + L

10: if $\Delta {t}_{{{{\rm{acc}}}}}\ge {\delta }_{\min }$ then

11: UPDATE_STEP(population, Δt_acc)

12: MUTATION_STEP(population, Δt_acc)

13: Δt_acc ← 0

14: end if

15: else ⊳ Accepted (reaction) step

16: t ← t + Δt

17: UPDATE_TIME(population, t)

18: Δt_acc ← Δt_acc + Δt

19: if $\Delta {t}_{{{{\rm{acc}}}}}\ge {\delta }_{\min }$ then

20: UPDATE_STEP(population, Δt_acc)

21: MUTATION_STEP(population, Δt_acc)

22: Δt_acc ← 0

23: end if

24: UPDATE_FITNESS_STEP(population)

25: reaction_id ← REACTION_STEP(population, B)

26: end if

27: REPORTER.UPDATE(population, Δt, reaction_id, event_type)

28: POPULATION.UPDATE_TRAJECTORY

29: if day has advanced then

30: POPULATION.UPDATE_LINEAGE_FREQUENCY_T(t)

31: end if

32: if CHECK_STOP_CONDITIONS(population, t) then

33: break

34: end if

35: end while

36: REPORTER.CLOSE

Simulated trees

SIMPLICITY can provide the full infection and phylogenetic tree of a simulation. Infection trees are created by building a time-oriented binary tree (branch length is the difference from the time of getting infected to infecting a new individual) in which leaves are individuals and internal nodes represent infection events. Phylogenetic trees are binary trees (branch length either genetic distance (Hamming) or time of emergence) in which each leaf is a lineage and internal nodes are substitution events. Lineages are defined by a unique set of substitutions, meaning that every sequence differing by at least 1 position in the genome from the others is defined as a separate lineage.

Model parametrization

Each model was parameterized to ensure that the simulations could reproduce the population and viral evolutionary dynamics observed during the SARS-CoV-2 pandemic. Some of the parameters were derived from data collected during the pandemic, while others were taken from literature or manually fine-tuned for the purposes of this paper.

Intra-host model

Model parameters were taken from the already fitted published model⁴.

SIRD model

The infection rate β(t) is related to the virus reproduction number (R) as follows:

$$R=\beta (t)\cdot | S(t)| \cdot {\tau }_{{{{\rm{infectious}}}}}$$

(7)

The reproduction number of the virus, i.e., the average number of new cases generated during an individual’s infectious period, is given by the product of the infection rate β(t) and the number of susceptibles in the system scaled by the expected duration of infectiousness τ_infectious. The relation between R and β, leaves R as free parameter, by solving for β and setting τ_infectious = 11.41 days⁴, which can be specified for each simulation.

The diagnosis rate parameter k_d can be estimated from a given diagnosis probability P(diag) by calculating the limit distribution

$${{{{\boldsymbol{p}}}}}_{\infty }({{{\boldsymbol{x}}}})={\lim }_{t\to \infty }\exp ({{{\boldsymbol{B}}}}t){{{{\boldsymbol{p}}}}}_{0}({{{\boldsymbol{x}}}}).$$

The last entry of p_∞(x) denotes the distribution of diagnosed individuals and therefore the diagnosis probability P(diag). We set the default diagnosis probability to 0.1. In the simulations, diagnosed individuals are sequenced with a user-defined probability. The sequencing rate allows the user to decide which proportion of diagnosed cases will be stored as sequences (default = 0.05).

Evolutionary model

The nucleotide substitution rate (NSR), is estimated empirically, using a regression between phylogenetic distance (genetic distance from the phylogenetic tree root to each leaf, annotated with a genetic sequence) and sampling time of the sequence²⁶. Our model follows the molecular clock hypothesis, such that genetic differences accumulate at a constant rate over time. Appropriate model parameterization would also allow reproducing inhomogeneous evolutionary rates across the tree. After each simulation, we use the simulated sequencing data to estimate the observed substitution rate (OSR).

SIMPLICITY allows the user to use any desired transition matrix P(t). For the work presented here, we use a constant rate (0.33) for any transition or transversion. As the phenotype model we employ only uses the Hamming distance between sequences to assign a fitness value to a lineage, the genome composition has no impact on the transmission fitness, meaning that we can use these simplified assumptions for the scope of this paper.

Phenotypic model

The baseline phenotype model has no parameters. The parameters of the immune-waning model are set to ${k}_{e}=\frac{\ln (2)}{{t}_{{{{\rm{half}}}}}}$, t_half = 30 and ${t}_{\max }=21$, according to previous work⁷. The antibody generation rate constant k_a is obtained numerically by solving

$${t}_{\max }=\frac{\ln ({k}_{a}/{k}_{e})}{{k}_{a}-{k}_{e}},$$

for k_a using a numerical method.

Experiments

The experiments presented in this paper showcase the use of SIMPLICITY to investigate of SARS-CoV-2 intra-host and population-level dynamics and their interplay. Table 1 resumes the parameter values used for each experiment. We define an experiment as a set of n simulations with a set of fixed and varying parameters. The first experiment (Fig. 2) goal is to tune the model to a range of observed substitution rates (OSR) that correspond to real-world SARS-CoV-2 data. We modeled the SARS-CoV-2 Spike gene, as it is the main evolutionary driver. We ran 100 simulations per parameter set, fixing all parameters except the nucleotide substitution rates (NSR) (we used 15 logarithmically spaced values ∈ [10⁻⁶, 3 × 10⁻⁴]) and used the resulting data to fit different functions (lin, log, exp, tan, spline) to the NSR/OSR relationship, using the Levenberg-Marquardt algorithm. We selected the best-fitting curve by comparing the resulting Akaike Information Criterion.

Table 1 Parameters used in OSR tuning (Fig. 2) and phenotype model comparison experiments (Figs. 3, 4)

Full size table

In the second experiment (Figs. 3 and 4), we ran 300 simulations for each phenotype model (baseline, immune waning) and compared the simulation results (system trajectories, lineage frequency dynamics, number of selective sweeps happening, R effective, infection and phylogenetic trees, and entropy). The plots shown in Fig. 3 used the simulation raw output data, R effective, and the true phylogenetic tree reconstructed from the evolution histories of the lineages. R effectiveness was computed by calculating the ratio of births to death events over a sliding time window of 21 days. For the average population R effective, we define birth events as an individual becoming infectious and a death event as an individual becoming non-infectious (entry in post-infectious phase or diagnosis and isolation). For lineage-specific R effective, we define births as a lineage becoming infection capable (either when an infected individual becomes infectious or when a new lineage emerges in an infectious individual). Lineage death events are defined as lineage removal from the infectious individuals pool, either due to individuals becoming non-infectious or to evolution into a new lineage. The phylogenetic tree is reconstructed as described in the respective section above and plotted on a circular axis, colored by lineage. All lineages in a single simulation are mapped to a rainbow gradient color map that assigns a unique color to each lineage, ordered by the time of emergence. To count the number of selective sweeps that happen in a simulation, we use the lineage frequency data and define a sweep event as a lineage crossing the 50% frequency threshold and staying above it for at least 21 days. In Fig. 4, we show a violin plot comparing the distributions of selecting sweeps counts in simulations running the baseline vs the immune waning phenotype model. We compared the two groups using a Mann-Whitney two-sided test. Finally, we compare the lineage frequency entropy trajectories of each group to investigate how the phenotype model affects lineage diversification. In Fig. 4, we show the entropy data for a single simulation on the raw lineage data. In Fig. S3, we also show the same analysis run on frequency data of clustered lineages (defining a lineage as a cluster of single-substitution lineages that share at least 5 mutations).

Statistics and reproducibility

We used a Mann-Whitney two-sided statistical test in Fig. 4 with N=206. To reproduce the experiments presented here, the user can simply install SIMPLICITY 1.1.4⁴⁶ from the provided repository and then run scripts/experiments/00_generate_data_OSR_fit.py for the OSR fitting experiment (Fig. 2) and scripts/experiments/01_generate_data_SIMPLICITY_paper.py for the main experiment (Figs. 3, 4). To reproduce the figures, run the scripts/paper_figures scripts. For Fig. S3 (sensitivity with regard to virus emergence rate k_v), run scripts/experiments/generate_data_IH_lineages.py and scripts/plots/01_plot_IH_lineages.py.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The datasets generated and analyzed during the presented study are available in a publicly available Zenodo repository⁴⁷ (https://doi.org/10.5281/zenodo.17368796), together with the figures’ source data.

Code availability

The SIMPLICITY model user interface was implemented using Python and is available on GitHub under the GNU GPL v3 licence: https://github.com/PietroGx/SIMPLICITY. The version used for the work presented here is SIMPLICITY v.1.1.4⁴⁶. The description of variables in the SIMPLICITY core algorithm is outlined in Supplementary Table S3.

References

Malhotra, I. & Goel, N. Infectious disease modeling: from traditional to evolutionary algorithms. Arch. Comput. Methods Eng. 31, 663–699 (2024).
Article Google Scholar
White, P. J. Mathematical models in infectious disease epidemiology. Infect. Dis. 49–53.e1 (2017).
Sabherwal, A. K., Sood, A. & Shah, M. A. Evaluating mathematical models for predicting the transmission of COVID-19 and its variants towards sustainable health and well-being. Discov. Sustain. 5, 38 (2024).
Article Google Scholar
Van Der Toorn, W. et al. An intra-host SARS-CoV-2 dynamics model to assess testing and quarantine strategies for incoming travelers, contact management, and de-isolation. Patterns 2, 100262 (2021).
Article PubMed PubMed Central Google Scholar
Chinazzi, M. et al. The effect of travel restrictions on the spread of the 2019 novel coronavirus (COVID-19) outbreak. Science 368, 395–400 (2020).
Article CAS PubMed PubMed Central Google Scholar
Moore, S., Hill, E. M., Dyson, L., Tildesley, M. J. & Keeling, M. J. Modelling optimal vaccination strategy for SARS-CoV-2 in the UK. PLOS Comput. Biol. 17, e1008849 (2021).
Article CAS PubMed PubMed Central Google Scholar
Raharinirina, N. A. et al. SARS-CoV-2 evolution on a dynamic immune landscape. Nature 639, 196–204 (2025).
Article CAS PubMed PubMed Central Google Scholar
Gandon, S., Day, T., Metcalf, C. J. E. & Grenfell, B. T. Forecasting Epidemiological and Evolutionary Dynamics of Infectious Diseases. Trends Ecol. Evol. 31, 776–788 (2016).
Article PubMed Google Scholar
Milgroom, M. G. Epidemiology and SIR Models. In Biology of Infectious Disease: From Molecules to Ecosystems, 253–268 (Cham, 2023).
Allen, L. J. S. An Introduction to Stochastic Epidemic Models. In Mathematical Epidemiology, 81–130 (Berlin, Heidelberg, 2008).
Epstein, J. M. Modelling to contain pandemics. Nature 460, 687–687 (2009).
Article CAS PubMed PubMed Central Google Scholar
Gubela, N. & von Kleist, M. Efficient and accurate simulation of infectious diseases on adaptive networks. PLOS Complex Syst. 2, e0000049 (2025).
Article Google Scholar
Ciupe, S. M. & Heffernan, J. M. In-host modeling. Infect. Dis. Model. 2, 188–202 (2017).
PubMed PubMed Central Google Scholar
Perelson, A. S. et al. Decay characteristics of HIV-1-infected compartments during combination therapy. Nature 387, 188–191 (1997).
Article CAS PubMed Google Scholar
Perelson, A. S., Essunger, P. & Ho, D. D. Dynamics of HIV-1 and CD4+ lymphocytes in vivo. AIDS 11 Suppl A, S17–24 (1997).
CAS PubMed Google Scholar
Zhang, L. et al. Model-based predictions of protective HIV pre-exposure prophylaxis adherence levels in cisgender women. Nat. Med. 29, 2753–2762 (2023).
Article CAS PubMed PubMed Central Google Scholar
Rosenbloom, D. I. S., Hill, A. L., Rabi, S. A., Siliciano, R. F. & Nowak, M. A. Antiretroviral dynamics determines HIV evolution and predicts therapy outcome. Nat. Med. 18, 1378–1385 (2012).
Article CAS PubMed PubMed Central Google Scholar
Feder, A. F., Harper, K. N., Brumme, C. J. & Pennings, P. S. Understanding patterns of HIV multi-drug resistance through models of temporal and spatial drug heterogeneity. eLife 10, e69032 (2021).
Article CAS PubMed PubMed Central Google Scholar
Xu, Z., Song, J., Liu, W. & Wei, D. An agent-based model with antibody dynamics information in COVID-19 epidemic simulation. Infect. Dis. Model. 8, 1151–1168 (2023).
CAS PubMed PubMed Central Google Scholar
Xu, Z. et al. More or less deadly? A mathematical model that predicts SARS-CoV-2 evolutionary direction. Comput. Biol. Med. 153, 106510 (2023).
Article CAS PubMed PubMed Central Google Scholar
Baumgarte, S. et al. Investigation of a Limited but Explosive COVID-19 Outbreak in a German Secondary School. Viruses 14, 87 (2022).
Article CAS PubMed PubMed Central Google Scholar
Markov, P. V. et al. The evolution of SARS-CoV-2. Nat. Rev. Microbiol. 21, 361–379 (2023).
Article CAS PubMed Google Scholar
Kemp, S. A. et al. SARS-CoV-2 evolution during treatment of chronic infection. Nature 592, 277–282 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dekker, J. P. Within-host evolution of bacterial pathogens in acute and chronic infection. Annu. Rev. Pathol.: Mech. Dis. 19, 203–226 (2024).
Article CAS Google Scholar
Theys, K. et al. The impact of HIV-1 within-host evolution on transmission dynamics. Curr. Opin. Virol. 28, 92–101 (2018).
Article PubMed Google Scholar
Pybus, O. G. & Rambaut, A. Evolutionary analysis of the dynamics of viral infectious disease. Nat. Rev. Genet. 10, 540–550 (2009).
Article CAS PubMed PubMed Central Google Scholar
Peck, K. M., Chan, C. H. S. & Tanaka, M. M. Connecting within-host dynamics to the rate of viral molecular evolution. Virus Evol. 1, vev013 (2015).
Article PubMed PubMed Central Google Scholar
Mideo, N., Alizon, S. & Day, T. Linking within- and between-host dynamics in the evolutionary epidemiology of infectious diseases. Trends Ecol. Evol. 23, 511–517 (2008).
Article PubMed Google Scholar
Feng, Z., Cen, X., Zhao, Y. & Velasco-Hernandez, J. X. Coupled within-host and between-host dynamics and evolution of virulence. Math. Biosci. 270, 204–212 (2015).
Article PubMed Google Scholar
Martcheva, M., Tuncer, N. & Mary, C. S. Coupling Within-Host and Between-Host Infectious Diseases Models. BIOMATH 4, 1510091 (2015).
Article Google Scholar
Lequime, S., Bastide, P., Dellicour, S., Lemey, P. & Baele, G. nosoi: A stochastic agent-based transmission chain simulation framework in R. Methods Ecol. Evol. 11, 1002–1007 (2020).
Article PubMed PubMed Central Google Scholar
Shchur, V. et al. VGsim: Scalable viral genealogy simulator for global pandemic. PLOS Comput. Biol. 18, e1010409 (2022).
Article CAS PubMed PubMed Central Google Scholar
Moshiri, N., Ragonnet-Cronin, M., Wertheim, J. O. & Mirarab, S. FAVITES: simultaneous simulation of transmission networks, phylogenetic trees and sequences. Bioinformatics 35, 1852–1861 (2019).
Article CAS PubMed Google Scholar
Cárdenas, P., Corredor, V. & Santos-Vega, M. Genomic epidemiological models describe pathogen evolution across fitness valleys. Sci. Adv. 8, eabo0173 (2022).
Article PubMed PubMed Central Google Scholar
Xu, P. et al. e3sim: epidemiological-ecological-evolutionary simulation framework for genomic epidemiology. bioRxiv https://doi.org/10.1101/2024.06.29.601123 (2024).
Hou, M. et al. Intra- vs. interhost evolution of SARS-CoV-2 driven by uncorrelated selection—the evolution thwarted. Mol. Biol. Evol. 40, msad204 (2023).
Article CAS PubMed PubMed Central Google Scholar
Morris, D. H. et al. Asynchrony between virus diversity and antibody selection limits influenza virus evolution. eLife 9, e62105 (2020).
Article CAS PubMed PubMed Central Google Scholar
Riddell, A. C. & Cutino-Moguel, T. The origins of new SARS-COV-2 variants in immunocompromised individuals. Curr. Opin. HIV AIDS (2023).
Raglow, Z. et al. SARS-CoV-2 shedding and evolution in patients who were immunocompromised during the omicron period: a multicentre, prospective analysis. Lancet Microbe 5, e235–e246 (2024).
Article CAS PubMed PubMed Central Google Scholar
Voliotis, M., Thomas, P., Grima, R. & Bowsher, C. G. Stochastic Simulation of Biomolecular Networks in Dynamic Environments. PLOS Comput. Biol. 12, e1004923 (2016).
Article PubMed PubMed Central Google Scholar
von Kleist, M. et al. HIV quasispecies dynamics during pro-active treatment switching: impact on multi-drug resistance and resistance archiving in latent reservoirs. PloS One 6, e18204 (2011).
Article Google Scholar
Jian, F. & Cao, Y. Deciphering SARS-CoV-2 evolution under antibody immune pressure. Trends Immunol. 46, 263–265 (2025).
Article CAS PubMed Google Scholar
Meijers, M., Ruchnewitz, D., Eberhardt, J., Łuksza, M. & Lässig, M. Population immunity predicts evolutionary trajectories of SARS-CoV-2. Cell 186, 5151–5164.e13 (2023).
Article PubMed PubMed Central Google Scholar
Jukes, T. H. & Cantor, C. R. Evolution of Protein Molecules. In Mammalian Protein Metabolism, 21–132 (1969).
Farjo, M. et al. Within-host evolutionary dynamics and tissue compartmentalization during acute SARS-CoV-2 infection. J. Virol. 98, e01618–23 (2024).
Article PubMed PubMed Central Google Scholar
Gerletti Pietro, Escudié Jean-Baptiste. SIMPLICITY https://doi.org/10.5281/zenodo.17338653 (2025).
Gerletti Pietro. SIMPLICITY paper supplementary and figures source data https://doi.org/10.5281/zenodo.17368796 (2025).
Walls, A. C. et al. Structure of the SARS-CoV-2 spike glycoprotein (closed state). RCSB Protein Data Bank (2020). PDB ID: 6VXX. Primary publication: Cell 181: 281 (2020).

Download references

Acknowledgements

The authors thank Ariane Weber and Sanni Översti for insightful discussions on SARS-CoV-2 evolution and phylogenetics and Silvan Wehrli for coding advice. We thank Wiep van der Toorn for feedback regarding the SARS-CoV-2 intra-host model and to Nadezhda Malysheva for discussion on the numeric implementation. MvK and NG acknowledge funding by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy - The Berlin Mathematics Research Center MATH+ (EXC-2046/1, project ID: 390685689). This project was supported by Germany’s Federal Ministry of Health (BMG) under grant no. 2523DAT400 (project “AI-assisted analysis and visualization of pandemic situations,” AI-DAVis-PANDEMICS).

Funding

Open Access funding enabled and organized by Projekt DEAL.

Author information

These authors contributed equally: Nils Gubela, Jean-Baptiste Escudié.
These authors jointly supervised this work: Denise Kühnert, Max Von Kleist.

Authors and Affiliations

Center for Artificial Intelligence in Public Health, Robert Koch Institute, Berlin, Germany
Pietro Gerletti, Jean-Baptiste Escudié & Denise Kühnert
Department of Mathematics & Computer Science, Freie Universität Berlin, Berlin, Germany
Pietro Gerletti, Nils Gubela & Max Von Kleist
International Max-Planck Research School “Biology and Computation” (IMPRS-BAC), Max-Planck Institute for Molecular Genetics, Berlin, Germany
Nils Gubela
Project groups, Robert-Koch Institute, Berlin, Germany
Max Von Kleist

Authors

Pietro Gerletti
View author publications
Search author on:PubMed Google Scholar
Nils Gubela
View author publications
Search author on:PubMed Google Scholar
Jean-Baptiste Escudié
View author publications
Search author on:PubMed Google Scholar
Denise Kühnert
View author publications
Search author on:PubMed Google Scholar
Max Von Kleist
View author publications
Search author on:PubMed Google Scholar

Contributions

Pietro Gerletti: software development, theoretical work and model development, experimental design, experiment realization, manuscript writing; Nils Gubela: theoretical work, oversight of mathematical formulation, manuscript writing; Jean-Baptiste Escudié: software development; Denise Kühnert: supervision, theoretical work, manuscript writing; Max von Kleist: supervision, theoretical work, model development, manuscript writing.

Corresponding author

Correspondence to Pietro Gerletti.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Communications Biology thanks Pablo Cárdenas and the other anonymous reviewer(s) for their contribution to the peer review of this work. Primary Handling Editors: Laura Rodriguez Perez. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Transparent Peer Review file

Supplementary Tables and Figures

Reporting Summary

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Gerletti, P., Gubela, N., Escudié, JB. et al. SIMPLICITY is an agent-based, multi-scale mathematical model to study SARS-CoV-2 intra- and between-host evolution. Commun Biol 9, 124 (2026). https://doi.org/10.1038/s42003-025-09403-y

Download citation

Received: 29 July 2025
Accepted: 10 December 2025
Published: 09 January 2026
Version of record: 29 January 2026
DOI: https://doi.org/10.1038/s42003-025-09403-y

Subjects

Abstract

Similar content being viewed by others

Dynamic causal modelling of immune heterogeneity

SARS-CoV-2 shifting transmission dynamics and hidden reservoirs potentially limit efficacy of public health interventions in Italy

Intervention strategies with 2D cellular automata for testing SARS-CoV-2 and reopening the economy

Introduction

Results

Overview

Reproducing Observed Substitution Rates

Immune waning dynamics generate selective sweeps

Discussion

Methods

Intra-host model

SIRD model

Evolutionary model

Intra-host viral diversification

Phenotypic model

Baseline model (linear)

Immune waning model

Numerical implementation

Algorithm 1

Simulated trees

Model parametrization

Intra-host model

SIRD model

Evolutionary model

Phenotypic model

Experiments

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Supplementary information

Transparent Peer Review file

Supplementary Tables and Figures

Reporting Summary

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links