Introduction

SARS-CoV-2 (severe acute respiratory syndrome coronavirus type 2) is a coronavirus (genus: Betacoronavirus, subgenus: Sarbecovirus) identified in early 2020 as the causative agent of COVID-191. Coronaviruses are widely distributed among mammals and birds. They are classified in the Coronaviridae family of RNA viruses (realm: Riboviria, order: Nidovirales, suborder: Cornidovirineae), in which the large subfamily Orthocoronavirinae includes four genera: Alpha-, Beta-, Gamma-, and Deltacoronavirus. Because of their capacity for homologous recombination, coronaviruses can relatively easily expand their host range and overcome cross-species boundaries2. The seven known human pathogenic coronaviruses (HCoV) fall into two genera: Alphacoronavirus (HCoV-229E, HCoV-NL63) and Betacoronavirus (HCoV-HKU1, HCoV-OC43, MERS-CoV, SARS-CoV, SARS-CoV-2). The viruses HCoV-229E, HCoV-NL63, HCoV-HKU1, and HCoV-OC43 are known as seasonal or endemic coronaviruses and primarily cause mild colds. However, in early childhood, the elderly, and immunocompromised individuals, severe cases of pneumonia can occur3. SARS-CoV, MERS-CoV, and SARS-CoV-2 have only recently spilt over from animal reservoirs to humans4. Infections with these “emerging pathogens” can cause severe disease with fatal outcomes.

Betacoronaviruses are membrane-enveloped RNA viruses and form virions approximately 60-140 nm in diameter with large (20-25 nm long) surface glycoproteins (spikes)5 (Fig. 1). They have a single-stranded RNA genome of positive polarity that is about 30 kilobases long, making it one of the largest genomes of all known RNA viruses. The genome encodes 29 proteins, including 16 non-structural, four structural, and nine accessory proteins which are involved in various steps of the virus’ life cycle6. The non-structural proteins are responsible for RNA replication. The structural proteins are the spike glycoprotein (S), the envelope small membrane protein (E), the membrane protein (M), and the nucleoprotein (N). The S, E, and M proteins are incorporated into the viral membrane that envelops the nucleocapsid, which is composed of the N protein and the viral genome3. The S (or spike) protein is responsible for entering the host cell and consists of two subunits. The S1 subunit contains the Receptor Binding Domain (RBD), which binds to the host cell receptor, and an amino-terminal (N-terminal) domain (NTD). The RBD contains the receptor-binding motif (RBM), which, together with the NTD, is the main target of neutralizing antibodies7. After host cell receptor binding, the S2 subunit mediates the fusion of the viral envelope and host cell membrane and the subsequent viral RNA release into the cytoplasm7. Neutralizing antibodies directed against epitopes on the RBD and the NTD inhibit cell entry of the virus and are one of the strongest available correlates of vaccine-induced protection against SARS-CoV-2 infection8,9,10. Thus, many vaccines utilize exclusively the spike protein as antigenic component11,12,13. SARS-CoV-2, like SARS-CoV and HCoV-NL63, engages the transmembrane angiotensin-converting enzyme 2 (ACE2) of host cells as a receptor, enabling subsequent entry into the host cell7: The spike protein of the virus attaches to the ACE2 receptor on the host cell and undergoes proteolytic activation. This activation involves cleavage at the S2’ site, facilitated by TMPRSS2 on the cell surface or by endosomal cathepsins within the cell. This process transitions the spike protein into a metastable state, allowing the viral and host cell membranes to fuse and initiate infection14,15. ACE2 and TMPRSS2 are co-expressed at high levels in the nasal epithelium, which may explain the efficient spread and shedding of SARS-CoV-2 in the upper respiratory tract16. High ACE2 density has been reported not only in the respiratory tract but also, for example, on enterocytes, vascular endothelial cells, renal epithelium, and myocardial cells17,18,19,20,21. Histopathological studies have demonstrated SARS-CoV-2 organ tropism for the lung, intestine, kidney, heart, and the central nervous system (CNS) (see22,23,24,25,26).

Fig. 1: Structure of SARS-CoV-2 virion and genome organization.
figure 1

A Schematic of the SARS-CoV-2 virion showing the lipid bilayer envelope containing membrane (M) and envelope (E) proteins, spike (S) proteins with subunits S1 and S2, and the nucleocapsid (N) protein bound to the positive-sense single-stranded RNA (ssRNA (+)) genome. The spike protein engages the host ACE2 receptor and is primed by TMPRSS2. B Genome organization of SARS-CoV-2 (~29,800 bp), including open reading frames ORF1a and ORF1b encoding nonstructural proteins (NSP1–16), structural proteins (S, E, M, N), and accessory proteins. The spike (S) protein is further divided into domains, including signal peptide (SP), N-terminal domain (NTD), receptor-binding domain (RBD) with receptor-binding motif (RBM), subdomains (SD1, SD2), fusion peptide (FP), heptad repeats (HR1, HR2), transmembrane domain (TM), and cytoplasmic tail (CT). Cleavage sites for furin and TMPRSS2 are indicated. The figure combines published information and was adapted accordingly from refs. 253,254,255,256. Figure elements in A were created with BioRender.com.

In contrast to pre-Omicron variants, SARS-CoV-2 variants of the Omicron complex tend to use the endosomal entry pathway mediated by cathepsins L/B, rather than the membrane (ACE2) entry pathway mediated by TMPRSS227,28. Cathepsins are located in the endosome and the cleavage of cathepsin occurs endosomally when the virus-ACE2 complex is internalized via clathrin-mediated endocytosis into the late endolysosomes15.

The emergence of the COVID-19 pandemic coincided with the advent of extensive global sequencing and data-sharing capabilities. This created an unprecedented opportunity to monitor the adaptive evolution of a respiratory RNA virus in near real-time on a global scale. It allowed us to monitor the virus while it developed the capacity to circulate in the human population and transmit among individuals with prior immunological exposure29,30.

A comprehensive understanding of the challenges posed by SARS-CoV-2, the virus responsible for the COVID-19 pandemic, requires close collaboration among virologists, bioinformaticians, clinicians, epidemiologists, and public health experts. At Robert Koch Institute, the national public health institute of Germany, an interdisciplinary team has continuously reviewed evidence on several critical aspects of COVID-19, including diagnostic and clinical features as well as viral evolution throughout the first years of the pandemic. This review outlines data obtained from 2020 to 2023 as a result of these concerted efforts, emphasizing evolutionary aspects, especially the rise of virus variants such as the Omicron complex and their impact from a public health perspective. It provides data to illustrate the ongoing need for robust surveillance systems to detect changes in respiratory virus circulation, guiding clinical and public health decisions.

SARS-CoV-2 variant nomenclature

Based on their mutation profiles, SARS-CoV-2 genetic variants are classified into clades as provided by efforts such as Nextclade31, or into lineages by systems such as Pangolin32, which also considers epidemiological criteria including localized and temporally clustered occurrences. The terms “lineage” and “variant” are used interchangeably and both refer to viruses which differ in genome sequence by several mutations. A “sublineage” is a lineage descendant, i.e., it harbors the same mutations as the “parental lineage” in addition to its distinct mutations. Eventually, all lineages (or variants) with their sublineages are placed in a large genealogy of SARS-CoV-2, which is maintained by the Pangolin community32.

The effects of specific mutations on the phenotypic properties of the virus, such as transmissibility, virulence, or immunogenicity, are subject to intense investigations, as a firm understanding will allow more reliable estimates of the risks posed by emerging variants [summarized in refs. 33,34]. If these properties of the virus change, then the term “strain” is used to differentiate this new variant from the others.

WHO classifies variants of SARS-CoV-2 based on their potential impact on global health and, thus, the degree of surveillance efforts required for each variant35,36. A Variant Under Monitoring (VUM) is a SARS-CoV-2 variant with genetic changes potentially affecting viral characteristics that shows early signs of growth (transmission) advantage, necessitating increased monitoring. A Variant of Interest (VOI) features genetic alterations that affect viral properties, demonstrating a growth advantage in multiple WHO regions or significant epidemiologic impact, signaling a threat to global health. The highest risk is assigned to a Variant of Concern (VOC), which not only meets VOI criteria but additionally leads to higher disease severity, affects COVID-19 epidemiology significantly, or markedly reduces vaccine protection against severe disease requiring major public health interventions.

Throughout the pandemic, the WHO has categorized five SARS-CoV-2 variants as VOCs: B.1.1.7, B.1.351, P.1, B.1.617.2, and B.1.1.52937,38. In line with the simplified WHO supplementary nomenclature, these are also called Alpha, Beta, Gamma, Delta, and Omicron according to Greek letters in the order of discovery35. The characteristics of the pre-Omicron VOCs and the Omicron complex are described in more detail in Boxes 14.

Until March 2023, all Omicron sublineages within the Omicron complex were classified as VOCs. However, this automatic inheritance of the parental VOC designation limited the ability to categorize specific Omicron sublineages and sub-sublineages as VOC/VOI/VUMs themselves. Therefore, this classification system did not provide the resolution needed to differentiate new and phenotypically distinct sublineages from other variants, including the parental Omicron lineages (e.g. BA.1, BA.2). To better track the evolution of SARS-CoV-2 variants, the WHO announced on March 15, 2023, that notable Omicron sublineages could be classified as VUM or VOI, rather than being grouped under the Omicron VOC label36. The original Omicron parent lineage (B.1.1.529) was de-escalated and is now classified as a “formerly circulating VOC” along with Alpha, Beta, Gamma, and Delta. Consequently, current Omicron variants lack the VOC label, even though they are not considered less dangerous than the B.1.1.529 parent lineage. The purpose of “resetting” the naming system was to enable appropriate naming and tracking of future sublineages that may be even more dangerous than the currently circulating ones. The declaration of a variant as VOC is still reserved for significant SARS-CoV-2 variants that are given a Greek letter name (e.g., Alpha, Delta, Omicron).

Mechanisms of SARS-CoV-2 evolution

Several factors, including the intrinsic mutation rate, virus and host biology, infection rates and selection pressures that have changed (and continue to vary) over the course of the pandemic, contribute to the evolution of the virus. Changes in the viral genome occur during the replication of RNA viruses in an infected host and can be transferred to the viral progeny, which are then subject to the immune selection pressure of their respective hosts. Although the RNA polymerase of coronaviruses has a rudimentary proofreading function that reduces such replication errors, SARS-CoV-2 can still accumulate a significant number of nucleotide polymorphisms. Additionally, recombination may occur when two genetically distinct viruses infect the same cell. However, within-host evolution of the virus does not directly translate into an observable between-host evolution39. Due to the short infectious period of SARS-CoV-2 (a few days)40, immunocompetent individuals typically transmit virus particles with minimal genetic variation compared to the initial infection. Outbreak genetic analyses have revealed that virus genomes obtained from transmission pairs (i.e., SARS-CoV-2 cases with an epidemiological link suggesting direct transmission) are identical for most pairs41,42,43. This attribute of SARS-CoV-2 biology contributes to the fact that the evolutionary rate of the virus is orders of magnitude lower than would be expected given its RNA polymerase error rate. However, high infection rates, or a prolonged course of infection may accelerate the overall rate of evolution despite the low rate of within-host evolution44. Importantly, infections that persist for months, occasionally observed in the context of immunocompromised individuals (so-called: long-shedders), may result in a vast array of mutant viruses with new capabilities45,46,47. While some of these highly evolved variants may be transmitted47, ongoing debate surrounds the extent to which long-term infections, particularly in immunocompromised patients, contribute to directed virus evolution46,48,49,50,51,52. This aspect remains a critical area of study in understanding the ability of the virus to adapt53.

Evolutionary rates and mutations

The evolutionary rate, or substitution rate, measures how fast detectable mutations accumulate in a viral population. Unlike the mutation rate, which includes all new mutations, the evolutionary rate focuses on those reaching significant frequencies44. Evolutionary rates vary among genes, in part due to heterogeneous selection pressures. Indeed, for SARS-CoV-2 evolutionary rates vary not only by genomic region but also by phase of the pandemic44. Here, Fig. 2A provides insight into the evolutionary rates over different SARS-CoV-2 genomic regions, based on over 8.5 million viral genomes obtained between January, 2020 and June, 202354,55,56,57 (for more details see Supplementary Text T1): During the first year of the pandemic, only few amino acid substitutions became fixed, indicating an overall low evolutionary rate. The two amino acid substitutions that did gain rapid predominance in early 2020, S D614G and, as part of the same haplotype, RdRp P323L (encoded by ORF1a/b) were associated with more transmissions, potentially reflecting adaptation to the human host58. In subsequent years, with rising case and transmission numbers, many more amino acid substitutions have accumulated.

Fig. 2: Mutation frequencies in the SARS-CoV-2 genome throughout the pandemic and independent of lineage assignments, based on published sequence data.
figure 2

Amino acid (aa) substitutions shown here have exceeded a frequency of 70% in at least one calendar week (see Supplementary Text T1 for details). A The global view is based on more than 8.5 million genomes from GISAID and illustrates the relative occurrence of mutations within selected viral genes over time. Note that genes, signified by the vertical bar to the left are not depicted to scale; i.e., the density of amino acid substitutions accumulated over the spike gene ( ~ 3600 nt) is considerably higher than that accumulated over ORF 1a/b (>15,000 nt). B Mutation frequencies within the spike gene, resulting in amino acid replacements during any given calendar week, are depicted for the German genomic surveillance data set; to avoid sampling bias, only those 360,800 genomes were included that had been marked as “randomly sampled” by the submitter (see Supplementary Methods for detailed explanation). The abundance of selected major lineages circulating in Germany between the end of 2020 and April 2023 is visualized at the top of the heatmap. Data used for this visualization is part of the German SARS-CoV-2 genomic surveillance data set collected from the German Electronic Sequence Data Hub (DESH)59, published at ref. 60.

The viral gene accumulating the highest number of nonsynonymous mutations relative to its length was the spike gene. This implies that the evolutionary rate increased over the course of the pandemic and was highest over the spike gene. In other words, this SARS-CoV-2 gene evolved the fastest, particularly once widespread infections and vaccination campaigns led to the development of population immunity. A more detailed view into the rapid accumulation of spike amino acid substitutions is provided in Fig. 2B, which visualizes a portion of the German SARS-CoV-2 genomic surveillance dataset59,60. The observation that the spike gene has the highest evolutionary rate comes as no surprise: the spike protein is the target of all neutralizing antibodies induced by vaccines and infection. Mutations in the spike gene can confer resistance to neutralization by vaccine- or infection-induced immune responses (“immune evasion”, “immune escape”), facilitating breakthrough infections and transmissions from immunized individuals61. Higher immune selection pressure (due to the global increase in population immunity) promotes antigenic drift in the spike gene, enhancing immune escape. The high structural plasticity of the Spike RBD is particularly conducive to the emergence of escape variants62. Moreover, Spike amino acid changes can influence viral infectivity and replication capacity, e.g. by increasing ACE2 receptor affinity and the ability for ACE2-independent cell entry63,64. This contrasts with the loss of structural stability of the spike trimer associated with some amino acid exchanges65,66. The balance of these factors determines the evolution of the Spike protein, a dynamic and complex process where transmissibility and immune escape are the ultimate determinants of positive selection.

The Omicron variant, which reached global predominance shortly after its emergence in late November 202167, has continuously evolved since and is divided into many sublineages68. Despite their genetic diversity, these sublineages display convergent evolution in the Spike protein (mutations independently acquired at the same positions), promoting the emergence of variants that are capable of evading population immunity69,70,71. Figure 3 depicts the spike proteins of selected Omicron sublineages, the index virus, and the pre-omicron VOCs, highlighting the variant-specific spike amino acid changes.

Fig. 3: Variant-specific amino acid changes on the 3D structure of the spike (S) protein for the Index (Wuhan), Alpha, Beta, Gamma, Delta, and selected Omicron variants.
figure 3

The receptor-binding domain (RBD) and N-terminal domain (NTD) locations are indicated. Red spots indicate non-synonymous mutations calculated with a mutation frequency of over 70% in genomes of the corresponding lineage based on 8.6 million GISAID samples (see Supplementary Text T1 for details). Here, the 3D structure of the S protein (PDB: 7SBO) was plotted using the PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.

By the end of 2022, the spreading dynamics of sublineages with new escape properties varied from region to region, reflecting the increasing complexity of the global immunity landscape. For example, the closely related Omicron-BA.4/BA.5 variants initially displayed slower spread in many European countries than in South Africa. A plausible explanation is that, unlike in South Africa, many people in Europe were infected with Omicron-BA.2 before the arrival of BA.4/5. Omicron-BA.2 displays high antigenic similarity with Omicron-BA.4/5. Therefore, after a pronounced Omicron-BA.2 wave, it can be assumed that population immunity against Omicron-BA.4/BA.5 reaches high levels. Thus, the course of waves with new (sub-)variants depends on which variants have dominated the preceding waves at a given time in a given region61.

While the Spike protein is arguably most relevant to the adaptive evolution of the virus in humans, mutations in other genomic regions are also selected (Fig. 2A and Supplementary Fig. S1). They may affect, for example, non-structural proteins such as NSP6, potentially counteracting the host’s innate immune response. However, the functional effects of these mutations remain to be fully characterized29,72,73,74.

Recombination

Recombination is the exchange of genetic material between genomes, for example, of different SARS-CoV-2 variants. This process is crucial to the evolution of the virus as it can increase genetic variation and thus lead to novel selection advantages. Recombination events occur quite frequently in betacoronaviruses75,76,77,78,79. A prerequisite is a simultaneous infection event of a host cell with two different viral variants. During replication, portions of the parental genetic material of the two variants combine with each other so that the progeny viruses carry hybrid genomes. Betacoronavirus recombinants may differ in phenotype from their parental lineages and may outperform them in terms of replicative fitness80. Therefore, detection of recombinant variants requires close monitoring in SARS-CoV-2 genomic surveillance. At the same time, it also poses technical challenges: samples with a true recombinant virus must be distinguished from patient material containing coinfecting viruses that have not recombined, from contaminations, and from sequencing or genome reconstruction errors81.

Because recombination requires coinfection of the same cell with two genetically distinct viruses, it is most likely to occur when multiple viral lineages co-circulate and when viral prevalence is high78. Given the ongoing genetic diversification of SARS-CoV-2 and its wide transmission in the population, the discovery of recombinant viruses has become increasingly common79,82. This is consistent with the co-circulation of different viral lineages (e.g., leading to recombination between Delta and Omicron, or different Omicron-BA.2 sublineages) and the intensity of genomic surveillance that enables such detections. Pangolin lineages starting with the letter “X” have been established for several recombinants and their offspring, for example, XD (Delta x Omicron-BA.1), XE (BA.1 x Omicron-BA.2), XF (Delta x Omicron-BA.1), and XBB (Omicron-BJ.1 x Omicron-BA.2.75*). However, in the Pangolin system each letter combination is limited to three virus generations, after which a new letter combination is assigned to subsequent descendant lineages32. Because letter selection is random and based on availability, tracing the relationships between lineages and sublineages can be complicated.

Chronology of variant evolution

Low diversity and one adaptive change

The first mutation with a clear impact on the epidemiology of SARS-CoV-2 was an amino acid substitution in the spike protein (S) [S:D614G; i.e., aspartic acid (D) in position 614 is substituted by glycine (G)]. Viral variants harboring the D614G substitution were rare at the onset of the pandemic but expanded rapidly in early 2020, becoming globally dominant by March 202058 (Fig. 2A and Supplementary Figs. S1, S2). It was initially debated whether this sharp increase was due to a fitness advantage of 614 G variants, leading to their natural selection, or to a founder effect, where 614 G variants initiated, by sheer chance, most transmissions in multiple locations44,83. Subsequent comprehensive studies demonstrated an intrinsic transmission advantage, i.e., that the D614G substitution represented indeed an adaptive change that was naturally selected (see Box 1 for details and references). In January 2025, S:D614G was present in 99.2% of sequenced viruses in the international GISAID data.

Rise of the VOCs

Apart from the expansion of D614G, little sequence diversity and evolution were observed during the first year of the pandemic (Fig. 3). This was partly due to limited sequencing efforts, resulting in undersampling (Supplementary Fig. S3), but also because at that time a broad range of public health measures was in place to prevent the spread of SARS-CoV-2 through unvaccinated populations worldwide, resulting in overall low case numbers40,44,84,85,86. However, as cases rose and genomic surveillance efforts were increased at the end of 2021, several virus variants displaying high sequence divergence and clear signals of an epidemiologic growth advantage emerged. The WHO coined the term Variant of Concern (VOC) in 2020, then defined as a virus variant with altered phenotype characteristics demonstrated to adversely affect epidemiology (especially transmissibility), clinical presentation (especially pathogenicity), or the effectiveness of countermeasures such as diagnostics, vaccines, or therapeutics87.

The first VOCs to emerge were B.1.1.7 (later designated as Alpha), B.1.351 (Beta), and P.1 (Gamma). These variants were initially termed 501Y.V1, 501Y.V2, and 501Y.V3 because they shared the S:N501Y substitution, which increases affinity to the ACE-2 receptor, thereby enhancing viral fitness88,89,90,91. Alpha displayed intrinsic higher transmissibility, Beta and Gamma were immune evasive. The spread of Beta and Gamma remained largely constrained to world regions that had seen high transmission levels during the first year of the pandemic92,93,94. There, the ability to evade the broad population immunity established through rampant infections turned out to be an evolutionary advantage for SARS-CoV-2, opening the evolutionary niche of reinfection44. In many other world regions, Alpha, with a reproduction number exceeding that of the index virus by 50-100%95, quickly became the predominant variant, driving a major surge of infections, severe illness, and death96,97,98,99,100.

In 2021, Alpha was displaced by Delta, the fourth variant declared a VOC, which emerged in India. With an R0 of 6-7101, Delta was intrinsically more transmissible than Alpha. The S:L452R, T478K, and P681R substitutions, which influence the affinity and cleavability of the spike, contributed to Delta’s increased transmissibility (see Box 2). In addition, this VOC was moderately immune evasive, transmitting efficiently through hosts that had been previously infected. Delta’s higher replication rates and more pronounced capacity to induce syncytium formation in airway cells translated to greater severity, increasing hospitalization and ICU admission rates. Delta became the dominant variant worldwide in June 2021 (Fig. 2A). Notably, the national vaccination campaigns in some countries, such as Germany, contributed substantially to decelerating the nascent Delta wave emerging in the summer of 202159.

Emergence and dominance of Omicron marked a watershed moment in the pandemic

At the end of 2021, the emergence of a new lineage prompted a swift, global response, thereby demonstrating the effectiveness of the genomic surveillance systems that had been implemented during the pandemic. The variant B.1.1.529, subsequently designated as Omicron, rapidly increased in the South African genome surveillance, a piece of information immediately shared with public health agencies worldwide67. Due to the new variant’s concerning constellation of mutations and the pronounced growth (transmission) advantage observed in a country with a high level of population immunity, it was quickly declared as a new VOC102, prompting increased surveillance measures and risk assessment studies worldwide. This facilitated the rapid accumulation and synthesis of epidemiologic, clinical, and laboratory experimental data that enabled a comprehensive assessment of the growth advantage of Omicron and the mechanisms underpinning it. Neutralization assays showed that the immune evasion displayed by Omicron was profound (albeit not complete), and its magnitude resembled an antigenic shift in the risk assessment framework of influenza viruses103,104,105,106,107. Consistent with these wet-lab data, integrated genomic and epidemiologic surveillance data demonstrated a propensity for infection of immunologically experienced individuals who had either been vaccinated or previously infected108,109. Moreover, high-validity tissue culture models indicated that Omicron had altered its host cell tropism to infect cells in the upper respiratory tract preferentially, enabling intrinsically more efficient person-to-person transmission63,110. Over the next months, it became gradually apparent that clinical, epidemiological, and genomic data lined up with these ex vivo data, at least for immunologically healthy and vaccinated adults. Omicron infections were associated with lower severity than infections with the previous VOCs. Although case counts reached record levels due to the numerous transmissions, ICU admissions and deaths did not exceed the levels that many countries experienced during the preceding Delta waves. While efficient suppression of Omicron transmissions is possible and has been demonstrated, e.g., in health care settings111,112, the reports on milder illness, the increasing vaccination rates, the overall pandemic fatigue, and the perception that containment of Omicron was not feasible led several countries to ease public health restrictions, often at the peak of the first Omicron wave. Healthcare systems in those countries were strained severely but were not overwhelmed and many countries transitioned to “living with COVID” mitigation policies. Thus, the emergence of Omicron may be viewed as a watershed moment. While mitigation policies may imply the absence of any transmission control, several measures aimed to curb the spread of COVID-19 were maintained (i.e., “vaccine plus” strategies), based on the long-term morbidity associated with COVID-19, the unknown impact of repeated SARS-CoV-2 infections, and the fact that elderly and immunologically compromised individuals remain vulnerable to severe acute illness.

The Omicron complex

Omicron became globally dominant (frequency > 50% on GISAID) in January 2022 and has since then effectively displaced all other lineages. BA.2 was the first Omicron sublineage to predominate worldwide. BA.4 and BA.5 have identical S-Proteins that, relative to BA.2, contain three amino acid changes (L452R, F486V, and R493Q) and one deletion (del 69-70). L452R, F486V, and del 69-70 enhance infectivity while L452R and F486V increase resistance to neutralizing antibodies. R493Q, a reversion, is thought to restore receptor affinity which is compromised by F486V in some experimental models. The combination of these mutations led to an effective transmission advantage for BA.5, which became globally dominant in the 2022 summer113,114,115.

The BA.5 wave was succeeded by a “variant soup” of co-circulating Omicron sublineages, with regional differences in prevalence. These sublineages convergently acquired mutations at key residues in the spike receptor-binding domain (RBD), including R346, L452, N460, and F486. Mutations like L452R and F486S enhanced immune evasion, including resistance to monoclonal antibodies such as Bebtelovimab and Tixagevimab, while R346T and N460K improved ACE2 binding and infectivity113,115,116,117,118,119. Thus, by altering the antigenicity and ACE2 affinity of the spike, these adaptations increased the effective fitness (R) of Omicron sublineages. Among them, BQ.1.1 and XBB emerged as the fittest, co-circulating globally until the end of 2022, when BQ.1.1 was outcompeted by XBB.1.5. Compared to its parental XBB lineage, XBB.1.5 carries the F486P substitution, which confers similar immune evasion and increased ACE2 affinity82,115,120. Further information on Omicron sublineages that emerged later is provided in Supplementary Table S1.

Over 56% of SARS-CoV-2 lineages and sublineages with sequence data available on GISAID (accessed January 29, 2025) descend from the parental Omicron lineages BA.1, BA.2, BA.3, BA.4, and BA.5 (including their sublineages), constituting the Omicron complex. Lineages within the Omicron complex continue to genetically diversify in an incremental adaptive process marked by convergent evolution of advantageous mutations. Many of these mutations are located on the spike and enable escape from the humoral immunity induced by previous variants, including earlier Omicron variants70. This contrasts the “saltatory” evolution observed with the historic emergence of VOCs. On the other hand, SARS-CoV-2 is known to persistently replicate in individuals with chronic infection46,50,121,122. Moreover, Delta and other de-escalated VOCs continue to circulate in animal reservoirs123. Thus, highly divergent lineages may continue to evolve but cannot be immediately detected via genomic surveillance for some time. A conceivable scenario is that one such lineage acquires characteristics that enhance human-to-human transmission, giving rise to a new VOC.

Genomic surveillance in the context of public health activities

Genomic surveillance has played an essential role in tracking SARS-CoV-2 since the onset of the pandemic and enabled a deeper understanding of the virus’s spread and evolution, thus has been fundamental in shaping effective public health measures and interventions43,59,67,92,93,95,124,125,126,127,128. Over the pandemic, many countries implemented extensive testing and sequencing programs that facilitated the identification of cases and a high-resolution genomic surveillance both temporally and spatially. However, after WHO declared that COVID-19 is an established and ongoing health issue which no longer constitutes a public health emergency of international concern (PHEIC) on 5th May 2023129, these activities have seen a noticeable decline. Medium-scale genomic surveillance systems often operate at a lower resolution, while effectively capturing essential aspects of the viral population dynamics and infection patterns. This can result in delayed detection of virus variants with unique phenotypes, primarily due to the limitations imposed by smaller sample sizes. Thus, the future intensity of SARS-CoV-2 genomic surveillance will be modified to a less comprehensive but flexible set up, aiming to balance the needs and costs of genomic surveillance against the burden of the disease. It is important to note and consider that the virus continues to evolve in a population with widespread immunity, often without causing severe illness130. However, accurately assessing the disease burden is challenging due to the evolution of SARS-CoV-2 and the factors influencing its spread44,62. The most effective surveillance integrates epidemiological and clinical patient data with virus’ genetic information in a timely manner. The genomic surveillance system still in place in Germany, operational since 2020, utilizes a nationwide network of laboratories (IMSSC2) and centralized sequencing at Robert Koch Institute; virus genomic data is to be complemented with clinical-epidemiological data provided by local health authorities59. In addition, SARS-CoV-2 is being integrated within the existing surveillance systems for acute respiratory infections with high public health impact (Influenza, RSV): Syndromic and virological surveillance is conducted through Germany’s national sentinel system, which covers both Severe Acute Respiratory Infection (SARI) and ambulatory Acute Respiratory Infections (ARI), providing a comprehensive view of acute respiratory illnesses85,131,132,133. These crucial surveillance instruments provide opportunities to correlate clinical symptoms or severity with different virus types and to identify and potentially isolate viruses with unusual phenotypic characteristics134. Thus, they are key in identifying and characterizing virus variants with distinctive traits, such as pronounced immune evasion, increased virulence, or reduced vaccine effectiveness. Facilitating prompt and focused research into the virulence and public health implications of newly emerging variants using these tools will be pivotal in guiding data-driven public health strategies. This includes making informed decisions about vaccine formulations and other intervention measures to effectively combat emerging viral threats and where such surveillance systems can make a strong contribution. These surveillance tools typically focus on a representative selection of symptomatic individuals. In addition, the COVID-19 pandemic has provided an opportunity to enhance these well-established monitoring systems with wastewater surveillance. Despite several experimental and bioinformatic challenges, wastewater surveillance is considered an overall promising complementary surveillance tool, as it provides virus information on asymptomatic infections and may enable early trend detection and population-scale monitoring135. However, this instrument does usually not allow for virus isolation or assessment of viral phenotypes. As a complementary system, wastewater surveillance represents a broad approach to track viral prevalence and trends across communities.

Variant risk assessment by integration of virological, clinical-epidemiological, and genomic data

For public health purposes, assessing the risk posed by a new variant is of the utmost importance.

Whenever a new variant shows evidence of faster spread, i.e., an epidemiological growth advantage, this may be due to viral characteristics that favor transmission, i.e., a true transmission advantage or chance136. The public health risk emanating from the variant requires careful evaluation, the extent of which depends on the certainty of the observed growth advantage36. In the first stage, risk assessment involves confirmation of faster spread in different geographic regions as well as bioinformatic analysis, especially of the spike region, for mutations that might affect transmission efficiency61. Sequence analyses should be complemented by laboratory experimental characterization of the new variant’s virological phenotype. These experiments, performed in selected in vitro and in vivo models, assess the new variant’s growth characteristics, immunoevasive properties, and pathogenicity (virulence) under tightly controlled conditions. In addition, high-throughput assays have been developed to define more precisely epitopes of neutralizing antibodies and quantify the phenotypic effects of virtually any mutation of the spike protein, including those not yet observed in circulating variants137,138. Another essential part of variant risk assessment are epidemiological analyses to assess the clinical severity of the associated illness, which, combined with the growth advantage estimate, help to evaluate the risk of healthcare systems being strained or even overwhelmed by the new variant. Risk assessment analyses are most accurate when they are based on integrated clinical and genomic data126,127,139,140.

Clinical presentation

Acute illness

Symptomatic acute SARS-CoV-2- infection manifests with nonspecific symptoms after an incubation period of 4.2 days on average for the Omicron variant which is shorter than for infections with the Delta variant141,142. While the phase of active virus replication and shedding in the respiratory tract is very short in most patients, severely immunocompromised patients may exhibit prolonged viral persistence143,144.

The clinical presentation varies widely, ranging from asymptomatic or mild upper respiratory symptoms (predominantly) to interstitial pneumonia, which may take a severe clinical course and be complicated by Acute Respiratory Distress Syndrome (ARDS)145. Symptoms of a mild SARS-CoV-2 infection are non-specific (e.g. headache, fever and myalgia, sore throat, rhinitis) and are similar to other respiratory viral infections146,147. However, the leading symptoms appear to be somewhat related to the virus variant. For example, a sore throat is more likely observed in Omicron infection than in Delta infection141. A relatively specific COVID-19 symptom is an impairment of smell or taste148,149. However, depending on the study, there are often contradictory data about the association of this symptom with virus variants. While some studies report that changes in sense of smell and taste occurred more frequently with the Delta variant than with the Alpha variant, other studies suggest that this was a more typical symptom of the Alpha variant150,151,152. It seems that in addition to the increase in the frequency of sore throats, a decrease in smell or taste impairment is also typical in those infected with Omicron150,153. The correlation of headaches also appeared to decrease in studies with the evolution of the virus and was no longer significant with Omicron154,155,156. The proportion of asymptomatic infections and of non-severe COVID-19 cases is also reported to be significantly higher for Omicron than for Delta157; in addition to lower intrinsic virulence of Omicron, this trend may also be influenced by the generally higher population immunity present during the Omicron era.

However, all the aforementioned differences reported in some studies do not allow a reliable diagnosis of COVID-19 or even differentiation of virus variants based on the symptom constellation. The only way to identify a case with certainty is to test for specific antigens or RNA. To date, all variants have been detectable with the established diagnostic assays.

Severe COVID-19 manifests initially by cough and hypoxaemia due to interstitial pneumonia158 and may result in sepsis-like symptoms and organ failure. This is consistent with the widespread expression of the ACE2 receptor in numerous human tissues. SARS-CoV-2 can infect cells in various organ systems beyond the respiratory tract, resulting in a broad spectrum of sometimes severe extrapulmonary manifestations and complications159,160,161,162. Underlying pathomechanisms include: (i) cytolysis, i.e., direct damage to host cells by the replicating virus, (ii) a dysregulated, exuberant immune response that can lead to a life-threatening cytokine storm163, (iii) organ-specific inflammatory responses159,164,165,166,167, and (iv) endothelial damage which may be associated with dysregulation of the renin-angiotensin system and may cause, e.g., thrombo-embolic complications168,169. In addition to thromboembolic complications, other cardiovascular manifestations such as myocarditis, arrhythmias or myocardial infarction can also occur as a result of SARS-CoV-2 infection170,171,172. Acute kidney failure and several neurological manifestations such as stroke are also observed173,174,175,176,177. The involvement of other organ systems can lead to a complete picture of multi-organ failure and determine the outcome178,179.

During the initial phase of the pandemic, severe courses of COVID-19 were observed in a significant number of cases, even among young and previously healthy individuals, providing a rationale for implementing society-level measures to curb the spread of SARS-CoV-2. A meta-analysis showed that infections with the Delta variant had the highest severity. However, the hospitalization rate (but not the need for intensive care nor fatality rate) was higher for the Beta variant180.Currently, severe illness primarily affects individuals who are immunocompromised or have other predisposing conditions that put them at risk for severe illness, including advanced age181. This shift can be attributed to at least two factors: (i) most individuals have acquired immunity against the virus, protecting them not necessarily against asymptomatic or symptomatic infection but against severe clinical course182,183,184, and (ii) practically all infections are meanwhile caused by Omicron variants, generally considered of lower intrinsic pathogenicity than previously circulating variants27,110,185,186,187, even though this has been controversially discussed188,189,190,191. Thus, although widespread vaccination has successfully reduced the burden of severe COVID-19 disease in the general population, additional efforts will continuously be needed to protect the vulnerable.

A rare clinical manifestation observed in children following COVID-19 is Pediatric Inflammatory Multisystem Syndrome (PIMS), also known as Multisystem Inflammatory Syndrome in Children (MIS-C). This post-acute complication is associated with a dysregulated immune response after acute SARS-CoV-2-infection. It is hypothesized that this condition may be due to viral persistence leading to excessive activation of T-cells192,193. More recent data suggests that during the Omicron wave, both the frequency and severity of MIS-C have decreased, which may be due to the properties of Omicron or a potential protective effect of vaccination194,195,196,197.

Late complications

Long COVID and post-COVID condition

Most COVID-19 patients recover within several days to weeks after infection. However, a significant number of individuals report various persistent or new physical or neurocognitive symptoms, even after an initial recovery from the acute SARS-CoV-2-infection. These include fatigue, exercise intolerance, malaise, dyspnea, orthostatic dysregulation, and neurocognitive dysfunction198,199,200,201. These sequelae can be prolonged, experienced as severely debilitating, and negatively impact daily functioning and quality of life. If such symptoms persist or recur and cannot be otherwise explained, they are referred to as Long COVID (beginning four weeks after acute infection) and post-COVID-19 condition (beginning 12 weeks after acute infection)202.

During or after COVID-19, neurocognitive symptoms may develop, including “brain fog”, memory loss, impaired consciousness, and confusion (so-called “neuro-COVID”)203,204,205,206. Whether these clinical sequelae are related to the radiographically observed structural brain changes observed after COVID-19207,208,209 has not yet been conclusively determined.

The available data may indicate that infection with the Omicron variant leads to fewer long-lasting COVID symptoms compared to earlier variants. Several hypotheses have been proposed to explain the decline in the incidence of long-term effects of COVID-19 in subsequent SARS-CoV-2 variants, including the increase in pre-existing immunity in the population over the course of the pandemic or changes in the pathogenicity of the virus210,211. The influence of reinfection as well as the vaccination status and timing must also be considered when interpreting the data210,211. However, the risk of post-COVID syndrome remains significant even in vaccinated individuals who have contracted SARS-CoV-2 in the Omicron era. Fatigue appears to be the most common post-COVID-19 symptom regardless of the SARS-CoV-2 variant212.

Other long-term sequelae

In addition, there is evidence that organ-specific complications213 and new-onset chronic non-communicable diseases214 may occur as long-term consequences of SARS-CoV-2 infection, even in individuals who were vaccinated and/or did not experience severe illness214,215,216,217. Emerging data suggest an increased risk for cardiovascular events, such as stroke, heart attack, arrhythmia, myocarditis, and heart failure218, as well as diabetes219,220, renal failure221, and psychiatric disorders222,223. The risk of postacute COVID-19 sequelae has been shown to be substantial even among vaccinated individuals who infected during the period of Omicron predominance211.

Relationship of viral mutations to clinical presentation and severity

Pathogen genetic variation can impact virulence, which is an important determinant of clinical severity. Due to the significant influence of many other factors, including host immunity / vaccination status, age, and individual predisposition including the vascular system, it is challenging to assess pathogen virulence (or changes thereof) based on clinical signs and severity even when disease severity may be compared for SARS-CoV-2 variants co-circulating in a given population during the same time period72. It is similarly difficult, if not more so, to assess the impact of specific mutations on SARS-CoV-2 virulence based on severity data alone. Nevertheless, several amino acid substitutions that may affect acute disease severity have been identified based on clinical-epidemiological evidence.

A significant correlation with a higher viral load was found for the substitutions R203K and G204R in the N protein and S:D614G. However, although higher viral load is considered predictive of morbidity and mortality224,225, a statistically significant association could not be proven for these mutations226. A 382-nucleotide deletion that truncates open reading frame 7b (ORF7b) and eliminates ORF8 correlates with reduced clinical severity227. Notably, any ORF8 knockout appears to be associated with milder illness228. The P25L substitution in ORF3a has the potential to contribute to immune evasion and enhanced virulence, and has been linked to higher case fatality rates226. For the nt14408 mutation in RdRp, a higher single-nucleotide variant frequency was observed in severe cases226. The 11,083 G > U mutation has been associated with asymptomatic cases226,227. For mutations in ORF1ab and in the N gene, an association with asymptomatic outcomes has been described, found to be particularly strong for the co-occurring ORF1ab substitutions R6997P and V30L226. For mutations in NSP6, as well as other nonstructural proteins, an association with adverse clinical outcomes has been described229.

The differences between the variants in terms of acute clinical presentation and severity have prompted numerous investigations using in vitro and animal models to elucidate the role of specific mutations. The severe disease outcomes of the Delta variant are attributed to its enhanced replicative capacity and increased syncytium formation (fusogenicity) in the lung. Key mutations driving these features have been identified in the spike protein and include the L452R and P681R substitutions230 (see Box 2). In contrast, the reduced pathogenicity of Omicron variants is explained by their propensity for upper (rather than lower) respiratory tract replication and diminished syncytium formation; these features have been linked primarily to the Omicron spike. Mutations near the furin cleavage site (P681H, H655Y, and N679K) and the S1 C-terminus (T547K and H655Y) render the Omicron spike less fusogenic and less efficiently cleaved, suggesting it is a key determinant of the attenuated phenotype28,110,231,232. While most investigations have focused on spike protein mutations, a growing body of evidence suggests that non-spike mutations also impact SARS-CoV-2 virulence226,227,229, including animal experimental data indicating that mutations in NSP 14, envelope, and membrane proteins reduce the neuropathogenicity of Omicron233. NSP6 mutations modulate Omicron virulence74,231; and ORF8 mutations modulate lung inflammation as well as disease severity231,234.

With respect to the symptomatology of Long COVID, no single SARS-CoV-2 mutation has been definitively tied to specific Long COVID manifestations. However, some but not all studies indicate that variant-specific trends exist (reviewed in ref. 235): Omicron variants have been associated more frequently with gastrointestinal and musculoskeletal complaints and less frequently with cardiopulmonary and neuropsychiatric symptoms than pre-Omicron variants, although these findings are not entirely consistent in the literature235,236,237,238,239,240.

Antiviral therapeutic options and resistance

Protease and polymerase inhibitors are employed as early direct antiviral therapy. Monoclonal antibodies, widely used for antiviral therapy during the first two years of the pandemic, currently have limited role in the treatment and prevention of COVID-19. This is because all viruses in circulation are Omicron variants, which are known for their significant immune evasion leading to decrease and even complete loss of the effectiveness of the licensed monoclonal antibodies (all of which were developed based on pre-Omicron variants)104,241,242,243,244,245. For pre-exposure prophylaxis, a new antibody has been approved, which is effective against XBB.1.5, XBB.1.16, XBB.2.3 and BA.2.86, but not against VOC carrying the F456L substitution246. In addition to antivirals, anti-inflammatory and immunomodulatory agents such as corticosteroids and IL-6 receptor antagonists, are available for the treatment of moderate to severe COVID-19 pneumonia requiring respiratory support.

Direct antiviral therapy inhibits viral replication and should be administered in the early phase of infection, also known as the “replicative phase”. Unlike monoclonal antibodies, which target the viral spike protein, these antivirals are not impacted by the presence of Spike gene mutations. However, antiviral resistance may evolve during treatment: Random transcription errors during viral RNA synthesis may result in inhibitor-specific point mutations in the therapeutic target proteins, which are selected if the virus replicates under continuous therapeutic pressure, thus giving rise to antiviral-resistant variants. Notably, a single amino acid substitution may suffice to render the virus less sensitive or resistant to an antiviral. As of today, four years after the first antiviral compounds were approved for COVID-19 therapy, resistance mutations are rare, occur mostly in laboratory settings, and have a worldwide prevalence of less than 1%247,248,249. Box 5 provides detailed information on the direct antiviral therapeutics currently approved for treating COVID-19.

Conclusions and perspectives

This article summarizes relevant data accumulated during the first four years (2020 – 2023) of the COVID-19 pandemic in the context of a “Working Group for Diagnostic and Evolution” on the causative agent, SARS-CoV-2, at Robert Koch-Institute, the German Public Health Institute. This has been done with the aim to underline the importance of virologic surveillance under the public health perspective and with respect to pandemic preparedness in future. The insights and data presented, while rooted in the German context, reflect broader virological and public health principles that are applicable across countries. The knowledge gained from these experiences can inform global strategies for surveillance and pandemic preparedness.

Thanks to substantial advances in sequencing of infective agents and the expansion of global data sharing initiatives, the scientific community has been able to detect and track the evolution of a newly introduced respiratory virus at unprecedented resolution and speed. However, it is still not possible to make reliable predictions about the long-term evolution of a new agent like SARS-CoV-2, which may continue to follow drift-like changes within the Omicron complex, but which could also be impacted by shift-like events, leading to new Variants of Concern (VOCs) that emerge for example from persistently infected humans or animal reservoirs. The genetic diversity of viruses, and hence also SARS-CoV-2, increases with the number of infections in both human and animal populations. This heightened genetic diversity benefits the virus, enabling rapid adaptation and the emergence of VOCs. Significant shifts resulting in VOCs have historically originated from chronically infected, immunocompromised human hosts44,49,121,250,251. Whether future SARS-CoV-2 VOCs will be derived primarily from Omicron or from phylogenetically divergent lineages, which may continue to evolve not only in animal reservoirs but also, potentially, in chronically infected humans, is currently uncertain44,62. The virulence of such future VOCs can also not be predicted62. Notably, it does not necessarily correlate with transmissibility, which is the virus’ key driver of positive selection.

As the virus continues to evolve, ongoing surveillance and research are crucial to understanding the impact of both spike and non-spike mutations on disease severity and clinical presentation. Research is ongoing to determine the impact of non-spike mutations on viral phenotype and pathogenicity. These mutations may affect viral replication, evasion of the innate immune response, and tissue tropism, potentially influencing disease outcomes. The specific mechanisms by which many of these mutations affect pathogenicity are still being investigated. Identification of key mutations linked to virologic properties will facilitate a more comprehensive understanding of the mechanisms of infection and aid in the development of antiviral treatments252.

Currently, SARS-CoV-2 circulation appears to trigger epidemiological surges at a higher frequency than that of influenza waves. This is likely due to rapid viral evolution and antigenic drift, indicating that SARS-CoV-2 has not yet evolved a clear pattern of seasonality. It is uncertain if and when the frequency of these waves will decrease in the future as population immunity becomes broader and more robust. It is similarly unclear whether or when the S protein of the virus will reach its evolutionary limits and constraints.

Therefore, it remains challenging to predict the evolutionary trajectory of SARS-CoV-2, including its potential for rapid adaptation and the emergence of new VOCs. Continuous in-depth surveillance that integrates genomic, clinical, epidemiologic, and virologic data obtained from various sources, including humans and animals as well as wastewater samples, enabling a fact-based risk assessment, is a public health imperative, which may inform timely interventions and the adaptation of vaccine formulation.