Abstract
As genomic surveillance is key to detecting novel respiratory viruses or variants, the highly unequal global distribution of respiratory virus sequencing infrastructure raises concerns about preparedness for future threats. Using mathematical models and global epidemic simulations, we demonstrate that attaining a global minimum sequencing capacity of two sequences per million people per week at fortnightly sequencing regularity could reduce the time to first detection of novel respiratory (variant) viruses by weeks to months compared to global sequencing efforts during the COVID-19 pandemic, even with a substantially reduced number of viruses sequenced globally. Establishing this minimum global capacity could increase the time between the virus’ first global detection and the first domestic case in all countries, universally improving prospects for mitigation of potential public health impacts. Importantly, these benefits cannot be attained by siloed expansion in countries that already possess strong capacity. Our results show that operationalizing global health solidarity is key to guiding investment in health security.
Similar content being viewed by others
Introduction
Genomic surveillance of respiratory viruses is a critical component of public health preparedness and response, particularly for identifying and monitoring the spread of new viruses and their variants1,2,3. The sooner respiratory viruses or variants (such as potential zoonotic influenza viruses or genetically divergent SARS-CoV-2 lineages) are detected, the more time is available to characterize the threat posed and design and implement interventions and mitigation strategies such as vaccines, therapeutics and diagnostics1,4,5,6. The COVID-19 pandemic represented the zenith of global respiratory virus sequencing output so far, with ~7 million SARS-CoV-2 genomes submitted to GISAID7 in 2022 alone. However, this output was highly unequally distributed8. In 2022, countries comprising only 4.4% of the global population accounted for over half of all publicly shared genomes. In contrast, the bottom half of the global population by sequencing rate accounted for only 0.7% of genomes (Fig. 1a, Supplementary Fig. 1). Given that novel respiratory viruses or variants can potentially emerge in any country, the unequal distribution of sequencing infrastructure potentially strongly limits the global capacity to rapidly detect novel respiratory virus threats.
a Distribution of country-specific weekly sequencing rates per million people by continent estimated from GISAID metadata (n = 199) (AF Africa, AS Asia, EU Europe, NA North America, OC Oceania, SA South America). b Distribution of median country-specific time from sample collection to sequence deposition in GISAID (n = 199). c The distribution of days to variant detection for different values of variant Re in global metapopulation model simulations, each with a distinct scenario of variant emergence (n = 10,000 for each variant Re). Vertical lines correspond to the median and 95% CI. d The simulated distribution of the number of global variant infections by the day of first variant detection. e The simulated probability that the variant is first detected in its origin continent, by origin continent. f The simulated time to variant detection by variant origin continent (AF: n = 1793, AS: n = 5946, EU: n = 934, NA: n = 739, OC: n = 54, SA: n = 534). Thin and thick lines correspond to 95% and 50% CIs, respectively. Points correspond to means. g The simulated number of global variant infections by the day of detection by variant origin continent, analogous to f. h The relationship between a country’s sequencing rate and the mean time to first global detection of a variant emerging in that country in metapopulation simulations (n = 160 for each variant Re). Lines correspond to LOESS fits by variant Re.
To guide efforts toward improved health security, it is important to understand how the global landscape of genomic surveillance capabilities impacts the ability to swiftly identify new respiratory viruses and their variants. Furthermore, strategic planning for enhanced surveillance requires meaningful minimum sequencing targets as well as functional upper bounds for effective and efficient detection of new (variant) viruses2,3,8,9,10,11. Using large-scale epidemic simulations, we investigated the performance of different, varyingly balanced, global distributions of genomic surveillance infrastructure for detection of novel respiratory viruses or variants. We used 2022 SARS-CoV-2 sequencing output as baseline, representing an empirical pandemic scenario with unprecedentedly high but very unequally distributed levels of virus genomic sequencing.
Results
Global variation in pandemic-period variant detection capacity
To investigate how global variation in genomic surveillance capacity impacts the speed of new variant detection, we first investigated the performance of global genomic surveillance efforts for SARS-CoV-2 in 2022, representing an empirical baseline expectation for a potential future pandemic scenario. Sequencing output in 2022 was highly unequally distributed: country-specific sequencing rates estimated from submissions to GISAID7 ranged from <0.01 sequences per million people per week (S/M/wk) in some countries to >1000 S/M/wk in others (Fig. 1a). The median sequencing rate across European countries amounted to 64.3 S/M/wk, compared to 0.18 S/M/wk for countries in Africa. Similarly, the median time from sample collection to deposition in GISAID (henceforth, turnaround time) ranged across countries from less than a week to hundreds of days (Fig. 1b). To understand how this variation impacts potential global detection capacity, we used a global metapopulation model, validated against GLEAM12,13 (Supplementary Fig. 2), to simulate hypothetical scenarios of global variant spread and subsequent detection. We performed 10,000 independent simulations for values of variant Re ranging from 1.2 to 2. We assumed a distinct archetypal scenario of variant emergence, characterized by initial Re and prevalence of wildtype virus, for each value of variant Re (Supplementary Fig. 3). In each simulation, we assumed the country where the variant emerged was randomly selected based on a population size-weighted probability. We then simulated the time to first variant detection for each metapopulation epidemic simulation, given empirical country-specific SARS-CoV-2 sequencing rates and turnaround times in 2022 (Fig. 1a, b).
Averaged across simulated variant Re values, the mean time to first variant detection globally based on genomic surveillance was 83.0 days (95% CI 18–193), with substantial variability especially at lower values of variant Re (Fig. 1c). The simulated global number of variant infections by the day of first global detection varied widely (mean 652,172 infections, 95% CI 79–5,991,409), spanning up to five orders of magnitude for all values of variant Re (Fig. 1d). In many simulations, new variants were first detected outside of their continent of origin, driven especially by variants first emerging in Africa (first detected outside origin continent in 74.4% of simulations), Asia (24.2%) and South America (19.4%) (Fig. 1e). This means that the variant would have frequently spread widely within and between continents prior to initial detection. The continent in which the variant first emerged strongly shaped the time to variant detection (Fig. 1f) and the number of global variant infections by the day of first detection (Fig. 1g), the latter ranging from a mean of 23,988 infections (95% CI 26–254,787) when emerging in Europe to 1,841,440 infections (95% CI 1246–14,949,651) in case of emergence in Africa across simulated values of variant Re. Differences in time to variant detection were strongly and highly nonlinearly associated with the sequencing capacity in the country of origin of the novel virus, with low sequencing rates being associated with longer times to variant detection (Fig. 1h).
A target global sequencing capacity
Globally, there is a shared risk of the emergence of pandemic viruses or their variants. In contrast, the results above indicate the capacity to rapidly detect novel respiratory viruses or variants is highly asymmetrically distributed. We sought to investigate how the effectiveness and efficiency of global respiratory virus genomic surveillance depend on how surveillance infrastructure is distributed globally. To do so, we first sought to identify a minimum sequencing capacity at the national level that could serve as a target toward improving global capacity for rapid global (variant) virus detection. Ideally, this target would ensure timely information for public health action and efficient use of potentially limited resources and be realistically attainable and sustainable in pandemic and inter-pandemic periods.
To identify a target capacity, we explored the relationship between sequencing rate, turnaround time, and time to variant detection in any single country in more detail. Representing a scenario of emergence of a potential future pandemic respiratory virus or variant, we simulated the emergence of a variant virus in the background of circulating wildtype virus and computed the expected time to variant detection based on binomial sampling for different sequencing rates. We then derived a new mathematical model characterizing the relationship between sequencing rates and time to detection of the new virus variant.
For a variant virus, introduced in a population at an initial frequency f0, where the change in variant proportion through time can be described by a logistic growth rate s, the time since variant introduction when the variant virus is expected to have been detected with confidence level 1-q when sequencing n samples per unit time is equal to ln[(q‑s/n‑1)/f0 + 1]/s (Supplementary Fig. 4). Because in this equation the epidemiological dynamics of wildtype and variant are captured only by f0, the initial prevalence of variant relative to wildtype, and s, which reflects the difference in epidemic growth rate between variant and wildtype, it is broadly applicable to respiratory viruses that can be described by SIR dynamics14. By varying f0 and s, the model can be tailored to the relevant epidemiological background. For example, the model indicates that when background prevalence is low (higher f0), fewer sequences are necessary for the same detection speed. On the other hand, when wildtype incidence is increasing, more sequences might be necessary to rapidly detect a simultaneously emerging variant (because a higher wildtype growth rate results in a smaller growth rate difference among wildtype and variant, i.e., a smaller s, which leads to slower detection). Because our study models passive surveillance, our results apply to, for example, the potential emergence of a pandemic influenza virus during the course of a seasonal influenza epidemic or the emergence of a novel SARS-CoV-2 variant, but not necessarily to new spillover pathogens for which the detection process could be fundamentally different.
For all modeled scenarios of variant emergence (Supplementary Fig. 3), the returns on increases in sequencing rate were rapidly diminishing: time to variant detection rapidly decreased as sequencing rate increased up to ∼10 S/M/wk while the benefits of increases in sequencing rate beyond 10 S/M/wk were much smaller (Fig. 2a, Supplementary Fig. 5a). In 2022, many high-income countries sequenced SARS-CoV-2 genomes at rates well in excess of 10 S/M/wk. In contrast, many lower- and middle-income countries sequenced at the relatively low rates where small absolute increases would substantially speed up variant detection (Fig. 2b, Supplementary Fig. 5a). For example, in a country of 100 million people sequencing at the median 2022 SARS-CoV-2 sequencing rate in low-income countries (0.035 S/M/wk), increasing the sequencing rate by 1 S/M/wk would reduce the time to detection of a variant with Re = 1.6 at 95% confidence by ∼28 days, given a wildtype prevalence of 0.5% and a wildtype Re of 1.1 at time of variant emergence. In contrast, if the same country was sequencing at the 2022 median high-income country rate (58.9 S/M/wk), the reduction in time to detection resulting from the same 1 S/M/wk increase in sequencing rate would be only 3.5 hours (Fig. 2b, Supplementary Fig. 5b).
a Relationship between sequencing rate and the expected number of days until the variant will have been detected with 95% confidence. The small black tick marks on the x-axes in this plot and in b and c show country-specific SARS-CoV-2 sequencing rates for 2022. Vertical dotted lines correspond to the median SARS-CoV-2 sequencing rates for high-income (HIC) and low-income (LIC) countries in 2022. In all panels, lines are colored by values of variant Re, with a distinct scenario of variant emergence for each value of variant Re; sequencing turnaround time was assumed to be 14 days. Vertical grey lines indicate 2 S/M/wk and 30 S/M/wk, respectively. b Relationship between sequencing rate and the reduction in the expected number of days until the variant will have been detected with 95% confidence that results from increasing the sequencing rate on the x-axis by 1 S/M/wk. c Relationship between sequencing rate and the expected number of variant infections by the day the variant will have been detected with 95% confidence. d Relationship between a reduction in turnaround time (in days) and the fold increase in sequencing rate that would be required to effect the same reduction in time to detection if turnaround time was kept constant.
The vast disparities between low- and high-income countries in the return on increases in sequencing rate are particularly prominent when looking at the expected number of variant infections by the variant’s day of detection (Fig. 2c, Supplementary Fig. 6a). Assuming a 14-day turnaround time, increasing the sequencing rate in a country sequencing at the median low-income country rate by 1 S/M/wk would reduce the expected number of variant infections by the time of detection with 95% confidence by ∼4.5 million infections for the scenario of variant emergence described above; in a country sequencing at the median high-income rate, the same 1 S/M/wk increase would only reduce the expected number of variant infections by the day of first detection by ∼60 infections (Fig. 2c, Supplementary Fig. 6b).
In addition to sequencing rate, turnaround time is an essential component of effective genomic surveillance2,8,15,16. For reducing time to variant detection, any reduction in turnaround time is functionally equivalent to a fold increase in sequencing rate (Fig. 2d). Reductions in turnaround time are especially valuable for the detection of variant viruses that are highly transmissible. For example, for the archetypal variant with Re = 2, a three-week reduction in turnaround time is equivalent to an 89.0-fold increase in sequencing rate (Fig. 2d). Hence, the benefits of increasing sequencing output should be carefully weighed against the gains from strengthening the ancillary infrastructure necessary for timely availability of sequencing results.
Given the identified relationship between sequencing rate, turnaround time, and time to detection, we propose a sequencing capacity of 2 S/M/wk with a fortnightly sequencing regularity (i.e., samples are collated in order to be sequenced once every fourteen days) as a potential global target (Fig. 2a, vertical gray line). Its position at the elbow of the relationship between sequencing rate and time to detection (Fig. 2a) suggests that 2 S/M/wk is efficient, and its rapid variant detection even when a highly transmissible variant emerges in the background of high wildtype prevalence suggests that it results in strong performance. We chose a relatively rapid sequencing regularity of fourteen days given the vital importance of turnaround time in shaping time to detection.
Global sequencing capacity improves global surveillance effectiveness and efficiency
Next, we re-simulated the global (variant) virus detection process in the scenario where all countries possessed the proposed global target capacity of at least 2 S/M/wk with fortnightly sequencing regularity. Establishing the target capacity globally while keeping sequencing output unchanged for countries that already satisfied this capacity in 2022 (henceforth, strategy A) reduced mean time to global variant detection by 31.2 days to 51.7 days (95% CI 15–113) relative to the simulated 2022 baseline (red bar, Fig. 3a). The mean number of global variant infections by the day of detection decreased from 652,172 infections (95% CI 79–5,991,409) to 12,697 infections (95% CI 56–77,162) (red bar, Fig. 3b), and the probability that the variant was first detected in its origin continent increased from 70.7 to 97.5% (red cross, Fig. 3c). This demonstrates that advancing global genomic surveillance capacity would yield substantial improvements to the ability to rapidly detect new respiratory viruses or variants. A sequencing rate of 2 S/M/wk corresponds to 0.18% of the maximum country-specific SARS-CoV-2 sequencing rate in 2022. If all countries sequencing at rates lower than 2 S/M/wk in 2022 were to attain this minimum capacity, the de novo generated sequencing capacity would represent 6.0% of global sequencing output in 2022.
a Comparison of time to variant detection for different global strategies for the global distribution of genomic surveillance infrastructure. Each value of variant Re corresponds to a distinct scenario of variant emergence (n = 10,000 replicate simulations for each). Thin and thick lines correspond to 95% and 50% CIs, respectively. Points correspond to means. b The cumulative number of global variant infections by the day of variant detection by strategy, analogous to a. c The probability that the variant is first detected in its origin continent, by strategy. d Comparison of the mean time between the first detection of the variant globally, and the first local within-country infection, by strategy, for individual countries, averaged across values of variant Re (n = 195 for each strategy). Each point corresponds to a country, colored by continent (AF Africa, EU Europe, OC Oceania, AS Asia, NA North America, SA South America). Boxplots show the median, first and third quartiles, and minimum and maximum values.
In contrast, since reductions in time to detection resulting from increases in sequencing rate beyond ∼10 S/M/wk (Fig. 2b) are limited, simulations suggested that reducing sequencing rates in the countries with the highest sequencing rates had little detrimental effect on time to variant detection relative to the 2022 baseline. Setting a 30 S/M/wk upper limit in all countries relative to the 2022 baseline but no minimum requirement (henceforth, strategy B) left the expected time to first global variant detection (green bar, Fig. 3a) and the expected number of variant infections by the day of detection (green bar, Fig. 3b) largely unchanged: mean time to variant detection increased by only 4.5 days, from mean 83.0 days to 87.5 days (95% CI 25–202) (green bar, Fig. 3a), while global sequencing output was reduced by 67.0%. We chose 30 S/M/wk because this sequencing rate lies firmly in the region of diminishing returns on increases in sequencing rate (Fig. 2a, vertical grey line).
Together, these results suggest that the greatest potential for improvements in the capacity to rapidly detect new viruses, by far, lies in those countries where existing capacities are most limited. In our simulations, ensuring a minimum global capacity of 2 S/M/wk, while also setting a 30 S/M/wk upper limit (henceforth, strategy C), improved time to variant detection by weeks while still reducing sequencing output by 61.0% relative to the 2022 pandemic baseline (blue bar, Fig. 3a, Supplementary Fig. 7). Advancing global surveillance capacity could allow for substantial improvements in the ability to rapidly detect new viruses globally, even with considerably fewer total viruses sequenced.
Global genomic surveillance coverage improves opportunities for mitigation globally
To investigate how establishing a global minimum capacity could affect public health preparedness, we computed the mean lead time between the (variant) virus’ first global detection and the first domestic case for all countries under the different strategies. As the first global detection represents a potential starting point for the design and implementation of local public health responses, this lead time between global detection and local arrival provides a measure for individual countries of the time horizon for public health measures that aim to mitigate potential impacts. In all countries, the lead time increased when the global target capacity was implemented (Fig. 3d), potentially allowing for more time to implement public health measures in preparation for variant outbreaks or nascent pandemics.
Strikingly, under more balanced surveillance strategies some countries’ lead time would increase even if their local sequencing rate decreased. For example, for the archetypal variant virus with Re = 1.6, the mean time between first global detection and arrival in the United States (US) was −8.4 days under the 2022 baseline, suggesting that on average, the variant would already be present in the US by the time it was first detected globally. Under strategy C (global capacity + 30 S/M/wk limit), the lead time in the US increased by three weeks to 12.2 days, despite the 30 S/M/wk limit leading to a reduction in the local sequencing rate in the US. This example indicates how advancing global surveillance capacity could yield public health benefits that are fundamentally unattainable through expansions of local surveillance in countries that already possess strong capacity. The increases in lead time were greater for lower values of Re; for example, for the archetypal variant with Re = 1.3, the mean lead time increased from +115.0 to +150.3 days in Rwanda, +82.3 to +117.7 days in Kazakhstan, +47.1 to +83.5 days in Indonesia, and +21.9 to +57.3 days in the United Kingdom, for strategy C relative to the 2022 baseline.
Local genomic surveillance outperforms travel-based genomic surveillance strategies
Our analyses suggest a sequencing capacity of 2 S/M/wk at fortnightly regularity could yield strong performance for the detection of novel viruses and variants. However, in some countries substantial hurdles might impede the establishment of local capacity. This raises questions about the possibility of alternative modalities of global surveillance, such as configurations in which countries with higher sequencing rates act as ‘sentinels’ for countries that lack sufficient local capacity. Motivated by recent interest in travel hub-based surveillance strategies17,18,19,20, we simulated a scenario of airport surveillance, where regular testing and sequencing at international transport hubs could facilitate detection of variant viruses emerging in locations with highly limited local capacity. We derived a mathematical model for the time until the variant would be detected after a variant-infected person travels to and is sampled in a sentinel country, given the variant’s daily epidemic growth rate r, the daily rate p of travel from the country of emergence to the sentinel country, and the ascertainment rate \(\alpha\) that reflects which proportion of infections would be detected in the sentinel country. We used this model to compute the time until a variant emerging in each country with a sequencing rate <0.1 S/M/wk in 2022 would be detected if the country with a 2022 sequencing rate >10 S/M/wk to which the former country was best connected acted as a sentinel by performing routine surveillance of incoming travelers. We used the travel rates from the mobility data to inform p for each emergence-sentinel country pair.
For the archetypal scenario with variant Re = 2, the expected time until the variant would have been detected in the sentinel country with 95% confidence was 71.9 days (IQR 61.8–82.0) for \(\alpha\) = 0.1 and 85.2 days (IQR 75.1–95.3) for \(\alpha\) = 0.01 across the countries with 2022 sequencing rate <0.1 S/M/wk. In contrast, under local sequencing with fortnightly regularity in the country of origin at 0.2 S/M/wk and 0.5 S/M/wk, respectively, the expected time to detection with 95% confidence was 64 and 59 days (Supplementary Fig. 9). This indicates that for this scenario of variant emergence, local capacity with rapid turnaround even at a relatively low sequencing rate would result in faster detection than a travel-based surveillance strategy. The differences are especially pronounced at lower values of variant Re: for the archetypal variant with Re = 1.3, the expected time to detection through travel surveillance was 207.4 days (IQR 178.3–236.6) for \(\alpha\) = 0.1 and 246.0 days (IQR 216.8–275.1) for \(\alpha\) = 0.01. In contrast, expected detection times at 95% confidence under local surveillance in the origin country were only 164 and 150 days, respectively, for 0.2 and 0.5 S/M/wk. Across simulated scenarios of variant emergence, we found that the variant viruses would generally be detected through local surveillance sooner than or close in time to detection at the travel hub, even if local surveillance was performed at rates much lower than the proposed capacity of 2 S/M/wk (Supplementary Fig. 9). This indicates that travel-based surveillance strategies come with substantial penalties to surveillance effectiveness if implemented in lieu of local sequencing capacity when local capacity, even at highly limited sequencing rates, is itself realistically attainable.
Initial detection is a necessary starting point for responses to newly emerging respiratory viruses or their variants. However, additional information beyond simple detection is often necessary to characterize the public health risk that a (variant) virus poses. For example, the SARS-CoV-2 Alpha variant was first detected in the UK in a sample collected on 20 September 2020, likely within days of its initial emergence21. However, it was not until December 2020 that epidemiological evidence of the variant’s transmission advantage relative to pre-existing viruses began to accumulate21,22. To that end, we also investigated how the global distribution of sequencing output affects the time elapsed until the variant would have been estimated to account for a substantial proportion of circulating virus, suggestive of a potential transmission advantage. In our simulations, the time until estimated variant frequencies, in at least one country, provided evidence with 95% confidence that the variant had reached 1% circulating frequency, decreased from 112.5 days (95% CI 38–247) for the 2022 baseline to 95.5 days (95% CI 29–197) for strategy C (Supplementary Fig. 10a). Correspondingly, the mean number of global infections by that day decreased from 1,567,276 (95% CI 2992–16,229,752) to 64,704 (95% CI 3085–420,032) (Supplementary Fig. 10b). In contrast, capping sequencing rates at 30 S/M/wk (strategy B) increased the mean time until the variant was established to have reached 1% circulating frequency somewhere globally by only 0.15 days relative to the 2022 baseline (Supplementary Fig. 10a). Mathematical models indicate that a sequencing capacity of 2 S/M/wk would ensure robust ascertainment of variant prevalence (Supplementary Fig. 11).
Discussion
Our results indicate that advancing global genomic surveillance by instituting a sustainable global sequencing capacity could strongly improve preparedness for potential future respiratory virus threats, reducing the expected time to detection of new viruses by weeks to months. Initial detection and sequencing are necessary first steps in assessing and responding to the threat posed by novel viruses and underlie the design and deployment of countermeasures such as vaccines, diagnostics, and therapeutics5,23. By detecting novel viruses sooner, thereby increasing the time horizon for global and local public health measures that aim to mitigate viral threats’ potential impacts, robust global surveillance capacity could yield improvements to health security for all countries globally. Importantly, siloed surveillance efforts in countries that already possess strong capacity cannot yield the same benefits; rather, enhanced basic global capacity is the key to improving respiratory virus outbreak preparedness through genomic surveillance prior to and during potential future pandemic scenarios.
Our analyses are primarily focused on detecting novel (variant) viruses and tracking their spread. Hence, our arguments weighing high-intensity local surveillance capacity against the development of basic global capacity do not consider ancillary benefits of high-intensity genomic surveillance in high-income settings, such as characterization of local transmission dynamics. However, we stress the fundamental immediate importance of basic global capacity for initial detection, such as in the context of vaccine development and deployment, where speed-ups of a few weeks could have substantial public health impacts globally24. Nevertheless, further information beyond simple detection is generally required to characterize the threat posed by a novel virus. The need for epidemiological signal of enhanced transmission to accumulate likely imposes a further limit on the utility of sequencing at extremely high rates for purpose of detection of novel viruses. This is in addition to the mathematically quantified diminishing returns on increases in sequencing rate, where the reductions in time to detection per S/M/wk added become very small at higher sequencing rates.
We note that given the proposed capacity, the optimal sequencing rate and its balance with turnaround time depends on the characteristics of the pathogen, the epidemiological background in which the variant was to emerge, and the required timeliness of sequencing data for public health action; for example, when levels of wildtype respiratory virus circulation are very low, relatively fewer sequences might yield a better balance between surveillance performance and resource use. The proposed target capacity aims to balance the resources necessary for surveillance in periods of seasonal circulation of respiratory pathogens with the capacity to rapidly detect and scale up capacity during potential pandemic scenarios. The model presented in this study simulates passive genomic surveillance for detection of novel viruses, and we use this detection process to generate a proposed capacity. As such, our results do not necessarily apply to alternative detection processes. For example, novel viruses or variants could also be detected through reactionary sequencing, e.g., upon clinical or epidemiological signal of a novel threat. While such detection processes are not explicitly modeled in our study, the fundamental principle holds that robust global genomic sequencing capacity is necessary to ensure rapid detection. While our study is focused on respiratory viruses and their variants, efficient implementation of genomics will likely leverage multi-pathogen approaches25. Similarly, while our modeling results provide a principled target that balances resource use and performance, the optimal design, including the balance of sequencing rate and turnaround time, will likely differ from country to country depending on local constraints and priorities25,26. Our study informs effective and efficient passive surveillance of respiratory pathogens, but this provides only part of the information required to design robust surveillance systems tailored to national needs.
The crucial role of air travel in driving the global dissemination of respiratory pathogens has led to the proposition of alternative modalities of genomic surveillance that leverage hubs of movement in the global transportation network to detect novel threats, e.g., through wastewater surveillance at airports17,18,19. Our results suggest that even at low sequencing rates (e.g, 0.2 S/M/wk), local genomic surveillance with rapid turnaround will generally outperform or have similar performance to travel-based surveillance strategies where well-connected and well-resources localities act as surveillance proxies for countries with limited local capacity, sampling incoming travelers with high intensity. Crucially, the need to wait for exported cases represents an insuperable limit to the performance of travel-based strategies. A further advantage of local capacity over travel hub-based surveillance strategies is that local genomic surveillance capacity can be integrated with further epidemiological and clinical signal available locally to characterize the threat posed. Even if a novel virus was detected at a travel hub, surveillance infrastructure in the virus’ country of origin would likely still be necessary to acquire sufficient information to inform public health responses. We note that if substantial barriers preclude the development of endogenous sequencing capacity, referral of samples to an external sequencing partner is a potential alternative16. Overall, we argue that establishing sustainable local surveillance capacity, even at sequencing rates much lower than the proposed global capacity, would spur improvements to local public health infrastructure that yield better prospects for responses to novel threats than reliance on genomic surveillance strategies centered around global transportation hubs. As such, while travel-based surveillance strategies are appealing due to their ability to leverage global connectivity for detection, an exclusive focus on such strategies could incur significant opportunity costs if it comes at the expense of development of local capacity globally.
In our model we assumed representative sampling in the genomic surveillance process, including the ready availability and access to diagnostic tools, which does not always hold in reality27,28. As the departure from this assumption is especially strong in resource-constrained settings15,27, the reported reductions in time to variant detection resulting from the establishment of a global minimum sequencing capacity are likely underestimated. Furthermore, our global simulations assumed the probability that a variant emerges in a particular country is proportional to its population size. We note that in reality, this simplifying assumption will not necessarily hold–in fact, it is likely that novel viruses are most likely to emerge in those regions that are most poorly surveilled29. Crucially, this suggests that the benefits of advancing global capacity are even larger than estimated in our study. We did not model the spatial distribution of the minimum capacity in the country. The spatial distribution of sequencing capacity and the structure of sample referral networks interplays with turnaround time to shape surveillance performance and affects the optimal country-level implementation of surveillance networks25. Our results underscore the importance of turnaround time in shaping the effectiveness and public health utility of surveillance efforts15,16, and particularly its balance with sequencing rate. Our model is not applicable to (variant) viruses with no or a detrimental effect on transmissibility, such as those that only result in increased disease severity or reduced sensitivity of diagnostics.
Our study provides quantitative evidence of how advancing global surveillance capacity provides a rational basis for improving preparedness but does not prescribe how this should be effectuated. Importantly, international solidarity represents a potential guiding principle. Solidarity gives guidance to human action in the face of interdependency related to the shared risks of communicable disease, and underlies the obligations reflected in the International Health Regulations (IHR)30,31; as a principle, solidarity specifically underlies institutionalized forms of sharing as a result of mutual dependence32,33. Internationally solidaristic efforts at the multilateral or regional level could play an essential role if barriers preclude countries’ independent establishment of sequencing capacity. Specifically, key areas where enhanced international cooperation could play a role in addressing barriers to the implementation of genomic capacity include the training and retention of skilled workforce and procurement and supply chains for sequencing equipment and reagents15,25,26,34,35,36,37,38,39. To achieve the long-term advancement of global genomics capacity, coherent capacity-building is necessary, which requires sustainable, diversified financing which minimizes dependence on a single funding source while aligning well with national needs35,36.
While solidarity is a helpful principle to guide policy in the case of an interdependence of risks given a particular context, reaching health equity or health justice will require overcoming substantial challenges40,41,42,43,44,45. Even with the capacity to detect new viruses sooner, the capacity to respond is also distributed asymmetrically. Even if the proposed models help undergird global health solidarity, the benefits of more rapidly available vaccines or better-matched vaccine updates due to timelier detection will only extend to countries with access to these benefits40,41,42,46. Furthermore, rapidly detecting and sharing information concerning new (variant) viruses must not paradoxically disadvantage countries that do so. Open sharing of pathogen genomic data must operate within a system of fair access and benefit-sharing to achieve its intended public health purpose without exacerbating global health inequity43,46,47. We note that critically considering these issues is essential, too, for the design of intrinsically international surveillance systems, such as a travel hub-based surveillance network.
The COVID-19 led to an unprecedented expansion of sequencing capacity globally. Some of the most consequential gains were made in resource-limited settings15,16, and it is essential that such gains are maintained and where necessary expanded to maximize preparedness for future threats48. Our results suggest that enhancing global genomic surveillance infrastructure offers a path toward improved responses to respiratory virus threats, even for countries with strong existing national surveillance capacities. Given that pandemic risk is globally shared, a global outlook on pandemic preparedness is essential to improving global and local health security.
Methods
Sequence metadata analysis
We downloaded metadata corresponding to all SARS-CoV-2 genomes in the GISAID7 database with collection date from January 1st to December 31st 2022 and submission date before July 1st 2023 (n = 6,914,601) (Supplementary Data 1). For each country with at least one sequence in the dataset, we computed the weekly sequencing rate by dividing the number of viruses sampled in that country by 52 and the country’s population size in millions, yielding a sequencing rate in units of sequences per million people per week (S/M/wk). Population sizes for July 1st 2022 were extracted from World Population Review (https://worldpopulationreview.com/) For each sequence, we computed the turnaround time from the number of days between sample collection and submission in GISAID. We extracted income classifications for each country for fiscal year 2024 from the World Bank (https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups).
Variant epidemic simulations
In all analyses, we assumed that a variant virus emerges in the context of circulating wildtype virus. In our simulations, both variant and wildtype epidemiological dynamics are described by a susceptible-infected-recovered (SIR) compartmental model with infectious period 1/γ equal to 5 days for both viruses, with no interactions between genotypes. We simulated variant epidemics under a range of values of variant Re at time of introduction (variant Re = 1.2, 1.3, 1.6, and 2). In the main text, we assumed a different scenario of variant emergence for each value of variant Re, characterized by a wildtype (wt) Re at time of variant introduction and a wildtype prevalence at time of variant introduction (variant Re = 1.2: wt Re = 1, wt prevalence = 0.1%; variant Re = 1.3: wt Re = 1.05, wt prevalence = 0.2%; variant Re = 1.6: wt Re = 1.1, wt prevalence = 0.5%; variant Re = 2: wt Re = 1, wt prevalence = 2%). These scenarios were chosen such that circulation dynamics of wildtype and variant were comparable (e.g., the emergence of a highly transmissible variant in the background of high wildtype prevalence). In the Supplementary Figures, we show the same analyses for all combinations of variant Re and scenario of variant emergence (e.g., a variant with Re = 2 with wildtype dynamics corresponding to the scenario for variant Re = 1.2 (wt Re = 1, wt prevalence = 0.1%)). Epidemic dynamics for each scenario in the main text are shown in Supplementary Fig. 3. While we report results based on variant Re, we note that results are primarily dependent on the logistic growth rate of the variant proportion and the initial frequency of variant relative to wildtype (see section ‘Mathematical model’ below). In this context, the scenario of variant Re = 2 with wt Re = 1 is, for example, functionally equivalent to a scenario with variant Re = 2.5 and wt Re = 1.5. Similarly, the first scenario of variant Re = 1.2 and wt Re = 1 is functionally equivalent to a scenario of variant Re = 1.6 and wt Re = 1.3, approximating a scenario of emergence of a A/H1N1pdm09 pandemic-like virus early in a seasonal influenza epidemic.
Metapopulation model
We used a metapopulation model that couples local SIR dynamics within each index country with global migration to simulate the global spread of a variant. Given a rate of movement \({w}_{{nm}}\) from population m to n, the expected number of variant-infected (\({I}_{n}\)) and variant-susceptible (\({S}_{n}\)) people in population n with population size \({N}_{n}\) for a variant with transmission rate β and recovery rate γ, is described by
This model is the basis of the model used by Brockmann et al.49 to fit empirical arrival times for multiple respiratory viruses to global air transportation data. We used the estimated pairwise number of trips between all countries from the Global Transnational Mobility (GTM) Dataset50 to inform \({w}_{{nm}}\). This dataset combines a tourism dataset from the World Tourism Organization and an origin-destination dataset corresponding to global air travel data. Previous work has validated the GTM dataset against the world airline network51, which was shown to reproduce observed dynamics of global pathogen spread49. Specifically, for any two countries n and m we computed wnm by dividing the number of trips from country m to n in the year 2016 by the population size of country m and by 365. For each value of variant Re, we performed 10,000 independent simulations of the metapopulation model, assuming that the probability a variant virus would emerge in a particular country is proportional to the country’s relative population size (simulations initialized in Africa: n = 1793; Asia: n = 5946, Europe: n = 934; North America: n = 739; Oceania: n = 54; South America: n = 534). We integrated the model forward in time at a daily timescale using a tau-leap algorithm, which also furnishes the epidemic dynamics and global spread with stochasticity. Each simulation was initialized with an infected population of 10 individuals. Analyses were performed in R v4.1.0.
We validated the metapopulation model by comparing arrival times against those that were independently estimated using GLEAM13, a separate metapopulation model that incorporates commuting but which relies on different underlying data. Given an epidemic origin location, we simulated ten epidemic instances using the metapopulation model, each initialized with ten infected individuals, and we simulated ten instances using GLEAM, where we implemented the same SIR model. In the GLEAM simulations, we assumed 100% of airline traffic, no seasonality, and a gravity commuting model with 8 hours spent at the commuting destinations. For each country, we computed the first day on which median cumulative incidence across simulations exceeded 0.01 per 1000 individuals for both model implementations. We performed these simulations for ten countries (Cameroon, Ecuador, France, Jamaica, Malaysia, Mali, Nepal, Nicaragua, Oman, and Uzbekistan) with the GLEAM model initialized in each country’s capital city. For all ten origin locations, we found a strong concordance between arrival times (r = 0.89 overall) estimated using the metapopulation model and GLEAM (Supplementary Fig. 2). This provides support for the use of the metapopulation model. Simulations were performed using GLEAMviz 7.2.
Global genomic surveillance simulations
We performed the genomic surveillance simulations using empirical turnaround times and sampling rates for each country, using data for 2022. For each sequence in GISAID, we computed the time T between the sample’s collection date and submission date. For each country c, the turnaround-time specific sequencing rate in unit of sequences per day nx,c, for each value of turnaround time x in days, was equal to the country’s total sequencing rate in sequences per day multiplied by the proportion of sequences from that country with T = x.
For each country in each simulation, starting from the first day on which the number of new variant infections exceeded 10 onwards, we deterministically simulated the wildtype epidemic dynamics. For each value of variant Re, we assumed a scenario of variant emergence (characterized by a wildtype prevalence and wildtype Re) as described above in the main text. In the Supplementary Figures, we show the same analyses for all combinations of variant Re and scenario of variant emergence. Until the first day on which the number of variant infections exceeded 10, wildtype incidence was assumed to be equal to wildtype incidence on the first day of the simulated wildtype epidemic, to account for the stochasticity observed when the number of infections was small and the potential for stochastic variant extinction.
Using the simulated variant and wildtype incidence on each day, we computed the variant proportion through time f(t). For each country c, on each day t, we used the simulated country-specific variant proportion fc(t) to simulate genomic surveillance: For each value of turnaround time x, we assumed that total sample count ñx,c ∼ Poisson(nx,c) and simulated the total number of variant samples Vc(t) \(=\,{\sum }_{x=0}^{t}{v}_{x,c}\), with \({v}_{x,c}\) ∼ Binomial(ñx,c, fc(t-x)). In each of 10,000 replicate simulations, and for each strategy for the global distribution of surveillance infrastructure (see next section), we computed the detection day as the first day t on which Vc(t) was at least one in at least one country c. We defined the detection country as the first country for which this held.
To investigate the time until the variant could be said to account for a substantial proportion of circulating virus in at least one country, we used the simulated weekly sequence counts to compute, for each country, if there was any week in the past in which the variant accounted for at least a proportion π of all samples collected that week with 95% confidence given a one-tailed binomial test for proportions. We performed this analysis on a weekly basis for each country, and the day on which the p-value for this binomial test declined below 0.05 in at least country, for any week in the past, was defined as the day the variant was established to account for a substantial proportion of circulating virus in at least one country globally. We chose π to be 1% for all countries with a population of 100 million individuals or fewer. We ensured a more flexible threshold for countries with a population larger than 100 million. For these countries, the threshold decreased proportionally as the population size increased, e.g. using a threshold of 0.5% for a population of 200 million and a threshold of 0.1% in a population of 1 billion.
Global surveillance strategies
We investigated five strategies for the global distribution of sequencing infrastructure:
Strategy 2022: the 2022 baseline. For each country, turnaround time-specific sequencing rates were extracted from GISAID metadata.
Strategy A: the 2022 baseline + a global minimum sequencing capacity of 2 S/M/wk with fortnightly regularity in each country. If a country already satisfied this requirement (i.e., the sum of turnaround time-specific sequencing rates with turnaround time ≤14 days was equal to or greater than 2 S/M/wk), its sequencing rates were unchanged relative to the 2022 baseline. If a country did not satisfy this requirement, the deficit in S/M/wk with turnaround time ≤14 days was addressed by additionally simulating daily sampling with sequencing rate given by the deficit. For the sequencing capacity that was added, we simulated fortnightly sequencing such that once every two weeks, the presence of variant samples in the samples collected in the two weeks prior was assessed. On these days the test for the variant proportion exceeding 1%, as described above, was also performed.
Strategy B: equivalent to the 2022 baseline, but individual countries’ sequencing output capped at 30 S/M/wk. Countries that sequenced at rates exceeding 30 S/M/wk had their sequencing output capped by dividing sequencing rate uniformly across all values of turnaround time such that total output across all values of turnaround time was equal to 30 S/M/wk.
Strategy C: a combination of strategies A and B. In countries that, after capping according to strategy B, did not satisfy the minimum sequencing rate of 2 S/M/wk with fortnightly regularity, this minimum was ensured analogous to Strategy A.
Mathematical model
For the single-country analyses presented in Fig. 2, we assumed a population of 100 million and turnaround time of two weeks. We deterministically simulated variant and wildtype epidemics, starting with one variant-infected individual, and computed the variant proportion f(t) through time. For each sequencing rate and given f(t), we computed the expected day of detection with 95% confidence as the day on which the probability that zero wildtype sequences would have been binomially sampled up to and including that day declined below 0.05. On each day, the total number of samples to sequence was assumed to be a Poisson-valued random variable with rate given by the sequencing rate. For each sequencing rate, the day of detection was computed as the median across 100 replicates. To compute the equivalent fold increase in sequencing rate for each reduction in turnaround time, we computed the slope of a linear model that relates the logarithm of the sequencing rate to the simulated day of detection for 1 < n < 100 S/M/wk. In Supplementary Figs. 5 and 6 we show the analyses of time to detection for all combinations of variant Re and scenario of variant emergence (e.g., a variant with Re = 2 with wildtype dynamics corresponding to the scenario for variant Re = 1.2 (wt Re = 1, wt prevalence = 0.1%)).
To mathematically model time to variant detection, we assumed that the variant frequency follows a logistic growth function, where the proportion f(t) of all new infections at time t that is attributable to the variant follows:
Here, s is the logistic growth rate that defines the speed at which the variant displaces the wildtype and f0 represents the initial variant frequency. The dynamics of logistic growth of variant proportion characterized the sequential replacement of variants during the COVID-19 pandemic. Assuming no interactions between genotypes, the value of s is equal to the difference of variant and wildtype exponential growth rates22. In reality, s is governed by factors, such as pre-existing immunity in the population and differences in epidemiological characteristics of variant and wildtype, such as their generation interval. Nevertheless, the derived relationship relies solely on the value of s, and hence is agnostic to the precise epidemiological characteristics of wildtype and variant. Given these dynamics, we derived a relationship between the number of viruses to sequence per unit time n and the expected time until the variant is detected. Beginning with the binomial probability that variant is detected at or before time step \(\tau\):
we derived an expression for \(\tau\):
Using the Volterra product integral:
Integrating \(f\left(t\right)\):
We can then rewrite:
Let \(q=1 - P\left(t \le \tau \right)\) which is the probability that the variant will not be detected before or during time step \(\tau\).
This equation yields, given s, n, f0, and q, the day τ on which the variant will have been detected at least once with confidence level 1 – q. This equation is valid when the timescales of detection are smaller than the timescales at which the logistic growth dynamics do not hold. For example, in extreme scenarios of a very high wildtype Re, a small variant transmission advantage and a low sequencing rate, the timescale of variant detection is beyond that of depletion of the susceptible population and the assumptions of the equation are not satisfied. We compared the predicted time to detection at 95% confidence for sequencing rates n ranging from 0.1 to 1000 S/M/wk as computed using epidemic simulations (Fig. 2a in main text) to predicted time to detection using only Eq. 1. In computing time to detection using the equation, we used the empirical value of f0 from the epidemic simulations as input, with q = 0.05. We used the theoretical value of s, computed as (variant Re - wildtype Re) / 5. We performed this simulation for all four scenarios of variant emergence (each corresponding to a different initial wildtype Re and wildtype prevalence) and all four values of variant Re. As seen in Supplementary Fig. 4, there was high correspondence between the time to detection from the explicit epidemic simulations and Eq. 1 when the variant Re was high and/or wildtype prevalence was low. In contrast, when variant Re was low and wildtype prevalence was high, susceptible depletion would occur before the timescale at which the variant would be detected, and the time to detection as predicted using the equation would deviate from the simulated time to detection. We note that, for combinations of initial variant proportion and variant proportion logistic growth rate not explicitly discussed in this study, the mathematical model can be used to compute the expected time to variant detection.
Variant prevalence estimation
In addition to variant detection, we investigated the relationship between sequencing rates and the accuracy with which the spread dynamics of the variant can be tracked following its detection. Specifically, we investigated the accuracy with which the weekly proportion of new infections that is attributable to the variant can be estimated, and how this accuracy depends on sequencing rate. Mathematically, assuming a small, finite population \(N\) was infected at prevalence \(\rho\) and samples were collected from fraction \(s\) of infected individuals during each week, the potential (finite) number of samples that could be sampled from for sequencing is \(N\rho s\).
Suppose the true circulating proportion is \(p\) and \(n\) (i.e., \(n < N\rho s\)) number of samples were sequenced, the number of variant sequences (\(X\)) follows a hypergeometric distribution with mean and variance:
The variance of the variant proportion \(\hat{p}\) (\(=X/n\)) showing up in the sequences is:
By the central limit theorem, \(\frac{\hat{p}-p}{\sqrt{\frac{p(1-p)}{n}\left(\frac{\rho {sN}-n}{\rho {sN}-1}\right)}}\) follows an approximate Normal distribution.
As such, at 95% (\(\alpha=5\%\)) confidence, the error (\(\epsilon\)) around the true variant proportion is:
For a sequencing rate of \(r\) sequences per million persons per week (hence \(n=\frac{{rN}}{c}\) where \(c={10}^{6}\)):
If \(N\rho s\) is sufficiently large (i.e. \(N\rho s\gg n=\frac{{rN}}{c}\to c\rho s\gg r\)),
In Supplementary Fig. 11, we visualized the relationship between sequencing rate and the error in the estimated variant proportion, for different population sizes, true variant proportions, and values of \(\alpha\).
Travel surveillance
To compute the expected time to detection based on travel surveillance, we assumed that the population prevalence of the variant P(t) follows \(P\left(t\right)=\,{P}_{0}{e}^{{rt}}\), where \({P}_{0}\) is initial number of variant infections and r is the daily exponential growth rate. The probability that no variant-infected individuals from an origin country of population size N would travel to the sentinel country on day t and be sampled for sequencing during the emergence of the variant (i.e., P(t)/N is small) is approximated as:
where p is the per capita daily rate of travel from the origin country to the sentinel country. Here, α is an ascertainment rate that reflects the probability that an infected traveler will be sampled and sequenced. This accounts for possible reduced propensity to travel due to illness, asymptomatic illness precluding detection, infected travelers not seeking medical care or testing, and limits on testing and sequencing capacity resulting in only a proportion of infected travelers being sampled and sequenced. This model is also applicable to wastewater surveillance: then, α depends on airplane toilet use20, viral shedding, and the per-flight probability of sampling. The probability that there has been at least one sequenced variant-infected traveler by day T is equal to \(Q\left(t\le T\right)=\,1-{e}^{-\alpha p{P}_{0}{\sum }_{t=0}^{T}{e}^{{rt}}}\). Let \(q=1 - Q\left(t \le T\right)\) be the probability that the variant will not be detected before or during time step \(T\). We solve for T in
Taking \(\sum\limits_{t=1}^{T}{e}^{{rt}}=\,\frac{{e}^{{rT}}-1}{{e}^{r}-1}\), yields
This equation yields the expected day on which an individual infected with the variant will have traveled from the origin country to the sentinel country and been detected with confidence level given by q.
Sensitivity analyses
In our analyses, we defined a sequence’s turnaround time as the time between the sequence’s collection date and its submission date on GISAID. This represents the most accurate measure of turnaround time available and has been used in previous analyses of global sequencing output8. Nevertheless, a potential issue with this definition of turnaround time is a lag between acquiring the sequence and its submission to GISAID, which is not reflected in these estimates52; in some cases, sequence analysis might have been performed but the sequence would only later be deposited in GISAID. To establish the sensitivity of our global simulation results to such delays in upload to GISAID, we re-simulated our global metapopulation genomic surveillance simulations, where we assumed that the day the sequence was acquired was somewhere between the sample’s collection date and date of submission to GISAID. Specifically, for each sequence, we computed the modified turnaround time as ϕ(tsubmission – tcollection), for 0 <ϕ < 1. We re-simulated the genomic surveillance simulation results as presented in Fig. 3 for ϕ = 0.25, 0.5. Varying ϕ modifies the (country-specific) turnaround time-specific sequencing rates used in the global genomic surveillance system. Results for different values of ϕ are presented in Supplementary Fig. 8a. For all values of ϕ tested, we find that the conclusion holds that more solidaristic strategies for the global distribution of respiratory virus surveillance infrastructure (strategies A and C) offer strongly reduced time to variant detection. Hence, our results are robust to biases resulting from deviations from the assumption that the submission date represents the date on which the sample is available. Importantly, the observed consistency between a country’s sequencing rate and its median turnaround time (Spearman’s ρ = −0.60, P < 0.001) suggests that a country’s distribution of turnaround times as computed from GISAID yields a representative picture of a country’s true capacity to rapidly sequence a virus after sample collection.
In our model, we estimated the mobility rate wnm for countries m and n by dividing the number of trips from m to n in 2016 in the GTM by the population in m. This assumes that all members of the population participate in disease-relevant spread. In reality, this will not be the case. However, for the results of our study, these differences are likely to be of little consequence. Specifically, a lower effective mobility rate would further increase the reduction in time to detection that would result from the establishment of minimum sequencing infrastructure globally, as the time until a variant that emerges in a low-sequencing rate environment is exported to a high-sequencing rate environment would increase. We explicitly investigated the potential effects of misspecification of the mobility matrix on our results by multiplying and dividing the mobility rate matrix by three, representing substantially increased and reduce spread, respectively (Supplementary Fig. 8b). A reduced rate would further increase the gains to be effected by the establishment of sequencing infrastructure globally. Even if the mobility rate was increased three-fold, the strategies with increased solidarity in the global distribution of genomic surveillance infrastructure yield strongly improved performance compared to the 2022 baseline. Hence, our results are robust to specifics of the mobility dynamics.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Data availability
Country population sizes are available from World Population Review (https://worldpopulationreview.com/). Data on income classification is available from the World Bank (https://datahelpdesk.worldbank.org/knowledgebase/articles/906519-world-bank-country-and-lending-groups). The Global Transnational Mobility Dataset is available from the Global Mobilities Project (https://migrationpolicycentre.eu/global-mobility/global-transnational-mobility-dataset/). Metadata on global SARS-CoV-2 and seasonal influenza virus sequencing rates were extracted from GISAID. GISAID Acknowledgement Table can be found in Supplementary Data 1 and via https://doi.org/10.55876/gis8.250501ot. The global epidemic simulation output generated in this study has been deposited at Zenodo (https://zenodo.org/records/10051237). Processed simulation output can be found at https://github.com/AMC-LAEB/genomic_surveillance_solidarity and https://zenodo.org/records/17535147.
Code availability
Custom code used to generate the results in this study is available at https://github.com/AMC-LAEB/genomic_surveillance_solidarity and https://zenodo.org/records/17535147.
References
Viana, R. et al. Rapid epidemic expansion of the SARS-CoV-2 Omicron variant in southern Africa. Nature 603, 679–686 (2022).
Ladner, J. T. & Sahl, J. W. Towards a post-pandemic future for global pathogen genome sequencing. PLoS Biol. 21, e3002225 (2023).
Hill, V. et al. Toward a global virus genomic surveillance network. Cell Host Microbe 31, 861–873 (2023).
Wu, F. et al. A new coronavirus associated with human respiratory disease in China. Nature 579, 265–269 (2020).
Krammer, F. SARS-CoV-2 vaccines in development. Nature 586, 516–527 (2020).
Dawood, F. S. et al. Emergence of a novel swine-origin influenza A (H1N1) virus in humans. N. Engl. J. Med. 360, 2605–2615 (2009).
Shu, Y. & McCauley, J. GISAID: global initiative on sharing all influenza data — from vision to reality. Euro Surveill. 22, 30494 (2017).
Brito, A. F. et al. Global disparities in SARS-CoV-2 genomic surveillance. Nat. Commun. 13, 7003 (2022).
Wohl, S., Lee, E. C., DiPrete, B. L. & Lessler, J. Sample size calculations for pathogen variant surveillance in the presence of biological and systematic biases. Cell Rep. Med. 4, 101022 (2023).
Hill, V., Ruis, C., Bajaj, S., Pybus, O. G. & Kraemer, M. U. G. Progress and challenges in virus genomic epidemiology. Trends Parasitol. 37, 1038–1049 (2021).
Méder, Z. Z. & Somogyi, R. Optimal capacity sharing for global genomic surveillance. Epidemics 43, 100690 (2023).
Balcan, D. et al. Seasonal transmission potential and activity peaks of the new influenza A(H1N1): a Monte Carlo likelihood analysis based on human mobility. BMC Med. 7, 45 (2009).
Broeck, W. et al. The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale. BMC Infect. Dis. 11, 37 (2011).
Boyle, L. et al. Selective sweeps in SARS-CoV-2 variant competition. Proc. Natl. Acad. Sci. USA 119, e2213879119 (2022).
Wilkinson, E. et al. A year of genomic surveillance reveals how the SARS-CoV-2 pandemic unfolded in Africa. Science 374, 423–431 (2021).
Tegally, H. et al. The evolving SARS-CoV-2 epidemic in Africa: insights from rapidly expanding genomic surveillance. Science 378, eabq5358 (2023).
St-Onge, G. et al. Pandemic monitoring with global aircraft-based wastewater surveillance networks. Nat. Med. 31, 788–796 (2025).
Li, J. et al. A global aircraft-based wastewater genomic surveillance network for early warning of future pandemics. Lancet Glob. Health 11, e791–e795 (2023).
Goldberg, Z., Linder, A. G., Miller, L. N. & Sorrell, E. M. Wastewater collection and sequencing as a proactive approach to utilizing threat agnostic biological defense. Health Security 22, 11–15 (2023).
Jones, D. L. et al. Suitability of aircraft wastewater for pathogen detection and public health surveillance. Sci. Total Environ. 856, 159162 (2023).
Hill, V. et al. The origins and molecular evolution of SARS-CoV-2 lineage B.1.1.7 in the UK. Virus Evol. 8, veac080 (2022).
Davies, N. G. et al. Estimated transmissibility and impact of SARS-CoV-2 lineage B.1.1.7 in England. Science 372, eabg3055 (2021).
Anderson, A. S. A lightspeed approach to pandemic drug development. Nat. Med. 28, 1538 (2022).
Newland, M. et al. Improving pandemic preparedness through better, faster influenza vaccines. Expert Rev. Vaccines 20, 235–242 (2021).
Khoo, Y. K. et al. National investment case development for pathogen genomics. Cell Genomics 5, 100781 (2025).
Pronyk, P. M. et al. Advancing pathogen genomics in resource-limited settings. Cell Genomics 3, 100443 (2023).
Han, A. X. et al. SARS-CoV-2 diagnostic testing rates determine the sensitivity of genomic surveillance programs. Nat. Genet. 55, 26–33 (2023).
Salyer, S. J. et al. The first and second waves of the COVID−19 pandemic in Africa: a cross-sectional study. Lancet 397, 1265–1275 (2021).
Jones, K. E. et al. Global trends in emerging infectious diseases. Nature 451, 990–993 (2008).
Taylor, A. L. et al. Solidarity in the wake of COVID−19: reimagining the International Health Regulations. Lancet 396, 82–83 (2020).
Toebes, B., Forman, L. & Bartolini, G. Toward human rights-consistent responses to health emergencies: what is the overlap between core right to health obligations and core international health regulation capacities? Health Hum. Rights 22, 99–111 (2020).
Sangiovanni, A. & Viehoff, J. Solidarity in social and political philosophy. In The Stanford Encyclopedia of Philosophy (Stanford University, 2023).
Prainsack, B.; Buyx, A. Solidarity: reflections on an emerging concept in bioethics. Nuffield Council Bioethics 17, 331 (2011).
Onywera, H. et al. Boosting pathogen genomics and bioinformatics workforce in Africa. Lancet Infect. Dis. 24, e106–e112 (2024).
Omotoso, O. E. et al. Bridging the genomic data gap in Africa: implications for global disease burdens. Glob. Health 18, 103 (2022).
Olono, A. et al. Building genomic capacity for precision health in Africa. Nat. Med. 30, 1856–1864 (2024).
Sahadeo, N. S. D. et al. Implementation of genomic surveillance of SARS-CoV-2 in the Caribbean: lessons learned for sustainability in resource-limited settings. PLoS Glob. Public Health 3, e0001455 (2023).
Getchell, M. et al. Pathogen genomic surveillance status among lower resource settings in Asia. Nat. Microbiol. 9, 2738–2747 (2024).
de Oliveira, T. & Baxter, C. Investing in Africa’s scientific future. Science 383, eadn4168 (2025).
Lavery, J. V., Porter, R. M. & Addiss, D. G. Cascading failures in COVID−19 vaccine equity. Science 380, 460–462 (2023).
Wouters, O. J. et al. Challenges in ensuring global access to COVID−19 vaccines: production, affordability, allocation, and deployment. Lancet 397, 1023–1034 (2021).
Ortiz, J. R. & Neuzil, K. M. Influenza immunization in low- and middle-income countries: preparing for next-generation influenza vaccines. J. Infect. Dis. 219, S97–S106 (2019).
Moodley, K. et al. Ethics and governance challenges related to genomic data sharing in southern Africa: the case of SARS-CoV-2. Lancet Glob. Health 10, e1855–e1859 (2022).
Ramsay, M. African genomic data sharing and the struggle for equitable benefit. Patterns 3, 100412 (2022).
Preiser, W., Engelbrecht, S. & Maponga, T. No point in travel bans if countries with poor surveillance are ignored. Lancet 399, 1224 (2022).
Carlson, C. et al. Save lives in the next pandemic: ensure vaccine equity now. Nature 626, 952–953 (2024).
The Lancet. The pandemic treaty: shameful and unjust. Lancet 403, 781 (2024).
Akande, O. W. et al. Investing in health preparedness, response and resilience: a genomics costing tool focused on next generation sequencing. Front. Public Health 12, 1404243 (2024).
Brockmann, D. & Helbing, D. The hidden geometry of complex, network-driven contagion phenomena. Science 342, 1337–1342 (2013).
Deutschmann, E., Recchi, E. & Vespe, M. Assessing Transnational Human Mobility on a Global Scale BT - Migration Research in a Digitized World: Using Innovative Technology to Tackle Methodological Challenges. in (eds. Pötzschke, S. & Rinken, S.) 169–192 (Springer International Publishing, 2022). https://doi.org/10.1007/978-3-031-01319-5_9.
Klamser, P. P. et al. Inferring country-specific import risk of diseases from the world air transportation network. PLoS Comput. Biol. 20, e1011775 (2024).
Kalia, K., Saberwal, G. & Sharma, G. The lag in SARS-CoV-2 genome submissions to GISAID. Nat. Biotechnol. 39, 1058–1060 (2021).
Acknowledgements
We gratefully acknowledge all data contributors, i.e., the Authors and their Originating laboratories responsible for obtaining the specimens, and their Submitting laboratories for generating the genetic sequence and metadata and sharing via the GISAID Initiative, on which this research is based. This work was supported by European Research Council grant no. 818353 and by ZonMw grant no. 10710022210003 (C.A.R.).
Author information
Authors and Affiliations
Contributions
Conceptualization: S.P.J.d.J., B.E.N., A.X.H., and C.A.R. Methodology: S.P.J.d.J., B.E.N., A.X.H., and C.A.R. Formal analysis: S.P.J.d.J. and A.X.H. Writing – Original Draft: S.P.J.d.J., A.X.H., and C.A.R. Visualization: S.P.J.d.J. Writing – Review & Editing: S.P.J.d.J., B.E.N., A.d.R., E.P., V.M., C.H., M.D.d.J., A.X.H., and C.A.R.
Corresponding author
Ethics declarations
Competing interests
Authors declare no competing interests.
Peer review
Peer review information
Nature Communications thanks Maciej F. Boni, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
de Jong, S.P.J., Nichols, B.E., de Ruijter, A. et al. Global solidarity in genomic surveillance improves early detection of acute respiratory virus threats. Nat Commun 17, 765 (2026). https://doi.org/10.1038/s41467-025-67442-9
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41467-025-67442-9





