Abstract
This study presents the development and validation of a genomic surveillance strategy using Whole Genome Sequencing (WGS) on normalized pooled samples to detect and monitor SARS-CoV-2 variants. A bioinformatics pipeline was designed specifically for analyzing pooled WGS data and was validated using simulated datasets, pooled samples of reference materials, and pooled clinical samples collected during key periods of the Delta and Omicron variant emergence. The approach was evaluated for its accuracy in estimating variant abundance at both the Phylogenetic Assignment of Named Global Outbreak (PANGO) lineage level and the World Health Organization (WHO) variant level. From the simulation datasets, the method achieved an overall sensitivity of 99.1% and a positive predictive value (PPV) of 99.9% for detecting SARS-CoV-2 variants at the WHO variant level. At the PANGO lineage level, it achieved an overall sensitivity of 82.8% and a PPV of 77.4% when a predicted lineage was considered accurate if it shared more than 90% of markers with any true lineage present in the pooled sample. The accuracy of variant abundance estimation was further validated using pooled samples of reference materials. Analysis of pooled clinical samples showed results consistent with national epidemiological trends, particularly during the emergence of the Delta and Omicron variants in Korea. This pooled WGS-based genomic surveillance strategy offers a scalable and economical solution for monitoring SARS-CoV-2 variants, providing public health authorities with a valuable tool for tracking pandemic dynamics and enabling timely responses.
Similar content being viewed by others
Introduction
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, was first identified in Wuhan City, China, prompting the World Health Organization (WHO) to declare a global pandemic on March 11, 2020. As of December 2022, according to the WHO (https://covid19.who.int/), there have been more than 647 million confirmed cases of COVID-19 worldwide, with more than 6.6 million deaths. The rapid emergence of novel SARS-CoV-2 variants has displaced earlier strains, underscoring the importance of continuous surveillance to monitor their evolution and spread. Genomic surveillance plays a crucial role in understanding the virus’s genetic diversity and informing public health responses aimed at mitigating transmission and reducing the clinical impact of COVID-191.
Whole genome sequencing (WGS) has proven to be an essential tool in monitoring the genetic evolution of SARS-CoV-2 and identifying emerging variants2,3. WGS allows for the detection of mutations across the entire viral genome, particularly in key structural proteins like spike (S), membrane (M), and envelope (E), which can affect viral transmission and immune escape. In addition to WGS, tools such as Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) are widely used to classify SARS-CoV-2 lineages by analyzing mutation patterns in viral genomes4,5. GISAID6 and Nextstrain7 facilitate global data sharing and real-time variant tracking, enabling timely surveillance.
Sample pooling is a promising strategy for genomic surveillance, with the potential to significantly reduce both the time and costs associated with WGS by combining multiple individual specimens into a single sequencing library. While several studies have investigated pooling strategies for SARS-CoV-2 detection, most have focused on PCR-based methods8,9. However, the application of pooled WGS for SARS-CoV-2 variants surveillance remains underexplored.
In this study, we present a bioinformatics pipeline for SARS-CoV-2 genomic surveillance using pooled WGS. Our approach accurately estimates the abundance of SARS-CoV-2 variants in pooled samples, eliminating the need for individual sample barcoding. We first validated the bioinformatics pipeline using simulated NGS datasets derived from GISAID genome sequences. We then applied the complete workflow, encompassing both wet-lab experiments and bioinformatics analyses, to real NGS datasets from pooled samples containing multiple reference materials. Finally, we applied our method to pooled clinical samples to estimate the abundance of SARS-CoV-2 variant, comparing the results with national epidemiological data collected during the same period of clinical sample collection. This approach also enables the identification of emerging novel SARS-CoV-2 mutations circulating in communities, providing a valuable resource for public health authorities to monitor and address the pandemic effectively.
Materials and methods
Identification of mutations associated with SARS-CoV-2 variants
To identify mutations associated with SARS-CoV-2 variants, we analyzed COVID-19 genome sequences and corresponding metadata obtained from GISAID. Lineage information for each genome sequence was extracted from the metadata, focusing on lineages with at least 10 high-quality genome sequences available. High-quality genome sequences were defined as those longer than 29,000 bases and containing more than 99% non-ambiguous bases (i.e., A/C/G/T).
For each lineage, a lineage-specific FASTA file containing the relevant genome sequences was created. These FASTA files were compared against the reference genome (NC_045512.2) to identify mutations associated with each lineage. In cases where a lineage contained more than 500 genome sequences, we randomly subsampled up to 500 genome sequences. Minimap210 was used to align the lineage-specific FASTA files to the reference genome, followed by mutation identification using VarScan211. Detected mutations were annotated using the Variant Effect Predictor (VEP)12. Mutations with a Variant Allele Fraction (VAF) greater than 85% were considered lineage-associated mutations, as detailed in Supplementary Table 1.
Generation of simulated pooled NGS datasets
To benchmark the pipeline under controlled conditions, we generated simulated pooled datasets with predefined lineage compositions and known abundance ratios. All genome sequences used to create these simulated datasets were obtained from GISAID. The simulated pooled NGS datasets were generated with the following systematic procedure:
-
1.
Selection of Genome Sequences: Ten genome sequences were randomly selected from a dataset corresponding to a specific GISAID submission date.
-
2.
Assignment of Abundance Ratios: Abundance ratios for each genome sequence were randomly assigned within the range of 0.1 and 0.2, derived from a uniform distribution.
-
3.
Normalization of Abundance Ratios: The assigned abundance ratios were normalized to ensure their sum equaled 1.
-
4.
Read Generation using InSilicoSeq (ISS): Utilizing ISS v1.6.013, one million paired-end reads were simulated based on the selected genome sequences and their normalized abundance ratios.
This simulation focused on two critical periods: May to July 2021 (Delta variant spread) and November 2021 to January 2022 (Omicron variant emergence). From each selected submission date, 35 pooled NGS datasets were simulated, resulting in a total of 1,120 datasets across 32 submission dates.
Bioinformatic pipeline for estimating the abundance of SARS-CoV-2 lineages in pooled NGS data
To estimate the abundance of SARS-CoV-2 lineages in pooled NGS data, a dedicated bioinformatics pipeline was developed, consisting of the following key steps:
-
1.
Reads Alignment: The pooled NGS data were aligned to the SARS-CoV-2 reference genome (NC_045512.2) using the bwa-mem algorithm14.
-
2.
Post-Processing: Picard tools were employed to mark duplicated reads and sort the resulting BAM files.
-
3.
Mutation Calling: Mutations were identified using VarScan2 and subsequently annotated with the Variant Effect Predictor (VEP).
-
4.
Lineage Candidate Identification: To identify potential lineages, a lineage-specific marker coverage ratio was calculated by dividing the number of detected markers by the total markers associated with each lineage. Lineages with a coverage ratio exceeding a predefined threshold (default set to 0.7), and at least five detected markers were selected for further analysis.
-
5.
Initial Lineage Grouping: Lineages sharing identical sets of detected markers were grouped. For each group, the lineage with the highest marker coverage ratio was chosen. In cases where more specific sub-lineages (e.g., B.1.1. versus B.1.1.411) shared the same marker coverage ratio, only the broader, more general lineage was retained. However, in certain cases, multiple lineages may still be retained within the group.
-
6.
Clustering of Lineage Groups Based on Marker Similarity:
-
A.
Pairwise comparisons were performed to calculate the percentage of shared detected markers between lineage groups, based on the intersection-over-union (IoU) of detected markers.
-
B.
These relationships were represented in a graph, connecting nodes (representing lineage groups) that shared at least 90% of detected markers.
-
C.
Connected components in the graph were identified, with each representing a cluster of lineage groups sharing similar detected markers.
-
A.
-
7.
Creation of Cluster-Marker Association Matrix:
-
A.
Markers were aggregated for each cluster, representing all member lineage groups within the cluster.
-
B.
An association matrix was constructed with clusters as rows and selected markers as columns, indicating the presence1 or absence (0) of markers within each cluster.
-
A.
-
8.
Iterative Estimation of Cluster Fractions:
-
A.
Initial clusters were identified if they contained more than a threshold number of private markers (default set to five) associated exclusively with other clusters.
-
B.
An optimization algorithm (from the scipy.optimize package) was applied to estimate the fraction of each selected cluster in the sample. Detailed procedures are provided in the Supplementary Text.
-
C.
Clusters were iteratively added until the inclusion of an additional cluster no longer significantly improved the squared error sum, with the improvement ratio threshold set at 2%.
-
A.
-
9.
Lineage Annotation and Result Compilation: Each cluster identified through the above steps was annotated with the corresponding lineages, and the final results were compiled for presentation.
The results of this workflow are illustrated in Fig. 1.
Example of Result for COVID-19 Variant Abundance Estimation and Mutation Detection through Pooled NGS Data Analysis. This figure shows the outcomes of analyzing pooled NGS data. It provides the estimated abundance for each cluster of lineages and lists all detected mutations, each with its corresponding lineage associations.
Reference panels: pooled samples of reference materials
We prepared pooled samples (panels) of reference materials with a constant SARS-CoV-2 RNA load, using the AMPLIRUN® SARS-CoV-2 RNA Control (Vircell SA, Granada, Spain) as reference material. To construct panels, we utilized five RNA controls (P.1, B.1.351, B.1.1.7, B.1.617.2, and Omicron; Vircell SA, Granada, Spain), which were serially diluted to obtain 2,000 copies per 1 \(\:\mu\:L\). A total of six panels were created, each consisting of 10 spiked reference material samples with a concentration of 2,000 copies/\(\:\mu\:L\) (Supplementary Table 2). The absolute copies/\(\:\mu\:L\) of diluted samples were measured using digital droplet PCR (ddPCR) with open reading frame 1 (ORF1) gene primer/probe15.
Validation panels: pooled samples of clinical samples
Upper respiratory tract samples were collected from COVID-19 patients (n = 100) from December 2021 to January 2022, coinciding with the introduction of the Omicron variant. These samples were collected to reflect the epidemic trend and to evaluate the surveillance ability for new emerging variants in Korea. The clinical samples were stored at − 80 °C using a universal transport medium (Becton Dickinson, Sparks, MD, USA). Viral RNA was isolated from the clinical samples using the QIAamp viral RNA extraction kit (Qiagen, Hilden, Germany).
Ten validation panels were created by combining 10 individual samples from the collection of 100 patients (Supplementary Table 3). The SARS-CoV-2 concentration in each patient sample was adjusted to achieve a dilution of 2,000 copies per microliter (\(\:\mu\:L)\), based on quantitative calculations performed with droplet digital PCR (ddPCR).
Normalization of samples for pooling
Clinical samples with a Ct value of less than 20, the median for positive clinical samples, were expected to contain more than 10,000 viral copies15. Therefore, the samples were diluted 100-fold prior to performing ddPCR (Supplementary Table 3). For sample pooling, the SARS-CoV-2 copy number in each component sample was adjusted to approximately 2,000 copies per microliter. Additionally, we compared the results from real-time RT-PCR and ddPCR and applied a linear regression equation (Supplementary Fig. 1) to derive dilution factors. This enabled the direct normalization of samples for pooling based on Ct values from real-time RT-PCR.
Droplet digital PCR
Digital PCR reactions were performed with QX200 Droplet Digital PCR System and Custom ddPCR assay (Bio-Rad Laboratories, Hercules, CA, USA). We used specific primers and probes for ORF1 gene primer/probe15. The 20 \(\:\mu\:L\) PCR mix was composed of 5 \(\:\mu\:L\) Bio-Rad ddPCR Supermix, 2 \(\:\mu\:L\) of reverse transcriptase, 1 \(\:\mu\:L\) of 300mM DTT, 1 \(\:\mu\:L\) of each amplification primer/probe mix, 1 \(\:\mu\:L\) of nucleic acids templets and 10 \(\:\mu\:L\) of nuclease-free water. Droplets were then thermally cycled as follows: one cycle of 50 °C for 60 min and 95 °C for 10 min, followed by 40 cycles of 95 °C for 30 s, 60 °C for 1 min, followed by 98 °C for 10 min and 4 °C for 30 min. A temperature ramp of 1.5 °C was fixed on all PCR steps, and the lid was heated at 105 °C. Following PCR, the 96-well plate was read in the QX200 Droplet Reader (Bio-Rad). The absolute copy number of ORF1a gene of SARS-CoV-2 was calculated using the QuantaSoft analysis software (Bio-Rad).
Real-time reverse-transcription PCR
Extracted nucleic acid samples were tested for comparative analysis of SARS-CoV-2 by PowerChek™ SARS-CoV-2 Real-time PCR Kit (Kogene biotech, Seoul, Korea). The qRT-PCR was performed with a CFX 96 touch real-time PCR detection system (Bio-rad, Hercules, CA, USA), applying the conditions according to the manufacturer’s instructions. The median Ct value of ORF1gene of clinical sample for validation panels is approximately 19.9 (range:15.1 ~ 24.9) (Supplementary Table 3).
NGS assay
The panels of 10 pooled samples for SARS-CoV-2 variants surveillance were evaluated using the Ion AmpliSeq SARS-CoV-2 Research Panel (ThermoFisher Scientific) in accordance with the manufacturer’s instructions. Ion AmpliSeq SARS-CoV-2 Research Panel was designed for sequencing the whole genomes of SARS-CoV-2. cDNA was synthesized using the Ion Torrent NGS Reverse Transcription Kit-cDNA synthesis kit (ThermoFisher Scientific). Sequencing-ready libraries were created from the prepared cDNA using the Ion AmpliSeq Library Kit Plus (ThermoFisher Scientific). Sequencing was carried out on the ThermoFisher Ion S5 XL Sequencer with an Ion 530 Chip (ThermoFisher Scientific).
Results
Evaluation of performance at the PANGO lineage level using simulated NGS datasets
The comparison between true lineages and predicted lineages from the simulated pooled NGS dataset presents a significant challenge due to the extensive diversity and phylogeny of SARS-CoV-2 variants5,16. To assess sensitivity at the lineage level, we employed two strategies.
The first strategy involved assessing whether a true lineage was included among the inferred lineages from a pooled sample. This approach yielded a sensitivity of 69.7% (4,471 out of 6,417). Alternatively, we employed a second strategy where a predicted lineage was considered accurate if it shared considerable markers with the true lineage. For this alternative approach, we defined two lineages A and B as sharing considerable markers if the ratio between the number of intersection of markers of A and B and the number of union of markers of A and B is greater than 90%. This alternative approach resulted in a sensitivity of 82.8% (5,314 out of 6,417).
Furthermore, for the inclusion of a predicted lineage among the true lineages of a pooled sample, the PPV was 54.1% (4,471 out of 8,258). From the approximate match, the PPV was improved to 77.4% (6,390/8,258). The overall performance is summarized in Fig. 2.
Lineage-Level Performance in COVID-19 Variant Detection. This figure shows the detection performance for key PANGO lineages associated with WHO COVID-19 variants, indicating both exact (superscript a) and approximate (superscript b) match outcomes for Detected Number, Sensitivity, and PPV (Positive Predictive Value). An approximate match is identified by a shared marker ratio exceeding 90%. The figure highlights the efficiency of detecting lineages such as Alpha (B.1.1.7), Delta (AY.4), and Omicron (BA.1) variants, demonstrating the system’s adaptability to both precise and nearly matching genetic sequences.
Evaluation of performance at the WHO variant level using simulated NGS datasets
We then aggregated the true abundance of PANGO lineages into WHO Variant levels. For WHO-designated Variants of Concern (VOC) and Variants of Interest (VOI), the overall sensitivity was 99.1% (2,184 out of 2,204), and the PPV was 99.9% (2,184 out of 2,185). Additionally, we examined the correlation between the true abundance and estimated abundance for the major WHO Variants. The correlation exceeded 0.95 for all major WHO variants. The detailed performance and correlation are depicted in Fig. 3. Additionally, we explored the trends in the estimated abundance of pooled samples over the study period, as shown in Fig. 4. This analysis provided insights into the prevalence and dynamics of different WHO variants within the study population, reflecting the effectiveness of our surveillance strategy in tracking the evolution and spread of SARS-CoV-2 variants.
Performance Evaluation of WHO Variant Level COVID-19 Variation Detection. This figure presents a comprehensive analysis of the sensitivity and positive predictive value (PPV) for the detection of various COVID-19 variants, including Alpha, Beta, Delta, Epsilon, Eta, GH/490R, Gamma, Iota, Kappa, Lambda, Mu, Omicron, and Zeta. The assessment was based on a total of 1120 tests, with the count of True Positives (TP), False Negatives (FN), and False Positives (FP) summarized for each variant. Sensitivity and PPV are depicted for each variant, highlighting the high accuracy and reliability of the detection method across the spectrum of COVID-19 variants, with most variants showing near-perfect sensitivity and PPV rates.
Temporal Trends in WHO COVID-19 Variant Prevalence Through Simulated Pooled NGS Data. This figure shows estimated fractions of WHO-recognized COVID-19 variants from simulated pooled NGS, including Lambda, Mu, Omicron, GH/490R, Iota, Kappa, Epsilon, Eta, Gamma, Alpha, Beta, Delta, and Zeta. The simulation spans several months, illustrating the dynamic shifts in variant prevalence. Each variant’s estimated fraction is plotted over time, highlighting the changing landscape of COVID-19 variants.
Reference panel and clinical validation
We evaluated the whole workflow by using six reference panels mixed with different WHO variants at desired ratios, as shown in the left bar (REAL) of Fig. 5. The right bar (Estimated) represents the predicted abundance obtained by analyzing each pooled NGS dataset from the reference panel.
Reference Panel Evaluation for COVID-19 Variant Detection. This figure illustrates the performance of six reference panels in accurately estimating the fraction of WHO-designated COVID-19 variants, including Alpha, Beta, Delta, Gamma, and Omicron. For each panel, the figure compares real versus estimated values, showing the precision of detection across six different panels. It highlights the ability of the reference panels to closely estimate the actual fraction of variants.
To evaluate the monitoring capability of our method for detecting new variants and the domestic spread of SARS-CoV-2, we collected specimens from COVID-19 patients (n = 100) from December 2021 to January 2022, reflecting the recent trends of the epidemic. This period was selected as it coincided with the emergence of the Omicron variant in Korea. The left pane of Fig. 6 displays National prevalence in Korea for the Delta and Omicron variants during December 2021 to January 2022. The right pane of Fig. 3 illustrates the estimated prevalence of Delta and Omicron from the 100 clinical samples. For the former 4 weeks, we sequenced 20 samples with two pools for each week, for the latter 2 weeks, we sequenced 10 samples with one pool for each week.
Analysis of Clinical Samples for COVID-19 Variant Prevalence. This figure demonstrates the prevalence of COVID-19 variants Omicron and Delta over a series of weeks, contrasting pooled clinical samples with official South Korean national prevalence data. It visually represents the proportion of each variant detected, emphasizing the shifting dominance of variants over the observe period. The side-by-side comparison with national statistics highlights the dynamic nature of the pandemic’s evolution within South Korea, reinforcing the essential role of ongoing surveillance in guiding public health strategies and interventions.
Discussion
The global public health threat posed by emerging and re-emerging infectious diseases highlights the urgent need for advanced diagnostic and surveillance technologies that facilitate timely detection, monitoring, and outbreak control17,18,19,20,21.
In this study, we implemented a pooling strategy that allows up to ten individual samples to be combined into a single sequencing library. This approach facilitates the collection of unbiased samples and significantly reduces the labor, time, and resource requirements associated with processing numerous individual samples, which is particularly beneficial in large-scale surveillance settings. Furthermore, by enabling more efficient and cost-effective genomic surveillance, this approach can assist in identifying and monitoring new variants of the virus, thereby informing public health strategies to mitigate the transmission and clinical impact of COVID-19. Given the continuous evolution of SARS-CoV-2, regular updates to the lineage markers remain essential.
Compared to other pooled testing strategies, particularly those using PCR, our approach offers several distinct advantages. Primarily, by utilizing WGS, it enables the detection of a comprehensive range of mutations and provides more detailed genetic information. Because WGS captures the full mutation profile of pooled samples, it is more effective in identifying diverse and novel SARS-CoV-2 variants. This capability is especially valuable for epidemiological analyses and for guiding public health responses to newly emerging variants.
WGS-based surveillance has demonstrated substantial public health benefits in real-world applications. Notably, the U.S. CDC’s Traveler-based Genomic Surveillance program (TGS) at international airports effectively employs WGS to rapidly detect and monitor emerging SARS-CoV-2 variants, highlighting its practical value in facilitating timely public health interventions22. Large-scale genomic surveillance programs require highly efficient and streamlined workflows. Although several high-capacity multiplexed WGS-based approaches, such as COVseq23, have been developed, our pooling method eliminates the need for individual sample barcoding, thereby simplifying the workflow and further reducing resource requirements. Pooling strategies potentially offer an additional axis of optimization to enhance efficiency and scalability in surveillance programs.
Nevertheless, this study has several limitations. First, the reliance on simulated datasets might not fully represent the complexity and heterogeneity inherent in real-world clinical samples. Second, although pooling enhances surveillance efficiency, it may inadvertently obscure the detection of variants present at low frequencies. Third, the pipeline’s performance could vary depending on sequencing depth, viral load, and sample quality. Future research should assess the robustness of this method across diverse epidemiological contexts and laboratory conditions to confirm its broader applicability.
Data availability
The sequencing data from the clinical samples used in this study are publicly available in the Sequence Read Archive (SRA) under the accession number PRJNA1175420.
References
Ling-Hu, T., Rios-Guzman, E., Lorenzo-Redondo, R., Ozer, E. A. & Hultquist, J. F. Challenges and opportunities for global genomic surveillance strategies in the COVID-19 era. Viruses 14(11). (2022).
Wolf, J. M., Wolf, L. M., Bello, G. L., Maccari, J. G. & Nasi, L. A. Molecular evolution of SARS-CoV-2 from December 2019 to August 2022. J. Med. Virol. 95 (1), e28366 (2023).
Ma, K. C. et al. Genomic surveillance for SARS-CoV-2 variants: circulation of Omicron Lineages - United States, January 2022-May 2023. MMWR Morb Mortal. Wkly. Rep. 72 (24), 651–656 (2023).
O’Toole, Á. et al. Assignment of epidemiological lineages in an emerging pandemic using the Pangolin tool. Virus Evol. 7 (2), veab064 (2021).
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5 (11), 1403–1407 (2020).
Shu, Y. & McCauley, J. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro. Surveill 22(13). (2017).
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34 (23), 4121–4123 (2018).
Mahmoud, S. A. et al. Evaluation of pooling of samples for testing SARS-CoV- 2 for mass screening of COVID-19. BMC Infect. Dis. 21 (1), 360 (2021).
Barak, N. et al. Lessons from applied large-scale pooling of 133,816 SARS-CoV-2 RT-PCR tests. Sci. Transl Med. 13, 589 (2021).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34 (18), 3094–3100 (2018).
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22 (3), 568–576 (2012).
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17. (2016).
Gourlé, H., Karlsson-Lindsjö, O., Hayer, J. & Bongcam-Rudloff, E. Simulating illumina metagenomic data with insilicoseq. Bioinformatics 35 (3), 521–522 (2019).
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 (14), 1754–1760 (2009).
Jung, Y. et al. Comparative analysis of Primer-Probe sets for RT-qPCR of COVID-19 causative virus (SARS-CoV-2). ACS Infect. Dis. 6 (9), 2513–2523 (2020).
O’Toole, Á., Pybus, O. G., Abram, M. E., Kelly, E. J. & Rambaut, A. Pango lineage designation and assignment using SARS-CoV-2 Spike gene nucleotide sequences. BMC Genom. 23 (1), 121 (2022).
Charostad, J. et al. A comprehensive review of highly pathogenic avian influenza (HPAI) H5N1: an imminent threat at doorstep. Travel Med. Infect. Dis. 55, 102638 (2023).
Karami, H. & Salehi-Vaziri, M. Dengue fever in Iran: an emerging threat requiring urgent public health interventions. New. Microbes New. Infect. 62, 101535 (2024).
Kumar, S., Guruparan, D., Karuppanan, K. & Kumar, K. J. S. Comprehensive insights into Monkeypox (mpox): recent advances in epidemiology, diagnostic approaches and therapeutic strategies. Pathogens 14(1). (2024).
Letafati, A., Salahi Ardekani, O., Karami, H. & Soleimani, M. Ebola virus disease: A narrative review. Microb. Pathog. 181, 106213 (2023).
Masmejan, S. et al. Zika Virus Pathogens 9(11). (2020).
Friedman, C., Morfino, R. & Ernst, E. Leveraging a strategic Public–Private partnership to launch an Airport-Based pathogen monitoring program to detect emerging health threats. Emerg. Infect. Disease J. ;31(13). (2025).
Simonetti, M. et al. COVseq is a cost-effective workflow for mass-scale SARS-CoV-2 genomic surveillance. Nat. Commun. 12 (1), 3903 (2021).
Acknowledgements
This research was supported by faculty research grants from Yonsei University College of Medicine [grant numbers: 6-2020-0129].
Author information
Authors and Affiliations
Contributions
IP, YK, KL wrote a manuscript. IP implements BI pipeline for the analysis. IP, YK curate and analyze the data. MK, YK, KL designed the study.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethical statement
The study was approved by the Institutional Review Board of Gangnam Severance Hospital (IRB number 3-2022-0459), and adhered to the clinical practice guidelines of the Declaration of Helsinki (2013 amendment).
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Park, I., Kim, Y., Choi, M.H. et al. Genomic surveillance of SARS-CoV-2 variants using pooled WGS. Sci Rep 15, 13948 (2025). https://doi.org/10.1038/s41598-025-99201-7
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-025-99201-7