Genomic surveillance of SARS-CoV-2 variants using pooled WGS

Park, Inho; Kim, Yoonjung; Choi, Min Hyuk; Lee, Kyung-A

doi:10.1038/s41598-025-99201-7

Download PDF

Article
Open access
Published: 22 April 2025

Genomic surveillance of SARS-CoV-2 variants using pooled WGS

Inho Park^1,2^na1,
Yoonjung Kim³^na1,
Min Hyuk Choi³ &
…
Kyung-A Lee^3,4

Scientific Reports volume 15, Article number: 13948 (2025) Cite this article

1815 Accesses
1 Altmetric
Metrics details

Subjects

Abstract

This study presents the development and validation of a genomic surveillance strategy using Whole Genome Sequencing (WGS) on normalized pooled samples to detect and monitor SARS-CoV-2 variants. A bioinformatics pipeline was designed specifically for analyzing pooled WGS data and was validated using simulated datasets, pooled samples of reference materials, and pooled clinical samples collected during key periods of the Delta and Omicron variant emergence. The approach was evaluated for its accuracy in estimating variant abundance at both the Phylogenetic Assignment of Named Global Outbreak (PANGO) lineage level and the World Health Organization (WHO) variant level. From the simulation datasets, the method achieved an overall sensitivity of 99.1% and a positive predictive value (PPV) of 99.9% for detecting SARS-CoV-2 variants at the WHO variant level. At the PANGO lineage level, it achieved an overall sensitivity of 82.8% and a PPV of 77.4% when a predicted lineage was considered accurate if it shared more than 90% of markers with any true lineage present in the pooled sample. The accuracy of variant abundance estimation was further validated using pooled samples of reference materials. Analysis of pooled clinical samples showed results consistent with national epidemiological trends, particularly during the emergence of the Delta and Omicron variants in Korea. This pooled WGS-based genomic surveillance strategy offers a scalable and economical solution for monitoring SARS-CoV-2 variants, providing public health authorities with a valuable tool for tracking pandemic dynamics and enabling timely responses.

Phylogenomics reveals viral sources, transmission, and potential superinfection in early-stage COVID-19 patients in Ontario, Canada

Article Open access 12 February 2021

Rapid monitoring of SARS-CoV-2 variants of concern through high-resolution melt analysis

Article Open access 07 December 2023

Host genomics of SARS-CoV-2 infection

Article Open access 30 June 2022

Introduction

Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of COVID-19, was first identified in Wuhan City, China, prompting the World Health Organization (WHO) to declare a global pandemic on March 11, 2020. As of December 2022, according to the WHO (https://covid19.who.int/), there have been more than 647 million confirmed cases of COVID-19 worldwide, with more than 6.6 million deaths. The rapid emergence of novel SARS-CoV-2 variants has displaced earlier strains, underscoring the importance of continuous surveillance to monitor their evolution and spread. Genomic surveillance plays a crucial role in understanding the virus’s genetic diversity and informing public health responses aimed at mitigating transmission and reducing the clinical impact of COVID-19¹.

Whole genome sequencing (WGS) has proven to be an essential tool in monitoring the genetic evolution of SARS-CoV-2 and identifying emerging variants^2,3. WGS allows for the detection of mutations across the entire viral genome, particularly in key structural proteins like spike (S), membrane (M), and envelope (E), which can affect viral transmission and immune escape. In addition to WGS, tools such as Pangolin (Phylogenetic Assignment of Named Global Outbreak Lineages) are widely used to classify SARS-CoV-2 lineages by analyzing mutation patterns in viral genomes^4,5. GISAID⁶ and Nextstrain⁷ facilitate global data sharing and real-time variant tracking, enabling timely surveillance.

Sample pooling is a promising strategy for genomic surveillance, with the potential to significantly reduce both the time and costs associated with WGS by combining multiple individual specimens into a single sequencing library. While several studies have investigated pooling strategies for SARS-CoV-2 detection, most have focused on PCR-based methods^8,9. However, the application of pooled WGS for SARS-CoV-2 variants surveillance remains underexplored.

In this study, we present a bioinformatics pipeline for SARS-CoV-2 genomic surveillance using pooled WGS. Our approach accurately estimates the abundance of SARS-CoV-2 variants in pooled samples, eliminating the need for individual sample barcoding. We first validated the bioinformatics pipeline using simulated NGS datasets derived from GISAID genome sequences. We then applied the complete workflow, encompassing both wet-lab experiments and bioinformatics analyses, to real NGS datasets from pooled samples containing multiple reference materials. Finally, we applied our method to pooled clinical samples to estimate the abundance of SARS-CoV-2 variant, comparing the results with national epidemiological data collected during the same period of clinical sample collection. This approach also enables the identification of emerging novel SARS-CoV-2 mutations circulating in communities, providing a valuable resource for public health authorities to monitor and address the pandemic effectively.

Materials and methods

Identification of mutations associated with SARS-CoV-2 variants

To identify mutations associated with SARS-CoV-2 variants, we analyzed COVID-19 genome sequences and corresponding metadata obtained from GISAID. Lineage information for each genome sequence was extracted from the metadata, focusing on lineages with at least 10 high-quality genome sequences available. High-quality genome sequences were defined as those longer than 29,000 bases and containing more than 99% non-ambiguous bases (i.e., A/C/G/T).

For each lineage, a lineage-specific FASTA file containing the relevant genome sequences was created. These FASTA files were compared against the reference genome (NC_045512.2) to identify mutations associated with each lineage. In cases where a lineage contained more than 500 genome sequences, we randomly subsampled up to 500 genome sequences. Minimap2¹⁰ was used to align the lineage-specific FASTA files to the reference genome, followed by mutation identification using VarScan2¹¹. Detected mutations were annotated using the Variant Effect Predictor (VEP)¹². Mutations with a Variant Allele Fraction (VAF) greater than 85% were considered lineage-associated mutations, as detailed in Supplementary Table 1.

Generation of simulated pooled NGS datasets

To benchmark the pipeline under controlled conditions, we generated simulated pooled datasets with predefined lineage compositions and known abundance ratios. All genome sequences used to create these simulated datasets were obtained from GISAID. The simulated pooled NGS datasets were generated with the following systematic procedure:

1.
Selection of Genome Sequences: Ten genome sequences were randomly selected from a dataset corresponding to a specific GISAID submission date.
2.
Assignment of Abundance Ratios: Abundance ratios for each genome sequence were randomly assigned within the range of 0.1 and 0.2, derived from a uniform distribution.
3.
Normalization of Abundance Ratios: The assigned abundance ratios were normalized to ensure their sum equaled 1.
4.
Read Generation using InSilicoSeq (ISS): Utilizing ISS v1.6.0¹³, one million paired-end reads were simulated based on the selected genome sequences and their normalized abundance ratios.

This simulation focused on two critical periods: May to July 2021 (Delta variant spread) and November 2021 to January 2022 (Omicron variant emergence). From each selected submission date, 35 pooled NGS datasets were simulated, resulting in a total of 1,120 datasets across 32 submission dates.

Bioinformatic pipeline for estimating the abundance of SARS-CoV-2 lineages in pooled NGS data

To estimate the abundance of SARS-CoV-2 lineages in pooled NGS data, a dedicated bioinformatics pipeline was developed, consisting of the following key steps:

1.
Reads Alignment: The pooled NGS data were aligned to the SARS-CoV-2 reference genome (NC_045512.2) using the bwa-mem algorithm¹⁴.
2.
Post-Processing: Picard tools were employed to mark duplicated reads and sort the resulting BAM files.
3.
Mutation Calling: Mutations were identified using VarScan2 and subsequently annotated with the Variant Effect Predictor (VEP).
4.
Lineage Candidate Identification: To identify potential lineages, a lineage-specific marker coverage ratio was calculated by dividing the number of detected markers by the total markers associated with each lineage. Lineages with a coverage ratio exceeding a predefined threshold (default set to 0.7), and at least five detected markers were selected for further analysis.
5.
Initial Lineage Grouping: Lineages sharing identical sets of detected markers were grouped. For each group, the lineage with the highest marker coverage ratio was chosen. In cases where more specific sub-lineages (e.g., B.1.1. versus B.1.1.411) shared the same marker coverage ratio, only the broader, more general lineage was retained. However, in certain cases, multiple lineages may still be retained within the group.
6.
Clustering of Lineage Groups Based on Marker Similarity:
1. A.
  Pairwise comparisons were performed to calculate the percentage of shared detected markers between lineage groups, based on the intersection-over-union (IoU) of detected markers.
2. B.
  These relationships were represented in a graph, connecting nodes (representing lineage groups) that shared at least 90% of detected markers.
3. C.
  Connected components in the graph were identified, with each representing a cluster of lineage groups sharing similar detected markers.
7.
Creation of Cluster-Marker Association Matrix:
1. A.
  Markers were aggregated for each cluster, representing all member lineage groups within the cluster.
2. B.
  An association matrix was constructed with clusters as rows and selected markers as columns, indicating the presence¹ or absence (0) of markers within each cluster.
8.
Iterative Estimation of Cluster Fractions:
1. A.
  Initial clusters were identified if they contained more than a threshold number of private markers (default set to five) associated exclusively with other clusters.
2. B.
  An optimization algorithm (from the scipy.optimize package) was applied to estimate the fraction of each selected cluster in the sample. Detailed procedures are provided in the Supplementary Text.
3. C.
  Clusters were iteratively added until the inclusion of an additional cluster no longer significantly improved the squared error sum, with the improvement ratio threshold set at 2%.
9.
Lineage Annotation and Result Compilation: Each cluster identified through the above steps was annotated with the corresponding lineages, and the final results were compiled for presentation.

The results of this workflow are illustrated in Fig. 1.

Reference panels: pooled samples of reference materials

We prepared pooled samples (panels) of reference materials with a constant SARS-CoV-2 RNA load, using the AMPLIRUN^® SARS-CoV-2 RNA Control (Vircell SA, Granada, Spain) as reference material. To construct panels, we utilized five RNA controls (P.1, B.1.351, B.1.1.7, B.1.617.2, and Omicron; Vircell SA, Granada, Spain), which were serially diluted to obtain 2,000 copies per 1 \(\:\mu\:L\). A total of six panels were created, each consisting of 10 spiked reference material samples with a concentration of 2,000 copies/\(\:\mu\:L\) (Supplementary Table 2). The absolute copies/\(\:\mu\:L\) of diluted samples were measured using digital droplet PCR (ddPCR) with open reading frame 1 (ORF1) gene primer/probe¹⁵.

Validation panels: pooled samples of clinical samples

Upper respiratory tract samples were collected from COVID-19 patients (n = 100) from December 2021 to January 2022, coinciding with the introduction of the Omicron variant. These samples were collected to reflect the epidemic trend and to evaluate the surveillance ability for new emerging variants in Korea. The clinical samples were stored at − 80 °C using a universal transport medium (Becton Dickinson, Sparks, MD, USA). Viral RNA was isolated from the clinical samples using the QIAamp viral RNA extraction kit (Qiagen, Hilden, Germany).

Ten validation panels were created by combining 10 individual samples from the collection of 100 patients (Supplementary Table 3). The SARS-CoV-2 concentration in each patient sample was adjusted to achieve a dilution of 2,000 copies per microliter (\(\:\mu\:L)\), based on quantitative calculations performed with droplet digital PCR (ddPCR).

Normalization of samples for pooling

Clinical samples with a Ct value of less than 20, the median for positive clinical samples, were expected to contain more than 10,000 viral copies¹⁵. Therefore, the samples were diluted 100-fold prior to performing ddPCR (Supplementary Table 3). For sample pooling, the SARS-CoV-2 copy number in each component sample was adjusted to approximately 2,000 copies per microliter. Additionally, we compared the results from real-time RT-PCR and ddPCR and applied a linear regression equation (Supplementary Fig. 1) to derive dilution factors. This enabled the direct normalization of samples for pooling based on Ct values from real-time RT-PCR.

Droplet digital PCR

Digital PCR reactions were performed with QX200 Droplet Digital PCR System and Custom ddPCR assay (Bio-Rad Laboratories, Hercules, CA, USA). We used specific primers and probes for ORF1 gene primer/probe¹⁵. The 20 \(\:\mu\:L\) PCR mix was composed of 5 \(\:\mu\:L\) Bio-Rad ddPCR Supermix, 2 \(\:\mu\:L\) of reverse transcriptase, 1 \(\:\mu\:L\) of 300mM DTT, 1 \(\:\mu\:L\) of each amplification primer/probe mix, 1 \(\:\mu\:L\) of nucleic acids templets and 10 \(\:\mu\:L\) of nuclease-free water. Droplets were then thermally cycled as follows: one cycle of 50 °C for 60 min and 95 °C for 10 min, followed by 40 cycles of 95 °C for 30 s, 60 °C for 1 min, followed by 98 °C for 10 min and 4 °C for 30 min. A temperature ramp of 1.5 °C was fixed on all PCR steps, and the lid was heated at 105 °C. Following PCR, the 96-well plate was read in the QX200 Droplet Reader (Bio-Rad). The absolute copy number of ORF1a gene of SARS-CoV-2 was calculated using the QuantaSoft analysis software (Bio-Rad).

Real-time reverse-transcription PCR

Extracted nucleic acid samples were tested for comparative analysis of SARS-CoV-2 by PowerChek™ SARS-CoV-2 Real-time PCR Kit (Kogene biotech, Seoul, Korea). The qRT-PCR was performed with a CFX 96 touch real-time PCR detection system (Bio-rad, Hercules, CA, USA), applying the conditions according to the manufacturer’s instructions. The median Ct value of ORF1gene of clinical sample for validation panels is approximately 19.9 (range:15.1 ~ 24.9) (Supplementary Table 3).

NGS assay

The panels of 10 pooled samples for SARS-CoV-2 variants surveillance were evaluated using the Ion AmpliSeq SARS-CoV-2 Research Panel (ThermoFisher Scientific) in accordance with the manufacturer’s instructions. Ion AmpliSeq SARS-CoV-2 Research Panel was designed for sequencing the whole genomes of SARS-CoV-2. cDNA was synthesized using the Ion Torrent NGS Reverse Transcription Kit-cDNA synthesis kit (ThermoFisher Scientific). Sequencing-ready libraries were created from the prepared cDNA using the Ion AmpliSeq Library Kit Plus (ThermoFisher Scientific). Sequencing was carried out on the ThermoFisher Ion S5 XL Sequencer with an Ion 530 Chip (ThermoFisher Scientific).

Results

Evaluation of performance at the PANGO lineage level using simulated NGS datasets

The comparison between true lineages and predicted lineages from the simulated pooled NGS dataset presents a significant challenge due to the extensive diversity and phylogeny of SARS-CoV-2 variants^5,16. To assess sensitivity at the lineage level, we employed two strategies.

The first strategy involved assessing whether a true lineage was included among the inferred lineages from a pooled sample. This approach yielded a sensitivity of 69.7% (4,471 out of 6,417). Alternatively, we employed a second strategy where a predicted lineage was considered accurate if it shared considerable markers with the true lineage. For this alternative approach, we defined two lineages A and B as sharing considerable markers if the ratio between the number of intersection of markers of A and B and the number of union of markers of A and B is greater than 90%. This alternative approach resulted in a sensitivity of 82.8% (5,314 out of 6,417).

Furthermore, for the inclusion of a predicted lineage among the true lineages of a pooled sample, the PPV was 54.1% (4,471 out of 8,258). From the approximate match, the PPV was improved to 77.4% (6,390/8,258). The overall performance is summarized in Fig. 2.

Evaluation of performance at the WHO variant level using simulated NGS datasets

We then aggregated the true abundance of PANGO lineages into WHO Variant levels. For WHO-designated Variants of Concern (VOC) and Variants of Interest (VOI), the overall sensitivity was 99.1% (2,184 out of 2,204), and the PPV was 99.9% (2,184 out of 2,185). Additionally, we examined the correlation between the true abundance and estimated abundance for the major WHO Variants. The correlation exceeded 0.95 for all major WHO variants. The detailed performance and correlation are depicted in Fig. 3. Additionally, we explored the trends in the estimated abundance of pooled samples over the study period, as shown in Fig. 4. This analysis provided insights into the prevalence and dynamics of different WHO variants within the study population, reflecting the effectiveness of our surveillance strategy in tracking the evolution and spread of SARS-CoV-2 variants.

Reference panel and clinical validation

We evaluated the whole workflow by using six reference panels mixed with different WHO variants at desired ratios, as shown in the left bar (REAL) of Fig. 5. The right bar (Estimated) represents the predicted abundance obtained by analyzing each pooled NGS dataset from the reference panel.

To evaluate the monitoring capability of our method for detecting new variants and the domestic spread of SARS-CoV-2, we collected specimens from COVID-19 patients (n = 100) from December 2021 to January 2022, reflecting the recent trends of the epidemic. This period was selected as it coincided with the emergence of the Omicron variant in Korea. The left pane of Fig. 6 displays National prevalence in Korea for the Delta and Omicron variants during December 2021 to January 2022. The right pane of Fig. 3 illustrates the estimated prevalence of Delta and Omicron from the 100 clinical samples. For the former 4 weeks, we sequenced 20 samples with two pools for each week, for the latter 2 weeks, we sequenced 10 samples with one pool for each week.

Discussion

The global public health threat posed by emerging and re-emerging infectious diseases highlights the urgent need for advanced diagnostic and surveillance technologies that facilitate timely detection, monitoring, and outbreak control^{17,18,19,20,21}.

In this study, we implemented a pooling strategy that allows up to ten individual samples to be combined into a single sequencing library. This approach facilitates the collection of unbiased samples and significantly reduces the labor, time, and resource requirements associated with processing numerous individual samples, which is particularly beneficial in large-scale surveillance settings. Furthermore, by enabling more efficient and cost-effective genomic surveillance, this approach can assist in identifying and monitoring new variants of the virus, thereby informing public health strategies to mitigate the transmission and clinical impact of COVID-19. Given the continuous evolution of SARS-CoV-2, regular updates to the lineage markers remain essential.

Compared to other pooled testing strategies, particularly those using PCR, our approach offers several distinct advantages. Primarily, by utilizing WGS, it enables the detection of a comprehensive range of mutations and provides more detailed genetic information. Because WGS captures the full mutation profile of pooled samples, it is more effective in identifying diverse and novel SARS-CoV-2 variants. This capability is especially valuable for epidemiological analyses and for guiding public health responses to newly emerging variants.

WGS-based surveillance has demonstrated substantial public health benefits in real-world applications. Notably, the U.S. CDC’s Traveler-based Genomic Surveillance program (TGS) at international airports effectively employs WGS to rapidly detect and monitor emerging SARS-CoV-2 variants, highlighting its practical value in facilitating timely public health interventions²². Large-scale genomic surveillance programs require highly efficient and streamlined workflows. Although several high-capacity multiplexed WGS-based approaches, such as COVseq²³, have been developed, our pooling method eliminates the need for individual sample barcoding, thereby simplifying the workflow and further reducing resource requirements. Pooling strategies potentially offer an additional axis of optimization to enhance efficiency and scalability in surveillance programs.

Nevertheless, this study has several limitations. First, the reliance on simulated datasets might not fully represent the complexity and heterogeneity inherent in real-world clinical samples. Second, although pooling enhances surveillance efficiency, it may inadvertently obscure the detection of variants present at low frequencies. Third, the pipeline’s performance could vary depending on sequencing depth, viral load, and sample quality. Future research should assess the robustness of this method across diverse epidemiological contexts and laboratory conditions to confirm its broader applicability.

Data availability

The sequencing data from the clinical samples used in this study are publicly available in the Sequence Read Archive (SRA) under the accession number PRJNA1175420.

References

Ling-Hu, T., Rios-Guzman, E., Lorenzo-Redondo, R., Ozer, E. A. & Hultquist, J. F. Challenges and opportunities for global genomic surveillance strategies in the COVID-19 era. Viruses 14(11). (2022).
Wolf, J. M., Wolf, L. M., Bello, G. L., Maccari, J. G. & Nasi, L. A. Molecular evolution of SARS-CoV-2 from December 2019 to August 2022. J. Med. Virol. 95 (1), e28366 (2023).
Article CAS PubMed Google Scholar
Ma, K. C. et al. Genomic surveillance for SARS-CoV-2 variants: circulation of Omicron Lineages - United States, January 2022-May 2023. MMWR Morb Mortal. Wkly. Rep. 72 (24), 651–656 (2023).
Article PubMed PubMed Central Google Scholar
O’Toole, Á. et al. Assignment of epidemiological lineages in an emerging pandemic using the Pangolin tool. Virus Evol. 7 (2), veab064 (2021).
Article PubMed PubMed Central Google Scholar
Rambaut, A. et al. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist genomic epidemiology. Nat. Microbiol. 5 (11), 1403–1407 (2020).
Article CAS PubMed PubMed Central Google Scholar
Shu, Y. & McCauley, J. GISAID: global initiative on sharing all influenza data - from vision to reality. Euro. Surveill 22(13). (2017).
Hadfield, J. et al. Nextstrain: real-time tracking of pathogen evolution. Bioinformatics 34 (23), 4121–4123 (2018).
Article CAS PubMed PubMed Central Google Scholar
Mahmoud, S. A. et al. Evaluation of pooling of samples for testing SARS-CoV- 2 for mass screening of COVID-19. BMC Infect. Dis. 21 (1), 360 (2021).
Article CAS PubMed PubMed Central Google Scholar
Barak, N. et al. Lessons from applied large-scale pooling of 133,816 SARS-CoV-2 RT-PCR tests. Sci. Transl Med. 13, 589 (2021).
Article Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34 (18), 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res. 22 (3), 568–576 (2012).
Article CAS PubMed PubMed Central Google Scholar
McLaren, W. et al. The ensembl variant effect predictor. Genome Biol. 17. (2016).
Gourlé, H., Karlsson-Lindsjö, O., Hayer, J. & Bongcam-Rudloff, E. Simulating illumina metagenomic data with insilicoseq. Bioinformatics 35 (3), 521–522 (2019).
Article PubMed Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25 (14), 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Jung, Y. et al. Comparative analysis of Primer-Probe sets for RT-qPCR of COVID-19 causative virus (SARS-CoV-2). ACS Infect. Dis. 6 (9), 2513–2523 (2020).
Article CAS PubMed Google Scholar
O’Toole, Á., Pybus, O. G., Abram, M. E., Kelly, E. J. & Rambaut, A. Pango lineage designation and assignment using SARS-CoV-2 Spike gene nucleotide sequences. BMC Genom. 23 (1), 121 (2022).
Article Google Scholar
Charostad, J. et al. A comprehensive review of highly pathogenic avian influenza (HPAI) H5N1: an imminent threat at doorstep. Travel Med. Infect. Dis. 55, 102638 (2023).
Article CAS PubMed Google Scholar
Karami, H. & Salehi-Vaziri, M. Dengue fever in Iran: an emerging threat requiring urgent public health interventions. New. Microbes New. Infect. 62, 101535 (2024).
Article PubMed PubMed Central Google Scholar
Kumar, S., Guruparan, D., Karuppanan, K. & Kumar, K. J. S. Comprehensive insights into Monkeypox (mpox): recent advances in epidemiology, diagnostic approaches and therapeutic strategies. Pathogens 14(1). (2024).
Letafati, A., Salahi Ardekani, O., Karami, H. & Soleimani, M. Ebola virus disease: A narrative review. Microb. Pathog. 181, 106213 (2023).
Article CAS PubMed Google Scholar
Masmejan, S. et al. Zika Virus Pathogens 9(11). (2020).
Friedman, C., Morfino, R. & Ernst, E. Leveraging a strategic Public–Private partnership to launch an Airport-Based pathogen monitoring program to detect emerging health threats. Emerg. Infect. Disease J. ;31(13). (2025).
Simonetti, M. et al. COVseq is a cost-effective workflow for mass-scale SARS-CoV-2 genomic surveillance. Nat. Commun. 12 (1), 3903 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This research was supported by faculty research grants from Yonsei University College of Medicine [grant numbers: 6-2020-0129].

Author information

Inho Park and Yoonjung Kim contributed equally to this work.

Authors and Affiliations

Center for Precision Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
Inho Park
Department of Pathology, Gangnam Severance Hospital, Yonsei University College of Medicine, Seoul, Korea
Inho Park
Department of Laboratory Medicine, Yonsei University College of Medicine, Seoul, Korea
Yoonjung Kim, Min Hyuk Choi & Kyung-A Lee
Department of Laboratory Medicine, Gangnam Severance Hospital, Yonsei University College of Medicine, 211 Eonju-Ro, Gangnam-Gu, Seoul, Republic of Korea
Kyung-A Lee

Authors

Inho Park
View author publications
Search author on:PubMed Google Scholar
Yoonjung Kim
View author publications
Search author on:PubMed Google Scholar
Min Hyuk Choi
View author publications
Search author on:PubMed Google Scholar
Kyung-A Lee
View author publications
Search author on:PubMed Google Scholar

Contributions

IP, YK, KL wrote a manuscript. IP implements BI pipeline for the analysis. IP, YK curate and analyze the data. MK, YK, KL designed the study.

Corresponding author

Correspondence to Kyung-A Lee.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical statement

The study was approved by the Institutional Review Board of Gangnam Severance Hospital (IRB number 3-2022-0459), and adhered to the clinical practice guidelines of the Declaration of Helsinki (2013 amendment).

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Supplementary Material 2

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Park, I., Kim, Y., Choi, M.H. et al. Genomic surveillance of SARS-CoV-2 variants using pooled WGS. Sci Rep 15, 13948 (2025). https://doi.org/10.1038/s41598-025-99201-7

Download citation

Received: 27 February 2025
Accepted: 17 April 2025
Published: 22 April 2025
DOI: https://doi.org/10.1038/s41598-025-99201-7