Introduction

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a single-stranded, positive-sense RNA virus classified within the Betacoronavirus genus of the Coronaviridae family1. As the causative agent of Coronavirus Disease 2019 (COVID-19), this virus is notorious for its high transmissibility and profound impact on global public health2. The SARS-CoV-2 genome is approximately 29,870 nucleotides long and features a single-stranded RNA structure that includes at least 13 identified open reading frames (ORFs). These ORFs are primarily arranged in a linear sequence from the 5′ to the 3′ end and are crucial for the virus’s biological functions3. Key structural proteins encoded by the SARS-CoV-2 genome include the spike (S), envelope (E), membrane (M), and nucleocapsid (N) proteins, all of which are essential for facilitating viral entry into host cells and for the assembly of new virions. Additionally, SARS-CoV-2 harbors multiple accessory and nonstructural proteins that help the virus evade the immune system, disrupt host cellular processes, and enhance both viral replication and transcription3,4,5,6.

As SARS-CoV-2 continues to evolve, new variants have emerged featuring mutations in the spike protein. Notably, sublineages of the omicron variant have become the predominant strains of the virus. These sublineages are characterized by a wide range of spike protein mutations that significantly increase their transmissibility and ability to evade immune responses7. The first omicron subvariant, BA.1, was detected in mid-November 2021 in South Africa and has been associated with enhanced viral replication and an elevated risk of reinfection8,9,10. After the emergence of BA.1, subsequent mutations led to the development of several significant subvariants, including BA.2, BA.2.75, and BA.4/5. By 2023, these subvariants, along with their descendants, were co-circulating globally, which has raised ongoing concerns among public health officials regarding the potential for increased transmission and impacts on vaccine efficacy11,12,13.

The increasing genetic diversity of SARS-CoV-2 has resulted in the concurrent emergence of multiple lineages, facilitating the identification of various recombinant forms. The Pango-designation project, established and released in early 2020, has meticulously cataloged over 200 distinct recombinant lineages of SARS-CoV-214,15. By late 2022, the Omicron XBB lineages, derived from the recombination of two co-circulating BA.2 sub-lineages (BJ.1 and BM.1.1.1), had become the dominant circulating variants16. However, by late 2023, the emergence of the JN.1 lineage in the United States rapidly supplanted the XBB lineages, establishing itself as the most prevalent Omicron variant globally by 202417. This transition highlights JN.1’s enhanced transmissibility and adaptability18,19. Ongoing genetic evolution has resulted in several JN.1 descendant, including JN.1.16, KP.2, and KP.3.1.117. Notably, in November 2024, the prevalence of a recombinant variant, XEC, a fusion of JN.1 lineages KS.1.1 and KP.3.3, significantly increased, further illustrating the virus’s ongoing genetic diversification17.

Since its emergence in late 2019, SARS-CoV-2 has been responsible for over 7 million confirmed deaths and nearly 700 million confirmed infections globally over a span of four years20. In Thailand, six distinct lineages of SARS-CoV-2 were identified in the first quarter of 2020, namely A, A.6, B, B.1, B.1.8, and B.5821. The B.1.36.16 variant was the primary driver of the major Outbreak during the second wave from December 2020 to January 2021. Following this, the Alpha variant emerged as the leading strain in the third wave from April to June 2021. The Delta variant characterized the fourth wave, which lasted from July to December 2021, while the Omicron variant significantly impacted the fifth wave from January to March 202222. In early 2023, the BA.2.75 variant was the most prevalent among COVID-19 cases; however, it was soon supplanted by the rising XBB.1.15 and XBB.1.16 variants, which gained dominance later in the year23.

Thailand’s commitment to genomic surveillance is exemplified by the submission of 25,178 SARS-CoV-2 sequences to GISAID in 202224. However, this number experienced a substantial decline of 78%, dropping to 5,550 sequences in 2023, and further decreased by 61% to just 2,131 sequences in 202424. Concurrently, scientists are examining the possibility of COVID-19 evolving into the same period as influenza or other respiratory diseases, spreading in the winter season of countries in the Northern and Southern Hemispheres25. In tropical countries such as Thailand, outbreaks can occur almost year-round, with peak transmission during the rainy season, from June to September each year26. As population immunity increases through vaccination and natural infection, and as the virus may mutate into less severe forms, the pattern of outbreaks could become more predictable.

Despite the decreased submission rates, maintaining robust genomic monitoring remains essential for effectively tracking the virus’s evolution and its variants. Continuous surveillance and timely updates on vaccinations are vital for controlling and managing the virus’s impact as it may transition to a seasonal pattern. This study aims to examine the seasonal patterns and genetic diversity of SARS-CoV-2 among patients with acute respiratory illness in Thailand during the fifth year of the COVID-19 pandemic, offering crucial insights into the virus’s evolving dynamics and informing future public health strategies.

Results

Trends in COVID-19 incidence and demographics of infected patients in 2024

According to the Ministry of Public Health (MoPH), a total of 46,079 COVID-19 cases and 220 fatalities were reported between January and December 2024, reflecting a 6% reduction in cases and a 20% decrease in deaths compared to the previous year (Fig. 1a and b)27. Regional distribution of cases revealed that the Bangkok Metropolitan Region (BMR) accounted for 30.8% of the total cases, followed by the Northeastern region at 20.0% and the Southern region at 13.8%. The Western region recorded the lowest proportion, contributing only 3.4% to the overall total. The disproportionately high number of cases in the BMR may reflect its dense population, greater mobility, and broader access to testing services, which could have contributed to both increased transmission and case detection. Notably, the highest incidence of deaths occurred in May and June, accounting for 41% of total fatalities (Fig. 1b).

Fig. 1
Fig. 1
Full size image

(a) Weekly number of COVID-19 cases by region27 and (b) number of COVID-19-related deaths in Thailand, 202427. (c) Percent positivity of SARS-CoV-2 infection in Acute Respiratory Illness (ARI) samples in this study, as determined by real-time RT-PCR.

In this study, 8,096 Acute Respiratory Infection (ARI) cases submitted by Thai residents to partnering hospitals were analyzed. Of these, 1,152 samples tested positive for SARS-CoV-2 via real-time RT-PCR, resulting in a positivity rate of 14.2% (Fig. 1c). Data collection was conducted across Thailand’s seasonal divisions: summer (February to mid-May), rainy season (mid-May to October), and winter (November to February). The findings revealed an initial SARS-CoV-2 Outbreak wave in January, accounting for 13.3% of total positive samples (153/1,152), followed by a significant second wave from April to June, which represented 48.9% of total positives (563/1,152) during the summer months. Notably, positivity rates for SARS-CoV-2 declined below 10% from July to December, indicating a reduction in transmission during the latter part of the year.

Among the 1,152 SARS-CoV-2 infected patients, 539 (46.8%) were Male and 613 (53.2%) were female (Table 1). A Chi-square test showed no significant association between gender and infection status, χ²(1, N = 8,096) = 0.30, p = 0.583. The highest proportion of SARS-CoV-2 infections occurred in individuals aged 31–40 years (18.3%), followed by those aged 41–50 (17.7%) and 1 month–10 years (16.9%). The lowest was in the 11–20 years group (5.9%). Pairwise proportion tests with Bonferroni correction showed significantly higher infection rates in the 31–40 age group compared to the 0–10 (p < 0.001), 11–20 (p = 0.0011), and 51–60 (p < 0.001) groups. No significant differences were observed when compared to the 21–30, 41–50, or > 60 age groups (p > 0.05). These findings highlight a greater burden of infection among middle-aged adults.

Table 1 Number and percentage of infected and non-infected individuals by gender and age group (n = 8,096).

SARS-CoV-2 variants distribution in Thailand

To investigate the distribution of SARS-CoV-2 lineages in Thailand from January to December 2024, a comprehensive analysis was conducted on 689 SARS-CoV-2 positive samples. Among these, 60 samples were randomly selected for complete genome sequencing, while the remaining 629 underwent partial spike sequencing. The study identified more than seven distinct lineages, each accompanied by its descendant variants [Fig. 2]. Notable lineages included JN.1, JN.1.16, KP.1 through KP.3, and LB.1, in addition to several recombinant lineages including XCU, XDQ, XDV, XDY, XEC, XEF, XEL, and XEN. The findings revealed that the frequency and distribution of these viral lineages demonstrated continuous variations over time.

Fig. 2
Fig. 2
Full size image

(a) The distribution of SARS-CoV-2 lineages in Thailand from January to December 2024 includes each lineage along with its descendant lineages. (b) The variation frequencies over time are shown, where the size of each point indicates the weekly collection of sequences. These data points are assessed through logistic regression, with the shaded areas denoting the 95% confidence interval. Only the variants with the highest prevalence during the study period are displayed, emphasizing the most significant trends in variant distribution and their statistical reliability.

In alignment with the observed variant frequencies (Fig. 2a and b), the JN.1 lineage Maintained the highest prevalence from January to April, peaking at 90.2% (55/61) in February, followed by JN.1.16 at 11.1% (4/36) in March and KP.2 at 23.2% (16/69) in April. Following this period, the highest prevalence transitioned to KP.2 during May and June, reaching a peak of 28.6% (32/112) in June, while JN.1.16 emerged as the second most prevalent variant, accounting for 23.4% (32/137) and 25.9% (29/112) in those respective months. Additionally, LB.1 was first detected in April with a prevalence of 4.3% (3/69) and subsequently peaked at 35.0% (21/60) in July, establishing itself as a dominant variant before declining and ultimately disappearing by November.

From August to September, the predominant variant shifted once again to KP.3, which accounted for 29.0% of cases in both months (11/38 in August and 9/31 in September), primarily represented by sublineages KP.3.1.1, KP.3.2, and KP.3.3.1 (Supplement Fig. 1). This variant maintained its status as the second most prevalent variant through the end of December.

The recombinant lineages were first identified in May, beginning with an initial prevalence of 8.8% (12/137). They subsequently became predominant during the last three months of the year, reaching 38.2% (13/34) in October and peaking at 57.1% (4/7) in December. For example, XDV.1 began to significantly contribute in mid-year, first detected in May at 5.1% (7/137) and increasing to 10.5% (4/38) in August (Supplement Fig. 1). Meanwhile, XDY peaked in September at 9.7% (3/31). The variant XEC emerged as the most prevalent within the recombinant lineages, first identified in September at 3.2% (1/31), and then rising to 15.8% (3/19) in November and 14.3% (1/7) in December. Additionally, in October, other recombinant variants detected included XEF at 2.9% (1/34), and both XEL and XEN at 8.8% each (3/34). This dynamic landscape of variant prevalence highlights the ongoing evolution and transmission patterns of SARS-CoV-2 in Thailand, underscoring the necessity for continuous genomic surveillance to adapt public health responses effectively.

Maximum likelihood and time‑scaled phylogenetic reconstruction of sars‑cov‑2 variants

To establish initial trees for Bayesian phylodynamic analyses, time-scaled phylogenies were constructed based on the maximum likelihood (ML) tree topologies and their corresponding collection dates (Fig. 3a and b). The ML phylogenetic tree, derived from complete genome sequences, illustrates the molecular evolution of SARS-CoV-2 isolates both from Thailand and globally. Molecular clock analysis of the dataset provided robust evidence for the time-stable, clock-like evolution of these lineages, demonstrating an R² value of 0.5 and a correlation coefficient of 0.7, along with a substitution rate of 1.38 × 10–3 substitutions per site per year (Fig. 3c). This analysis employed the optimal rooting approach, which minimizes the mean squared residuals, to effectively investigate the relationships between genetic divergence and sampling dates.

Fig. 3
Fig. 3
Full size image

(a) A global maximum likelihood (ML) tree was constructed from a dataset of complete genome sequences (N = 423) using TreeTime, employing the oldest method for phylogenetic analysis. (b) A time-scaled ML tree was generated by integrating collection dates into the ML tree from panel (a), thus serving as the primary reference for further analysis. (c) A regression analysis of root-to-tip genetic distances against sampling dates for this comprehensive dataset, estimated using TempEst, revealed a significant positive molecular clock signal.

The Bayesian phylogenetic tree, constructed from an Additional set of 60 complete SARS-CoV-2 genomes (Supplement Table 1) alongside global sequences collected between December 2023 and December 2024 (Fig. 4), indicates that the most recent common ancestor of this dataset is estimated to have emerged in August 2022. Overall, the Thai and global SARS-CoV-2 samples did Not exhibit distinct phylogenetic clustering, suggesting similar patterns of viral circulation and shared lineages during the study period. However, after September 2024, a divergence was observed, with Thai samples not clustering within the sublineage clusters KP3.1.1, KP3.2, and KP3.3.1. The absence of these newly emerging KP3 sublineages in Thailand during the study period May reflect differences in transmission dynamics, introduction events, or the temporal scope of sampling. The molecular evolutionary rate for the dataset was estimated at 0.63 × 10–3 nucleotide substitutions per site per year (sub/site/year), with a 95% highest posterior density interval (HPDI) ranging from 0.57 × 10–3 to 0.70 × 10–3 sub/site/year.

Fig. 4
Fig. 4
Full size image

Maximum Clade Credibility (MCC) tree of SARS-CoV-2 variant populations circulating in Thailand (n = 176) and globally (n = 247), inferred using molecular clock analysis of samples collected between December 2023 and December 2024. The scale bar indicates nucleotide substitutions per site.

Table 2 details the sequence change rates and the time to the most recent common ancestors (TMRCA) as determined by Markov chain Monte Carlo (MCMC) methods. Among all variants, XDV showed the highest substitution rate (0.89 × 10–3) and the greatest nucleotide divergence (4.56 × 10–4), with a tMRCA estimated around late November 2023. In contrast, JN.1 had the lowest substitution rate (0.45 × 10–3) and an earlier tMRCA in early March 2023. Variants KP.1 to KP.3 and LB.1 shared comparable nucleotide divergence values (3.66 × 10–4 and 4.09 × 10–4, respectively), with substitution rates of 0.56 × 10–3 and 0.75 × 10–3. However, KP.1–3 emerged later, with a tMRCA in mid-November 2023, while LB.1 diverged slightly after, in early January 2024. XEC and XDY both emerged more recently, with tMRCAs in May 2024. While their substitution rates were similar (0.55 and 0.56 × 10–3), XDY showed notably lower nucleotide divergence (1.72 × 10–4) compared to XEC (3.33 × 10–4), possibly reflecting differences in evolutionary dynamics or sampling time frames.

Table 2 Sequence change rates and time to the most recent common ancestors (TMRCA) using Markov chain Monte Carlo (MCMC) methods.

Discussion

The COVID-19 Outbreak, first identified in late 2019, arrived in Thailand in early 2020, leading to a series of ongoing transmission events that prompted national concern28. Following nearly three years of extensive outbreaks and significant public health challenges, the Thai Ministry of Public Health reclassified COVID-19 from a ‘dangerous infectious disease’ to a ‘disease under surveillance’ and announced a revised reporting protocol in October 2022 that focuses exclusively on case counts among hospitalized patients29. By 2023, the severity of COVID-19 had significantly diminished, evidenced by a marked reduction in fatalities27. In 2024, the reported death toll decreased to just 220 cases, representing a stark contrast to the high mortality rates observed in the first three years of the pandemic, which included 21,614 deaths in 2021, 11,971 in 2022, and 848 in 202327. This reduction in severity can be attributed to several factors, including increased vaccination coverage, the emergence of less virulent variants, and improved treatment modalities. As a result, COVID-19 is now recognized as a seasonal respiratory illness, similar to other common respiratory infections.

This study investigated the seasonal trends and genetic diversity of SARS-CoV-2 among patients with acute respiratory illness in Thailand during the fifth year of the COVID-19 pandemic, providing a comprehensive analysis of acute respiratory infection cases throughout 2024, which includes both inpatient and outpatient data across all seasons. The results demonstrate that COVID-19 transmission follows a distinct seasonal pattern, marked by a significant increase in cases from late summer to the early rainy season, particularly between April and June.

In the early years of COVID-19’s seasonal transition, its incidence peaked earlier than other respiratory infections, such as influenza and respiratory syncytial virus30,31. Although Thailand’s official rainy season begins in mid-May, the interplay of school reopenings and the Songkran festival in mid-April, characterized by extensive travel and gatherings, contributes significantly to an early rise in infection rates.

Following this peak, case numbers gradually declined from September to the end of the year, before experiencing another surge in early winter (January to mid-March), primarily due to lower temperatures. Unlike temperate regions where COVID-19 surges predominantly occur during winter32,33,34. Thailand’s tropical climate lacks a distinct cold season. While there is evidence of increased cases during cooler months, the most pronounced peak reliably occurs at the onset of the rainy season each year. This highlights the unique seasonal dynamics of COVID-19 in Thailand, characterized by year-round transmission with a marked surge during early monsoon months. Such insights are crucial for informing public health strategies tailored to local seasonal patterns.

This study observed the highest prevalence of COVID-19 infection among individuals aged 31 to 40 years, with statistically significant differences compared to several other age groups, including children (0–10 years), adolescents (11–20 years), and older adults aged 51–60 years. These findings are consistent with previous research reporting a greater burden of SARS-CoV-2 infection among middle-aged adults, possibly due to increased social mobility and occupational exposure during the post-pandemic reopening phase35. Although a slightly higher proportion of female cases was observed, statistical analysis showed no significant association between gender and infection status. This aligns with earlier studies that reported comparable infection rates between males and females35. However, consistent with earlier findings, male patients have been reported to experience more severe disease outcomes and increased mortality36.

The current study revealed that the predominance of variants shifted throughout the year, with JN.1 dominating in the earlier part of the year, followed by transitions to KP.2 and LB.1. After August, KP.3 became the predominant variant, primarily represented by the sublineages KP.3.1.1, and KP.3.3.1. The present study aligns with a report from South Korea, which found that the proportions of various sub-lineages, including JN.1, KP.2, LB.1, and KP.3, exhibited the highest prevalence during the period from April to August 202437. That report also indicated that the KP.3 sub-lineages with the highest proportions were identified as KP.3.3.1, KP.3.3, and KP.3.1.1. The ongoing genetic evolution of SARS-CoV-2 poses significant challenges for vaccine development, as newly emerging variants may diminish the efficacy of existing vaccines. The variants identified in Thailand closely align with those circulating globally, highlighting the interconnected nature of SARS-CoV-2 evolution and underscoring the need for continuous genomic surveillance and vaccine adaptation to effectively combat the evolving pandemic. Notably, while the overall variant dynamics in Thailand closely mirrored global trends, the absence of the newly emerging KP.3 sublineages (KP.3.1.1, KP.3.2, and KP.3.3.1) in Thai samples after September 2024 suggests potential regional differences in viral evolution and spread. This highlights the importance of timely genomic surveillance to detect emerging sublineages that may initially appear in global datasets but remain undetected locally.

The current study estimated a nucleotide evolutionary rate of approximately 0.63 × 10–3 substitutions per site per year (sub/site/year), consistent with reported during the XBB wave in the Republic of Korea (late 2022 to late 2023), where substitution rates ranged from 0.56 × 10–3 to 0.91 × 10–3 sub/site/year38. Similarly, a study conducted in the United States reported a whole-genome evolutionary rate of 0.67 × 10–3 (sub/site/year), further supporting the comparability of our findings39. When compared to other respiratory viruses, this rate is similar to that of human rhinovirus (0.66 × 10–3) and human metapneumovirus (0.71 × 10–3), while it is lower than that of respiratory syncytial virus (0.76 × 10–3) and influenza A, which ranges from 2.21 to 3.37 × 10–340,41,42,43.

The present study provides important insights into the molecular epidemiology and evolutionary dynamics of SARS-CoV-2 circulating in Thailand and globally. By integrating genomic surveillance with phylogenetic analysis, the findings contribute to the early detection of emerging variants and the identification of transmission patterns. The ability to monitor viral evolution in near real-time supports early warning systems and enables the timely adjustment of public health measures. Moreover, the phylogenetic similarities observed between Thai and global samples emphasize the importance of international surveillance and data sharing to anticipate local outbreaks influenced by global viral movements. These insights are instrumental for guiding vaccine policy, resource allocation, and national preparedness planning, particularly in the context of rapidly evolving variants such as JN.1 and KP.1-KP.3.

During the study period, vaccination strategies globally were updated to address the Omicron JN.1 subvariant, with reformulated vaccines being deployed. In Thailand, however, these JN.1-targeted vaccines were introduced later, toward the end of 2024, and their uptake was limited due to high cost and voluntary administration. This may have influenced the transmission dynamics observed in the region.

This study provides a comprehensive year-long analysis of the seasonal distribution of COVID-19 in Thailand, utilizing a substantial dataset collected over the entire year with a large number. However, several limitations must be acknowledged. Since the analysis focuses exclusively on a single year, multi-year investigations are necessary to fully understand the seasonal fluctuations of COVID-19 outbreaks. Regarding variant analysis, despite efforts to conduct sequencing throughout the year, resource constraints, including budget limitations, limited the ability to sequence all cases. The limited number and temporal distribution of fully sequenced genomes may affect the representativeness of lineage dynamics, so caution is warranted when interpreting results based on this subset. Additionally, in instances where patients exhibited low viral loads (as indicated by high Ct values), variant identification was not feasible. For evolutionary analysis, the inclusion of more robust metadata would enhance the accuracy and effectiveness of datasets used to estimate the nucleotide substitution rate and the time to the most recent common ancestor (tMRCA).

In conclusion, this study provides a comprehensive analysis of COVID-19 incidence and variant dynamics in Thailand throughout 2024. It highlights a notable seasonal trend in infection rates and emphasizes the predominance of specific SARS-CoV-2 lineages, including KP.3 and its sublineages, particularly in the latter part of the year. The findings underscore the importance of continuous genomic surveillance to monitor the evolving landscape of variants and inform public health strategies effectively. Additionally, the estimated molecular evolutionary rate and the emergence timing of key lineages contribute valuable insights to the understanding of SARS-CoV-2 evolution and its impact on the pandemic landscape in Thailand.

Materials and methods

Institutional review board statement

The research was carried out following the Good Clinical Practice (GCP) under the principles outlined in the Declaration of Helsinki and received approval from the Institutional Review Board of the Faculty of Medicine, Chulalongkorn University, Thailand (approval number IRB0933/67). To ensure patient confidentiality, all data and identifiers were anonymized. Since this was a retrospective study, the Institutional Review Board waived the requirement for informed consent.

Specimens collection

In this study, 8,096 specimens were collected using throat swabs or nasopharyngeal swabs with Flock swabs (Copan Diagnostics, Murrieta, CA) or Modono sterile swabs (Modono, New Delhi, India), and were subsequently placed in viral transport media (VTM) or universal transport media (UTM). The samples were obtained from patients presenting with acute respiratory illness characterized by symptoms such as fever, sore throat, rhinorrhea, cough, or dyspnea within seven days of symptom onset. The study included both outpatients and inpatients seeking medical care at various hospitals in Bangkok as well as at Chum Phae Hospital in Khon Kaen province. Collected samples were tested for multiple respiratory viruses, including influenza viruses (A and B), SARS-CoV-2, respiratory syncytial virus (RSV), adenovirus, parainfluenza viruses, human metapneumovirus, seasonal coronaviruses, and rhinovirus. Residual specimens were then sent to the Center of Excellence in Clinical Virology at the Faculty of Medicine, Chulalongkorn University, where they were stored at − 20 °C until further analysis for this study.

RNA extraction and molecular testing

RNA extraction and molecular testing were conducted as previously outlined23. In brief, RNA was extracted from a 200-µL aliquot of the supernatant using the MagLEAD 12gC instrument (Precision System Science, Chiba, Japan), following the manufacturer’s protocols meticulously. Real-time reverse transcription-PCR (RT-PCR) was performed using the Roche LightCycler® 480 (LC480) instrument, employing specific primers and probes targeting the nucleocapsid gene (N1 and N2) in accordance with the guidelines established by the Centers for Disease Control and Prevention (CDC)44. The amplification of the housekeeping gene Glyceraldehyde 3-phosphate dehydrogenase (GAPDH) was included as an endogenous control for each sample, utilizing primers and probes as outlined in prior studies45. The thermocycling conditions included an initial 10-minute incubation at 45 °C, followed by a 5-minute denaturation step at 95 °C, and then 45 amplification cycles at 95 °C for 15 s and 60 °C for 30 s. Each testing run included positive and negative (nontemplate) controls, with a sample classified as positive for SARS-CoV-2 if the cycle threshold (CT) value was ≤ 38. Additionally, a selection of positive SARS-CoV-2 samples was randomly chosen for variant classification via partial spike sequencing and complete genome sequencing.

Amplification and sequencing of the partial Spike gene

From January to December 2024, a total of 629 SARS-CoV-2 RNA samples were processed, with partial amplification of the spike gene performed on individuals in Thailand confirmed to be infected with COVID-19. The amplification utilized primers detailed in Supplementary Table 2. Briefly, RT-PCR was performed in a total volume of 25 µL, consisting of 2–3 µL of total RNA (ranging from 100 ng to 1 µg), 0.5 µM of each primer, 12.5 µL of 2X Reaction Mix (which contains 0.4 mM of each dNTP and 3.2 mM MgSO4), 1 µL of the SSIII RT/Platinum Taq Mix, and nuclease-free water. The Superscript III One-Step RT-PCR system with Platinum Taq High Fidelity was utilized according to the manufacturer’s guidelines (Invitrogen, Carlsbad, CA, USA). The PCR protocol involved an initial incubation at 45 °C for 30 min, followed by 40 cycles that included denaturation at 95 °C for 30 s, annealing at 50 °C for 30 s, and extension at 68 °C for 1 min and 45 s. A final extension step was carried Out at 68 °C for 5 min. Both forward and reverse primers were utilized concurrently for sequencing and product amplification, conducted at First BASE Laboratories Sdn Bhd (Selangor Darul Ehsan, Malaysia), ensuring comprehensive coverage and accuracy in the analysis.

Whole-genome sequencing

Residual SARS-CoV-2 PCR-positive respiratory specimens (N = 60) were utilized for viral sequencing. Samples with a Ct value of 25 or lower were specifically selected for whole-genome sequencing, which was carried out by adapting a previously established protocol to enhance accuracy and efficiency in the genomic analysis23. The sequencing and identification of complete SARS-CoV-2 genomes were conducted using the Celemics Comprehensive Respiratory Virus Panel (Celemics Inc., Incheon, Republic of Korea), facilitating accurate and efficient genomic analysis. Briefly, RNA extraction involved combining 25 ng of isolated RNA with an RNA fragmentation buffer, followed by first-strand cDNA synthesis using a specialized master mix. The cDNA was converted into double-stranded form through incubation with a second-strand synthesis mix, after which it was cleaned, repaired, and modified with poly(A) tail oligomers. The A-tailed DNA was then ligated to adapters and purified using CeleMag cleanup beads before amplification to create an adapter-ligated library with CLM polymerase and unique dual-index primers. The quality of the library was assessed using automated capillary gel electrophoresis, ensuring DNA fragments were in the 200 to 400 bp range. Next-Generation Sequencing (NGS) was conducted on the Illumina NextSeq 500 system, and the resulting FASTQ data were trimmed, assembled, and analyzed through the Celemics Virus Verifier pipeline to generate consensus sequences.

Maximum likelihood phylogenetic analysis and molecular clock assessment

In this study, a comprehensive dataset of 423 genomic sequences was compiled, which included 60 newly collected sequences from this research and 116 sequences from other provinces in Thailand (Supplement Table 3). This dataset was further enriched by integrating 247 globally representative SARS-CoV-2 genomes retrieved from the GISAID database within the timeframe of the study, from December 2023 to December 2024, thereby enhancing its robustness. The dataset was aligned utilizing MAFFT v.746, and the alignments were subsequently partitioned by codon position. TreeTime47 was used to construct a maximum likelihood tree, which provided the framework for the Bayesian phylodynamic analyses based on the tree topologies and corresponding collection dates. The resulting maximum likelihood phylogeny and time-scaled tree were visualized using the ggtree package48 in R. v.4.4.249. Additionally, a regression analysis of root-to-tip genetic distance against sampling time was conducted using TempEST v1.5.350.

Bayesian phylogenetic analysis and genetic distance estimation

A complete genome dataset was utilized to reconstruct time-scaled phylogenies through Bayesian inference, employing Markov Chain Monte Carlo (MCMC) techniques via the BEAST software (v.2.4.8)51. To determine the most appropriate evolutionary model, combinations of three coalescent tree priors (constant population size, exponential growth, and Bayesian skyline) and two molecular clock models (strict and uncorrelated lognormal relaxed clocks) were independently evaluated. Model fit was assessed through marginal likelihood estimation (MLE) via path sampling and stepping-stone sampling52. Log Bayes Factors (logBF), calculated as the difference in MLE between competing models, were interpreted following Kass and Raftery’s criteria53, with logBF > 5 indicating very strong support. Among all tested models, the strict molecular clock with a constant population size prior exhibited the highest marginal likelihood and was decisively supported over alternative models (Supplement Table 4). Final phylogenetic inference was therefore conducted under the selected best-fit model. Two independent MCMC chains, each consisting of 200 million steps, were run and combined using the BEAGLE library54 to enhance computational performance. Parameters and trees were sampled every 20,000 steps, with the initial 20% discarded as burn-in. Convergence and adequate sampling were confirmed in Tracer v1.7.155, with effective sample sizes (ESS) exceeding 200 for all key parameters. The resulting posterior tree distributions were summarized using TreeAnnotator v1.8.4 to generate a maximum clade credibility (MCC) tree, and phylogenies were visualized with FigTree (https://github.com/rambaut/figtree/releases).

Average genetic distances were calculated using MEGA-X version [10.2.6], employing the Kimura 2-parameter model with variance estimation enabled56.

Statistical analysis

Associations between the categorical variable gender and infection status were examined using Pearson’s chi-square test of independence. Pairwise comparisons of infection proportions among age groups were conducted with Bonferroni correction to adjust for multiple testing. Statistical significance was set at p < 0.05. All analyses were performed using R version 4.4.249.