Introduction

Orthohantaviruses, members of the order Bunyavirales and family Hantaviridae, are enveloped, single-stranded, negative-sense RNA viruses1. Their genome comprises large (L), medium (M), and small (S) segments, which encode an RNA-dependent RNA polymerase (RdRp), two surface glycoproteins (Gn and Gc), and a nucleocapsid protein (NP), respectively. These pathogenic viruses cause hemorrhagic fever with renal syndrome (HFRS) in Eurasia and hantavirus cardiopulmonary syndrome in the Americas2. Approximately 150,000 HFRS cases are documented annually in East Asia, with fatality rates ranging from < 1–15%3. Furthermore, HFRS is primarily associated with infections caused by Orthohantavirus hantanense (HTNV), O. seoulense (SEOV), O. puumalaense (PUUV), and O. dobravaense (DOBV)4. Transmission to humans occurs predominantly through the inhalation of aerosolized viral particles shed in the saliva, urine, or feces of infected rodents5.

Phylogeographic inference methods, which incorporate discrete and continuous spatial data, are becoming increasingly prominent in viral phylodynamics. These methods provide crucial insights into viral evolution and the geographic spread of infectious diseases6. Phylodynamic analyses of HTNV have revealed significant phylogeographic and epidemiological correlations between patients with HFRS and their rodent reservoir hosts, thereby facilitating the identification of potential viral exposure sites7. Further genomic epidemiological studies analyzing nearly full-length HTNV genome sequences from military personnel in the Republic of Korea (ROK) and United States between 2013 and 2015 strengthened these phylogeographic connections8. High-resolution phylogeographic associations of HTNV within a 5 km radius were established through active surveillance involving targeted rodent trapping in areas suspected of HFRS outbreaks in 2020, consequently facilitating the identification of infectious agents and potential exposure sites9. These high-resolution phylogeographic inference approaches, underpinned by comprehensive viral genome databases, have reconstructed epidemiological linkages between patients and viral sources. Thus, they have substantially contributed to the development of public health interventions for hantavirus-associated diseases.

Segmented RNA viruses can substantially influence molecular evolution through reassortment, which involves the exchange of entire genome segments during co-infection of a host cell by distinct viral strains10. This process promotes genome shuffling, increases genetic diversity, and often results in distinct phylogenetic relationships among segments. Reassortment alters critical viral features such as immune evasion, transmissibility, and virulence in humans11,12. In viruses such as influenza A and rotavirus A, the reassortment of genetic segments facilitates immune escape and contributes to epidemic outbreaks12,13,14. Similarly, the ongoing reassortment of human immunodeficiency virus results in variants with modified transmissibility and virulence in human populations15. Furthermore, the genotype-specific pathogenic potential of the Dabie bandavirus is observed across six pure genotypes and nine reassortants, with varying mortality rates reported in the ROK, China, and Japan16,17. Molecular evidence suggests that genetic exchange in HTNV occurs naturally; this contributes to the phylogeographical diversity of the virus in the ROK18.

A hybrid zone is a geographical area where two distinct taxa that have undergone substantial evolutionary divergence come into contact and interbreed19. These regions persist over evolutionary timescales and are characterized by fluctuations in genotype frequencies, manifesting as clines that reflect variations in genetic or phenotypic traits20. Their stability is maintained by the delicate interplay between the homogenizing influence of dispersal and the diversifying forces of natural selection21,22. Previous studies have documented the existence of hybrid regions between pathogens and their reservoir hosts in natural environments. These zones facilitate genetic diversity and speciation processes, as observed in beak and feather disease viruses and their host species23. Hybrid zones are maintained by spatial contact and natural selection pressures in murine cytomegalovirus and its hosts24. Genomic surveillance of hybrid zones has revealed noteworthy dynamics, such as the co-circulation of two distinct lineages of O. tulaense within a geographical region where different evolutionary clades of the common vole (Microtus arvalis) interact and interbreed25. More recently, the identification of an HTNV hybrid zone in the ROK suggested that the convergence of divergent genotypes within this area may promote genome reassortment, thereby contributing to enhanced genetic diversity and evolutionary divergence26.

In this study, we conducted comprehensive genomic surveillance and phylogeographic analyses of HTNV strains collected across the ROK over several decades. We integrated complete genome sequencing, evolutionary analyses, and investigation of genomic reassortment and hybrid zone dynamics to elucidate the mechanisms underlying the genetic diversity and evolution of HTNV. These analyses provide a foundation for the improvement of phylogeographic inference methods, ultimately supporting the development of more effective public health strategies against endemic HFRS outbreaks.

Methods

Establishment of genome database for HTNV

In this study, we obtained all available complete genome sequences of HTNV (N = 123) reported in the ROK and their corresponding annotation metadata from National Center for Biotechnology Information (NCBI) GenBank, with data collected until February 28, 2025. When accessible, epidemiological information was gathered to construct detailed datasets, including strain names and origins, lineages, collection dates, reservoir hosts, candidates for genome exchange, accession numbers, and precise geographical locations of trapping sites, encompassing global positioning system (GPS) coordinates, towns, cities, and provinces. Duplicate viral sequences from similar strains and entries with unclear or incomplete information were excluded to ensure metadata accuracy. Similar strains were defined as sequences derived from the same viral isolate that differed only by sequencing method or virus propagation in cell culture. Additionally, sequences with ambiguous collection dates, uncertain host species, imprecise geographic information, or evident sequencing errors were excluded.

Localized Nextstrain build for HTNV in the ROK

In the present study, the Snakemake pipeline for HTNV was employed as a workflow management system to facilitate rapid deployment and ensure the reproducibility of genomic surveillance efforts27,28. Geographical data were linked to an extensive metadata set based on GPS coordinates and organized into three resolution levels: 36 towns (denoted as -ri), 11 cities (denoted as -si or -gun), and three provinces. The genomic sequences of each segment were aligned using MAFFT29and time-scaled phylogenies were generated using IQ-TREE and TreeTime30,31. These phylogenetic trees were then visualized alongside geographic maps, genomic entropy and variations, and frequency panels using the Augur and Auspice toolkits32.

Evolutionary rate analysis

The HTNV dataset (N = 123) encompassing genome sequences reported from 1976 to 2023 includes the L segment encoding the RdRp (6,456 bp), the M segment encoding glycoproteins Gn (1,944 bp) and Gc (1,464 bp), and the S segment encoding the NP (1,290 bp), with all sequences collected for evolutionary rate estimation. Multiple sequence alignments were generated using MAFFT (v7.511)29. Evolutionary rates (substitutions per site per year) were inferred using Bayesian Markov chain Monte Carlo (MCMC) analysis implemented in BEAST (v1.10)33. The most appropriate nucleotide substitution model was selected using jModelTest2 (v2.1.10), which identified GTR + G + I as the best-fitting model for all datasets. Temporal calibration was performed based on sampling years, and coalescent analyses were conducted until all parameters reached convergence, with uncertainty assessed using 95% highest posterior density intervals. Effective sample sizes exceeded 200 for all key parameters to ensure statistical robustness. Molecular clock analyses were performed under both strict and relaxed models with an uncorrelated lognormal rate distribution. A range of substitution rate priors was evaluated, and demographic histories were inferred under two models, including constant population size and exponential growth.

Natural selection analysis

The evolutionary pressure acting on the genomes was evaluated by estimating the rate of nonsynonymous (dN) and synonymous (dS) nucleotide substitutions per site per year using the HyPhy software package34. Comprehensive natural selection profiles were generated for all genomic segments across complete coding genome datasets to elucidate potential residues under positive or negative selective pressures. Three complementary analytical methods—single-likelihood ancestor counting (SLAC), fixed effects likelihood (FEL), and fast unconstrained Bayesian approximation (FUBAR)—were utilized to determine statistically significant selection sites. These analyses were performed using the Datamonkey Adaptive Evolution Server (https://www.datamonkey.org/; accessed on 18 March 2025)35. Statistically robust evidence for positive and negative selection was defined as amino acid positions having p-values < 0.05 (SLAC and FEL) or posterior probability values > 0.9 (FUBAR).

Genetic reassortment analysis

Genetic reassortment was evaluated using the graph incompatibility-based reassortment finder (GiRaF) software36. Alignments of the tri-segmented genomes of HTNV were used as inputs for Bayesian inference. One thousand unrooted candidate phylogenetic trees were constructed using the GTR + G + I substitution model, with a burn-in period of 25% (50,000 iterations), and sampling was conducted every 200 iterations. These phylogenies were used to account for evolutionary variability across each genomic segment, adhering to the default parameters set by GiRaF. The confidence threshold was set to 0.7. This procedure was repeated 10 times, resulting in 10 independent MrBayes tree files for each segment.

Cline analysis

Geographic clines were analyzed along one-dimensional transects spanning the contact zones between two distinct HTNV lineages using the HZAR package in R37. This software implements functions for the modeling of molecular, genetic or morphological data from hybrid zones, based on classical equilibrium cline models, and utilizes the Metropolis-Hastings MCMC algorithm. The MCMC method was run for one million generations with an initial burn-in of 100,000 iterations for HTNV tripartite genome analysis. The input for cline analysis included information on the spatial separation between localities, genotype distributions, and sample sizes. Hybridization frequencies at the collection sites were visualized, starting from the westernmost point in Paju, Gyeonggi Province, ROK. Genotype frequency data objects were generated using the hzar.doMolecularData1DPops function, and the hzar.plot.obsData function was used to create graphical representations showing both average molecular cline frequencies and typical morphological cline values.

Results

Spatiotemporal genomic surveillance of HTNV in the ROK

High-resolution spatiotemporal surveillance was conducted using complete genomic sequences and epidemiological data from 123 HTNV strains collected in the ROK between 1976 and 2023 (Figs. 1, 2 and 3). Data visualization was filtered based on factors such as strain name, reservoir host, phylogenetic lineage, reassortant candidate, hybrid zone, accession number, collection date, and trapping location. In addition, the resolution was adjusted from the village (36 towns, marked as -ri) to the city (11 cities, marked as -si or -gun) and provincial (three provinces) levels, enabling detailed analysis alongside the phylogenetic data. Lineage 1 of the HTNV tripartite genomes was identified in regions such as Pocheon, Paju, and Yeoncheon in Gyeonggi Province, as well as Chuncheon, Cheorwon, and Hwacheon in Gangwon Province. Lineage 2 spans areas including Yeoncheon in Gyeonggi Province and Chuncheon, Cheorwon, Inje, Hwacheon, and Yanggu in Gangwon Province. Lineage 3 is primarily associated with the southern regions of Gyeonggi Province, including Pocheon, Dongducheon, and Pyeongtaek, and extends to Inje in Gangwon Province. Lineage 4 corresponds to a unique HTNV genetic variant found solely on Jeju Island.

Fig. 1
figure 1

Spatiotemporal genomic surveillance of the Orthohantavirus hantanense (HTNV) L segment in the Republic of Korea (ROK). (AD) Comprehensive phylodynamic analysis of the HTNV L segment across the ROK between 1976 and 2023. (A) A time-scaled phylogenetic tree reconstructed using TimeTree, focusing on the HTNV L segment (positions 1–6,530 nt). (B) The geographic distribution of HTNV genomes, with colored circles indicating the population size at each collection site (village level). (C) The genomic diversity and entropy within both nucleotide and amino acid sequences of the HTNV L segment. (D) The frequency patterns of HTNV lineages circulating in the ROK over the study period, with each lineage in the study denoted by a unique colored symbol.

Fig. 2
figure 2

Spatiotemporal genomic surveillance of the Orthohantavirus hantanense (HTNV) M segment in the Republic of Korea (ROK). (AD) Comprehensive phylodynamic analysis of the HTNV M segment across the ROK between 1976 and 2023. (A) A time-scaled phylogenetic tree reconstructed using TimeTree, focusing on the HTNV M segment (positions 1–3,616 nt). (B) The geographic distribution of HTNV genomes, with colored circles indicating the population size at each collection site (village level). (C) The genomic diversity and entropy within both nucleotide and amino acid sequences of the HTNV M segment. (D) The frequency patterns of HTNV lineages circulating in the ROK over the study period, with each lineage in the study denoted by a unique colored symbol.

Fig. 3
figure 3

Spatiotemporal genomic surveillance of the Orthohantavirus hantanense (HTNV) S segment in the Republic of Korea (ROK). (AD) Comprehensive phylodynamic analyses of the HTNV S segment across the ROK between 1976 and 2023. (A) A time-scaled phylogenetic tree reconstructed using TimeTree, focusing on the HTNV S segment (positions 1–1,696 nt). (B) The geographic distribution of HTNV genomes, with colored circles indicating the population size at each collection site (village level). (C) The genomic diversity and entropy within both nucleotide and amino acid sequences of the HTNV S segment. (D) The frequency patterns of HTNV lineages circulating in the ROK over the study period, with each lineage in the study denoted by a unique colored symbol.

Evolutionary dynamics of HTNV in the ROK

The mean rates of molecular evolution for HTNV, inferred from Bayesian coalescent analyses across multiple molecular clock and demographic models, ranged from 1.6 to 5.1 × 10⁻⁴ substitutions per nucleotide per year for RdRp (L segment), 3.3 to 6.6 × 10⁻⁴ for Gn, 2.4 to 3.8 × 10⁻⁴ for Gc (both M segment), and 2.6 to 3.1 × 10⁻⁴ for NP (S segment) (Table 1). Selection pressure analyses indicated that HTNV predominantly underwent purifying selection throughout its evolutionary history, although one positively selected residue (position 547) within the Gn protein was identified using FUBAR (Supplementary Table 1). The estimated dN/dS (ω) ratios were 0.0323, 0.0574, 0.0384, and 0.0322 for RdRp, Gn, Gc, and NP, respectively; the Gn gene exhibited a comparatively higher ω ratio. Additionally, reassortment analyses based on established datasets revealed distinct evolutionary patterns, identifying 33 reassortment events between segments L and M, 44 between L and S, and 18 between M and S (Supplementary Table 2).

Table 1 Bayesian estimates of the rate of nucleotide substitution in Orthohantavirus Hantanense collected in the Republic of korea, 1976–2023.

Analysis of hybrid zones for HTNV in the ROK

Cline analysis revealed shifts in the population frequencies of divergent HTNV lineages 1 and 2 along the geographic transects for each segment. The transition shapes in the L and S segments exhibited spatial homogeneity, whereas the M segment displayed a steeper transition pattern than the others (Fig. 4). Additionally, distinct phylogenetic groups of HTNV in the ROK have specific geographical contact points. A genetic contact zone extending from Yeoncheon to Hwacheon (45.3–89.4 km from Paju) was identified for the L segment. In contrast, the contact area of the M segment was confined to Hwacheon (64.3–89.4 km from Paju). The S segment presented a contact zone stretching from Yeoncheon to Cheorwon (45.3–64.3 km from Paju). The hybrid zone of HTNV spans approximately 45 km (45.3–89.4 km from Paju) and includes areas in Yeoncheon, Gyeonggi Province, as well as Chuncheon, Cheorwon, and Hwacheon in Gangwon Province, ROK (Fig. 5).

Fig. 4
figure 4

Geographic clines and hybrid zone depicting the transition between distinct lineages of Orthohantavirus hantanense (HTNV) in the Republic of Korea (ROK). This figure illustrates the geographic clines and hybrid zone, capturing the transition between different phylogenetic lineages of HTNV across the ROK between 1976 and 2023. The geographic clines reflect estimated shifts in the frequency of genetic traits along a transect extending from the westernmost location in Paju to the easternmost location in Yanggu. For the L segment, a genetic crossbreed area was observed extending from Yeoncheon to Hwacheon (45.3–89.4 km from Paju). The contact region was restricted to Hwacheon (64.3–89.4 km from Paju) in the M segment. The S segment displayed an inter-lineage hybridization zone spanning from Yeoncheon to Cheorwon (45.3–64.3 km from Paju). Symbol sizes represent the number of samples collected, whereas the colors indicate the genotype frequency at each location (red for lineage 1, orange for the hybrid zone, and green for lineage 2). The grey-shaded areas represent regions of 95% credible clines, with dotted lines indicating the boundaries of the hybrid zone.

Fig. 5
figure 5

Geographical locations of the Orthohantavirus hantanense (HTNV) hybrid zone in the Republic of Korea (ROK). Two distinct phylogenetic lineages of HTNV intersect within Gyeonggi and Gangwon Provinces, ROK. The hybrid zones are located in four regions: Yeoncheon in Gyeonggi Province, as well as Chuncheon, Cheorwon, and Hwacheon in Gangwon Province. The orange circle highlights the area of geographic overlap between lineage 1 (represented by a red symbol) and lineage 2 (represented by a green symbol) of HTNV. The map was originally generated using Quantum Geographical Information System 3.10 for Mac and later modified using Adobe Illustrator CC 2019.

Discussion

In this study, we analyzed 123 whole-genome sequences of HTNV collected in the ROK between 1976 and 2023, representing the most comprehensive dataset of full-length HTNV genomes from this region to date. Phylogenetic analyses identified four genetic lineages (lineages 1 to 4), each associated with specific geographic clusters within the ROK. Bayesian coalescent analyses, conducted under various molecular clock and demographic models, estimated substitution rates ranging from 1.6 to 6.6 × 10⁻⁴ substitutions/site/year across the segments. These values fall within the short-term substitution range of 10⁻² to 10⁻⁴ previously reported for rodent-borne hantaviruses and are consistent with patterns observed in other RNA viruses38,39.

Selective pressure analyses indicated that HTNV strains circulating in the ROK are predominantly subject to purifying selection. All segments exhibited dN/dS ratios below 0.1, reflecting strong evolutionary constraint. The Gn gene displayed the highest dN/dS value (0.0574) and the greatest substitution rate (3.3 to 6.6 × 10⁻⁴ substitutions/site/year), indicating a relatively elevated rate of evolution. These findings broadly align with those of Demirev et al., who likewise reported predominant purifying selection in HTNV40. However, our analysis consistently revealed higher dN/dS ratios across all segments compared to their study, suggesting that HTNV strains in the ROK may be under slightly stronger evolutionary pressures, potentially shaped by region-specific factors. Additionally, we identified a single codon under positive selection (residue 547) in the Gn gene. This site is located within the short C-terminal cytoplasmic tail (CT), a functionally important region involved in several critical steps of the viral life cycle41. The CT interacts with the NP and ribonucleocapsid complexes, likely serving as a structural bridge between internal viral components and the glycoprotein lattice42. It also contains a conserved ββα-type zinc finger domain essential for NP binding, and its distal region is implicated in nonspecific RNA binding43. Although the identification of a positively selected site in this functionally relevant region is noteworthy, the signal was detected using a single method and should be interpreted with caution. Further investigation is warranted to assess whether this substitution affects virion morphogenesis, intracellular trafficking, or host-specific adaptation.

Previous studies have shown that genome exchange among hantaviruses typically occurs within or between closely related lineages44,45. Among the three segments, the M segment has most frequently been implicated in reassortment, likely due to its functional plasticity and potential to confer adaptive advantages46. For instance, reassortment involving the M segment has been documented for O. sinnombreense and O. andesense in both natural and experimental settings, often resulting in geographically structured clades47,48,49. Experimental reassortment has also been observed in DOBV, again primarily affecting the M segment50. In contrast, reassortment involving the L or S segments has been reported in HTNV, SEOV, and PUUV8,40,51. In our analysis, which focused exclusively on HTNV genomes sampled within the ROK, we identified a different pattern: reassortment was more frequently observed in the L segment than in the M or S segments. Although L segment reassortment has been previously reported in HTNV52,53its relative prominence in our dataset merits further attention. Reassortment events involving the L segment were especially common in strains from Paju, Gyeonggi Province, suggesting possible regional variation in segment exchange dynamics. However, we cannot exclude the possibility that uneven sampling intensity contributed to this observed pattern. As with other segmented RNA viruses, successful reassortment in HTNV is likely constrained by the need to preserve packaging signal compatibility and RNA–protein interactions10. Whether L segment exchange confers functional advantages—such as enhanced replication, transmission efficiency, or host adaptability—remains unresolved.

Approximately 400 cases of HFRS are reported annually in the ROK, with a mortality rate of 1–4% (accessible online at https://dportal.kdca.go.kr/pot/is/rginEDW.do). HTNV is the primary etiological agent and is particularly prevalent in the northern regions of the Gyeonggi and Gangwon Provinces, affecting both military personnel and civilians18,53,54,55. In the present study, we identified a hybrid zone spanning approximately 45 km, covering four key locations: Yeoncheon in Gyeonggi Province and Chuncheon, Cheorwon, and Hwacheon in Gangwon Province. Compared to previous findings from 2020, the expansion of contact areas in Yeoncheon and Chuncheon effectively doubles the previously documented extent of the hybrid zone26potentially reflecting increased opportunities for genome exchange. We also observed distinct cline patterns among the HTNV L, M, and S segments, suggesting varying tendencies for reassortment and hybrid zone expansion. The L segment exhibited the broadest cross-lineage distribution, consistent with its higher exchange frequency, whereas the M segment appeared more compatible with lineage 1 within the hybrid zone. These patterns may be shaped by differences in local viral fitness or segment-specific functional constraints. According to evolutionary theory, contact zones are expected to favor genotypes with higher relative fitness46,56which may contribute to the observed segment dynamics.

Although ecological and demographic variables were not directly examined in this study, previous research has demonstrated that urbanization, human migration, rodent population dynamics, and environmental or climatic changes can influence the spatial and temporal patterns of hantavirus transmission57,58. For instance, long-term surveillance in China has shown associations between rural-to-urban migration and sustained hantavirus epidemics, highlighting the importance of socio-environmental drivers in shaping virus–host interactions59. While such external factors may not fully account for the hybrid zone expansion observed in our dataset, they provide a meaningful context for interpreting the ecological and epidemiological mechanisms underlying our findings. Future research integrating viral genomic data with ecological and demographic information will be essential to evaluate their roles in hybrid zone dynamics and viral emergence.

This report highlights several limitations that warrant further exploration: (1) the insufficient availability of genomic sequences and epidemiological data from other endemic areas in the ROK, especially in the southern regions; (2) the uncertain biological advantages associated with HTNV L segment exchange, specifically regarding its impact on viral replication and pathogenicity; and (3) the need for more in-depth research on the implications of hybrid zone expansion, which may lead to alterations that could influence pathogenicity in humans.

In conclusion, this study provides the most comprehensive whole-genome analysis of HTNV in the ROK to date, revealing four distinct genetic lineages with clear regional clustering and a notable concentration of reassortment events in the L segment. Purifying selection was the dominant evolutionary force across all segments, with relatively elevated rates and a single positively selected site detected in the Gn gene. The identification of an expanded hybrid zone and segment-specific cline patterns suggests active genome exchange dynamics in northern regions. These insights not only enhance our understanding of HTNV evolution but may also inform regionally tailored genomic surveillance strategies, support the development of more sensitive molecular diagnostics, and assist in prioritizing public health responses in areas with heightened reassortment potential.