Abstract
Comparably few lineages of influenza A virus (IAV) have evolved long-term sustained transmission in mammals. The reasons remain largely unknown, and the possibility of avian IAVs evolving sustained mammalian transmission is an ongoing concern. Here we measured the GC content and frequency of GC dinucleotides in 115,520 whole genomes of IAVs using bioinformatic analyses. We found that persistent mammalian lineages showed declining trends in GC-related content and could be reliably separated from IAVs circulating only in birds and those sporadically infecting mammals. Similarly, the earliest viruses of persistent mammalian lineages showed reduced GC-related content, suggesting that this trait might in part contribute to their eventual persistence. Recent highly pathogenic 2.3.4.4b H5 viruses that spread in mink, foxes and humans were also characterized by reduced GC-related content. While not sufficient, reduced GC-related content may be a necessary condition for sustained mammalian transmission and should be included in risk assessment tools for pandemic influenza.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 digital issues and online access to articles
$119.00 per year
only $9.92 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
The IAV genomic data used in this study are available in GISAID with accession numbers provided in EPI_SET_20220531ye and EPI_SET_250116bq at https://www.gisaid.org/. The phylogenetic trees of the eight protein coding regions used to classify persistent mammalian lineages and of the more recent H5Nx genomes, the summary of statistics for genomic GC content, GC dinucleotide frequencies, codon adaption index and free energy of RNA folding, and SVM model files are publicly accessible via GitHub at https://github.com/id-bioinfo/IAV_GCContent.
Code availability
The SVM model is available at https://iav-transmission.org/. Scripts for the analysis are accessible via GitHub at https://github.com/id-bioinfo/IAV_GCContent. Tools, packages and software used in this study are publicly available.
References
Neumann, G., Eisfeld Amie, J. & Kawaoka, Y. Viral factors underlying the pandemic potential of influenza viruses. Microbiol. Mol. Biol. Rev. 89, e0006624 (2025).
Taubenberger, J. K. & Kash, J. C. Influenza virus evolution, host adaptation, and pandemic formation. Cell Host Microbe 7, 440–451 (2010).
Ma, W. Swine influenza virus: current status and challenge. Virus Res. 288, 198118 (2020).
Veldhuis Kroeze, E. J. B. & Kuiken, T. in Animal Influenza 2nd edn (ed Swayne, D. E.) Ch. 23 (John Wiley & Sons, Inc., 2016).
Long, J. S., Mistry, B., Haslam, S. M. & Barclay, W. S. Host and viral determinants of influenza A virus species specificity. Nat. Rev. Microbiol. 17, 67–81 (2019).
Peacock, T. P. et al. The global H5N1 influenza panzootic in mammals. Nature 637, 304–313 (2025).
Lam, T. T.-Y. et al. Dissemination, divergence and establishment of H7N9 influenza viruses in China. Nature 522, 102–105 (2015).
Tan, X. et al. A case of human infection by H3N8 influenza virus. Emerg. Microbes Infect. 11, 2214–2217 (2022).
Nidra, F. Y., Monir, M. B. & Dewan, S. M. R. Avian influenza A (H5N1) outbreak 2024 in Cambodia: worries over the possible spread of the virus to other Asian nations and the strategic outlook for its control. Environ. Health Insights 18, 11786302241246453 (2024).
Cox, N. J., Trock, S. C. & Burke, S. A. in Influenza Pathogenesis and Control Vol. I (eds Compans, R. & Oldstone, M.) Ch. 5 (Springer, 2014).
Tool for Influenza Pandemic Risk Assessment (TIPRA) (World Health Organization, 2016).
Honce, R. & Schultz-Cherry, S. Recipe for zoonosis: how influenza virus leaps into human circulation. Cell Host Microbe 28, 506–508 (2020).
Greenbaum, B. D., Levine, A. J., Bhanot, G. & Rabadan, R. Patterns of evolution and host gene mimicry in influenza and other RNA viruses. PLoS Pathog. 4, e1000079 (2008).
Gu, H., Fan, R. L. Y., Wang, D. & Poon, L. L. M. Dinucleotide evolutionary dynamics in influenza A virus. Virus Evol. 5, vez038 (2019).
Gaunt, E. et al. Elevation of CpG frequencies in influenza A genome attenuates pathogenicity but enhances host response to infection. Elife 5, e12735 (2016).
Chang, C.-C. & Lin, C.-J. LIBSVM: a library for support vector machines. ACM Trans. Intell. Syst. Technol. 2, 27 (2011).
Anderson, T. K. et al. A phylogeny-based global nomenclature system and automated annotation tool for H1 hemagglutinin genes from swine influenza A viruses. mSphere https://doi.org/10.1128/msphere.00275-16 (2016).
Voorhees, I. E. H. et al. Spread of canine influenza A (H3N2) virus, United States. Emerg. Infect. Dis. 23, 1950–1957 (2017).
Sharp, P. M. & Li, W.-H. The codon adaptation index—a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 15, 1281–1295 (1987).
Brower-Sinning, R. et al. The role of RNA folding free energy in the evolution of the polymerase genes of the influenza A virus. Genome Biol. 10, R18 (2009).
Shimizu, S., Hoyer, P. O., Hyvärinen, A., Kerminen, A. & Jordan, M. A linear non-Gaussian acyclic model for causal discovery. J. Mach. Learn. Res. 7, 2003–2030 (2006).
Cheng, X. et al. CpG usage in RNA viruses: data and hypotheses. PLoS ONE 8, e74109 (2013).
Vieira, V. C. & Soares, M. A. The role of cytidine deaminases on innate immune responses against human viral infections. BioMed Res. Int. 2013, 683095 (2013).
Su, W. et al. Ancestral sequence reconstruction pinpoints adaptations that enable avian influenza virus transmission in pigs. Nat. Microbiol. 6, 1455–1465 (2021).
Caserta, L. C. et al. Spillover of highly pathogenic avian influenza H5N1 virus to dairy cattle. Nature 634, 669–676 (2024).
Puryear, W. et al. Highly pathogenic avian influenza A (H5N1) virus outbreak in New England seals, United States. Emerg. Infect. Dis. 29, 786–791 (2023).
Uhart, M. M. et al. Epidemiological data of an influenza A/H5N1 outbreak in elephant seals in Argentina indicates mammal-to-mammal transmission. Nat. Commun. 15, 9516 (2024).
Agüero, M. et al. Highly pathogenic avian influenza A (H5N1) virus infection in farmed minks, Spain, October 2022. Euro Surveill. 28, 2300001 (2023).
Fusaro, A. et al. High pathogenic avian influenza A (H5) viruses of clade 2.3.4.4b in Europe—why trends of virus evolution are more difficult to predict. Virus Evol. 10, veae027 (2024).
Kareinen, L. et al. Highly pathogenic avian influenza A (H5N1) virus infections on fur farms connected to mass mortalities of black-headed gulls, Finland, July to October 2023. Euro Surveill. 29, 2400063 (2024).
Finkelstein, D. B. et al. Persistent host markers in pandemic and H5N1 influenza viruses. J. Virol. 81, 10292–10299 (2007).
Chun, J. Influenza including its infection among pigs. Natl Med. J. China 5, 34–44 (1919).
Jones, J. C. et al. Risk assessment of H2N2 influenza viruses from the avian reservoir. J. Virol. 88, 1175–1188 (2014).
Harkness, W., Schild, G. C., Lamont, P. H. & Brand, C. M. Studies on relationships between human and porcine influenza. 1. Serological evidence of infection in swine in Great Britain with an influenza A virus antigenically like human Hong Kong-68 virus. Bull. World Health Organ. 46, 709–719 (1972).
Furmanski, M. & Murcia, P. R. Did horses act as intermediate hosts that facilitated the emergence of 1918 pandemic influenza? J. Infect. Dis. 232, 521–524 (2025).
Meneu, L. et al. Sequence-dependent activity and compartmentalization of foreign DNA in a eukaryotic nucleus. Science 387, eadm9466 (2025).
Galtier, N. & Lobry, J. R. Relationships between genomic G+C content, RNA secondary structures, and optimal growth temperature in prokaryotes. J. Mol. Evol. 44, 632–636 (1997).
Bisht, K. & te Velthuis, A. J. W. Decoding the role of temperature in RNA virus infections. mBio 13, e0202122 (2022).
Li, Y. et al. Low RNA stability signifies increased post-transcriptional regulation of cell identity genes. Nucleic Acids Res. 51, 6020–6038 (2023).
Le Sage, V., Campbell, A. J., Reed, D. S., Duprex, W. P. & Lakdawala, S. S. Persistence of influenza H5N1 and H1N1 viruses in unpasteurized milk on milking unit surfaces. Emerg. Infect. Dis. 30, 1721–1723 (2024).
Nguyen, T.-Q. et al. Emergence and interstate spread of highly pathogenic avian influenza A (H5N1) in dairy cattle in the United States. Science 388, eadq0900 (2025).
Plaza, P. I. et al. Pacific and Atlantic sea lion mortality caused by highly pathogenic avian influenza A (H5N1) in South America. Travel Med. Infect. Dis. 59, 102712 (2024).
Sun, H. et al. Mink is a highly susceptible host species to circulating human and avian influenza viruses. Emerg. Microbes Infect. 10, 472–480 (2021).
Restori, K. H. et al. Risk assessment of a highly pathogenic H5N1 influenza virus from mink. Nat. Commun. 15, 4112 (2024).
Lindh, E. et al. Highly pathogenic avian influenza A (H5N1) virus infection on multiple fur farms in the South and Central Ostrobothnia regions of Finland, July 2023. Euro Surveill. 28, 2300400 (2023).
Pepin, K. M., Lass, S., Pulliam, J. R. C., Read, A. F. & Lloyd-Smith, J. O. Identifying genetic markers of adaptation for surveillance of viral host jumps. Nat. Rev. Microbiol. 8, 802–813 (2010).
Gu, C. et al. A human isolate of bovine H5N1 is transmissible and lethal in animal models. Nature 636, 711–718 (2024).
Eisfeld, A. J. et al. Pathogenicity and transmissibility of bovine H5N1 influenza virus. Nature 633, 426–432 (2024).
Shu, Y. & McCauley, J. GISAID: global initiative on sharing all influenza data—from vision to reality. Euro Surveill. 22, 30494 (2017).
Heiny, A. T. et al. Evolutionarily conserved protein sequences of influenza A viruses, avian and human, as vaccine targets. PLoS ONE 2, e1190 (2007).
Ye, Y. et al. GLProbs: aligning multiple sequences adaptively. IEEE/ACM Trans. Comput. Biol. Bioinform. 12, 67–78 (2015).
Sievers, F. & Higgins, D. G. Clustal Omega for making accurate alignments of many protein sequences. Protein Sci. 27, 135–145 (2018).
Suyama, M., Torrents, D. & Bork, P. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–W612 (2006).
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2—approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Paradis, E. & Schliep, K. ape 5.0: an environment for modern phylogenetics and evolutionary analyses in R. Bioinformatics 35, 526–528 (2019).
Ishikawa, S. A., Zhukova, A., Iwasaki, W. & Gascuel, O. A fast likelihood method to reconstruct and visualize ancestral scenarios. Mol. Biol. Evol. 36, 2069–2085 (2019).
Ye, Y. et al. F1ALA: ultrafast and memory-efficient ancestral lineage annotation applied to the huge SARS-CoV-2 phylogeny. Virus Evol. 10, veae056 (2024).
Ye, Y. et al. Robust expansion of phylogeny for fast-growing genome sequence data. PLoS Comput. Biol. 20, e1011871 (2024).
Nakamura, Y., Gojobori, T. & Ikemura, T. Codon usage tabulated from international DNA sequence databases: status for the year 2000. Nucleic Acids Res. 28, 292–292 (2000).
Lorenz, R. et al. ViennaRNA package 2.0. Algorithms Mol. Biol. 6, 26 (2011).
Chen, Y.-W. & Lin, C.-J. Combining SVMs with various feature selection strategies. in Feature Extraction. Studies in Fuzziness and Soft Computing Vol. 207 (eds Guyon, I. et al.) 315–324 (Springer, 2006).
Lycett, S. et al. Highly pathogenic avian influenza and its complex patterns of reassortment. In Epidemics 9: 9th International Conference on Infectious Disease Dynamics P3.092 (SSRN, 2024).
Youk, S. et al. H5N1 highly pathogenic avian influenza clade 2.3.4.4b in wild and domestic birds: introductions into the United States and reassortments, December 2021–April 2022. Virology 587, 109860 (2023).
Yu, G., Smith, D. K., Zhu, H., Guan, Y. & Lam, T. T.-Y. ggtree: an R package for visualization and annotation of phylogenetic trees with their covariates and other associated data. Methods Ecol. Evol. 8, 28–36 (2017).
Minh, B. Q. et al. IQ-TREE 2: new models and efficient methods for phylogenetic inference in the genomic era. Mol. Biol. Evol. 37, 1530–1534 (2020).
Suttie, A. et al. Inventory of molecular markers affecting biological characteristics of avian influenza A viruses. Virus Genes 55, 739–768 (2019).
Mertens, E. et al. Evaluation of phenotypic markers in full genome sequences of avian influenza isolates from California. Comp. Immunol. Microbiol. Infect. Dis. 36, 521–536 (2013).
WHO H5N1 Genetic Changes Inventory: A Tool for International Surveillance (CDC, 2015).
Suchard, M. A. et al. Bayesian phylogenetic and phylodynamic data integration using BEAST 1.10. Virus Evol. 4, vey016 (2018).
Gill, M. S. et al. Improving Bayesian population dynamics inference: a coalescent-based model for multiple loci. Mol. Biol. Evol. 30, 713–724 (2013).
Rambaut, A., Drummond, A. J., Xie, D., Baele, G. & Suchard, M. A. Posterior summarization in Bayesian phylogenetics using Tracer 1.7. Syst. Biol. 67, 901–904 (2018).
Bielejec, F. et al. SpreaD3: interactive visualization of spatiotemporal history and trait evolutionary processes. Mol. Biol. Evol. 33, 2167–2169 (2016).
Acknowledgements
This project is supported by the National Natural Science Foundation of China’s Excellent Young Scientists Fund (Hong Kong and Macau) (31922087), the Hong Kong Research Grants Council’s General Research Fund (17150816), the Health and Medical Research Fund (COVID1903011-WP1), the Innovation and Technology Commission’s InnoHK funding (D24H), the Theme Based Research Scheme (T11-705/21-N), the Hong Kong Jockey Club Global Health Institute and the Guangdong Government for the funding support (2019B121205009, HZQB-KCZYZ-2021014, 200109155890863, 190830095586328 and 190824215544727). The funders had no role in the study design, data collection and analysis, decision to publish or preparation of the paper. We gratefully acknowledge the authors from the laboratories responsible for obtaining the specimens and those who generated and shared the sequence data to GISAID. The acknowledgement tables for IAVs can be found with EPI_SET_20220531ye and EPI_SET_250116bq. We thank S. Lycett and W. Harvey for sharing the genotype assignment of H5 IAVs in Europe.
Author information
Authors and Affiliations
Contributions
Y.Y., Y.G. and T.T.-Y.L. conceived the study. Y.Y. conducted data collection and analyses. Y.S. performed the phylogeographic analysis. T.T.-Y.L. supervised the study. Y.Y., D.K.S. and E.C.H. developed the first draft of the paper. Y.Y., H.S., Y.S., M.H.-H.S., D.K.S., H.G., H.Z., J.T.W., N.S.L., F.B., E.C.H., M.P., L.L.-M.P., Y.G. and T.T.-Y.L. critically reviewed the analysis results, revised the paper and approved it for submission.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature Microbiology thanks Theo Sanderson and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 Longitudinal evolution of codon usage bias in persistent mammalian lineages.
Scatter plots show the evolution of codon adaptation index (CAI) of the combined eight CDSs (PB2, PB1, PA, HA, NP, NA, M1 and NS1) for the 12 persistent mammalian IAV lineages over sample collection time.
Extended Data Fig. 2 Correlation coefficient between the free energy of RNA folding and GC/GC3/GC12 content.
Heat map shows the Pearson correlation coefficients between minimum free energy (MFE) of RNA secondary structure and GC content at all (GC), the third (GC3), and first and second (GC12) codon sites in the eight CDSs for 12 persistent mammalian lineages. A cross indicates a correlation is not statistically significant (p-value >= 0.05). Blue color shows negative correlation, as red for positive.
Extended Data Fig. 3 Genomic GC content of avian IAVs.
Avian IAVs are shown by host families with more than 5 sequences that are sorted by 25th percentile in descending order from left to right. Genomic GC content of the 12 persistent mammalian lineages is present as reference. The whiskers represent the minimum and maximum values while the box shows the lower and upper quartiles with the median crossing the box.
Extended Data Fig. 4 Distribution of the combined avian and sporadic mammalian IAVs and the persistent mammalian lineages based on features from individual CDS.
LDA (linear discriminant analysis) projection using features from each individual CDS. The linear discriminant coordinates 1 (LD1) and 2 (LD2) are the x-y axes. Explained variations are given in parentheses. Distributions of the combined avian and sporadic mammalian IAVs (gray regions), and the persistent mammalian lineages (red regions) were computed using kernel density estimation. Positions of the earliest sequences in the 12 persistent mammalian lineages are present where Hu1 (red), Sw2 (blue), Eq (green) and Ca2 (yellow) wholly derived from avian IAVs are highlighted while others are colored in light blue.
Extended Data Fig. 5 Correlation of receptor binding, polymerase activity and viral transmissibility against genomic GC content.
Linear regression of genomic GC content over a, viral receptor-binding preference to α-2,3-linked sialosides and α-2,6-linked sialosides (n = 6), b, viral polymerase activity in NPTr and 293 T cells (n = 3), c, viral loads in the donor nasal swabs and onward transmissibility in seroconversion (n = 4). R2, p-value and 95% confidence intervals are shown around the regression line. Each data point represents the mean value of the following tested viruses: DK/77, Sw2/81, Sw2/92, RG-EA1, RG-EA2, RG-EA3 and RG-EA4, and is arranged by genomic GC content. Source data were derived from Su W. et al., 2021. SIA, sialoside; r.f.u., relative fluorescence units.
Extended Data Fig. 6 Proportion of complete genome sequences of clade 2.3.4.4b viruses with molecular markers.
Heatmap of selected molecular markers (protein mutations) associated with mammalian adaptation by (a) genotype and (b) those found in mammalian IAVs of genotype G1 or not G1. Mutations unique to the G1 genotype are highlighted by red boxes. Two sub-genotypes of B3.2 and B3.13 are shown for pinnipeds and cattle H5 viruses respectively.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–24.
Supplementary Tables (download XLSX )
Supplementary Tables 1–16.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ye, Y., Shuai, H., Song, Y. et al. Genomic features associated with sustained mammalian transmission of avian influenza A viruses. Nat Microbiol 11, 802–814 (2026). https://doi.org/10.1038/s41564-025-02257-4
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41564-025-02257-4


