Introduction

Bats are subject to epidemiological surveillance because they are hosts of viruses such as coronaviruses, paramyxoviruses, filoviruses1, Venezuelan Equine Encephalitis2and dengue virus3,4. Although bats have not yet been established as reservoirs of influenza viruses, a new genomic sequence of influenza A virus (IAV) designated as H17N10 was detected in fruit bats in Guatemala5. Later, another genome classified as H18N11 was characterized in bats in Peru6, Bolivia, and Brazil7. In addition, a virus similar to avian strains was detected in Egyptian bats8. This evidence suggests new hypotheses about the origins of Influenza A viruses and their potential public health impact. They highlight the need for further research on bats as reservoirs and the implications for influenza control strategies5,6.

The origin of bat IAVs needs to be better understood, and only a few studies have clarified their origin. Phylogenetic and evolutionary analyses have established that they originated from a common ancestor with avian IAV subtypes9. However, divergence is a feature that has drawn the attention of researchers. IAVs probably split into two branches due to geographic separation and multiple early spread events, or IAVs may have undergone drastic changes to adapt to bats10.

Crystallographic structures of the hemagglutinin (H) and neuraminidase (N) proteins have been generated to characterize bat IAVs6,11,12. The structures were similar to avian IAV strains but with specific molecular modifications as an adaptability mechanism to bats. IAVs do not use sialic acid, but molecules of the major histocompatibility complex class II (MHC-II) as cell entry receptors13,14. This mechanism suggests adaptability and possible inter-species jumping, since MHC-II is expressed in immune system cells and epithelial tissue in many animals, including pigs, mice, and chickens10.

In contrast, the N protein of bat IAV appears to have no catalytic activity, and its function is an enigma for researchers. These results indicate that the N protein could induce low expression of MHC-II molecules by an unknown mechanism10. Confirmation of these data would demonstrate that the surface glycoproteins of bat IAVs have receptor binding and destruction activities. Therefore, bat IAVs would carry out the same infection and release processes of viral particles as avian IAVs9. The search for IAVs in bats has motivated researchers to investigate the possible role of bats as hosts and to delve deeper into the evolution of these viruses. Current knowledge does not precisely establish the adaptability of IAVs.

The present study aimed to characterize the phylogenetic, evolutionary, and antigenic relationships of an influenza A virus detected in the fishing bat Noctilio albiventris.

Results

Bat capture, sample, and sequencing analysis

A total of 159 bats were randomly captured in four municipalities in the Colombian Caribbean: Talaigua Nuevo (Bolívar), Santa Ana (Magdalena), Moñitos (Córdoba), and Colosó (Sucre). All samples (26 pools) processed and sequenced by RNA-Seq belonged to individuals of the Phyllostomidae, Molossidae, Noctilionidae, and Emballonuridae families. Only a pool of four individuals of the fishing bat Noctilio albiventris captured in Talaigua Nuevo, Bolívar (Fig. 1) yielded contigs associated with Orthomyxoviridae.

Fig. 1
figure 1

Geographic location of the department of Bolívar, Colombia. The red triangle shows the sampling point (9°18’28”N- 74°36’56”O). The map was generated with QGIS 3.32.2 (QGIS.org, 2021. QGIS Geographic Information System. QGIS Association. http://www.qgis.org).

Genomic, phylogenetic and evolutionary analysis of influenza A virus detected in N. albiventris

Initially, three contigs with a similarity of 90% were assigned to the H18N11 subtype. The reference genome (H18N11 Peru CY125942-CY125949) map yielded seven segments corresponding to the PB1, PB2, PA, HA, NS, NP, and M genes. The search effort for the NA gene was increased by decreasing the alignment and similarity criteria at the seed level. Eight segments were obtained (Supplementary Fig. S1) that corresponded to 0.12% (22,256/18’129,755) of the total reads with a depth of 100X, and this virus was designated A/ fishing bat/Colombia/2023 (A/bat/Colombia/23) (SRA: PRJNA1162262). The genome comparison analysis shows greater similarity with the H18N11 subtype sequence recorded in Peru (93%). The percentage of similarity of each segment varied between 92 and 98%, and the N protein gene of A/bat/Colombia/23 showed greater divergence (Table 1). Figure 2 shows the similarity patterns between A/bat/Colombia/23 concerning the H1N1, H2N2, H3N3, H17N10, and H18N11 subtypes.

Table 1 Comparison and percentages of similarity of A/bat/Colombia/23 segments concerning the H18N11 subtypes obtained in Peru and Bolivia.
Fig. 2
figure 2

Similarity patterns between A/bat/Colombia/23 concerning the H1N1, H2N2, H3N3, H17N10, and H18N11 subtypes. The comparison was made with the amino acid sequences for each segment. A 500 bp visualization window with 50 bp steps and Kimura as a distance model was established.

Phylogenetic analyses indicate that the A/bat/Colombia/23 segments are related to IAVs detected in bats and form a divergent clade to avian IAVs (Supplementary Fig. S2). The HA segment is closely related to node H18 from Peru, Brazil, and Bolivia (Fig. 3A). However, the PB2, PA, and M segments form a basal branch (Supplementary Fig. S2). In addition, the NA gene is the most phylogenetically distant (Fig. 3B). Based on phylogenetic analysis and NA gene divergence, a time to the most recent common ancestor (TMCRA) analysis was conducted. This analysis, with a chain length of 100,000,000 interactions indicated that TMCRA gave rise to the bat NA segment in 1420. The A/bat/Colombia/23 NA segment came from a node in 1940, and the node that gave rise to the N11 subtype dates to around 1990 (Fig. 3C). This demonstrates that the previously described N11 segment is phylogenetically related to A/bat/Colombia/23NA but is estimated to be approximately 50 years earlier than the earliest reported N11 sequence.

Fig. 3
figure 3

Phylogenetic tree of A/bat/Colombia/23 HA (A) and NA (B) genes. Bat influenza A branches and the new sequence are presented in red, which signifies their unique evolutionary path. The phylogenetic tree was constructed using the maximum likelihood method. Bootstrap values (1000) were given at the relevant nodes. The branch scale represents the number of amino acid substitutions along the tree branches. Reference sequences from all reported IAV subtypes (HA1-16 and NA1-9) were included in all trees. (C) Molecular clock to determine the TMCRA of the NA segment. The branch of the NA segment of A/bat/Colombia/23 is represented in red and arises from a node dating back to 1940. The scale represents the estimated time from the common ancestor to the current sequences. The posterior probability values were given to the nodes of interest. The phylogenetic trees were constructed using complete gene sequences.

Because of the evolutionary implications associated with the phylogenetic and evolutionary divergence of some segments of A/bat/Colombia/23, possible genetic reassortment events with HA and NA segments were inferred. The antigenic variability and selective pressure to which these genes are subjected allow us to demonstrate the processes involved in adaptability to new hosts and genetic diversity15. Interestingly, the analysis suggested that the HA segment of bat IAVs was acquired from canonical IAV strains, indicating a possible evolutionary interaction between the two groups (Fig. 4). The NA segment originates from the interaction between strains circulating in neotropical bats. On average, there was one reassortment event every 31 years in the HA and NA segments. The low reassortment rate may reflect differences in genetic compatibility between segments or ecological restrictions that limit co-infections. However, bats also influence the evolution of IAVs.

Fig. 4
figure 4

Estimates of the genetic recombination networks of the HA and NA segments of canonical IAV and bat IAV. The analysis was carried out with 300,000,000 interactions. The recombination rate was 0.0321 (95% HPD: 0.0206–0.0445). The purple and red dashed lines indicate rearrangement events for the NA and HA segment, respectively. The red flake indicates the HA and NA segment of A/bat/Colombia/23. The posterior probability values were given to the nodes of interest. The phylogenetic tree was constructed using complete gene sequences.

Antigenic structure and molecular docking of the N protein

Amino acid sequence analysis shows a domain conserved at the level of the Alphainfluenzavirus genus and another of sialidase enzymes (Supplementary Fig. S3). The 3D structure for the N protein was generated from the crystallographic structure of the N11 protein of the H18N11 subtype from Peru (4K3Y) (Supplementary Table S1 and Fig. S4). Computational modeling generated a tetramer structurally similar to the N protein of avian strains that contains six antiparallel β-sheets in a helix-like arrangement. It also maintains six residues (R118, W178, S179, R224, E276 and E425) conserved in the active site. The alignment with the H18N11 subtype from Peru and Bolivia has yielded points of divergence at the level of the transmembrane and distal domains of the protein (Fig. 5A). The most significant mutations are highlighted in the hypothetical active site region (Fig. 5B). At the conformational level, our analysis of the crystal structure of the N11 protein from Peru (Fig. 6A) and the A/bat/Colombia/23 N protein (Fig. 6B) has revealed critical structural differences. The hypothetical active site pocket of the N11 protein is broader than that of the IAV and influenza B virus N proteins due to the movements of loops 150 and 4306 (Fig. 5A). Conversely, the hypothetical active site pocket of the A/bat/Colombia/23 N protein is narrow due to the plasticity of loop 150 and the K363R mutation (Fig. 6B)6. This unique structural feature has led to a statistically significant interaction between the A/bat/Colombia/23 N protein and HLA-DR (MHC-II) of bats (8JRJ) (Fig. 6C), with nine residues involved in hydrogen bond formation being identified (Fig. 6D). Notably, three of the five mutations (Ser361, Ar363, and Lys242) have been found to increase the binding with the S1 subunit (110–165) of the α2 chain of bat HLA-DR, forming bonds with high specificity and strength (˚A2 > 1500). The movement of loop 150, which reduces the space of the active site, allows for a more significant contact and interaction surface. These findings significantly impact our understanding of viral protein interactions and could potentially inform future research and drug development, particularly in the design of antiviral drugs targeting the N protein.

Fig. 5
figure 5

Homology modeling of the A/bat/Colombia/23 N protein. (A) Alignment of the amino acid sequence of the A/bat/Colombia/23 NA protein with the N11 subtypes from Peru and Bolivia. (B)3D structure of the tetramer from a bottom transmembrane view and a top view showing the putative active site of the protein with five mutations (K363R, T242K, M138F, G361S, and I139V) relative to the N11 sequence from Peru and Bolivia. Conserved regions are shown in purple, and divergent amino acids in white. The image of the 3D model was generated with ChimeraX 1.7.142 (https://www.cgl.ucsf.edu/chimerax/).

Fig. 6
figure 6

Molecular analysis and docking of the N protein. (A) A/bat/Peru N11 protein (4K3Y). (B) N protein of A/bat/Colombia/23. (C) Monomer of the N protein of A/bat/Colombia/23 bound to the bat HLA-DR. The contact region (2935.1 ˚A2) between the two proteins is indicated. The Van der Waal energy was 67.8. D. Binding details between residues of the N protein of A/bat/Colombia/23 and bat HLA-DR. (D)The binding of the three mutations Ser361, Ar363, and Lys242 of the hypothetical active site of the N protein with Lys111, pro87, and Val97/Glu98, respectively, with the bat HLA-DR. The image of the 3D model was generated with ChimeraX 1.7.142 (https://www.cgl.ucsf.edu/chimerax/).

Discussion

This study represents a significant contribution to the current pool of known IAV genomes, with a sequence detected for the first time in Noctilio albiventrisfrom Colombia. Phylogenetic analysis unveils a divergent clade of IAVs from bats in seven of the eight segments, a discovery that aligns with previous studies5,6. Notably, the HA gene is the only segment related to all 16 known avian subtypes and is found within group one. Likewise, the results indicated that four segments (PB1, PA, M, and NA) of A/bat/Colombia/23 originated from a rearrangement event between viral strains circulating in bat populations. This provides a comprehensive understanding of viral adaptation to different hosts.

The clade of bat IAVs diverged about 300–500 years ago5. Evolutionary analyses indicate that HA was acquired from canonical IAV strains and then made changes to bind to bat MHC-II13,14, but retained characteristics of primitive nodes. This finding has profound implications for the evolution and transmission of IAV. Undoubtedly, the hemagglutinin (H) of avian strains have tropism for the enteric tract due to the abundance of α−2,3 sialic acid that presents a “linear” structure that improves fusion with the cell, while the strains that infect humans bind to α−2,6 that has a “bent” structure and is mainly found in the respiratory tract16. However, the hemagglutinin of bat IAVs is highly divergent and binds to MHC-II5,6. Using this approach, the binding capacity of bat IAVs should not be analyzed based on the knowledge generated by avian IAVs12; instead, it should be analyzed based on the phenomenon of adaptability and the ability to infect new hosts15.

The N protein of A/bat/Colombia/23 is the most divergent among the eight segments compared to the bat and bird IAV sequences. In addition, the new sequence does not share TMCRA with the N11 subtype. The molecular divergence of the NA segment raises uncertainty and raises hypotheses about the origin of this sequence. Discrepancy analysis with the NA and HA segments suggests that there were reassortment events between circulating strains in neotropical bats. However, the NA segment was acquired from a bat IAV ancestor that has not yet been described. This divergence suggests a new subtype called H18N1217.

A comparison of the antigenic architecture of the N11 protein in Peru showed critical structural differences. The movement of loop 150 affected the general morphology of the protein, suggesting a structural plasticity that could influence the efficiency and specificity of neuraminidase to perform its function during viral release18. In addition, a reduction in the size of the hypothetical active site implies changes in the stability of the structure5,6.

Three of the five mutations (K363R, T242K, and G361S) near the hypothetical active site increased the possibility of binding to HLA-DR in bats. This finding generates uncertainty regarding the functions of this protein and the implications that such changes could have on viral replication or transmission. The changes generated in the protein strengthen the interactions with cellular receptors and support the theory of downregulation of MHC-II by bat IAV N protein10. The change in Lys to Arg favors new interactions by increasing the number of hydrogen bonds due to the presence of a guanidino group in its side chain. Meanwhile, Met for Phe allows binding with the hydrophobic regions of the HLA-DR of bats19,20. It was shown that there is an interaction of loop 150 (Phe146), a crucial area for the enzymatic activity of this viral protein since it participates in the recognition and cleavage of receptors on cell surfaces. However, these data must be corroborated by in vitro assays.

Another aspect is the biological implications and ability of IAV to bind to MHC-II in other mammals. A recent study showed that catalytic activity can be acquired again with significant mutations in the amino acid sequence21. F144C and T342A changes in the N11 protein increased viral particles in MDCK II cells compared to cells infected with the wild-type virus. Furthermore, IAV has broader tissue tropism in the airways of mice (BALB/c) and ferrets (Mustela putorius furo)19. This is because, in the three-dimensional structure of N11, residue 144 is located at the outer edge of the hypothetical active site, whereas residue 342 is close to the binding site. Both sites are critical for sialidase activity of the N protein and are present in avian IAVs20,22,23. Therefore, the zoonotic potential and interspecies jump of bat IAVs cannot be ruled out.

Studies indicate that H18N11 subtypes have inefficient transmission and infection in non-bat hosts without generating critical structural changes10. For A/bat/Colombia/23, hemagglutinin was 100% identical to the H18 subtype. However, mutations in neuraminidase could improve the infection process and transmission to new species. The genetic plasticity of IAVs allows them to adapt to different species because, in the case of the bat H9N2 subtype, it has been shown that it can replicate and be transmitted between ferrets. In addition, they can efficiently infect human lung cell cultures. It can evade antiviral inhibition by MxA in B6 transgenic mice and generate cross-reaction with N2 specific antibodies in human sera24. Bat IAVs demonstrate the ability of influenza viruses to transmit and adapt to new hosts24. Therefore, new sequences must be studied to understand whether they can generate significant epidemiological outbreaks.

In conclusion, the characterization of viruses with zoonotic potential, such as IAV in bats, is crucial for understanding the role of these wild animals in the evolution of the IAV virus. The virus identified in this study has sequences that are new to science, with genetic and structural changes in the N protein that possibly propose a new subtype (H18N12). Experimental trials are needed to provide more information on influenza virus adaptation in bat hosts. The detection of IAVs in members of Noctilionidae expands the range of hosts in which these viruses can be detected. The fishing bat Noctilio albiventris is associated with aquatic ecosystems, and bodies of water can be a means of interaction between bats and birds. These findings underscore the urgent need for further research to understand the potential epidemiological implications of these new sequences, and the importance of ongoing research in this field.

Methods

Type of study, location, and ethical aspects

A prospective, cross-sectional descriptive study was conducted with rectal swab samples collected from bats captured between January and December 2023 in the Colombian Caribbean. The collected samples are part of an epidemiological surveillance study of emerging viruses in bats and mosquitoes developed by the Universidad de Córdoba, Colombia.

Ethical statement

The Instituto de Investigaciones Biológicas del Trópico and the Ethics Committee of the Universidad de Córdoba approved the project with permission from the National Environmental Licensing Authority (ANLA) of Colombia (Resolution 00914 of August 4, 2017). The animals captured in the study were released. Samples were taken in accordance with CDC Guidelines for Safe Work Practices in Human and Animal Medical Diagnostic Laboratories25. All methods were performed in accordance with the relevant guidelines and regulations.

Capture of bats and sample collection

The capture was done through mist nets (6 × 2 m). Animals were taxonomically identified using dichotomous keys based on morphometry26. Then, two to three rectal swab samples were taken, which were placed in a viral transport medium stored in N2 and transported to the Instituto de Investigaciones Biológicas del Trópico of the Universidad de Córdoba, where they were kept at −80 °C until processing.

RNA extraction, purification, and sequencing

Rectal samples were vortexed for 30s, and then a pool was made by locality and species with the supernatant of each sample. Pool suspension was filtered through a 0.45 μm filter by centrifugation at 2000 x g in a microcentrifuge. RNA extraction was done from 200 µl of the supernatant using the GeneJET Viral DNA/RNA Purification Kit (Thermo Fisher Scientific™). The RNA was subjected to degradation of contaminating DNA with DNAse I (Promega™). RNA from the host was not removed. Then, it was purified and concentrated with the GeneJET RNA Cleanup and Concentration Kit (Thermo Fisher Scientific™). The concentration and integrity of the RNA were determined by fluorometry with Qubit® (Thermo Fisher Scientific™). Finally, they were processed with the Paired-End FCL 150 MGIEasy Fast RNA Library Prep Set™ under the high-throughput sequencing methodology based on DNA nanobeads (DNB) from MGI Tech™. Metatranscriptomic sequencing was performed on the MGI-G50® (Shenzhen, China) (Supplementary material).

Bioinformatics analysis

The sequences were subjected to quality assessment and reads elimination (< Q20) using Fastp27. A de novo assembly with a minimum length of 300 nucleotides was performed using MEGAHIT28. The contigs were compared with the non-redundant (nr) protein database of the National Center for Biotechnology Information (NCBI) with DIAMOND29to optimize the search for CDS regions encoding viral proteins. The maximum expected value is used to obtain an alignment of 0.001. The files obtained were processed and analyzed using MEGAN629. BLASTn and BLASTx compared sequences of interest with more than 80% similarity. To confirm the viral contigs, reads were mapped to the reference sequence of the H18N11 subtype from Peru (CY125942-CY125949) with Bowtie230. The confirmed segments were used as reference sequences to obtain the segment coverage and depth using Bowtie2 and SAMTools31. The aligned reads were visualized and analyzed using UGENE32. The genome was annotated using Prokka33.

Phylogenetic and evolutionary analysis

Reference genomes were obtained from GenBank (Supplementary Table S2). Aligning each segment’s amino acid sequences was done with MAFFT34, and manual editing was done in UGENE. A maximum likelihood tree was then created with a bootstrap of 1000 using IQ-TREE35and rooted, taking into account the substitution rate measured in time with Treetime36. Identity patterns with reference genomes were performed with SimPlot++37. Evolutionary tracking in time was executed in Bayesian Evolutionary Analysis Sampling Trees (BEAST)38. The substitution rate was estimated using TreeTime from nucleotide sequences aligned in MAFFT. The evolutionary model and options for the MCMC analysis were built using the BEAUti tool, where tests were carried out with different priors.

A strict molecular clock with uniform distribution was established. The analysis was executed with a GTR substitution model with Γ4 distribution, partitioning at the codon positions (3 partitions: positions 1, 2, 3), and tree construction under an analysis coalescent with constant population size prior. The results were visualized and analyzed in Tracer39, where an Effective Sample Size (ESS) > 200 was taken into account in all statistical analyses and the convergence of the Markov chains. Then, the tree information was summarized with TreeAnnotator38 and visualized in FigTree (http://tree.bio.ed.ac.uk/software/figtree/). Finally, the rearrangement networks were inferred using a coalescing model with CoalRe in BEAST240. The evolutionary model was built using the BEAUti tool where the priors were established as described above. The tree was constructed under a Coalescent Analysis with Reassortment Constant Population. The summary of the distribution of networks maximizing the credibility of the clade was made with networktree40. A burn-in of 10% was applied in all cases.

Analysis of the antigenic structure and molecular docking of the N protein

The 3D model of the N protein of A/bat/Colombia/23 was generated by homology with SWISS-MODEL server41. The template was selected according to its identity, coverage, Global Model Quality Estimation (GMQE), and Quaternary Structure Quality Estimation (QSQE). The model was visualized using ChimeraX42. Subsequently, the mutations concerning the N11 subtype of Peru and Bolivia were identified and mapped onto the 3D structure of the N protein of N. albiventris. The antigenic architecture of the N protein was then compared to the N11 crystallographic structure from Peru (4K3Y). For docking analysis, charge addition of the N protein of A/bat/Colombia/23 and bat HLA-DR (8JRJ) was performed in ChimeraX. Then, an information-driven flexible docking approach was used for modeling biomolecular complexes using HADDOCK43. Van der Waals energy values, Buried Surface Area, and Z score mainly evaluated the models. Finally, the contact points were visualized and predicted using ChimeraX.