Introduction

Currently, the global number of individuals infected with chronic hepatitis B virus (HBV) is estimated to exceed 257 million, with approximately 80–100 million cases in China alone. Approximately 40% of these cases are expected to progress to cirrhosis or liver cancer1,2,3. HBV continues to pose a significant challenge to global public health. Hepadnaviruses, classified within the Hepadnaviridae family, are categorized into five genera: orthohepadnaviruses infecting mammals, avihepadnaviruses infecting birds, metahepadnaviruses and parahepadnaviruses infecting fishes, and herpetohepadnaviruses infecting reptiles and amphibians. In 2017, Lauber et al. identified a group of non-enveloped fish viruses, distantly related to hepadnaviruses, which may represent a viral family called the nackednaviruses, or Nudnaviridae4,5,6. The evolutionary lineage of hepadnaviruses is believed to date back as far as 400 million years4.

Next-generation sequencing analyses have been instrumental in uncovering viral sequences within hosts, particularly for novel viruses without reference genomes. With recent advancements in NGS technologies, successive discoveries of new hepadnaviruses and nackednaviruses have been reported in a wide range of animal species, including fish7,8, frogs9,10, bats11,12, cats13, dogs14, duikers15, raccoons16, equids17, and shrews18,19,20. These findings have significantly contributed to our understanding of the origin and evolution of hepadnaviruses21,22 alongside studies on ancient HBV23 and endogenous hepatitis B viruses (eHBVs) found in avian24,25 or saurian26 genomes.

The absence of practical and widely accessible animal models for hepadnaviral infection poses a significant constraint on HBV research27,28. Human HBV exhibits strict host specificity, infecting only humans and chimpanzees, and cannot infect commonly used animal model species such as mice, rats, rabbits, or monkeys. Following the identification of the human sodium taurocholate co-transporting polypeptide (hNTCP) as an essential HBV receptor29,30 it was observed that murine hepatocytes expressing hNTCP were still resistant to HBV infection, suggesting the involvement of additional host restriction factors limiting infection in mice. Currently, HBV research in mice primarily relies on HBV transgenic mice, AAV/AdV vector-mediated HBV genome delivery, and humanized liver chimeric mouse models31,32,33 thereby limiting progress in anti-HBV drug development. The discovery of novel hepadnaviruses in previously unexplored species holds promise for establishing practical animal models for hepadnaviral infection.

With the continued advancement of NGS technologies, the total volume of nucleic acid sequences in public databases has exceeded 20 PB and is growing exponentially. In 2022, Edgar et al. introduced the Serratus cloud computing platform34 leveraging existing commercial cloud computing resources and storage space for the analysis and reuse of NGS data. This effort resulted in the discovery of thousands of vertebrate viruses, including hepadnaviruses.

In this study, we utilized the Serratus and NCBI SRA databases to re-examine and reanalyze public NGS datasets to identify hepadnaviruses and nackednaviruses. We recovered multiple complete genomes of these viruses from previously unreported host species, including putative hepadnaviruses from hamsters and buffaloes, as well as diverse hepadnaviruses and nackednaviruses from African cichlid fishes. In addition, replication-competent plasmids derived from the identified rice rat and frog hepadnaviruses were constructed, and their viral replication capacity was evaluated in vitro.

Materials and methods

Search for viral genomes

To identify novel hepadnaviruses and nackednaviruses, publicly available data from the Sequence Read Archive (SRA) database hosted by the National Center for Biotechnology Information (NCBI) was screened using BLASTN or TBLASTN35. Nucleotide or HBp protein sequences from a comprehensive set of representative known and novel hepadnaviruses and nackednaviruses were used as query bait. A BLAST hit with an E-value of ≤ 10–20 (BLASTN) or ≤ 10− 4 (TBLASTN) was generally considered significant, and all potential hits were manually verified by inspecting the BLAST output. In selecting SRAs for BLAST analysis, we prioritized mammalian datasets due to our interest in potential animal models for hepadnavirus infection; SRA datasets from other potential host species were also sampled based on the NCBI taxonomy for targeted analysis. Additionally, using the Serratus platform (https://serratus.io/explorer/nt)34 we explored the Serratus database for the Hepadnaviridae family and identified numerous SRAs with hepadnavirus-like sequences.

Assembly of viral genomes

SRA files containing hepadnavirus or nackednavirus sequences were retrieved from NCBI or the European Nucleotide Archive. Fastp was used for quality control, and Cutadapt was used to trim adapter sequences and low-quality bases36. MEGAHIT37 was utilized for de novo genome assembly, and MMseqs238 was used to search nucleotide sequence sets. Two methods were used to assemble viral genomes: (1) De novo assembly using MEGAHIT. Sequencing reads were assembled into longer fragments using MEGAHIT, and then analyzed with MMseqs2 using all known and novel representative hepadnavirus and nackednavirus genomes as bait to obtain newly assembled hepadnavirus-related fragments. (2) Reference-guided method using MMseqs2 and Geneious Prime software (version 2025.0.3)39. MMseqs2 was used to search NGS data against known and novel hepadnavirus/nackednavirus genomes via BLAST, and the resulting viral reads were assembled to the closest genome template using Geneious software. For samples with limited viral reads but high biological relevance, we pooled and enriched reads from related samples prior to reassembly. All novel hepadnaviruses and nackednaviruses identified in the study were ultimately confirmed using method 2, with coverage depth and completeness evaluated.

Genome circularization and viral integration were assessed as follows. To complete circular genomes, both ends of the linear assemblies were manually joined. The integration of hepadnavirus sequences within the host genome was assessed based on the following two criteria: (1) Related sequence fragments could not be assembled into a circular hepadnavirus genome. (2) The assembled long linear fragments contained flanking non-hepadnaviral host sequences.

Annotation of viral genomes

In this study, the initiation sites for hepadnavirus or nackednavirus genomes were defined as the starting position of the HBc gene, diverging from the EcoRI restriction enzyme cleavage site conventionally used for the HBV genome40. Open reading frames (ORFs) for each newly identified viral genome were predicted using Geneious software, and gene annotations were based on related reference viral genomes.

Sequence alignments

Multiple sequence alignments were performed in Geneious software using the MAFFT41 plugin with default parameters, including automatic algorithm selection and a 200PAM/k = 2 scoring matrix. The gap-open penalty and offset values were set to 1.53 and 0.123, respectively.

Phylogenetic analysis

Phylogenetic trees were constructed using the Geneious Tree Builder module. The Tamura-Nei model was employed for genetic distance estimation, and the neighbor-joining method was utilized for tree construction to analyze the relationships among sequences, with the insect hepadnavirus-like retroelement (HEART)42 or the newly discovered Pliego virus5 serving as the outgroup. Bootstrap support values were estimated using 1,000 replicates.

Phylogenetic analysis of avian eHBV HBp protein sequences spanning the reverse transcriptase (RT) domain region corresponding to amino acids 336–679 of the human HBV reference (GenBank: U95551) was conducted using MEGA12 software43. Amino acid sequences were initially aligned with MAFFT using default settings. A phylogenetic tree was constructed using the Maximum Likelihood (ML) method with the Jones-Taylor-Thornton (JTT) amino acid substitution model. The reliability of the phylogenetic tree topology was assessed by bootstrap analysis with 1,000 replicates.

Metagenomic classification analysis

Metagenomic sequence classification was conducted using the Centrifuge taxonomic classifier44 which indexed the entire NCBI non-redundant sequence database. Visualization of classification results was performed using Pavian45 a tool designed for analyzing and visualizing metagenomics data.

In vitro virological assays

Virus particles from cell culture supernatants were harvested and ultracentrifuged according to previously described methods46. Standard laboratory biosafety protocols (BSL-2) were followed for all in vitro work involving plasmid construction and replication assays described in this manuscript. Based on complete genome sequences of frog (Mouse-mus490-Frog, GenBase: C_AA050321) and rice rat (RiceRat-olig706, GenBase: C_AA050303) hepadnaviruses, 1.1-fold genome length replicating plasmids were constructed to investigate hepadnavirus replication. These plasmids were transfected into Huh7 cells using PEI reagent, and cell supernatants were collected at 3 and 6 days post-transfection. After concentration by sucrose cushion ultracentrifugation, potential virions were further separated using CsCl density gradient ultracentrifugation47.

Real-time PCR with specific primers, including HBV-Forward (5′-GTTATCGCTGGATGTGTCTGC-3′) and HBV-Reverse (5′-GACAAACGGGCAACATACCTT-3′), RiceRat-Forward (5′-CTACCACAAGCACAACAC-3′) and RiceRat-Reverse (5′-AGGAGCCACATAACTGAG-3′), as well as Frog-Forward (5′-TCAATCAACACCGAAGTCTT-3′) and Frog-Reverse (5′-GTTCAATACATATCTCAGGTCAGT-3′), was performed to quantify hepadnaviral DNA across the collected gradient fractions48.

Transmission electron microscopy with negative staining was performed by Wuhan Servicebio Technology, China. Briefly, a 20 µL virus suspension was placed on a copper grid with a carbon film for 3–5 min, after which excess liquid was absorbed using filter paper. For staining, 2% phosphotungstic acid was applied to the copper grid for 1–2 min, followed by the removal of excess liquid with filter paper. The grids were then allowed to dry at room temperature. Grids were subsequently imaged using a transmission electron microscope (HT7800/HT7700, Hitachi), and representative micrographs were recorded.

Results

Multiple novel hepadnavirus and nackednavirus genomes were obtained

This study identified numerous novel hepadnaviruses and nackednaviruses with complete genomes (Supplementary Table S1). The five genera within the Hepadnaviridae family and the unclassified nackednaviruses were analyzed in phylogenetic trees (Fig. 1A, B). To enhance the analysis and visualization of their genomic characteristics, these genomes were presented relative to the origin of the HBc gene (HBc-origin), distinct from the traditional EcoRI cleavage site (EcoRI-origin) (Fig. 1C)40. Utilizing HBc-origin offers several advantages: (1) Some virus genomes lack a corresponding traditional EcoRI cleavage site. (2) This novel presentation ensures that neither the preS gene nor the HBp gene is truncated at the start position or spans both sides of it (Fig. 1D). (3) Given the conservation of the HBc gene, the HBc-origin site can be reliably identified under most circumstances. While the HBp-origin display presents an alternative, it may lead to HBc truncation. Overall, the chosen HBc-origin method for new hepadnaviruses and nackednaviruses provides a clear depiction of the HBc-HBp gene structure and aligns with the pgRNA gene structure of hepadnaviruses and nackednaviruses.

Fig. 1
Fig. 1The alternative text for this image may have been generated using AI.
Full size image

Classification and novel display method for hepadnaviruses and nackednaviruses. (A) Phylogenetic trees depicting reference hepadnaviruses and nackednaviruses, with representative viruses and their hosts highlighted in red. Nacked = nackednavirus, Herpeto = herpetohepadnavirus, Avi = avihepadnavirus, Para = parahepadnavirus, Meta = metahepadnavirus, Ortho = orthohepadnavirus. Scale bar indicates substitutions per site. (B) An unrooted tree layout illustrating novel hepadnaviruses and nackednaviruses identified in this study. (C) A schematic comparison of the conventional EcoRI-origin numbering system for the HBV genome, which starts at the EcoR1 restriction site (G/AATTC, where the first T is nucleotide 1, green pentagram), and the HBc-origin (red pentagram). (D) A representation of the viral genome based on the HBc-origin for human HBV (GenBank: U95551), an orthohepadnavirus; frog hepadnavirus (GenBank: KX058435), a herpetohepadnavirus; and rockfish nackednavirus (GenBank: MH158726), a nackednavirus.

Identification of exogenous avian hepadnaviruses and eHBVs

With regard to avian hepadnaviruses, this study identified white-rumped munia (Lonchura striata), American songbird (Piranga ludoviciana), pigeon (Columba livia), and Maned duck (Chenonetta jubata) hepadnaviruses (Fig. 2A, B). Interestingly, the hepadnavirus porgy-acan628 (GenBase: C_AA050335) found in porgy fish (Acanthopagrus latus) belongs to the Avihepadnavirus genus, though the reason remains unknown.

However, the majority of avian hepadnaviruses identified in this study were found to have integrated into the host genome as eHBVs (Fig. 2C). These eHBVs exhibit the following characteristics: 1. Most HBs genes appeared to be integrated and transcriptionally active. 2. The 5’ terminus within the HBp and preS genes often contained stop codons. 3. The 3’ terminus of most HBp genes remained intact, while the 5’ terminus was missing. 4. Very few HBc genes remained intact, as observed in the case of QuailThrush-cinc826-i. Phylogenetic analysis of our identified avian eHBV sequences (using the HBp RT domain) with the dataset from Lytras et al.26. revealed that most avian eHBVs formed distinct phylogenetic clades separate from exogenous avian hepadnaviruses (Fig. 2D).

Fig. 2
Fig. 2The alternative text for this image may have been generated using AI.
Full size image

Novel avian hepadnaviruses. (A) Newly discovered avian hepadnaviruses. The scale bar represents substitutions per site. (B) Genomic profiles of representative novel avian hepadnaviruses. (C) Avian eHBVs. The schematic diagram illustrates the avian eHBVs, such as Blackbird-agel788-i, with flanking host genome sequences omitted. The ‘i’ indicates integration, and ‘*’ denotes a stop codon. Most eHBVs exhibit intact HBs genes and partially complete HBp genes. (D) Evolutionary relationships inferred from the HBp RT domain sequences of novel avian eHBVs. The tree was generated with MEGA12 using the Maximum Likelihood method with the Jones-Taylor-Thornton (JTT) model. Bootstrap support values are based on 1,000 replicates. Scale bar represents substitutions per site.

African cichlids harbored a diverse range of hepadnaviruses and nackednaviruses

Diverse novel hepadnaviruses and nackednaviruses were found in fish, encompassing various species such as the stickleback (Gasterosteus aculeatus) and elephantfish (Brienomyrus brachyistius) nackednaviruses, porgy (Pagrus auratus) and grinner (Aulopiformes) parahepadnaviruses, and common carp (Cyprinus carpio) and icefish (Chionodraco myersi) metahepadnaviruses (Fig. 3A).

Particularly surprising is the abundance of novel hepadnaviruses and nackednaviruses discovered in African cichlids, with sequences exhibiting extensive diversity, including both nackednaviruses and metahepadnaviruses (Fig. 3B). The diversity of hepadnaviruses and nackednaviruses identified in African cichlids is particularly notable (Fig. 3A). In the PRJNA552202 project, researchers sequenced 2,242 samples from African cichlids encompassing five tissues across 76 cichlid species from Lake Tanganyika49 revealing a significant presence of hepadnaviruses and nackednaviruses in these specimens50.

Additionally, in the PRJNA845781 project, RNA sequencing conducted on 140 elephantfish samples demonstrated the widespread presence of hepadnaviruses in 12 out of 13 elephantfish specimens (Fig. 3C). Specifically, sequencing of 13 tissues or organs from elephantfish BB366 (Fig. 3D) showed that the skin and electric organs, rather than the liver, harbored the highest abundance of hepadnavirus, indicating a lack of liver tropism in elephantfish hepadnaviruses. These elephantfish hepadnaviruses, lacking envelope proteins, exhibit characteristics typical of nackednaviruses; furthermore, their distribution in tissues such as the skin and electric organs, rather than the liver or blood, suggests a transmission route distinct from traditional bloodborne transmission. Moreover, the detection of multiple distinct nackednavirus sequences in individual host species, such as elephantfish with intra-species nucleotide identities ranging 82.0-98.4% and stickleback with identities ranging 95.5-98.7%, points towards considerable viral diversity or the presence of different genotypes within these populations.

Furthermore, potential linear hepadnaviruses were detected in dragonfish (Akarotaxis nudiceps) and elephantfish, designated as dragonfish-akar681-l and elephantfish-brie291-l. These viruses exhibit entirely different non-viral flanking nucleic acid sequences compared to traditional circular hepadnavirus genomes. While it is possible that these differences in the flanking nucleic acid sequences result from integration into the host genome, further investigation is required to confirm this possibility.

Fig. 3
Fig. 3The alternative text for this image may have been generated using AI.
Full size image

Novel fish hepadnaviruses and nackednaviruses. (A) Newly discovered fish hepadnaviruses and nackednaviruses. Scale bar represents substitutions per site. (B) Genomic profiles of representative novel fish hepadnaviruses or nackednaviruses. (C) A depiction of an elephantfish (Brienomyrus brachyistius) generated by Baidu Yige-AI (https://yige.baidu.com). (D) Distribution of nackednavirus fragments in elephantfish tissues. The brain tissue exhibited the highest number of fragments, with up to 104 hits.

Herpetohepadnaviruses were found in samples from mammals

Regarding herpetohepadnaviruses, this study identified several lizard and toad hepadnaviruses (Fig. 4). Herpetohepadnaviruses have been previously documented in amphibians and reptiles, including frogs9,10 and lizards4. Notably, phylogenetic analysis revealed that the hepadnavirus lizard-sapr381, found in the Australian lizard (Saproscincus basiliscus), does not cluster within either the herpetohepadnavirus or avihepadnavirus genera. Instead, it forms a broader clade encompassing both, suggesting a close evolutionary relationship between the two. Frog hepadnaviruses were unexpectedly detected in mice (Mus musculus) and dogs (Canis lupus familiaris), with sequences closely resembling the known frog hepadnavirus (GenBank: KX058435) (Fig. 4A, B). Further investigation revealed frog hepadnavirus fragments in other animals, such as snub-nosed monkeys (Rhinopithecus bieti), giant pandas (Ailuropoda melanoleuca), and pigs (Sus scrofa) (Fig. 4C). In the PRJNA248058 project, NGS analyses of 12 tissue samples from a cancer-free 32-year-old black snub-nosed monkey (Fig. 4D) indicated that frog hepadnavirus fragments were predominantly present in the kidneys and small intestine, and scarcely detected in the liver or blood (Fig. 4E). The presence of frog hepadnavirus in kidney or digestive tract samples from the black snub-nosed monkey may suggest ingestion of an infected host; however, contamination or true infection cannot be ruled out.

Fig. 4
Fig. 4The alternative text for this image may have been generated using AI.
Full size image

Novel herpetohepadnaviruses. (A) Newly discovered herpetohepadnaviruses. Representative hepadnaviruses and their hosts are highlighted in red, while newly identified hepadnaviruses are shown in blue. The nomenclature for newly discovered hepadnaviruses follows the convention of using the first four letters of the species name followed by the last three digits of the SRA accession number. Scale bar represents substitutions per site. (B) Genomic profiles of representative novel herpetohepadnaviruses. (C) Comparative analysis of the whole genomes of frog hepadnaviruses found in mammal samples. Frog hepadnaviruses were identified in mice, dogs, snub-nosed monkeys, pigs, giant pandas, and sea cucumbers, with their sequences exhibiting significant similarity. White indicates identical sequences, black signifies differences, and light green denotes missing data. (D) A depiction of a black snub-nosed monkey (Rhinopithecus bieti) generated by Baidu Yige-AI. (E) Tissue distribution of hepadnavirus in a snub-nosed monkey. In a snub-nosed monkey from Yunnan Province, China, fragments of the virus were predominantly found in the kidneys and small intestine, with smaller numbers detected in the liver and oral cavity, and none in other tissues such as blood.

Discovery of novel orthohepadnaviruses in buffaloes and hamsters and replication assessment of the rice rat hepadnavirus

This study identified two novel orthohepadnaviruses in African buffaloes (Syncerus caffer) and long-tailed pygmy rice rats (Oligoryzomys longicaudatus) (Fig. 5A, C). The buffalo hepadnavirus (Buffalo-buff360, GenBase: C_AA050234) showed high similarity to the previously reported duiker hepadnavirus15. On the other hand, the rice rat hepadnavirus (RiceRat-olig706, GenBase: C_AA050303) shared similarities with shrew hepadnaviruses18,19,20 evidenced by comparable genomic structures (Fig. 5B). It is noteworthy that the rice rat hepadnavirus was HBeAg-negative, similar to shrew hepadnaviruses, due to a stop codon mutation in the preC open reading frame at the same site (Fig. 5F)20.

Whole genomic sequences of the rice rat hepadnavirus RiceRat-olig706 and frog hepadnavirus Mouse-mus490-Frog (identified in mice samples) were synthesized, and 1.1-fold over-length genomes were constructed with an upstream CMV-promoter, akin to the well-known HBV replicating construct46. These constructs were transfected into the human hepatoma cell line Huh7, with HBV 1.1-fold construct used as a control. In the cell supernatant of Huh7 cells, a peak similar to the Dane particle peak was detected at a density of approximately 1.18–1.22 g/mL (Fig. 5D, E). Remarkably, approximately 3 × 106 copies/mL of rice rat hepadnavirus were detected in the CsCl gradient, which is five times higher than that of HBV, suggesting active replication and secretion of this virus. However, the frog hepadnavirus did not exhibit a corresponding naked particle peak observed for human HBV and rice rat hepadnavirus at a density of 1.28–1.32 g/mL when expressed in Huh7 cells, which might reflect an intrinsic property of the virus or, potentially, an artifact resulting from the use of a non-native host cell line. Negative-stain electron microscopy revealed that rice rat hepadnavirus and frog hepadnavirus particles were slightly larger than human HBV, spherical in shape, and approximately 40–60 nm in diameter.

Fig. 5
Fig. 5The alternative text for this image may have been generated using AI.
Full size image

Novel orthohepadnaviruses. (A) Newly discovered orthohepadnaviruses. Representative hepadnaviruses and their hosts are highlighted in red, while newly identified hepadnaviruses are shown in blue. Scale bar represents substitutions per site. (B) Genomic profiles of representative novel orthohepadnaviruses. (C) A depiction of a long-tailed pygmy rice rat (Oligoryzomys longicaudatus) generated by Baidu Yige-AI. (D) Identification of hepadnaviruses through ultracentrifugation. Two distinct hepadnavirus DNA peaks were observed at 1.18–1.22 and 1.28–1.32 g/mL for HBV and rice rat hepadnavirus, respectively, whereas only the peak at approximately 1.18–1.22 g/mL was detected for the frog hepadnavirus. (E) Identification of hepadnaviruses via negative stain electron microscopy. (F) The rice rat hepadnavirus was HBeAg-negative, similar to shrew hepadnaviruses, as a result of a stop codon mutation in the preC open reading frame.

Discussion

Discovery of multiple novel hepadnaviruses and nackednaviruses

In this study, bioinformatic analyses of NGS data led to the identification of multiple novel hepadnaviruses and nackednaviruses, most of which possess complete genomes. These efforts included not only the de novo assembly of entirely new viral genomes but also, in some instances, the reanalysis and completion of genomes for viruses previously reported with only partial sequences. For example, a complete genome for the porgy hepadnavirus (Porgy-pagr765) was assembled from SRA dataset SRR752776551, building upon a prior partial polymerase sequence (GenBank: MH716821). Similarly, complete genome characterization or re-assembly was achieved for the toad hepadnavirus Toad-bufo524 (GenBank: MZ244216; SRA SRR14214524) and the carp hepadnavirus Carp-schi040 (GenBank: GHYL01057506.1; SRA SRR8618040)52. With the exception of a few instances (dragonfish-akar681-l and elephantfish-brie291-l), the newly discovered hepadnaviruses in this study could be assembled into complete circular genomes.

Novel orthohepadnaviruses were identified in rice rats and buffaloes. Additionally, novel metahepadnaviruses were found in cichlids, herrings, common carp, eels, whales, icefish, Caranginae, and Japanese halfbeaks, while new parahepadnaviruses were identified in porgy and Aulopiformes. Furthermore, novel nackednaviruses were detected in dragonfish, sturgeon, and sticklebacks. Among herpetohepadnaviruses, lizard and toad hepadnaviruses were newly discovered. Novel avian hepadnaviruses were also identified in white-rumped munias, American songbirds, and pigeons. Notably, among all these, the rice rat hepadnavirus, the African cichlid hepadnavirus, and the African cichlid nackednavirus stand out as particularly significant.

Multiple avian eHBVs

The hepadnaviruses identified in munia and American songbird samples were found to belong to the same evolutionary lineage. White-rumped munias are indigenous to South Asia and southern China, whereas American songbirds inhabit Central America and North America. This suggests a broad historical distribution of the ancestral lineage of these munia and American songbird hepadnaviruses across distinct geographic regions. Phylogenetic analysis further revealed that the pigeon hepadnavirus is closely related to duck hepatitis B virus (DHBV), with both clustering within the DHBV lineage.

However, a more significant finding is the integration of numerous avian hepadnaviruses into host genomes, consistent with previous literature findings53,54,55. Viral fragment assemblies from metagenomic reads could represent either exogenous viruses or eHBVs, and we differentiate between them based on the following criteria: (1) Genome structure and coding integrity: Exogenous virus reads can be assembled into a complete circular genome using MEGAHIT without a reference template, and the major viral gene ORFs lack internal stop codons. In contrast, eHBVs often exhibit multiple stop codons within their reading frames. For example, in QuailThrush-cinc826-i, while the genome can be almost fully assembled with substantial read support, stop codons appear in the HBs and HBp ORFs. (2) Chimeric host-virus reads: The presence of viral-host chimeric reads is a strong indicator of eHBVs. While most of the novel viruses identified in this study did not present such chimeric evidence, Dragonfish-akar681-l was an exception, exhibiting hybrid “unknown” sequences at its genomic ends. Despite this, its genome was relatively complete and the main viral ORFs lacked stop codons, suggesting a potentially recent integration or a distinct type of endogenous element. (3) Distribution patterns and host specificity: eHBVs are more likely to be found in avian or amphibian species and are often detected consistently across most samples of a given species in the same fragmented form due to their genomic integration, such as Sparrow-zono391-i and Tit-aegi626-i. In contrast, exogenous viruses may be detected in some samples of the same species, while others show no detection at all. Our phylogenetic comparison with the comprehensive avian eHBV dataset from Lytras et al.26. while broadly consistent, showed some topological discrepancies, potentially reflecting differences in the analyzed regions or datasets. Nonetheless, the identification of additional avian eHBVs contributes to a deeper understanding of their ancient diversity and evolution.

Diverse fish hepadnaviruses and nackednaviruses

Fish hepadnaviruses and nackednaviruses are regarded as among the most primitive, diverse, and widely distributed members of these viral families4. Various fish species harbor distinct hepadnaviruses and nackednaviruses exhibiting significant sequence variations, exemplified by those found in African cichlids, elephantfish, eels, and killifish, highlighting their deep evolutionary origins50.

Among these, African cichlids yielded particularly notable findings, with diverse genera found to be widely infected by both hepadnaviruses and nackednaviruses. Examples include Cichlid-cypr446, Cichlid-lamp675, Cichlid-ACNDV, and Cichlid-ANDV, which appear to represent some of the most primitive non-enveloped nackednaviruses. In contrast, traditional African cichlid hepadnaviruses, which resemble mammalian hepadnaviruses, are primarily classified within the genus Metahepadnavirus. African cichlids inhabit the three major lakes in East Africa, namely Lake Malawi, Lake Tanganyika, and Lake Victoria, all located near the East African Great Rift Valley. Given their ecological overlap and their importance as a food source, the potential for zoonotic transmission through the food chain warrants further attention. Further exploration is needed to discern the full significance of these viruses in the evolutionary narrative of hepadnaviruses and nackednaviruses.

The pressing need for animal models of orthohepadnavirus infection

As previously observed, hepadnaviruses exhibit strict host specificity, underscoring the challenge in finding suitable animal models for studying hepadnavirus infections27,28. Animal models of hepadnavirus infection remain a persistent challenge in the field, particularly concerning human HBV. Although highly abundant hepadnaviruses from cichlids or elephantfish offer extensive sequence data, their piscine hosts limit their utility in mammalian systems.

Attention thus turns to novel hepadnaviruses discovered in mammals, such as the orthohepadnavirus identified in the long-tailed pygmy rice rat (Oligoryzomys longicaudatus), a member of the Cricetidae family. Our successful in vitro assembly and secretion of virus-like particles using constructs from this rice rat hepadnavirus (RiceRat-olig706) supports its potential as a candidate for further in vivo model development. While initial thoughts for an in vivo model of the rice rat-derived hepadnavirus might turn to other Cricetidae (e.g., Syrian/Chinese striped hamsters) or common laboratory mice, the cotton rat (Sigmodon hispidus) warrants specific attention. As a fellow Sigmodontinae rodent phylogenetically closer to Oligoryzomys than these alternatives, and an established model for various human viral pathogens like Respiratory Syncytial Virus56 it offers a potentially more permissive system for this novel hepadnavirus. However, there remains a substantial gap between demonstrating in vitro replication and establishing productive in vivo infection models. This challenge is compounded by factors such as host-specific entry restrictions, immune-mediated clearance, and the lack of suitable permissive animal species.

Study limitations and shortcomings

For ease of description and discussion, species names from NGS sample annotations were used to refer to the newly identified hepadnavirus and nackednavirus species in this study. However, such nomenclature does not necessarily imply that the viruses detected in a given species are truly capable of infecting that host species, as they may also originate from contaminated samples, mislabeling, or sequencing artifacts. Since the findings were derived solely from computational analysis of publicly available NGS data and lacked in vivo or functional validation, they should be interpreted with caution. Further experimental validation in the respective host species is needed to confirm infectivity, as demonstrated in previous studies on novel orthohepadnaviruses in shrews20 and equids17. In addition, our current screening strategy has limitations, and future use of more advanced, comprehensive, and sensitive bioinformatic approaches, such as PebbleScout57 may reveal additional hepadnaviruses and nackednaviruses in public databases.

Conclusions

This study identified numerous novel hepadnaviruses through the comprehensive analysis of extensive NGS datasets from public repositories. These findings not only provide fresh insights into the origins and evolutionary trajectories of hepadnaviruses and nackednaviruses, but also lay the groundwork for establishing new animal models of hepadnavirus infection. Such models will be instrumental in advancing our understanding of HBV pathogenesis and in developing potential therapeutic interventions for HBV infection58,59.