Abstract
The rapid expansion of next-generation sequencing (NGS) databases over the past decade has significantly advanced the identification of novel viruses across a wide range of host species. The Serratus platform and the NCBI Sequence Read Archive (SRA) database were utilized to reassess and analyze publicly available NGS datasets, aiming to identify novel hepadnaviruses and nackednaviruses. Our analysis uncovered multiple complete genomes of previously unrecognized hepadnaviruses and nackednaviruses, including those putatively infecting animals such as hamsters and buffaloes. Additionally, we identified the presence and distribution of various hepadnaviruses and nackednaviruses in African cichlid fishes. In vitro assays employing replication-competent plasmids derived from the identified rice rat and frog hepadnaviruses demonstrated their capacity to support viral replication. The identification of these novel hepadnavirus and nackednavirus species provides valuable insights into the origin and evolutionary history of hepadnaviruses. Moreover, these findings open new avenues for investigating potential animal models to study hepadnavirus replication and infection.
Similar content being viewed by others
Introduction
Currently, the global number of individuals infected with chronic hepatitis B virus (HBV) is estimated to exceed 257 million, with approximately 80–100 million cases in China alone. Approximately 40% of these cases are expected to progress to cirrhosis or liver cancer1,2,3. HBV continues to pose a significant challenge to global public health. Hepadnaviruses, classified within the Hepadnaviridae family, are categorized into five genera: orthohepadnaviruses infecting mammals, avihepadnaviruses infecting birds, metahepadnaviruses and parahepadnaviruses infecting fishes, and herpetohepadnaviruses infecting reptiles and amphibians. In 2017, Lauber et al. identified a group of non-enveloped fish viruses, distantly related to hepadnaviruses, which may represent a viral family called the nackednaviruses, or Nudnaviridae4,5,6. The evolutionary lineage of hepadnaviruses is believed to date back as far as 400 million years4.
Next-generation sequencing analyses have been instrumental in uncovering viral sequences within hosts, particularly for novel viruses without reference genomes. With recent advancements in NGS technologies, successive discoveries of new hepadnaviruses and nackednaviruses have been reported in a wide range of animal species, including fish7,8, frogs9,10, bats11,12, cats13, dogs14, duikers15, raccoons16, equids17, and shrews18,19,20. These findings have significantly contributed to our understanding of the origin and evolution of hepadnaviruses21,22 alongside studies on ancient HBV23 and endogenous hepatitis B viruses (eHBVs) found in avian24,25 or saurian26 genomes.
The absence of practical and widely accessible animal models for hepadnaviral infection poses a significant constraint on HBV research27,28. Human HBV exhibits strict host specificity, infecting only humans and chimpanzees, and cannot infect commonly used animal model species such as mice, rats, rabbits, or monkeys. Following the identification of the human sodium taurocholate co-transporting polypeptide (hNTCP) as an essential HBV receptor29,30 it was observed that murine hepatocytes expressing hNTCP were still resistant to HBV infection, suggesting the involvement of additional host restriction factors limiting infection in mice. Currently, HBV research in mice primarily relies on HBV transgenic mice, AAV/AdV vector-mediated HBV genome delivery, and humanized liver chimeric mouse models31,32,33 thereby limiting progress in anti-HBV drug development. The discovery of novel hepadnaviruses in previously unexplored species holds promise for establishing practical animal models for hepadnaviral infection.
With the continued advancement of NGS technologies, the total volume of nucleic acid sequences in public databases has exceeded 20 PB and is growing exponentially. In 2022, Edgar et al. introduced the Serratus cloud computing platform34 leveraging existing commercial cloud computing resources and storage space for the analysis and reuse of NGS data. This effort resulted in the discovery of thousands of vertebrate viruses, including hepadnaviruses.
In this study, we utilized the Serratus and NCBI SRA databases to re-examine and reanalyze public NGS datasets to identify hepadnaviruses and nackednaviruses. We recovered multiple complete genomes of these viruses from previously unreported host species, including putative hepadnaviruses from hamsters and buffaloes, as well as diverse hepadnaviruses and nackednaviruses from African cichlid fishes. In addition, replication-competent plasmids derived from the identified rice rat and frog hepadnaviruses were constructed, and their viral replication capacity was evaluated in vitro.
Materials and methods
Search for viral genomes
To identify novel hepadnaviruses and nackednaviruses, publicly available data from the Sequence Read Archive (SRA) database hosted by the National Center for Biotechnology Information (NCBI) was screened using BLASTN or TBLASTN35. Nucleotide or HBp protein sequences from a comprehensive set of representative known and novel hepadnaviruses and nackednaviruses were used as query bait. A BLAST hit with an E-value of ≤ 10–20 (BLASTN) or ≤ 10− 4 (TBLASTN) was generally considered significant, and all potential hits were manually verified by inspecting the BLAST output. In selecting SRAs for BLAST analysis, we prioritized mammalian datasets due to our interest in potential animal models for hepadnavirus infection; SRA datasets from other potential host species were also sampled based on the NCBI taxonomy for targeted analysis. Additionally, using the Serratus platform (https://serratus.io/explorer/nt)34 we explored the Serratus database for the Hepadnaviridae family and identified numerous SRAs with hepadnavirus-like sequences.
Assembly of viral genomes
SRA files containing hepadnavirus or nackednavirus sequences were retrieved from NCBI or the European Nucleotide Archive. Fastp was used for quality control, and Cutadapt was used to trim adapter sequences and low-quality bases36. MEGAHIT37 was utilized for de novo genome assembly, and MMseqs238 was used to search nucleotide sequence sets. Two methods were used to assemble viral genomes: (1) De novo assembly using MEGAHIT. Sequencing reads were assembled into longer fragments using MEGAHIT, and then analyzed with MMseqs2 using all known and novel representative hepadnavirus and nackednavirus genomes as bait to obtain newly assembled hepadnavirus-related fragments. (2) Reference-guided method using MMseqs2 and Geneious Prime software (version 2025.0.3)39. MMseqs2 was used to search NGS data against known and novel hepadnavirus/nackednavirus genomes via BLAST, and the resulting viral reads were assembled to the closest genome template using Geneious software. For samples with limited viral reads but high biological relevance, we pooled and enriched reads from related samples prior to reassembly. All novel hepadnaviruses and nackednaviruses identified in the study were ultimately confirmed using method 2, with coverage depth and completeness evaluated.
Genome circularization and viral integration were assessed as follows. To complete circular genomes, both ends of the linear assemblies were manually joined. The integration of hepadnavirus sequences within the host genome was assessed based on the following two criteria: (1) Related sequence fragments could not be assembled into a circular hepadnavirus genome. (2) The assembled long linear fragments contained flanking non-hepadnaviral host sequences.
Annotation of viral genomes
In this study, the initiation sites for hepadnavirus or nackednavirus genomes were defined as the starting position of the HBc gene, diverging from the EcoRI restriction enzyme cleavage site conventionally used for the HBV genome40. Open reading frames (ORFs) for each newly identified viral genome were predicted using Geneious software, and gene annotations were based on related reference viral genomes.
Sequence alignments
Multiple sequence alignments were performed in Geneious software using the MAFFT41 plugin with default parameters, including automatic algorithm selection and a 200PAM/k = 2 scoring matrix. The gap-open penalty and offset values were set to 1.53 and 0.123, respectively.
Phylogenetic analysis
Phylogenetic trees were constructed using the Geneious Tree Builder module. The Tamura-Nei model was employed for genetic distance estimation, and the neighbor-joining method was utilized for tree construction to analyze the relationships among sequences, with the insect hepadnavirus-like retroelement (HEART)42 or the newly discovered Pliego virus5 serving as the outgroup. Bootstrap support values were estimated using 1,000 replicates.
Phylogenetic analysis of avian eHBV HBp protein sequences spanning the reverse transcriptase (RT) domain region corresponding to amino acids 336–679 of the human HBV reference (GenBank: U95551) was conducted using MEGA12 software43. Amino acid sequences were initially aligned with MAFFT using default settings. A phylogenetic tree was constructed using the Maximum Likelihood (ML) method with the Jones-Taylor-Thornton (JTT) amino acid substitution model. The reliability of the phylogenetic tree topology was assessed by bootstrap analysis with 1,000 replicates.
Metagenomic classification analysis
Metagenomic sequence classification was conducted using the Centrifuge taxonomic classifier44 which indexed the entire NCBI non-redundant sequence database. Visualization of classification results was performed using Pavian45 a tool designed for analyzing and visualizing metagenomics data.
In vitro virological assays
Virus particles from cell culture supernatants were harvested and ultracentrifuged according to previously described methods46. Standard laboratory biosafety protocols (BSL-2) were followed for all in vitro work involving plasmid construction and replication assays described in this manuscript. Based on complete genome sequences of frog (Mouse-mus490-Frog, GenBase: C_AA050321) and rice rat (RiceRat-olig706, GenBase: C_AA050303) hepadnaviruses, 1.1-fold genome length replicating plasmids were constructed to investigate hepadnavirus replication. These plasmids were transfected into Huh7 cells using PEI reagent, and cell supernatants were collected at 3 and 6 days post-transfection. After concentration by sucrose cushion ultracentrifugation, potential virions were further separated using CsCl density gradient ultracentrifugation47.
Real-time PCR with specific primers, including HBV-Forward (5′-GTTATCGCTGGATGTGTCTGC-3′) and HBV-Reverse (5′-GACAAACGGGCAACATACCTT-3′), RiceRat-Forward (5′-CTACCACAAGCACAACAC-3′) and RiceRat-Reverse (5′-AGGAGCCACATAACTGAG-3′), as well as Frog-Forward (5′-TCAATCAACACCGAAGTCTT-3′) and Frog-Reverse (5′-GTTCAATACATATCTCAGGTCAGT-3′), was performed to quantify hepadnaviral DNA across the collected gradient fractions48.
Transmission electron microscopy with negative staining was performed by Wuhan Servicebio Technology, China. Briefly, a 20 µL virus suspension was placed on a copper grid with a carbon film for 3–5 min, after which excess liquid was absorbed using filter paper. For staining, 2% phosphotungstic acid was applied to the copper grid for 1–2 min, followed by the removal of excess liquid with filter paper. The grids were then allowed to dry at room temperature. Grids were subsequently imaged using a transmission electron microscope (HT7800/HT7700, Hitachi), and representative micrographs were recorded.
Results
Multiple novel hepadnavirus and nackednavirus genomes were obtained
This study identified numerous novel hepadnaviruses and nackednaviruses with complete genomes (Supplementary Table S1). The five genera within the Hepadnaviridae family and the unclassified nackednaviruses were analyzed in phylogenetic trees (Fig. 1A, B). To enhance the analysis and visualization of their genomic characteristics, these genomes were presented relative to the origin of the HBc gene (HBc-origin), distinct from the traditional EcoRI cleavage site (EcoRI-origin) (Fig. 1C)40. Utilizing HBc-origin offers several advantages: (1) Some virus genomes lack a corresponding traditional EcoRI cleavage site. (2) This novel presentation ensures that neither the preS gene nor the HBp gene is truncated at the start position or spans both sides of it (Fig. 1D). (3) Given the conservation of the HBc gene, the HBc-origin site can be reliably identified under most circumstances. While the HBp-origin display presents an alternative, it may lead to HBc truncation. Overall, the chosen HBc-origin method for new hepadnaviruses and nackednaviruses provides a clear depiction of the HBc-HBp gene structure and aligns with the pgRNA gene structure of hepadnaviruses and nackednaviruses.
Classification and novel display method for hepadnaviruses and nackednaviruses. (A) Phylogenetic trees depicting reference hepadnaviruses and nackednaviruses, with representative viruses and their hosts highlighted in red. Nacked = nackednavirus, Herpeto = herpetohepadnavirus, Avi = avihepadnavirus, Para = parahepadnavirus, Meta = metahepadnavirus, Ortho = orthohepadnavirus. Scale bar indicates substitutions per site. (B) An unrooted tree layout illustrating novel hepadnaviruses and nackednaviruses identified in this study. (C) A schematic comparison of the conventional EcoRI-origin numbering system for the HBV genome, which starts at the EcoR1 restriction site (G/AATTC, where the first T is nucleotide 1, green pentagram), and the HBc-origin (red pentagram). (D) A representation of the viral genome based on the HBc-origin for human HBV (GenBank: U95551), an orthohepadnavirus; frog hepadnavirus (GenBank: KX058435), a herpetohepadnavirus; and rockfish nackednavirus (GenBank: MH158726), a nackednavirus.
Identification of exogenous avian hepadnaviruses and eHBVs
With regard to avian hepadnaviruses, this study identified white-rumped munia (Lonchura striata), American songbird (Piranga ludoviciana), pigeon (Columba livia), and Maned duck (Chenonetta jubata) hepadnaviruses (Fig. 2A, B). Interestingly, the hepadnavirus porgy-acan628 (GenBase: C_AA050335) found in porgy fish (Acanthopagrus latus) belongs to the Avihepadnavirus genus, though the reason remains unknown.
However, the majority of avian hepadnaviruses identified in this study were found to have integrated into the host genome as eHBVs (Fig. 2C). These eHBVs exhibit the following characteristics: 1. Most HBs genes appeared to be integrated and transcriptionally active. 2. The 5’ terminus within the HBp and preS genes often contained stop codons. 3. The 3’ terminus of most HBp genes remained intact, while the 5’ terminus was missing. 4. Very few HBc genes remained intact, as observed in the case of QuailThrush-cinc826-i. Phylogenetic analysis of our identified avian eHBV sequences (using the HBp RT domain) with the dataset from Lytras et al.26. revealed that most avian eHBVs formed distinct phylogenetic clades separate from exogenous avian hepadnaviruses (Fig. 2D).
Novel avian hepadnaviruses. (A) Newly discovered avian hepadnaviruses. The scale bar represents substitutions per site. (B) Genomic profiles of representative novel avian hepadnaviruses. (C) Avian eHBVs. The schematic diagram illustrates the avian eHBVs, such as Blackbird-agel788-i, with flanking host genome sequences omitted. The ‘i’ indicates integration, and ‘*’ denotes a stop codon. Most eHBVs exhibit intact HBs genes and partially complete HBp genes. (D) Evolutionary relationships inferred from the HBp RT domain sequences of novel avian eHBVs. The tree was generated with MEGA12 using the Maximum Likelihood method with the Jones-Taylor-Thornton (JTT) model. Bootstrap support values are based on 1,000 replicates. Scale bar represents substitutions per site.
African cichlids harbored a diverse range of hepadnaviruses and nackednaviruses
Diverse novel hepadnaviruses and nackednaviruses were found in fish, encompassing various species such as the stickleback (Gasterosteus aculeatus) and elephantfish (Brienomyrus brachyistius) nackednaviruses, porgy (Pagrus auratus) and grinner (Aulopiformes) parahepadnaviruses, and common carp (Cyprinus carpio) and icefish (Chionodraco myersi) metahepadnaviruses (Fig. 3A).
Particularly surprising is the abundance of novel hepadnaviruses and nackednaviruses discovered in African cichlids, with sequences exhibiting extensive diversity, including both nackednaviruses and metahepadnaviruses (Fig. 3B). The diversity of hepadnaviruses and nackednaviruses identified in African cichlids is particularly notable (Fig. 3A). In the PRJNA552202 project, researchers sequenced 2,242 samples from African cichlids encompassing five tissues across 76 cichlid species from Lake Tanganyika49 revealing a significant presence of hepadnaviruses and nackednaviruses in these specimens50.
Additionally, in the PRJNA845781 project, RNA sequencing conducted on 140 elephantfish samples demonstrated the widespread presence of hepadnaviruses in 12 out of 13 elephantfish specimens (Fig. 3C). Specifically, sequencing of 13 tissues or organs from elephantfish BB366 (Fig. 3D) showed that the skin and electric organs, rather than the liver, harbored the highest abundance of hepadnavirus, indicating a lack of liver tropism in elephantfish hepadnaviruses. These elephantfish hepadnaviruses, lacking envelope proteins, exhibit characteristics typical of nackednaviruses; furthermore, their distribution in tissues such as the skin and electric organs, rather than the liver or blood, suggests a transmission route distinct from traditional bloodborne transmission. Moreover, the detection of multiple distinct nackednavirus sequences in individual host species, such as elephantfish with intra-species nucleotide identities ranging 82.0-98.4% and stickleback with identities ranging 95.5-98.7%, points towards considerable viral diversity or the presence of different genotypes within these populations.
Furthermore, potential linear hepadnaviruses were detected in dragonfish (Akarotaxis nudiceps) and elephantfish, designated as dragonfish-akar681-l and elephantfish-brie291-l. These viruses exhibit entirely different non-viral flanking nucleic acid sequences compared to traditional circular hepadnavirus genomes. While it is possible that these differences in the flanking nucleic acid sequences result from integration into the host genome, further investigation is required to confirm this possibility.
Novel fish hepadnaviruses and nackednaviruses. (A) Newly discovered fish hepadnaviruses and nackednaviruses. Scale bar represents substitutions per site. (B) Genomic profiles of representative novel fish hepadnaviruses or nackednaviruses. (C) A depiction of an elephantfish (Brienomyrus brachyistius) generated by Baidu Yige-AI (https://yige.baidu.com). (D) Distribution of nackednavirus fragments in elephantfish tissues. The brain tissue exhibited the highest number of fragments, with up to 104 hits.
Herpetohepadnaviruses were found in samples from mammals
Regarding herpetohepadnaviruses, this study identified several lizard and toad hepadnaviruses (Fig. 4). Herpetohepadnaviruses have been previously documented in amphibians and reptiles, including frogs9,10 and lizards4. Notably, phylogenetic analysis revealed that the hepadnavirus lizard-sapr381, found in the Australian lizard (Saproscincus basiliscus), does not cluster within either the herpetohepadnavirus or avihepadnavirus genera. Instead, it forms a broader clade encompassing both, suggesting a close evolutionary relationship between the two. Frog hepadnaviruses were unexpectedly detected in mice (Mus musculus) and dogs (Canis lupus familiaris), with sequences closely resembling the known frog hepadnavirus (GenBank: KX058435) (Fig. 4A, B). Further investigation revealed frog hepadnavirus fragments in other animals, such as snub-nosed monkeys (Rhinopithecus bieti), giant pandas (Ailuropoda melanoleuca), and pigs (Sus scrofa) (Fig. 4C). In the PRJNA248058 project, NGS analyses of 12 tissue samples from a cancer-free 32-year-old black snub-nosed monkey (Fig. 4D) indicated that frog hepadnavirus fragments were predominantly present in the kidneys and small intestine, and scarcely detected in the liver or blood (Fig. 4E). The presence of frog hepadnavirus in kidney or digestive tract samples from the black snub-nosed monkey may suggest ingestion of an infected host; however, contamination or true infection cannot be ruled out.
Novel herpetohepadnaviruses. (A) Newly discovered herpetohepadnaviruses. Representative hepadnaviruses and their hosts are highlighted in red, while newly identified hepadnaviruses are shown in blue. The nomenclature for newly discovered hepadnaviruses follows the convention of using the first four letters of the species name followed by the last three digits of the SRA accession number. Scale bar represents substitutions per site. (B) Genomic profiles of representative novel herpetohepadnaviruses. (C) Comparative analysis of the whole genomes of frog hepadnaviruses found in mammal samples. Frog hepadnaviruses were identified in mice, dogs, snub-nosed monkeys, pigs, giant pandas, and sea cucumbers, with their sequences exhibiting significant similarity. White indicates identical sequences, black signifies differences, and light green denotes missing data. (D) A depiction of a black snub-nosed monkey (Rhinopithecus bieti) generated by Baidu Yige-AI. (E) Tissue distribution of hepadnavirus in a snub-nosed monkey. In a snub-nosed monkey from Yunnan Province, China, fragments of the virus were predominantly found in the kidneys and small intestine, with smaller numbers detected in the liver and oral cavity, and none in other tissues such as blood.
Discovery of novel orthohepadnaviruses in buffaloes and hamsters and replication assessment of the rice rat hepadnavirus
This study identified two novel orthohepadnaviruses in African buffaloes (Syncerus caffer) and long-tailed pygmy rice rats (Oligoryzomys longicaudatus) (Fig. 5A, C). The buffalo hepadnavirus (Buffalo-buff360, GenBase: C_AA050234) showed high similarity to the previously reported duiker hepadnavirus15. On the other hand, the rice rat hepadnavirus (RiceRat-olig706, GenBase: C_AA050303) shared similarities with shrew hepadnaviruses18,19,20 evidenced by comparable genomic structures (Fig. 5B). It is noteworthy that the rice rat hepadnavirus was HBeAg-negative, similar to shrew hepadnaviruses, due to a stop codon mutation in the preC open reading frame at the same site (Fig. 5F)20.
Whole genomic sequences of the rice rat hepadnavirus RiceRat-olig706 and frog hepadnavirus Mouse-mus490-Frog (identified in mice samples) were synthesized, and 1.1-fold over-length genomes were constructed with an upstream CMV-promoter, akin to the well-known HBV replicating construct46. These constructs were transfected into the human hepatoma cell line Huh7, with HBV 1.1-fold construct used as a control. In the cell supernatant of Huh7 cells, a peak similar to the Dane particle peak was detected at a density of approximately 1.18–1.22 g/mL (Fig. 5D, E). Remarkably, approximately 3 × 106 copies/mL of rice rat hepadnavirus were detected in the CsCl gradient, which is five times higher than that of HBV, suggesting active replication and secretion of this virus. However, the frog hepadnavirus did not exhibit a corresponding naked particle peak observed for human HBV and rice rat hepadnavirus at a density of 1.28–1.32 g/mL when expressed in Huh7 cells, which might reflect an intrinsic property of the virus or, potentially, an artifact resulting from the use of a non-native host cell line. Negative-stain electron microscopy revealed that rice rat hepadnavirus and frog hepadnavirus particles were slightly larger than human HBV, spherical in shape, and approximately 40–60 nm in diameter.
Novel orthohepadnaviruses. (A) Newly discovered orthohepadnaviruses. Representative hepadnaviruses and their hosts are highlighted in red, while newly identified hepadnaviruses are shown in blue. Scale bar represents substitutions per site. (B) Genomic profiles of representative novel orthohepadnaviruses. (C) A depiction of a long-tailed pygmy rice rat (Oligoryzomys longicaudatus) generated by Baidu Yige-AI. (D) Identification of hepadnaviruses through ultracentrifugation. Two distinct hepadnavirus DNA peaks were observed at 1.18–1.22 and 1.28–1.32 g/mL for HBV and rice rat hepadnavirus, respectively, whereas only the peak at approximately 1.18–1.22 g/mL was detected for the frog hepadnavirus. (E) Identification of hepadnaviruses via negative stain electron microscopy. (F) The rice rat hepadnavirus was HBeAg-negative, similar to shrew hepadnaviruses, as a result of a stop codon mutation in the preC open reading frame.
Discussion
Discovery of multiple novel hepadnaviruses and nackednaviruses
In this study, bioinformatic analyses of NGS data led to the identification of multiple novel hepadnaviruses and nackednaviruses, most of which possess complete genomes. These efforts included not only the de novo assembly of entirely new viral genomes but also, in some instances, the reanalysis and completion of genomes for viruses previously reported with only partial sequences. For example, a complete genome for the porgy hepadnavirus (Porgy-pagr765) was assembled from SRA dataset SRR752776551, building upon a prior partial polymerase sequence (GenBank: MH716821). Similarly, complete genome characterization or re-assembly was achieved for the toad hepadnavirus Toad-bufo524 (GenBank: MZ244216; SRA SRR14214524) and the carp hepadnavirus Carp-schi040 (GenBank: GHYL01057506.1; SRA SRR8618040)52. With the exception of a few instances (dragonfish-akar681-l and elephantfish-brie291-l), the newly discovered hepadnaviruses in this study could be assembled into complete circular genomes.
Novel orthohepadnaviruses were identified in rice rats and buffaloes. Additionally, novel metahepadnaviruses were found in cichlids, herrings, common carp, eels, whales, icefish, Caranginae, and Japanese halfbeaks, while new parahepadnaviruses were identified in porgy and Aulopiformes. Furthermore, novel nackednaviruses were detected in dragonfish, sturgeon, and sticklebacks. Among herpetohepadnaviruses, lizard and toad hepadnaviruses were newly discovered. Novel avian hepadnaviruses were also identified in white-rumped munias, American songbirds, and pigeons. Notably, among all these, the rice rat hepadnavirus, the African cichlid hepadnavirus, and the African cichlid nackednavirus stand out as particularly significant.
Multiple avian eHBVs
The hepadnaviruses identified in munia and American songbird samples were found to belong to the same evolutionary lineage. White-rumped munias are indigenous to South Asia and southern China, whereas American songbirds inhabit Central America and North America. This suggests a broad historical distribution of the ancestral lineage of these munia and American songbird hepadnaviruses across distinct geographic regions. Phylogenetic analysis further revealed that the pigeon hepadnavirus is closely related to duck hepatitis B virus (DHBV), with both clustering within the DHBV lineage.
However, a more significant finding is the integration of numerous avian hepadnaviruses into host genomes, consistent with previous literature findings53,54,55. Viral fragment assemblies from metagenomic reads could represent either exogenous viruses or eHBVs, and we differentiate between them based on the following criteria: (1) Genome structure and coding integrity: Exogenous virus reads can be assembled into a complete circular genome using MEGAHIT without a reference template, and the major viral gene ORFs lack internal stop codons. In contrast, eHBVs often exhibit multiple stop codons within their reading frames. For example, in QuailThrush-cinc826-i, while the genome can be almost fully assembled with substantial read support, stop codons appear in the HBs and HBp ORFs. (2) Chimeric host-virus reads: The presence of viral-host chimeric reads is a strong indicator of eHBVs. While most of the novel viruses identified in this study did not present such chimeric evidence, Dragonfish-akar681-l was an exception, exhibiting hybrid “unknown” sequences at its genomic ends. Despite this, its genome was relatively complete and the main viral ORFs lacked stop codons, suggesting a potentially recent integration or a distinct type of endogenous element. (3) Distribution patterns and host specificity: eHBVs are more likely to be found in avian or amphibian species and are often detected consistently across most samples of a given species in the same fragmented form due to their genomic integration, such as Sparrow-zono391-i and Tit-aegi626-i. In contrast, exogenous viruses may be detected in some samples of the same species, while others show no detection at all. Our phylogenetic comparison with the comprehensive avian eHBV dataset from Lytras et al.26. while broadly consistent, showed some topological discrepancies, potentially reflecting differences in the analyzed regions or datasets. Nonetheless, the identification of additional avian eHBVs contributes to a deeper understanding of their ancient diversity and evolution.
Diverse fish hepadnaviruses and nackednaviruses
Fish hepadnaviruses and nackednaviruses are regarded as among the most primitive, diverse, and widely distributed members of these viral families4. Various fish species harbor distinct hepadnaviruses and nackednaviruses exhibiting significant sequence variations, exemplified by those found in African cichlids, elephantfish, eels, and killifish, highlighting their deep evolutionary origins50.
Among these, African cichlids yielded particularly notable findings, with diverse genera found to be widely infected by both hepadnaviruses and nackednaviruses. Examples include Cichlid-cypr446, Cichlid-lamp675, Cichlid-ACNDV, and Cichlid-ANDV, which appear to represent some of the most primitive non-enveloped nackednaviruses. In contrast, traditional African cichlid hepadnaviruses, which resemble mammalian hepadnaviruses, are primarily classified within the genus Metahepadnavirus. African cichlids inhabit the three major lakes in East Africa, namely Lake Malawi, Lake Tanganyika, and Lake Victoria, all located near the East African Great Rift Valley. Given their ecological overlap and their importance as a food source, the potential for zoonotic transmission through the food chain warrants further attention. Further exploration is needed to discern the full significance of these viruses in the evolutionary narrative of hepadnaviruses and nackednaviruses.
The pressing need for animal models of orthohepadnavirus infection
As previously observed, hepadnaviruses exhibit strict host specificity, underscoring the challenge in finding suitable animal models for studying hepadnavirus infections27,28. Animal models of hepadnavirus infection remain a persistent challenge in the field, particularly concerning human HBV. Although highly abundant hepadnaviruses from cichlids or elephantfish offer extensive sequence data, their piscine hosts limit their utility in mammalian systems.
Attention thus turns to novel hepadnaviruses discovered in mammals, such as the orthohepadnavirus identified in the long-tailed pygmy rice rat (Oligoryzomys longicaudatus), a member of the Cricetidae family. Our successful in vitro assembly and secretion of virus-like particles using constructs from this rice rat hepadnavirus (RiceRat-olig706) supports its potential as a candidate for further in vivo model development. While initial thoughts for an in vivo model of the rice rat-derived hepadnavirus might turn to other Cricetidae (e.g., Syrian/Chinese striped hamsters) or common laboratory mice, the cotton rat (Sigmodon hispidus) warrants specific attention. As a fellow Sigmodontinae rodent phylogenetically closer to Oligoryzomys than these alternatives, and an established model for various human viral pathogens like Respiratory Syncytial Virus56 it offers a potentially more permissive system for this novel hepadnavirus. However, there remains a substantial gap between demonstrating in vitro replication and establishing productive in vivo infection models. This challenge is compounded by factors such as host-specific entry restrictions, immune-mediated clearance, and the lack of suitable permissive animal species.
Study limitations and shortcomings
For ease of description and discussion, species names from NGS sample annotations were used to refer to the newly identified hepadnavirus and nackednavirus species in this study. However, such nomenclature does not necessarily imply that the viruses detected in a given species are truly capable of infecting that host species, as they may also originate from contaminated samples, mislabeling, or sequencing artifacts. Since the findings were derived solely from computational analysis of publicly available NGS data and lacked in vivo or functional validation, they should be interpreted with caution. Further experimental validation in the respective host species is needed to confirm infectivity, as demonstrated in previous studies on novel orthohepadnaviruses in shrews20 and equids17. In addition, our current screening strategy has limitations, and future use of more advanced, comprehensive, and sensitive bioinformatic approaches, such as PebbleScout57 may reveal additional hepadnaviruses and nackednaviruses in public databases.
Conclusions
This study identified numerous novel hepadnaviruses through the comprehensive analysis of extensive NGS datasets from public repositories. These findings not only provide fresh insights into the origins and evolutionary trajectories of hepadnaviruses and nackednaviruses, but also lay the groundwork for establishing new animal models of hepadnavirus infection. Such models will be instrumental in advancing our understanding of HBV pathogenesis and in developing potential therapeutic interventions for HBV infection58,59.
Data availability
The genome sequences of all novel hepadnaviruses and nackednaviruses presented in this study are openly available in the GenBase repository at the National Genomics Data Center (NGDC) of China60 (https://ngdc.cncb.ac.cn/genbase/) under accession numbers C_AA050214 – C_AA050361 (e.g., the sequence for rice rat hepadnavirus RiceRat-olig706, C_AA050303, can be found at https://ngdc.cncb.ac.cn/genbase/search/gb/C_AA050303).
References
Glebe, D., Goldmann, N., Lauber, C. & Seitz, S. HBV evolution and genetic variability: impact on prevention, treatment and development of antivirals. Antiviral Res. 186, 104973 (2021).
Tong, S., Li, J., Wands, J. R. & Wen, Y. Hepatitis B virus genetic variants: biological properties and clinical implications. Emerg. Microbes Infect. 2, 1–11 (2013).
Guo, X. et al. Trends in hepatitis B virus resistance to nucleoside/nucleotide analogues in North China from 2009–2016: A retrospective study. Int. J. Antimicrob. Agents. 52, 201–209 (2018).
Lauber, C. et al. Deciphering the origin and evolution of hepatitis B viruses by means of a family of non-enveloped fish viruses. Cell. Host Microbe. 22, 387–399e6 (2017).
Buigues, J. et al. Phylogenetic evidence supporting the nonenveloped nature of hepadnavirus ancestors. Proc. Natl. Acad. Sci. U. S. A. 121, e2415631121 (2024).
Magnius, L. et al. ICTV virus taxonomy profile: Hepadnaviridae. J. Gen. Virol. 101, 571–572 (2020).
Hahn, C. M. et al. Characterization of a novel hepadnavirus in the white sucker (Catostomus commersonii) from the great lakes region of the united States. J. Virol. 89, 11801–11811 (2015).
Adams, C. R., Blazer, V. S., Sherry, J., Cornman, R. S. & Iwanowicz, L. R. Phylogeographic genetic diversity in the white sucker hepatitis B virus across the great lakes region and alberta, Canada. Viruses 13, 285 (2021).
Dill, J. A. et al. Distinct viral lineages from fish and amphibians reveal the complex evolutionary history of hepadnaviruses. J. Virol. 90, 7920–7933 (2016).
Debat, H. J. & Ng, T. F. F. Complete genome sequence of a divergent strain of Tibetan frog hepatitis B virus associated with a concave-eared torrent frog (Odorrana tormota). Arch. Virol. 164, 1727–1732 (2019).
He, B. et al. Hepatitis virus in long-fingered bats, Myanmar. Emerg. Infect. Dis. 19, 638–640 (2013).
De Souza, A. J. S. et al. Orthohepadnavirus infection in a Neotropical Bat (Platyrrhinus lineatus). Comp. Immunol. Microbiol. Infect. Dis. 79, 101713 (2021).
Pesavento, P. A. et al. A novel hepadnavirus is associated with chronic hepatitis and hepatocellular carcinoma in cats. Viruses 11, 969 (2019).
Diakoudi, G. et al. A novel hepadnavirus in domestic dogs. Sci. Rep. 12, 2864 (2022).
Gogarten, J. et al. A novel orthohepadnavirus identified in a dead maxwell’s duiker (Philantomba maxwellii) in Taï National park, Côte d’ivoire. Viruses 11, 279 (2019).
Jo, W. K. et al. Natural co-infection of divergent hepatitis B and C virus homologues in carnivores. Transbound. Emerg. Dis. 69, 195–203 (2022).
Rasche, A. et al. A hepatitis B virus causes chronic infections in equids worldwide. Proc. Natl. Acad. Sci. 118, e2013982118 (2021).
He, W. Q. et al. Detection of hepatitis B virus-like nucleotide sequences in liver samples from murine rodents and Asian house shrews. Vector-Borne Zoonotic Dis. 19, 781–783 (2019).
Nie, F. Y. et al. Discovery of a highly divergent hepadnavirus in shrews from China. Virology 531, 162–170 (2019).
Rasche, A. et al. Highly diversified shrew hepatitis B viruses corroborate ancient origins and divergent infection patterns of mammalian hepadnaviruses. Proc. Natl. Acad. Sci. U S A. 116, 17007–17012 (2019).
Locarnini, S. A., Littlejohn, M. & Yuen, L. K. W. Origins and evolution of the primate hepatitis B virus. Front. Microbiol. 12, 653684 (2021).
Locarnini, S., Littlejohn, M., Aziz, M. N. & Yuen, L. Possible origins and evolution of the hepatitis B virus (HBV). Semin Cancer Biol. 23, 561–575 (2013).
Kocher, A. et al. Ten millennia of hepatitis B virus evolution. Science 374, 182–188 (2021).
Liu, W. et al. The first full-length endogenous hepadnaviruses: identification and analysis. J. Virol. 86, 9510–9513 (2012).
Suh, A. et al. Early mesozoic coexistence of amniotes and hepadnaviridae. PLOS Genet. 10, e1004559 (2014).
Lytras, S., Arriagada, G. & Gifford, R. J. Ancient evolution of hepadnaviral paleoviruses and their impact on host genomes. Virus Evol. 7, veab012 (2021).
Ploss, A., Strick-Marchand, H. & Li, W. Animal models for hepatitis B: does the supply Meet the demand? Gastroenterology 160, 1437–1442 (2021).
Zhang, X., Wang, X., Wu, M., Ghildyal, R. & Yuan, Z. Animal models for the study of hepatitis B virus pathobiology and immunity: past, present, and future. Front. Microbiol. 12, 715450 (2021).
Yan, H. et al. Sodium taurocholate cotransporting polypeptide is a functional receptor for human hepatitis B and D virus. eLife 1, e00049 (2012).
Ni, Y. et al. Hepatitis B and D viruses exploit sodium taurocholate co-transporting polypeptide for species-specific entry into hepatocytes. Gastroenterology 146, 1070–1083e6 (2014).
Burwitz, B. J., Zhou, Z. & Li, W. Animal models for the study of human hepatitis B and D virus infection: new insights and progress. Antiviral Res. 182, 104898 (2020).
Li, D. et al. A potent human neutralizing antibody Fc-dependently reduces established HBV infections. eLife 6, e26738 (2017).
Zhang, T. Y. et al. Prolonged suppression of HBV in mice by a novel antibody that targets a unique epitope on hepatitis B surface antigen. Gut 65, 658–671 (2016).
Edgar, R. C. et al. Petabase-scale sequence alignment catalyses viral discovery. Nature 602, 142–147 (2022).
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Chen, S., Zhou, Y., Chen, Y. & Gu, J. Fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinforma Oxf. Engl. 34, i884–i890 (2018).
Li, D., Liu, C. M., Luo, R., Sadakane, K. & Lam, T. W. MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinforma Oxf. Engl. 31, 1674–1676 (2015).
Steinegger, M. & Söding, J. MMseqs2 enables sensitive protein sequence searching for the analysis of massive data sets. Nat. Biotechnol. 35, 1026–1028 (2017).
Kearse, M. et al. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinforma Oxf. Engl. 28, 1647–1649 (2012).
Galibert, F., Mandart, E., Fitoussi, F., Tiollais, P. & Charnay, P. Nucleotide sequence of the hepatitis B virus genome (subtype ayw) cloned in E. coli. Nature 281, 646–650 (1979).
Katoh, K. & Standley, D. M. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol. Biol. Evol. 30, 772–780 (2013).
Gong, Z. & Han, G. Z. Insect retroelements provide novel insights into the origin of hepatitis B viruses. Mol. Biol. Evol. 35, 2254–2259 (2018).
Kumar, S. et al. MEGA12: molecular evolutionary genetic analysis version 12 for adaptive and green computing. Mol. Biol. Evol. 41, msae263 (2024).
Kim, D., Song, L., Breitwieser, F. P. & Salzberg, S. L. Centrifuge: rapid and sensitive classification of metagenomic sequences. Genome Res. 26, 1721–1729 (2016).
Breitwieser, F. P. & Salzberg, S. L. Pavian: interactive analysis of metagenomics data for Microbiome studies and pathogen identification. Bioinforma Oxf. Engl. 36, 1303–1304 (2020).
Wang, X., Zhang, X., Hu, W., Zhang, T. & Wang, S. A simple and efficient strategy for the de Novo construction of greater-than-genome-length hepatitis B virus replicons. J. Virol. Methods. 207, 158–162 (2014).
Glebe, D. & Gerlich, W. H. Study of the endocytosis and intracellular localization of subviral particles of hepatitis B virus in primary hepatocytes. Methods Mol. Med. 96, 143–151 (2004).
Pei, Y. et al. Discovery of new hepatitis B virus capsid assembly modulators by an optimal high-throughput cell-based assay. ACS Infect. Dis. 5, 778–787 (2019).
Taher, A. E. et al. Gene expression dynamics during rapid organismal diversification in African cichlid fishes. Nat. Ecol. Evol. 5, 243 (2021).
Costa, V. A. et al. Host adaptive radiation is associated with rapid virus diversification and cross-species transmission in African cichlid fishes. Curr. Biol. CB. 34, 1247–1257e3 (2024).
Geoghegan, J. L. et al. Hidden diversity and evolution of viruses in market fish. Virus Evol. 4, vey031 (2018).
Chen, X. X., Wu, W. C. & Shi, M. Discovery and characterization of actively replicating DNA and retro-transcribing viruses in lower vertebrate hosts based on RNA sequencing. Viruses 13, 1042 (2021).
Suh, A., Brosius, J., Schmitz, J. & Kriegs, J. O. The genome of a mesozoic paleovirus reveals the evolution of hepatitis B viruses. Nat. Commun. 4, 1–7 (2013).
Gilbert, C. & Feschotte, C. Genomic fossils calibrate the long-term evolution of hepadnaviruses. PLoS Biol. 8, e1000495 (2010).
Tu, T., Budzinska, M., Shackel, N. & Urban, S. HBV DNA integration: molecular mechanisms and clinical implications. Viruses 9, 75 (2017).
Niewiesk, S. & Prince, G. Diversifying animal models: the use of hispid cotton rats (Sigmodon hispidus) in infectious diseases. Lab. Anim. 36, 357–372 (2002).
Shiryev, S. A. & Agarwala, R. Indexing and searching petabase-scale nucleotide resources. Nat. Methods. 21, 994–1002 (2024).
Revill, P. A. et al. A global scientific strategy to cure hepatitis B. Lancet Gastroenterol. Hepatol. 4, 545–558 (2019).
Wang, X. Y. & Wen, Y. M. A ‘sandwich’ strategy for functional cure of chronic hepatitis B. Emerg. Microbes Infect. 7, 1–3 (2018).
CNCB-NGDC Members and Partners. Database resources of the National genomics data center, China National center for bioinformation in 2024. Nucleic Acids Res. 52, D18–D32 (2024).
Acknowledgements
We extend our sincere appreciation to Dr. Yi Ni from the University of Heidelberg for his invaluable advice and expert editing assistance with this article. We also thank Dr. Wei Liu from the University of Pennsylvania for his valuable insights and constructive comments. Additionally, we acknowledge LetPub for its linguistic assistance during the preparation of this manuscript.
Funding
This work was supported by Beijing YouAn Hospital Youth Innovation Foundation [grant number YNKTQN2021020], the Beijing Postdoctoral Research Foundation [grant number 2022-ZZ-039], Beijing Nova Program [grant number Z171100001117119], the Scientific Research Project of Beijing YouAn Hospital [grant number BJYAYY-YN2022-19], the National Natural Science Foundation of China [grant number 82073676], the Key Programs of Beijing Municipal Education Commission of China [grant number KZ202010025037], the Chinesisch-Deutsche Zentrum für Wissenschaftsförderung [grant number C-0012], and the Pilot Project of Public Welfare Development and Reform of Beijing-affiliated Medical Research Institutes [grant numbers JingYiYan2021-10 and 2019-6].
Author information
Authors and Affiliations
Contributions
Conceptualization, H.B. and D.C.; methodology, H.B. and L.L.; software, H.B.; validation, H.B., X.W., P.Y., P.L., Y.G., Y.W., Y.L., and C.H.; formal analysis, H.B.; investigation, H.B.; resources, H.B.; data curation, H.B.; writing—original draft preparation, H.B.; writing—review and editing, H.B., X.W., and D.C.; visualization, H.B.; supervision, D.C.; project administration, D.C.; funding acquisition, H.B., X.W., Y.W., and D.C. This study is part of the postdoctoral research of H.B. All authors have read and agreed to the published version of the manuscript.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Ethics statement
This study did not involve any experiments on live animals. All analyses involving animal-derived sequence data were performed on information retrieved from publicly accessible databases, including the NCBI Sequence Read Archive (SRA).
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Ben, H., Wang, X., Yang, P. et al. Metagenomic analysis uncovers novel hepadnaviruses and nackednaviruses. Sci Rep 15, 24699 (2025). https://doi.org/10.1038/s41598-025-05993-z
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41598-025-05993-z







