Background & Summary

Electric fishes have independently evolved electric discharges capabilities across multiple teleost lineages, most prominently within the order Gymnotiformes (South America)1. This unique adaptation has positioned electric fishes as valuable models for investigating convergent evolution, electroreception, and the diversification of ion channel functions1,2,3,4. Among them, the electric eel (genus: Electrophorus, family: Gymnotidae) represents one of the most prominent cases of high-voltage electrogenesis in vertebrates. The genus Electrophorus was long considered to consist of a single species, E. electricus. However, recent comprehensive taxonomic revision based on molecular and morphological analyses have reclassified Electrophorus genus into three distinct species: E. electricus, E. voltai, and E. varii5. Among them, E. voltai is distinguished by its ability to generate electric discharges reaching up to 860 volts, the highest voltage recorded in electric fishes. These high-voltage discharges fulfill multiple ecological functions, including predation, defense, navigation, and communication1,6. As a member of the order Gymnotiformes, E. voltai serves as an important model for unraveling the evolution and functional mechanisms of electric organ. Its unparalleled capacity for high-voltage discharge offers valuable insights into the genetic, physiological, and developmental foundations of biological electricity generation, shedding light on the evolutionary origins of bioelectrogenesis in vertebrates.

Despite its significance, no high-quality reference genome has been reported for E. voltai. This absence has hindered functional and comparative genomic studies in this species, limiting the identification of structural variants, regulatory elements, and conserved synteny critical to elucidating the molecular basis of electric organ development. Although existing genomic resources for electric fishes have improved, they remain relatively limited in taxonomic scope and quality. Many are restricted to draft assemblies or transcriptomic datasets from distantly related taxa, which lack the resolution needed to detect lineage-specific adaptations or interpret regulatory complexity3,7. The recent telomere-to-telomere assembly of E. electricus represents a significant advance8, yet additional high-quality genomes are still needed to enable robust comparative and evolutionary analyses. Furthermore, the evolutionary convergence of electrogenesis in distantly related lineages, including South American Gymnotiformes and African Mormyridae, highlights the necessity for species-specific genomic resources to dissect the distinct genetic architectures underlying similar electrogenic functions1,2,9. Thus, a high-quality genome for E. voltai is therefore indispensable, not only for elucidating species-specific adaptations in electrogenesis, but also for enabling robust cross-lineage comparisons and informing broader understanding of vertebrate organ evolution and physiological innovation.

To address this gap, we report a chromosome-level genome assembly of E. voltai, constructed using PacBio HiFi long-read and Hi-C sequencing. The assembled genome spans 666.91 Mb, with a scaffold N50 of 28.42 Mb and contig N50 of 1.54 Mb, anchored onto 26 chromosomes. BUSCO assessment revealed that 95.80% of the expected conserved orthologs were complete in the genome assembly, and gene prediction identified 23,221 protein-coding genes, of which 22,306 (96.06%) were functionally annotated. This high-quality genomic resource provides a robust foundation for future studies of the molecular basis of electric signal production, electrocyte development, and functional adaptations in electric fishes. It also enables comprehensive investigations into genome architecture, gene family evolution, and regulatory networks underpinning vertebrate electrogenesis, offering new avenues to study the evolutionary innovation of bioelectricity.

Methods

Ethics statement

The experimental protocols involving animals were reviewed and approved by the Laboratory Animal Ethics Committee of the Pearl River Fisheries Research Institute, (PRFRI), Chinese Academy of Fishery Sciences (CAFS), China (License No. LAEC-PRFRI-2020-11-17).

Sample collection and identification

Electric eel sample was collected from Guangzhou Lanhai Marine Technology Co., Ltd (Guangzhou City, Guangdong Province, China), and identified as E. voltai through COI barcoding. The obtained COI sequence was compared against all available COI sequences of the genus Electrophorus in NCBI GenBank. Phylogenetic analysis showed that our sequence clustered unambiguously with E. voltai reference sequences, clearly distinguishing it from E. electricus and E. varii (Figure S1). The sampled E. voltai individual was confirmed to be male based on anatomical examination, with a total length of 1.15 meters (Fig. 1a). E. voltai possesses three distinct electric organs: the main electric organ (main EO), Hunter’s organ, and Sach’s organ. The main EO is primarily responsible for generating high-voltage discharges3 and was the tissue sampled for both genome sequencing and RNA sequencing in this study.

Fig. 1
figure 1

Specimen photograph and Hi-C chromatin interaction heatmap of Electrophorus voltai. (a) Photograph of the E. voltai specimen. (b) Heatmap of chromatin interactions across the E. voltai genome based on Hi-C data. Genomic coordinates (in megabases) are displayed along both axes, while interaction intensity is represented by color shading. Darker reds indicate higher interaction frequencies. The color bar on the right denotes interaction strength.

Genome sequencing

Genomic DNA was extracted from the main electric organ (EO) of E. voltai using the TIANamp Genomic DNA Kit (TIANGEN, Beijing, China). DNA integrity was verified by gel electrophoresis (1.0% agarose), concentration was measured using a NanoDrop 2000 spectrometer (Thermo Scientific, Waltham, MA, USA). Qualified genomic DNA was fragmented to 300–500 bp using a Covaris ultrasonicator (Covaris Inc., Woburn, MA, USA), followed by end-repair, A-tailing, adapter ligation and PCR amplification. A short-insert (350 bp) paired-end library was constructed and sequenced on the MGIseq. 2000 platform (MGI Tech Co. Ltd., Shenzhen, Guangdong, China), yielding approximately 40.1 Gb of 150 bp paired-end reads (Table 1). For long-read sequencing, high-fidelity (HiFi) libraries were constructed following PacBio’s standard protocol (Pacific Biosciences, Menlo Park, CA, USA), and sequenced on the PacBio Sequel II system, generating approximately 13.8 Gb of HiFi reads. Additionally, for chromosome-level scaffolding, Hi-C libraries were prepared from EO-derived genomic DNA that was cross-linked with formaldehyde and digested using the MboI restriction enzyme. The Hi-C libraries were sequenced on the MGISEQ-2000 platform, producing approximately 46.3 Gb of data (~81X coverage).

Table 1 Sequencing data summary for the E. voltai genome.

Transcriptome sequencing

Total RNA was extracted from the main EO with a TRIzol kit (Invitrogen, Carlsbad, CA, USA), and mRNAs were then isolated and purified from the sample with an Oligotex mRNA Midi Kit (Qiagen GmbH, Hilden, Germany). RNA concentration and purity were measured by a Nanodrop spectrophotometer. RNA integrity was verified with an Agilent 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA). A transcriptome library was constructed by using NEBNext Ultra RNA Library Prep Kit (NEB, Ipswich, MA, USA) according to the manufacturer’s protocol, which was then sequenced on a MGIseq 2000 platform following the 150-bp paired-end protocol.

Genome survey and assembly

Prior to genome assembly, k-mer frequency analysis was performed using short-read data generated on the MGIseq 2000 platform to estimate the genome characteristics of E. voltai, including genome size, heterozygosity, GC content, and repeat content. K-mer counting was conducted with Jellyfish v2.010, and the 17-mer frequency distribution was analyzed (Figure S2). The genome size was estimated to be approximately 773.91 Mb, with a heterozygosity ratio of 0.89%, GC content of 39.9% and repetitive sequences accounting for 51.13% of the genome. The genome of E. voltai was initially assembled into contigs using PacBio HiFi reads using hifiasm v0.14-r31211, generating 1,183 contigs with a total length of 692.97 Mb and a contig N50 of 1.47 Mb. Further, we also integrated Hi-C data to obtain high quality de novo assembly at the chromosome level. Quality control of raw Hi-C reads and the generation of valid paired-end interactions were performed using Juicer v1.5 (RRID:SCR_017226)12. Scaffolding was conducted using the 3D-DNA pipeline v18092213, anchoring 87.01% of the assembled contigs onto 26 chromosomes based on Hi-C interaction data (Fig. 1b, Table 2). The resulting chromosome number is consistent with the karyotypic data previously reported for Electrophorus electricus14,15. The final genome assembly spanned 666.91 Mb, with a scaffold N50 of 28.42 Mb and a contig N50 of 1.54 Mb (Fig. 2, Table 3).

Table 2 Chromosome-level assembly statistics of the E. voltai genome based on Hi-C scaffolding.
Fig. 2
figure 2

Circos plot illustrating the genomic landscape of E. voltai. From the outermost to innermost rings: (a) chromosome layout (Chr1-Chr26), (b) distribution of gene density (0–100), (c) repeat element coverage (0–55), (d) localization of non-coding RNAs, including rRNAs (black), snRNAs (red), tRNAs (green), and miRNAs (blue), and (e) GC ratio (30%–55%).

Table 3 Summary of the assembled genome and annotation features of E. voltaic.

Repeat identification

Repetitive elements in the E. voltai genome were identified using a combination of ab initio and homology-based approaches. For ab initio prediction, RepeatModeler v1.0.1116 and LTR-FINDER v1.0.517 were used to construct a de novo repeat library, which was subsequently applied to the genome using RepeatMasker v4.0.918. Tandem repeats were detected with Tandem Repeats Finder (TRF) v4.0919. For homology-based detection, genome sequences were aligned to the RepBase database (http://www.girinst.org/repbase) using both RepeatMasker v4.0.918 and RepeatProteinMask v4.0.918. By integrating the results from both ab initio and homology-based methods, a total of 298.83 Mb of repetitive sequences were identified, accounting for 43.09% of the genome (Table 4). Among these, transposable elements (TEs) constituted 37.35% of the genome.

Table 4 Repetitive sequences identified in the E. voltai genome.

Gene annotation

Gene prediction for E. voltai was conducted using an integrative approach combining homology-based, transcriptome-based, and ab initio methods. For homology-based annotation, protein sequences from four phylogenetically related species were retrieved from NCBI, including E. electricus1, Ictalurus punctatus20, Silurus meridionalis21, and Danio rerio22. These sequences were aligned to the E. voltai genome using BLAST v2.11.0+23 with an e-value cutoff of 1e–5. Gene structures were further refined using Exonerate v2.4.024. For transcriptome-based prediction, raw RNA-seq reads were filtered using SOAPnuke v2.1.025 with parameters -lowQual = 20 -nRate = 0.005 -qualRate = 0.5. Clean reads were assembled de novo with Trinity v2.8.526, aligned to the assembled genome using HISAT2 v2.2.127, and assembled into transcripts with StringTie v2.1.728. Ab initio gene prediction was performed with Augustus v3.429 and Genscan v1.030. Gene models derived from the three strategies were integrated into a non-redundant reference gene set using MAKER v3.01.0331, followed by refinement with PASA v2.4.132. By integrating results from all three annotation strategies, we predicted 23,221 protein-coding genes (Table 5). Functional annotation was performed using DIAMOND v2.0.733 by aligning the predicted proteins against the InterPro34, SwissProt, TrEMBL35, NR, GO36, and KEGG37 databases. As a result, 96.06% (22,306) of the predicted genes received functional annotations (Table 6).

Table 5 Gene prediction results for the E. voltai genome.
Table 6 Annotation results of the E. voltai genome.

Non-coding RNA annotation

Non-coding RNAs (ncRNAs) were annotated using a combination of tools appropriate for each class of ncRNA. Transfer RNAs (tRNAs) were identified using tRNAscan-SE v2.0.938, while ribosomal RNAs (rRNAs) were predicted with RNAmmer v1.239. For the identification of microRNAs (miRNAs) and small nuclear RNAs (snRNAs), we employed INFERNAL v1.1.440 as implemented in the Rfam database41. As a result, a total of 7,279 ncRNA genes were annotated in the E. voltai genome, including 225 miRNAs, 5,409 tRNAs, 576 rRNAs and 1,069 snRNAs (Table 7).

Table 7 Annotation of non-coding RNAs in the E. voltai genome.

Data Records

All raw sequencing data generated in this study have been deposited in the NCBI SRA database under the BioProject accession number PRJNA946944. Specifically, short-read data (SRR33386860)42, PacBio HiFi reads (SRR33386859)43, Hi-C reads (SRR33386858)44, and RNA-seq reads (SRR33386857)45 are publicly available. The final chromosome-level genome assembly has been deposited in the GenBank database under accession number JAROKS00000000046, and the corresponding genome annotation files are available on Figshare47.

Technical Validation

The E. voltai genome assembly was assessed for completeness and annotation quality through four independent methods. First, assessment of completeness using BUSCO v5.2.248 demonstrated high completeness levels for both the genome assembly (95.80%) and the predicted gene set (95.50%) (Table 3). Second, 99.94% of the PacBio HiFi clean reads were successfully mapped back to the assembly using minimap2 v2.2149, indicating high assembly accuracy. Third, to assess the accuracy and reliability of gene prediction, we compared the distributions of gene length, coding sequence (CDS) length, exon length, and intron length between E. voltai and four phylogenetically related species (E. electricus, I. punctatus, S. meridionalis, and D. rerio). The consistent distribution patterns across these species further validated the quality of the annotated gene dataset for E. voltai (Fig. 3). Fourth, we conducted a genomic collinearity analysis between E. voltai and E. electricus using the jcvi toolkit (Python version of MCScan50; https://github.com/tanghaibao/jcvi/wiki/MCscan-(Python-version)). The analysis revealed a high degree of one-to-one correspondence at the chromosomal level, with only minor structural differences detected on chromosomes 4, 5, 8, 9, 14, 16, and 17 (Fig. 4). We also confirmed that key genes associated with electrogenesis, including voltage-gated sodium channel genes (scn4aa, scn4ab), acetylcholinesterase (ache), ATPase subunit genes (atp1a2, atp1b1b), and other related genes, are accurately annotated in the assembly. Among these, scn4aa is reported to play a key role in high-voltage generation in electric fishes and is particularly highly expressed in the electric organ1,2,3,4,9. Together, these results support the quality and reliability of the genome and its utility for future comparative and evolutionary studies.

Fig. 3
figure 3

Comparative analysis of genomic features across E. voltai and four closely related species. (a) Distribution of gene lengths, highlighting interspecific differences in overall gene architecture. (b) Comparison of coding sequence (CDS) lengths, illustrating variation in protein-coding region sizes. (c) Exon length distributions across species, showing differences in exon structural composition. (d) Intron length comparisons, indicating species-specific variation in non-coding genomic regions.

Fig. 4
figure 4

Chromosomal synteny between E. voltai and E. electricus. Each colored block represents a chromosome.