Chromosome-scale genome assembly and gene annotation of the Alligator Gar (Atractosteus spatula)

Wang, Qing; Yu, Qianqian; Dong, Xiangqian; Chen, Hengjin; Tian, Xue; Qi, Peng; Wu, Haitao; Yuan, Yuxiang

doi:10.1038/s41597-024-04256-2

Download PDF

Data Descriptor
Open access
Published: 18 December 2024

Chromosome-scale genome assembly and gene annotation of the Alligator Gar (Atractosteus spatula)

Qing Wang ORCID: orcid.org/0000-0003-4744-8729^1,2^na1,
Qianqian Yu^1,3^na1,
Xiangqian Dong¹,
Hengjin Chen¹,
Xue Tian¹,
Peng Qi¹,
Haitao Wu¹ &
…
Yuxiang Yuan¹

Scientific Data volume 11, Article number: 1337 (2024) Cite this article

4027 Accesses
3 Altmetric
Metrics details

Subjects

Abstract

Given the aggressive nature and robust survival capabilities of the alligator gar (Atractosteus spatula), if it was to exist in a new environment as an invasive species, it could cause significant disruption to the invaded ecosystem. Building on the continuity and completeness of the existing draft genome were not optimal, this study has updated a high-quality genome of the alligator gar at the chromosome level, which was assembled using Oxford Nanopore Technology and chromatin interaction mapping (Hi-C) sequencing techniques. In summary, the alligator gar genome in this study was 1.05 Gb in size with a contig N50 of 15.7 Mb and scaffold N50 of 56.8 Mb. We captured 98.26% of assembled bases in 28 pseudochromosomes. The completeness of the final chromosome-level genome reached 96.7%. Meanwhile, a total of 19,103 protein-coding genes were predicted, of which 99.83% could be predicted with functions. Taken together, the present high-quality alligator gar chromosome-level genome provides a valuable resource for exploring the underlying genomic basis to comprehend the functional genomics, chromosome evolution, and population management of this species.

A high-quality chromosome-level genome assembly of the topmouth culter (Culter alburnus Basilewsky, 1855)

Article Open access 22 August 2024

An improved chromosome-level genome assembly and annotation of Echeneis naucrates

Article Open access 04 May 2024

First Chromosome-level Genome Assembly and Annotation of an Endangered Freshwater Stingray (Fontitrygon garouaensis) from Africa

Article Open access 26 January 2026

Background & Summary

The alligator gar (Atractosteus spatula, Lacepède 1803) is one of seven extant species of the ancient Lepisosteidae family, which includes two genera: Atractosteus with three species (tropical, Cuban, and alligator gars) and Lepisosteus with four species (spotted, Florida, longnose, and shortnose gars)¹. The slowly evolving genome of the gar fish has garnered increasing attention from scientific researchers in recent years. Studies of the spotted gar (Lepisosteus oculatus) genome have revealed the value of holostean genomes in comparative research, offering significant insights into the evolution of vertebrate immunity, development, and the roles of regulatory sequences². The draft genome of the alligator gar was used to examine the terrestrial transition of vertebrates from aquatic environments³. In recent times, the genome of the longnose gar (Lepisosteus osseus) highlighted the potential of holostean genomes for understanding the evolution of vertebrate repetitive elements and provided a critical reference for comparative genomic studies using ray-finned fish models⁴.

It is noteworthy that recent reports highlighting the invasion of the alligator gar underscore the urgency of prioritizing its management efforts. Alligator gar is native to northern and central parts of the United States and Mexico^5,6. It has been distributed to numerous countries globally through the aquarium industry. Considered invasive in China, Singapore, Indonesia, Turkmenistan, and several other nations, it has been documented invading 47,287 locations across the planet^7,8. Initially detected in Baiyun Lake, Guangzhou, Guangdong Province, in February 2019, it subsequently expanded its range to various provinces including Hunan, Guangxi, Shandong, Sichuan, Qinghai, Jiangsu, and Yunnan⁷. Due to their inherent biological characteristics, alligator gars possess three primary advantages for survival in freshwater ecosystems. Firstly, they exhibit a large body size and possess overlapping ganoid scales. As the largest species within the gar family, typical adult alligator gars reach length of about 2 m (6.5 feet) and weight over 45 kg (100 pounds)⁹. They also have tough bone-like scales covered by an enamel-like substance, rendering them nearly impenetrable^10,11. Secondly, alligator gars display high fecundity and produce toxic eggs. In comparison to the spotted gar and longnose gar, alligator gar laid the greatest number of eggs per gram of body weight¹². In addition, their eggs and yolk sacs are extremely toxic to crustaceans and vertebrates, except teleosts¹³. Lastly, alligator gars are voracious predators that prey on blue crabs, waterfowl, turtles, small mammals, carrion, and other discarded waste around docks and piers¹⁴.

Currently, high-quality sequencing technologies offer immense potential in unraveling the genetic basis of biological characteristics for many species at genome-wide levels^15,16. Despite the availability of a draft second-generation genome of the alligator gar, it largely limits the study of speciation and chromosome evolution³. In this study, we generated a high contiguity, completeness, and accuracy genome assembly of alligator gar at chromosome level using Oxford Nanopore Technology and Hi-C sequencing techniques. The assembled genome was 1.05 Gb, with a contig N50 of 15.7 Mb, scaffold N50 of 56.8 Mb. The Hi-C sequences were further clustered and ordered into 28 pseudochromosomes(2n = 56, length from 10.2 Mb to 76.3 Mb). A sequence of ~323 Mb was annotated as a repeat element, constituting 30.91% of the genome. We predicted 19,103 protein-coding genes, of which 99.83% were functionally annotated. In summary, the genomic resources presented in this study would deep our understanding of the underlying genomic basis to comprehend the ecology, evolution, and invasiveness of alligator gars.

Methods

Sample collection and ethics statement

A six-month/1-year-old female alligator gar with 2.8 kg in bodyweight and 41.3 cm in body length was collected by Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Jilin Province, China (Supplementary Fig. 1). The Alligator Gar was captured with standard mini-fyke nets (0.6 m × 1.2 m frame, with a 4.6-m-long lead and 3-mm mesh¹⁷) and electrofishing in Dehui, Jilin Province, China, and then placed in a live well for further processes. The otolith of alligator gar was picked out to discriminate its age¹⁸. All experiments on the alligator gar were approved under the project ID “DLS20220131-001” by Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences.

Nucleic acid extraction, library construction and sequencing

For the Nanopore library, a total of 8–10 µg high-quality genomic DNA was extracted from a muscle sample, and >50 kb DNA molecules were selected with BluePippin (Sage Science, Beverly, MA, USA). A standard library was constructed with the Ligation Sequencing Kit 1D following the Nanopore library construction protocol. ONT long reads were sequenced on the PromethION P48 sequencer (Grandomics, Beijing, China). RNA was extracted from three tissues (blood, muscle, and skin) from the same individual using TRlzol reagent (Invitrogen, USA) according to manufacturer’s instructions. cDNA libraries were reverse transcribed from 200 to 400 bp RNA fragments and sequenced on the Illumina NovaSeq 6000 platform (Grandomics, Beijing, China). For short insert size WGS sequencing, we first isolated total genomic DNA from muscle samples (~2 g) using a phenol-chloroform protocol together with ethanol precipitation¹⁹ and prepared the DNBSEQ libraries following manufacturer’s instructions. Finally, they were subjected to the DNBSEQ-T1 sequencer (MGI tech, Guangdong, China) for paired end 100 bp sequencing. For Hi-C library construction, freshly collected liver samples (~2 g) were crosslinked with formaldehyde to fix the chromatin conformation, and the crosslinked DNA was digested by the dpnII restriction endonuclease. The Hi-C library with 350 bp insert size was sequenced on a DNBSEQ-T7 sequencer (Grandomics, Beijing, China).

To obtain a high-quality reference genome of the alligator gar, we generated 65.43 Gb raw long reads (60.68 Gb pass long reads, 57.8-fold) using Oxford Nanopore Technology (ONT) for de novo assembly (Table 1). The average read length of long reads was 18.0 kb and the N50 was 30.5 kb. We also generated 206.46 Gb whole-genome sequencing (WGS) data (196.6-fold) for the genome survey and polishing (Table 1). We generated 8.82 Gb transcriptomic data for genome annotation (Table 1).

Table 1 Summary of genome assemblies and gene annotations in the alligator gar genome.

Full size table

Genome size estimation and de novo assembly

Before the de novo genome assembly, we performed a genome survey to estimate the genome size using DNBSEQ short reads data by KmerFreq v1.0²⁰ with a kmer size of 17. The de novo genome assembly was performed by NextDenovo (v2.5.0; https://github.com/Nextomics/NextDenovo) with default parameters. We used NextCorrect and NextGraph, two core modules in NextDenovo, to process the raw Nanopore long read correction for consensus sequence extraction and initial assembly. We then improved the single-base accuracy of the draft genome assembly by the NextPolish v1.4.0 software²¹ using both ONT long-reads and DNBSEQ short reads for six times. For further chromosomal-level genome assembly, Hi-C reads were aligned to the polished genome assembly using Burrows-Wheeler Aligner (BWA, v0.7.17)²². Juicer v1.5²³ was used for Hi-C data quality control, and 3d-DNA pipeline v190716²⁴ was applied to concatenate the scaffolds to the chromosome-level genome. Juicer Box v1.11.08²⁵ was used for final manual correction.

The genome size was 1.19 Gb, as estimated by 17-kmer frequency (Fig. 1a). A total of 107.81 Gb Hi-C reads were generated for concatenating primary contigs into a chromosome-level genome assembly. We then anchored scaffolds to a cluster map, capturing 98.26% of assembled bases in 28 pseudochromosomes (2n = 56, length from 10.2 Mb to 76.3 Mb, Table 2 and Supplementary Fig. 2), which is consistent with the karyotype study of Echelle et al.²⁶. Other 11 scaffolds had a total length of 18.2 Mb (1.74%, length from 5 kb to 5.7 Mb). In summary, the alligator gar genome in this study was 1.05 Gb in length with a contig N50 of 15.7 Mb and scaffold N50 of 56.8 Mb. The completeness of the final chromosome-level genome reached 96.7% (96% complete and in single copy) by BUSCO analysis (Fig. 1b).

Table 2 Statistical results of the 28 pseudochromosomes of the alligator gar genome.

Full size table

Combined with third-generation ONT long reads and large-scale Hi-C data, the chromosome-level genomes assembled in this study exhibited significant improvements in the following aspects compared with the previously released assembly (GCA_016984175.1, Supplementary Table 1)³. (1) Our assembly demonstrated a significant reduction in the number of scaffolds from 81,747 to 39. (2) Our assembly showed remarkable enhancements of 785-fold and 41-fold in N50 values of the contig and scaffold over those of the previously released assemblies, respectively. (3) The gap region (Ns) in the previously released assemblies (5.967%) has been significantly reduced by a factor of 459 in our assembly genome (0.013%). (4) Our assembly had a notable increase of 2.1% in the BUSCO score, indicating a higher integrity in our assembly. These findings demonstrated the reliability and advanced nature of our chromosome-level genetic assembly. Furthermore, the combination of homology-based protein alignment, de novo predictions, and transcriptomic mapping in our study showed a higher PCGs number than that of the previously published alligator gar genome (GCA_016984175.1, 18,839 PCGs)³, enhancing contiguity and integrity of our assembled genome allowed for more precise gene prediction.

Genome annotation

Prior to gene prediction and annotation, genome repetitive elements were annotated by integrating homology-based and de novo strategies. For the de novo method, RepeatModeler v2.0²⁷ and LTR_retriever²⁸ were used to annotate repeat elements which were then added to the known repeat database REPBASE v21.01²⁹. Then, the genome was aligned to the REPBASE using RepeatMasker v4.0.5³⁰, RepeatProteinMask, and Trf v4.07b³¹ at both DNA and protein levels. Finally, we obtained a nonredundant repeat set. The gene density, GC content, repeat number, and LTR number of 28 chromosomes were further analyzed in nonoverlapping 500 kb windows using CIRCOS v0.69-8³².

After masking the repeat elements in the genome, three strategies were used for protein-coding gene prediction. Firstly, for the de novo strategy, we ran the prediction using Augustus v3.0.3³³. Secondly, for the transcriptome-based strategy, transcripts were assembled using StringTie v1.3.3b³⁴ based on RNA-seq data. Finally, for the homology-based strategy, protein sequences of the spotted gar (LepOcu1, GeneBank ID: GCF_000242695.1)², coelacanth (Latimeria chalumnae, LatCha1, GeneBank ID: GCF_000225785.1)³⁵, bichir (Polypterus senegalus, ASM1683550v1, GeneBank ID: GCF_016835505.1)³, and paddlefish (Polyodon spathula, ASM1765450v1, GeneBank ID: GCF_017654505.1)³⁶ were mapped to the alligator gar genome using TBlastn program v2.9.0³⁷. GeneWise v2.4.1³⁸ was used to predict the potential gene structure with an E-value cutoff of 1e-5 (Supplementary Table 2). The final protein-coding gene set was predicted by combining the results from these three strategies using the MAKER pipeline v3.01.03³⁹. For functional annotations, this gene set was searched in five publicly available databases including Swiss-Prot, TrEMBL, InterProScan v5.52-86.0⁴⁰, GO terms, and KEGG using BLAST v2.2.26³⁷ (e-value cutoff of 1e-5). For ncRNA prediction, miRNA and snRNA were identified by searching the Rfam database (Release 12.0)⁴¹. The tRNA genes were predicted with tRNAscan-SE v1.3.1⁴², and the rRNA genes were identified by aligning human rRNA using BLAST.

Combining the de novo and homology-based predictions, we found 323 Mb repeat elements, accounting for 30.91% of the alligator gar genome (Supplementary Table 2). The predominant repeat types were long terminal repeats (LTR, 11.73%), long interspersed elements (LINEs, 5.69%), and DNA transposons (4.96%) (Table 3, Supplementary Table 3, and Supplementary Fig. 3).

Table 3 Statistics of identified repeat elements by De novo method.

Full size table

We predicted 19,103 protein-coding genes (PCGs) through the combination of homology-based protein alignment, de novo predictions, and transcriptomic mapping. The average lengths of PCGs, exons, and introns were 2,120.55 bp, 166.71 bp, and 2231.85 bp, respectively (Table 1 and Fig. 2). Of these predicted PCGs, 19,070 (99.83%) were annotated in at least one related functional assignment (Table 4 and Fig. 3). We further plotted the distribution of gene density, GC content, and repeat density across 28 pseudochromosomes (Fig. 1c). We also predicted 22,559 noncoding RNAs (ncRNAs), including 191 microRNAs, 10,015 transfer RNAs (tRNAs), 9524 ribosomal RNAs (rRNAs), and 2829 small nuclear RNAs (snRNA) (Table 5).

Table 4 Statistics on functional annotation of the alligator gar gene set.

Full size table

Table 5 Statistics of non-coding RNA annotation.

Full size table

Data Records

The raw sequencing data for this study are deposited in the NCBI under BioProject ID: PRJNA1161041⁴³. Illumina, transcriptome, and PacBio sequencing data are available under the Sequence Read Archive (SRA) with the accession number SRP537046⁴⁴. The assembled genome has been deposited in the GenBank database under the accession number GCA_043380575.1⁴⁵. Additionally, assembled genome and annotations can be downloaded from Figshare⁴⁶ under https://doi.org/10.6084/m9.figshare.27193392.

All assemblies and raw sequencing data generated of this study also have been deposited CNGB Sequence Archive (CNSA)⁴⁷ (https://db.cngb.org/cnsa/) of the China National GeneBank DataBase (CNGBdb)⁴⁸ with accession number CNP0003816.

Technical Validation

Benchmarking Universal Single-Copy Orthologs (BUSCO) v3.1.0⁴⁹ was used to evaluate the completeness of the draft, polished, and final chromosome-level genomes in the genome mode (-m genome) with 3354 core vertebrata gene sets (vertebrata_odb10). The completeness of the final chromosome-level genome reached 96.7% (96% complete and in single copy) by BUSCO analysis. Gene set completeness was also evaluated with the vertebrata_odb10 database using the protein mode (-m protein) of BUSCO. BUSCO analysis showed 94.5% completed BUSCO scores for predicted PCGs, with 2.5% fragmented and 3.0% missing of core vertebrate genes. Furthermore, the total size of the assembled genome is similar to that estimated by jellyfish.

Code availability

No specific script was used in this work. The codes and pipelines used in data processing were all executed according to the manual and protocols of the corresponding bioinformatics software. The specific versions of software have been described in Methods.

References

Wright, J. J., David, S. R. & Near, T. J. Gene trees, species trees, and morphology converge on a similar phylogeny of living gars (Actinopterygii: Holostei: Lepisosteidae), an ancient clade of ray-finned fishes. Molecular Phylogenetics and Evolution 63, 848–856 (2012).
Article PubMed Google Scholar
Braasch, I. et al. The spotted gar genome illuminates vertebrate evolution and facilitates human-teleost comparisons. Nature genetics 48, 427–437 (2016).
Article CAS PubMed PubMed Central Google Scholar
Bi, X. et al. Tracing the genetic footprints of vertebrate landing in non-teleost ray-finned fishes. Cell 184, 1377–1391. e1314 (2021).
Article CAS PubMed Google Scholar
Mallik, R. et al. A chromosome-level genome assembly of longnose gar, Lepisosteus osseus. G3 (Bethesda) 13, https://doi.org/10.1093/g3journal/jkad095 (2023).
Raz-Guzmán, A., Huidobro, L. & Padilla, V. An updated checklist and characterisation of the ichthyofauna (Elasmobranchii and Actinopterygii) of the Laguna de Tamiahua, Veracruz, Mexico. Acta Ichthyologica et Piscatoria 48 (2018).
Warren, M. L. Jr et al. Diversity, distribution, and conservation status of the native freshwater fishes of the southern United States. Fisheries 25, 7–31 (2000).
Article Google Scholar
Li, M. & Zhang, H. Predicting the Distribution of the Invasive Species Atractosteus spatula, the Alligator Gar, in China. Water 15, https://doi.org/10.3390/w15244291 (2023).
Kumar, A. B., Raj, S., Arjun, C., Katwate, U. & Raghavan, R. Jurassic invaders: flood-associated occurrence of arapaima and alligator gar in the rivers of Kerala. Curr Sci 116, 1628–1630 (2019).
Google Scholar
Region, S., Sager, C. & Routledge, D. Lake Texoma Fisheries Management Plan.
Sherman, V. R., Yaraghi, N. A., Kisailus, D. & Meyers, M. A. Microstructural and geometric influences in the protective scales of Atractosteus spatula. Journal of the Royal Society Interface 13, 20160595 (2016).
Article PubMed PubMed Central Google Scholar
Yang, W. et al. Structure and fracture resistance of alligator gar (Atractosteus spatula) armored fish scales. Acta biomaterialia 9, 5876–5889 (2013).
Article CAS PubMed Google Scholar
DiBenedetto, K. C. Life history characteristics of alligator gar, Atractosteus spatual, in the Bayou DuLarge area of southcentral Louisiana (2009).
Goodger, W. P. & Burns, T. A. The cardiotoxic effects of alligator gar (Lepisosteus spatula) roe on the isolated turtle heart. Toxicon 18, 489–494 (1980).
Article CAS PubMed Google Scholar
O’Connell, M. T., Shepherd, T. D., O’Connell, A. M. & Myers, R. A. Long-term declines in two apex predators, bull sharks (Carcharhinus leucas) and alligator gar (Atractosteus spatula), in lake pontchartrain, an oligohaline estuary in southeastern Louisiana. Estuaries and Coasts 30, 567–574 (2007).
Article Google Scholar
Lan, T. et al. The Chromosome-Scale Genome of the Raccoon Dog: Insights into the Genomic Basis of Invasiveness.
Li, H. et al. Chromosome-level Genome of the Muskrat (Ondatra zibethicus). Genome biology and evolution 14, evac138 (2022).
Article PubMed PubMed Central Google Scholar
Eggleton, M. A., Jackson, J. R. & Lubinski, B. J. Comparison of Gears for Sampling Littoral‐Zone Fishes in Floodplain Lakes of the Lower White River, Arkansas. North American Journal of Fisheries Management 30, 928–939, https://doi.org/10.1577/m09-127.1 (2011).
Article Google Scholar
Smith, N. G. et al. Hydrologic Correlates of Reproductive Success in the Alligator Gar. North American Journal of Fisheries Management 40, 595–606, https://doi.org/10.1002/nafm.10442 (2020).
Article Google Scholar
Sambrook, J., Fritsch, E. F. & Maniatis, T. Molecular cloning: a laboratory manual. (Cold spring harbor laboratory press, 1989).
Liu, B. et al. Estimation of genomic characteristics by analyzing k-mer frequency in de novo genome projects. arXiv preprint arXiv:1308.2012 (2013).
Hu, J., Fan, J., Sun, Z. & Liu, S. NextPolish: a fast and efficient genome polishing tool for long-read assembly. Bioinformatics (2020).
Li, H. & Durbin, R. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26, 589–595 (2010).
Article PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell systems 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. 356(6333), 92–95, https://doi.org/10.1126/science.aal3327 (2017).
Durand, N. C. et al. Juicebox provides a visualization system for Hi-C contact maps with unlimited zoom. Cell systems 3, 99–101 (2016).
Article CAS PubMed PubMed Central Google Scholar
Echelle, A. A. & Grande, L. Lepisosteidae: gars. Freshwater fishes of North America 1, 243–278 (2014).
Google Scholar
Flynn, J. M. et al. RepeatModeler2 for automated genomic discovery of transposable element families. Proceedings of the National Academy of Sciences 117, 9451–9457 (2020).
Article ADS CAS Google Scholar
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant physiology 176, 1410–1422 (2018).
Article CAS PubMed Google Scholar
Bao, W., Kojima, K. K. & Kohany, O. Repbase Update, a database of repetitive elements in eukaryotic genomes. Mobile Dna 6, 1–6 (2015).
Article Google Scholar
Chen, N. Using Repeat Masker to identify repetitive elements in genomic sequences. Current protocols in bioinformatics 5, 4.10. 11–14.10. 14 (2004).
Article Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome research 19, 1639–1645 (2009).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M., Steinkamp, R., Waack, S. & Morgenstern, B. AUGUSTUS: a web server for gene finding in eukaryotes. Nucleic acids research 32, W309–W312 (2004).
Article CAS PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nature biotechnology 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Chris, T. et al. The African coelacanth genome provides insights into tetrapod evolution. Nature 496(7445), 311–316, https://doi.org/10.1038/nature12027 (2013).
Cheng, P. et al. The American Paddlefish Genome Provides Novel Insights into Chromosomal Evolution and Bone Mineralization in Early Vertebrates. Abstract Molecular Biology and Evolution 38(4), 1595–1607, https://doi.org/10.1093/molbev/msaa326 (2021).
Mount, D. W. Using the basic local alignment search tool (BLAST). Cold Spring Harbor Protocols 2007, pdb. top17 (2007).
Article PubMed Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and genomewise. Genome research 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Campbell, M. S., Holt, C., Moore, B. & Yandell, M. Genome annotation and curation using MAKER and MAKER-P. Current protocols in bioinformatics 48, 4.11. 11–14.11. 39 (2014).
Article Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Griffiths-Jones, S., Bateman, A., Marshall, M., Khanna, A. & Eddy, S. R. Rfam: an RNA family database. Nucleic acids research 31, 439–441 (2003).
Article CAS PubMed PubMed Central Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic acids research 25, 955–964 (1997).
Article CAS PubMed PubMed Central Google Scholar
NCBI Bioproject https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1161041 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP537046 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_043380575.1 (2024).
Wang, Q. Y. Yuxiang. Chromosome-scale genome assembly and gene annotation of the Alligator Gar (Atractosteus spatula). figshare https://doi.org/10.6084/m9.figshare.27193392.v1 (2024).
Guo, X. et al. CNSA: a data repository for archiving omics data. Database 2020 (2020).
Chen, F. Z. et al. CNGBdb: china national genebank database. Yi Chuan= Hereditas 42, 799–809 (2020).
PubMed Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the National Key Research and Development Program of China (2022YFF1300900), National Natural Science Foundation of China (42101071), and International Wetlands Research League, Alliance of International Science Organizations (ANSO-PA-2020-14).

Author information

These authors contributed equally: Qing Wang, Qianqian Yu.

Authors and Affiliations

Key Laboratory of Wetland Ecology and Environment & Heilongjiang Xingkai Lake Wetland Ecosystem National Observation and Research Station, Northeast Institute of Geography and Agroecology, Chinese Academy of Sciences, Changchun, China
Qing Wang, Qianqian Yu, Xiangqian Dong, Hengjin Chen, Xue Tian, Peng Qi, Haitao Wu & Yuxiang Yuan
School of Medicine, Nanjing University of Chinese Medicine, Nanjing, China
Qing Wang
College of Resources and Environment, University of Chinese Academy of Sciences, Beijing, China
Qianqian Yu

Authors

Qing Wang
View author publications
Search author on:PubMed Google Scholar
Qianqian Yu
View author publications
Search author on:PubMed Google Scholar
Xiangqian Dong
View author publications
Search author on:PubMed Google Scholar
Hengjin Chen
View author publications
Search author on:PubMed Google Scholar
Xue Tian
View author publications
Search author on:PubMed Google Scholar
Peng Qi
View author publications
Search author on:PubMed Google Scholar
Haitao Wu
View author publications
Search author on:PubMed Google Scholar
Yuxiang Yuan
View author publications
Search author on:PubMed Google Scholar

Contributions

Y.Y. conceived and designed the project. P.Q. and X.T. collected the samples. Q.Y. and H.W. performed the DNA and RNA extraction, library preparation, and genome sequencing. Q.W. Q.Y., X.D. and H.C. performed the bioinformatics analysis and visualized the results. Q.W. wrote the manuscript. P.Q., X.T. and Y.Y. revised and edited the manuscript. All authors have read and approved the final version of manuscript.

Corresponding author

Correspondence to Yuxiang Yuan.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wang, Q., Yu, Q., Dong, X. et al. Chromosome-scale genome assembly and gene annotation of the Alligator Gar (Atractosteus spatula). Sci Data 11, 1337 (2024). https://doi.org/10.1038/s41597-024-04256-2

Download citation

Received: 03 November 2024
Accepted: 06 December 2024
Published: 18 December 2024
Version of record: 18 December 2024
DOI: https://doi.org/10.1038/s41597-024-04256-2