Chromosome-level reference genome and annotation of the Arctic fish Anisarchus medius

Liu, Ruoyu; Meng, Ziyu; Mu, Yinan; Zhang, Ran; Ma, Hanhui; Hu, Jingjing; Wang, Yanan; Shi, Yuxin; Li, Yanan; Wang, Chaofeng; Zhang, Weini; Lin, Longshan; Zheng, Ping; Chen, Xinhua

doi:10.1038/s41597-025-04419-9

Download PDF

Data Descriptor
Open access
Published: 14 January 2025

Chromosome-level reference genome and annotation of the Arctic fish Anisarchus medius

Ruoyu Liu ORCID: orcid.org/0000-0003-0853-9863¹,
Ziyu Meng¹,
Yinan Mu¹,
Ran Zhang²,
Hanhui Ma¹,
Jingjing Hu¹,
Yanan Wang¹,
Yuxin Shi¹,
Yanan Li³,
Chaofeng Wang¹,
Weini Zhang¹,
Longshan Lin²,
Ping Zheng ORCID: orcid.org/0000-0002-1236-9038¹ &
…
Xinhua Chen¹

Scientific Data volume 12, Article number: 68 (2025) Cite this article

2368 Accesses
1 Citations
Metrics details

Subjects

Abstract

Anisarchus medius (Reinhardt, 1837) is a widely distributed Arctic fish, serving as an indicator of climate change impacts on coastal Arctic ecosystems. This study presents a chromosome-level genome assembly for A. medius using PacBio sequencing and Hi-C technology. The PacBio assembly totaled 739.07 Mb across 697 contigs, with a Contig N50 of 10.004 Mb. Hi-C mapping yielded 23 chromosomes, with a successful mapping rate of 90.53% and a Scaffold N50 of 30.20 Mb. Genome BUSCO integrity was 97.05%. Repetitive sequences accounted for 240.83 Mb (32.58%) of the genome. Non-coding RNA annotations included 4,928 rRNAs, 9,663 tRNAs, 347 snRNAs, and 21 snoRNAs. A total of 30,345 protein-coding genes were identified, encoding 46,603 proteins, with a BUSCO completeness of 94.98%. Molecular pathway related to the endocrine system, carbohydrate metabolism, folding, sorting, and degradation, signal transduction, and transport and catabolism contribute to A. medius adaptation to extreme Arctic environments. This high-quality genome provides valuable genetic resources for understanding Arctic adaptations and supporting polar ecological conservation and management.

A chromosome-level genome assembly of tropical purple sea urchin Heliocidaris crassispina

Article Open access 18 December 2024

Chromosome-level genome assembly of the parasitoid wasp Aenasius arizonensis

Article Open access 17 May 2025

Chromosome-level genome assembly of Acrossocheilus fasciatus using PacBio sequencing and Hi-C technology

Article Open access 03 February 2024

Background & Summary

Anisarchus medius, commonly known as stout eelblenny, is a ray-fin fish that belongs to the eel pricklebacks, taxonomically classified under Order Perciformes, Suborder Cottioidei, Infraorder Zoarcales, and Family Stichaeidae (NCBI: txid980385, Fig. 1). A. medius is a common seen species in the Arctic Ocean, with a practically circumpolar distribution and particular abundance in the Pacific Arctic¹. Thriving in muddy substrates at depths of 30–100 m¹ and favoring water temperatures below 0 °C^2,3, this benthic fish primarily feeds on polychaetes and other small benthic organisms^1,4, acting as a critical intermediary in transferring energy from lower benthic trophic levels to higher predators⁴. By maintaining ecosystem stability, A. medius plays an essential role in Arctic marine food webs^1,4. Given the Arctic ecosystem is highly sensitive to climate change, with rising temperatures and shifting ice dynamics threatening its biodiversity^5,6,7, A. medius has emerged as an important indicator species for assessing the climate change on population dynamics within the coastal Arctic ecosystems². Deciphering the high-quality genome of the ecosystem indicator species like A. medius is critical for informing future conservation and management strategies in the face of a rapidly warming climate.

Despite its ecological importance, research on A. medius remains limited and predominantly focused on fundamental aspects, including morphological classification, growth and development, geographical distribution, phylogeny, nutrient composition, and ecological and climatic studies. Morphological studies address characteristics such as body size, age description^1,8,9, and skeletal development during ontogenesis¹⁰. Growth and development studies have explored various life stages, including egg, larval, post-larval, and juvenile phases^1,10,11. Geographical distribution surveys^1,9 and phylogenetic research based on mitochondrial genomes and conserved sequences ^2,12 have also been conducted. Physiological studies are scarce and mainly focus on the aspect such as fatty acid composition^13,14. Key biological questions, such as the reproductive habits and life history of A. medius, remain unanswered. Ecological and climatic studies of A. medius are relatively more abundant, primarily addressing temperature-related phenomena, such as osmotic and ionic regulation at varying temperatures¹⁵ and population migrations driven by global warming^16,17,18,19. Genomic resources for A. medius are currently limited to one mitochondrial genome² and two genomic Illumina sequencing datasets²⁰. Critical gaps in understanding the origin, evolution, and adaptive mechanisms of A. medius to the Arctic environment, as well as its conservation and management, highlight the urgent need for high-quality genome sequencing.

This study combines PacBio long-read sequencing with high-throughput chromosome conformation capture (Hi-C) technology to construct a high-quality, chromosome-level reference genome for A. medius. Comprehensive genomic annotation includes repetitive sequences, non-coding elements, and protein-coding genes. Functional annotation, including Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis, further enriches our understanding of the A. medius genome. These findings provide valuable insights into the adaptive evolution of Arctic organisms and establish a foundation for polar ecological conservation and management efforts.

Methods

Ethics statement

All animal handling and experimental procedures were approved by the Fujian Agriculture and Forestry University Ethics Committee (approval number: PZCASFAFU24069).

Sample collection and sequencing

During the 2014 Chinese Arctic Research Expedition, A Stout Eelblenny (A. medius) was sampled aboard by the icebreaker R/V Xuelong (Snow Dragon) in the Chukchi Sea (-169.02 °W; 67.68°N) through trawling. The arctic fish sample was rapidly frozen with liquid nitrogen on the deck and then transferred to a −80 °C freezer for storage^{21,22,23,24,25}. Upon returning to the laboratory, the dorsal muscle tissue of the fish was collected for genomic sequencing and analysis²¹. High-quality genomic DNA was extracted from the dorsal muscle tissue using QIAGEN Genomic-tip 100/G kit (Qiagen, Hilden, Germany)²². The quality and quantity of the extracted DNA were assessed using a NanoDrop 2000 spectrophotometer (NanoDrop Technologies, DE, USA), measuring the OD260/280 and OD260/230 ratios^22,24. The concentration of the genomic DNA was accurately measured using a Qubit 3.0 Fluorometer (Life Technologies, CA, USA). The integrity of the genomic DNA was verified through 0.8% agarose gel electrophoresis^21,22,24.

The SMRTbell library was prepared using the SMRTbell Express Template Prep Kit 2.0 (Pacific Biosciences)^21,26. In brief, 15 μg of genomic DNA was subjected to an initial enzymatic reaction to remove single-stranded overhangs, followed by treatment with repair enzymes to fix any potential damage to the DNA backbone. Once the DNA damage was repaired, the ends of the double-stranded fragments were polished, and an A-overhang was added. The fragments were then ligated to T-overhang SMRTbell adapters at 20 °C for 60 min. After ligation, the SMRTbell library was purified using 1X AMPure PB beads. The library’s size distribution and concentration were evaluated using a FEMTO Pulse automated pulsed-field capillary electrophoresis instrument (Agilent Technologies, DE, USA) and a Qubit 3.0 Fluorometer (Life Technologies, CA, USA). Once the library characterization was complete, 3 μg of the library underwent size selection using the BluePippin system (Sage Science, MA, USA), which removed SMRTbells smaller than 15 kb. After size selection, the library was again purified with 1X AMPure PB beads. The final library size and quantity were verified using the FEMTO Pulse and the Qubit dsDNA HS Assay Kit. Sequencing primer and Sequel II DNA Polymerase were then annealed and bound to the SMRTbell library. The library was loaded at an on-plate concentration of 120 pM using diffusion loading. SMRT sequencing was conducted on a single 8 M SMRT Cell using the Sequel II System and the Sequel II Sequencing Kit, with 1800-minute movies produced by Frasergen Bioinformatics Co., Ltd. (Wuhan, China), and the resulting data was processed using SMRT LINK 5.0 software²⁶.

Genome assembly and analysis

SMRTbell libraries were sequenced on a PacBio Revio system, and consensus HiFi reads were generated using the CCS software (https://github.com/pacificbiosciences/unanimity) with a minimum of three passes (‘-minPasses 3’). To further enhance and validate the assemblies, we generated 35.75 Gb of PacBio HiFi reads for this sample (Table 1), and the length distribution is shown in Fig. 2. These long (~15 kb), highly accurate (>99%) HiFi reads were assembled using Hifiasm (https://github.com/chhylp123/hifiasm)²⁷. To correct errors in the initial assembly, Illumina-derived short reads were employed using Pilon (v1.23)²⁸ for error correction. The final A. medius genome assembly had a total length of approximately 739.07 Mb, consisting of 697 contigs with a Contig N50 of 10.004 Mb (Table 2). The completeness of the assembled genome was assessed using BUSCO (v3.1.0) with the OrthoDB database Actinopterygii_odb9^29,30, where 97.05% of the BUSCO genes were successfully identified (Fig. 3), indicating a highly complete genome assembly.

Table 1 The summary of HIFI-Subread statistic of A. medius.

Full size table

Table 2 The assembly statistic of A. medius.

Full size table

To anchor the contigs, 7.39 Gb of clean read pairs were generated from the Hi-C library and mapped to the polished A. medius genome using Bowtie 2 (v2.4.1) with default parameters³¹. Paired reads, where mates were mapped to different contigs, were utilized for Hi-C-assisted scaffolding. Invalid reads, such as self-ligation, non-ligation, and others like Start Near Rsite, PCR amplification artifacts, random breaks, Large Small Fragments, and Extreme Fragments, were filtered out using the HICUP software³². We clustered 390 contigs into 23 groups using agglomerative hierarchical clustering within 3D-DNA, as illustrated in Fig. 4. The 3D-DNA tool was also employed to order and orient these clustered contigs^33,34. A total of 285 contigs, spanning 669,270,049 base pairs, were successfully ordered and oriented. This effort resulted in the first chromosomal-level high-quality assembly, with chromosome lengths ranging from 17.25 Mb to 46.97 Mb, accounting for 90.53% of the total genome sequence (Table 2). The statistics of A. medius chromosome lengths are summarized in Table 3.

Table 3 The summary of A. medius chromosome lengths statistics.

Full size table

Repetitive sequence annotation and analysis

To identify the repeat elements in A. medius genome, we applied two complementary methods: homology-based and de novo prediction. For the homology-based approach, we used RepeatMasker (v4.0.9)³⁵ with the Repbase TE library to identify known transposable elements (TEs). In parallel, we constructed a de novo repeat library for the A. medius genome using RepeatModeler (http://www.repeatmasker.org/RepeatModeler/)³⁶, which automatically integrates two key de novo repeat-finding tools: RECON (v1.08) and RepeatScout (v1.0.5). These tools enabled comprehensive identification, refinement, and classification of consensus models for potential interspersed repeats³⁷. Additionally, a de novo search for long terminal repeat (LTR) retrotransposons was conducted using LTR_FINDER (v1.0.7), LTR harvest (v1.5.11), and LTR retriever (v2.7)^38,39,40. We also identified tandem repeats using the Tandem Repeat Finder (TRF) package and simple sequence repeats (SSR) using MISA (v1.0)⁴¹. Finally, we merged the libraries generated from both approaches and used RepeatMasker³⁵ to identify the complete repeat content in the genome. As a result, we identified 240.83 Mb of repeat sequences within A. medius genome. A summary of these repeat sequence statistics is provided in Table 4. Furthermore, The Circos program⁴² was used to generate a circular genome map, displaying various genomic features in concentric layers. From the outermost to the innermost ring, the map illustrates the following: gene density (using a 200 kb sliding window), transposon density (200 kb sliding window), repetitive sequence density (marked as SimpleRepeat by RepeatMasker, with a 200 kb window), and GC content (200 kb sliding window, where red indicates regions with higher GC content than the genome average, and green indicates regions with lower GC content), the innermost ring represents collinear blocks (Fig. 5).

Table 4 The summary of repeat sequence statistics of A. medius genome.

Full size table

Non-coding annotation and analysis

We employed the tRNAscan-SE (v1.3.1)⁴³ algorithm with default parameters to identify tRNA genes. tRNAs are adaptor molecules composed of RNA that play a crucial role in translating the three-letter genetic code in messenger RNA (mRNA) into the twenty-letter code of amino acids in proteins. For rRNA gene identification, we used RNAmmer (v1.2)⁴⁴ with the parameters ‘-S euk -m lsu,ssu,tsu’. Additionally, snoRNAs, a class of small RNA molecules that guide the chemical modification of other RNAs (primarily ribosomal RNAs, transfer RNAs, and small nuclear RNAs), were identified using cmscan (v1.1.2)⁴⁵ against the Rfam (v14.0)⁴⁶ database with default parameters. In the rRNA analysis, we identified a total of 4,928 rRNA sequences, with lengths ranging from 9,028 bp to 491,254 bp (Table 5). For tRNAs, 9,663 sequences were identified, with a total length of 728,112 bp and an average length of 75 bp, accounting for 0.0011 of the genomes. Additionally, we identified 347 snRNAs and 21 snoRNAs, with total lengths of 45,796 bp and 2,661 bp, respectively.

Table 5 The summary of non-coding RNA statistics of A. medius genome.

Full size table

Protein coding gene prediction and analysis

For protein-coding genes prediction in A. medius genome using three complementary methods: ab initio gene prediction, homology-based prediction, and RNA-Seq-guided prediction. Before gene prediction, the assembled A. medius genome underwent both hard and soft masking with RepeatMasker³⁵ to eliminate repetitive elements. For ab initio gene prediction, we used Augustus (v3.3.3)^47,48, which was trained with high-quality protein models generated from RNA-Seq data. For homology-based gene prediction, we employed MAKER (v2.31.10)⁴⁹. In this process, protein and transcript sequences were aligned to the genome assembly, and coding genes were predicted using the default parameters in MAKER. RNA-Seq-guided gene prediction involved aligning clean RNA-Seq reads to the genome using HISAT2 (v2.0.0)⁵⁰. The gene structures were then reconstructed using Trinity (v2.3.2)⁵¹ with the parameter ‘–genome_guided_bam XXX_aligned.bam’, followed by TransDecoder (v2.01) and MAKER⁴⁹ to further refine the gene models. Finally, the gene prediction results from the three methods were integrated using EVidenceModeler (EVM, v1.1.1)⁵², resulting in a consistent set of non-overlapping sequence assemblies that defined the gene structures. In total, 30,345 protein-coding genes were predicted, with an average gene length of 8,563.51 bp. Additionally, 46,603 proteins were identified, with a total length of 23,509,360 amino acids (aa) and an average length of 504.46 aa. Figure 6A illustrates the protein length distribution for the A. medius genome. The completeness of the predicted protein-coding genes was assessed using BUSCO, with the Actinopterygii odb9 database^29,30, and the protein-coding genes datasets showed a 94.98% completeness score (Fig. 6B).

Protein coding gene annotation and analysis

Gene functions were predicted by aligning sequences to several major protein databases, including the National Center for Biotechnology Information (NCBI) Non-Redundant (NR), TrEMBL, InterPro⁵³, and Swiss-Prot⁵⁴, using BLASTP (ncbi-blast v2.6.0+)⁵⁵ with an E-value threshold of 1E⁻⁵. Additionally, functional annotation was performed using the Kyoto Encyclopedia of Genes and Genomes (KEGG) database⁵⁶. Protein domain identification was carried out using PfamScan⁵⁷ and InterProScan (v5.35–74.0)⁵⁸, with reference to the InterPro protein database⁵³. Motifs and domains within gene models were identified through the PFAM database⁵⁹. Gene Ontology (GO) terms⁶⁰ were assigned to each gene using Blast2GO⁶¹. In total, approximately 45,380 (about 97.38%) of the predicted protein-coding genes in A. medius were successfully annotated with known gene functions, conserved domains, and GO terms. Table 6 provides a summary of the annotation statistics across the seven major databases.

Table 6 The summary of protein-coding gene annotation statistics of A. medius genome.

Full size table

In the GO annotation, the molecular function category showed the highest rate of annotations for binding (57.75%), followed by catalytic activity (28.28%). In the cellular component category, membrane parts had the most annotations (42.82%), followed by cell parts (21.12%). For the biological process category, metabolic processes (31.10%) and cellular processes (20.59%) were the top two with the highest annotation rates (Fig. 7). In the KEGG annotation, within the organismal systems category, the endocrine system had the highest number of annotated genes (4,368), followed by the immune system (3,291) and nervous system (3,075). Carbohydrate metabolism, lipid metabolism, and amino acid metabolism also showed a significant number of annotated genes, with 1,510, 1,300, and 980 genes, respectively. In the genetic information processing category, folding, sorting, and degradation were associated with 959 genes, while translation had 870 annotated genes. For environmental information processing, 10,047 genes were annotated under signal transduction, and 1,912 under signaling molecules and interaction. In the cellular processes category, cellular communities-eukaryotes, and transport and catabolism had 2,238 and 1,593 annotated genes, respectively. Figures 8 and 9 illustrate the protein-coding gene annotations from the KEGG and KOG databases.

Data Records

All data from A. medius whole-genome project have been submitted to the NCBI SRA database under BioProject accession number PRJNA1176308 (https://www.ncbi.nlm.nih.gov/bioproject/PRJNA1176308/). Specifically, the genome PacBio sequencing data was deposited at GeneBank (accession SRR31143456)⁶². The genome Hi-C sequencing data was deposited at GeneBank (accession SRR31715051)⁶³. The RNA PacBio sequencing data was deposited at GeneBank (accession SRR31715464)⁶⁴. And The RNA BGI sequencing data was deposited at GeneBank (accession SRR31715465)⁶⁵. This Whole Genome Shotgun project has been deposited at GenBank under the accession JBJXUX000000000⁶⁶. Moreover, the assembly of A. medius has been deposited at GenBank under the accession GCA_046127435.1 (https://www.ncbi.nlm.nih.gov/datasets/genome/GCA_046127435.1/). Concurrently, the final assembly and annotation files of this study are publicly available in FigShare (https://doi.org/10.6084/m9.figshare.27368472)⁶⁷.

Technical Validation

To ensure the accuracy of sequencing data, we performed a series of quality control checks on the samples before proceeding with library construction. First, we assessed the purity of the genomic DNA using a Nanodrop spectrophotometer (NanoDrop Technologies, DE, USA), measuring the OD260/280 and OD260/230 ratios and ensuring that the nucleic acid absorption peaks were normal. Next, the concentration of the genomic DNA was accurately measured using a Qubit fluorometer (Life Technologies, CA, USA), and the Qubit concentration was compared with the Nanodrop concentration to evaluate the sample’s purity. Finally, the integrity of the genomic DNA was verified through electrophoresis. Only after meeting all quality standards were the samples used for library preparation. The completeness of the assembled genome was evaluated using BUSCO (v3.1) analysis with the OrthoDB database. The results showed that 97.05% of the BUSCO genes (Actinopterygii odb9) were successfully identified, indicating a high level of completeness in the assembled genome (Fig. 2).

Code availability

No custom scripts or codes were used in the management and verification of the data sets in this study. All software and pipelines used for data processing were executed according to the manuals and protocols of the bioinformatics software cited above. The specific parameters were described if the default parameters were not applied for data analysis.

References

Mecklenburg, C. W., Lynghammar, A., Johannesen, E., Byrkjedal, I. & Wienerroither, R. M. 2018. Marine Fishes of the Arctic Region. Conservation of Arctic Flora and Fauna. 372–373 (2018).
Liu, K., Lin, H., Wang, J. & Sun, J. Complete mitochondrial genome sequence and phylogenetic analysis of Anisarchus medius (Reinhardt, 1837). Mitochondrial DNA B. 4(2), 3973–3974 (2019).
Article MATH Google Scholar
Whitehead, P., Bauchot, M., Hureau, J., Nielsen, J. & Tortonese, E. Fishes of the North-eastern Atlantic and the Mediterranean. Vol. 3 1126–1129 (Springer, 1986).
McMeans, B. C., Rooney, N., Arts, M. T. & Fisk, A. T. Food web structure of a coastal Arctic marine ecosystem and implications for stability. Mar. Ecol. Prog. Ser. 482, 17–28 (2013).
Article ADS CAS Google Scholar
Rantanen, M. et al. The Arctic has warmed nearly four times faster than the globe since 1979. Commun Earth Environ. 3, 168 (2022).
Article ADS Google Scholar
Post, E. et al. Ecological dynamics across the Arctic associated with recent climate change. Science. 325(5946), 1355–1358 (2009).
Article ADS PubMed MATH CAS Google Scholar
Johannessen, O. M. & Miles, M. W. Critical vulnerabilities of marine and sea ice–based ecosystems in the high Arctic. Reg. Environ. Change. 11(Suppl 1), 239–248 (2011).
Article MATH Google Scholar
Forster, C. E., Norcross, B. L. & Spies, I. Documenting growth parameters and age in Arctic fish species in the Chukchi and Beaufort seas. Deep Sea Research Part II: Topical Studies in Oceanography 177 (2020).
Looby, A. et al. Global inventory of species categorized by known underwater sonifery. Sci Data 10, 892 (2023).
Article PubMed PubMed Central MATH Google Scholar
Voskoboinikova, O. Skeleton development in ontogenesis of the butterfish Pholis gunnellus, Anisarchus Anisarchus medius, and slender eel blenny Lumpenus fabricii (Zoarcoidei, Perciformes). J Ichthyol 45, 503–512 (2005).
Google Scholar
Chiperzak, D., Hopky, G., Lawrence, M., Schmid, D. & Reist, J. Larval and post-larval fish data from the Canadian Beaufort Sea shelf, July to September, 1987. (Fisheries and Oceans Canada Winnipeg MB, 2003).
Turanov, S., Kartavtsev, Y. P. & Zemnukhov, V. Molecular phylogenetic study of several eelpout fishes (Perciformes, Zoarcoidei) from Far Eastern seas on the basis of the nucleotide sequences of the mitochondrial cytochrome oxidase 1 gene (CO1). Russ. J. Genet. 48, 208–223 (2012).
Article CAS Google Scholar
Dahl, T. M. et al. Fatty acid composition of the blubber in white whales (Delphinapterus leucas). Polar Biol. 23, 401–409 (2000).
Article MATH Google Scholar
Murzina, S. A. et al. Lipid status of the two high latitude fish species, Leptoclinus maculatus and Lumpenus fabricii. Int J Mol Sci. 14, 7048–7060 (2013).
Article PubMed PubMed Central MATH CAS Google Scholar
Prosser, C. L., Mackay, W. & Kato, K. Osmotic and ionic concentrations in some Alaskan fish and goldfish from different temperatures. Physiol. Zool. 43, 81–89 (1970).
Article MATH CAS Google Scholar
Lin, L. et al. Composition and distribution of fish species collected during the fourth Chinese National Arctic Research Expedition in 2010. Arctic Research 23, 116 (2012).
MATH Google Scholar
Norcross, B. L., Holladay, B. A., Busby, M. S. & Mier, K. L. Demersal and larval fish assemblages in the Chukchi Sea. Deep Sea Research Part II: Topical Studies in Oceanography 57, 57–70 (2010).
Article ADS Google Scholar
Frainer, A. et al. Climate-driven changes in functional biogeography of Arctic marine fish communities. Proceedings of the National Academy of Sciences 114, 12202–12207 (2017).
Article ADS MATH CAS Google Scholar
Fu, S. et al. Trophic levels and trophic niches of fish from the Bering Sea and Chukchi Sea. Biodiversity Science 31, 22521 (2023).
Article Google Scholar
Ghezelayagh, A. et al. Prolonged morphological expansion of spiny-rayed fishes following the end-Cretaceous. Nat Ecol Evol 6, 1211–1220 (2022).
Article PubMed MATH Google Scholar
Mu, Y. et al. Whole genome sequencing of a snailfish from the Yap Trench (~7,000 m) clarifies the molecular mechanisms underlying adaptation to the deep sea. PLoS Genet. 17, e1009530 (2021).
Article PubMed PubMed Central CAS Google Scholar
Liu, R. et al. De novo genome assembly of limpet Bathyacmaea lactea (Gastropoda: Pectinodontidae), the first reference genome of a deep-sea gastropod endemic to cold seeps. Genome Biol. Evol. 12, evaa100 (2021).
Google Scholar
Liu, R., Liu, J. & Zhang, H. Positive selection analysis reveals the deep-sea adaptation of a hadal sea cucumber (Paelopatides sp.) to the Mariana Trench. J. Oceanol. Limnol. 39, 266–281 (2021).
Article ADS MATH Google Scholar
Feng, C. et al. The genome of a new anemone species (Actiniaria: Hormathiidae) provides insights into deep-sea adaptation. Deep-Sea Res. I: Oceanogr. Res. Pap. 170, 103492 (2021).
Article MATH Google Scholar
Ao, J. et al. Genome sequencing of the perciform fish Larimichthys crocea provides insights into molecular and genetic mechanisms of stress adaptation. PLoS Genet. 11, e1005118 (2015).
Article PubMed PubMed Central Google Scholar
Flusberg, B. et al. Direct detection of DNA methylation during single-molecule, real-time sequencing. Nat Methods 7, 461–465 (2010).
Article PubMed PubMed Central CAS Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article PubMed MATH CAS Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PloS One. 9(11), e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Manni, M., Berkeley, M. R., Seppey, M. & Zdobnov, E. M. BUSCO: assessing genomic data quality and beyond. Curr. Protoc. 1(12), e323 (2021).
Article PubMed Google Scholar
Kuznetsov, D. et al. OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity. Nucleic Acids Res. 51(D1), D445–51 (2023).
Article PubMed CAS Google Scholar
Li, H. & Durbin, R. Fast and accurate long-read alignment with burrows–wheeler transform. Bioinformatics 26, 589–595 (2010).
Article MATH Google Scholar
Wingett, S. et al. HiCUP: pipeline for mapping and processing Hi-C data. F1000Res. 4, 1310 (2015).
Article PubMed PubMed Central MATH Google Scholar
Servant, N. et al. HiC-Pro: an optimized and flexible pipeline for Hi-C data processing. Genome Biol. 16, 1–11 (2015).
Article MATH Google Scholar
Durand, N. C. et al. Juicer Provides a One-Click System for Analyzing Loop-Resolution Hi-C Experiments. Cell Syst. 3, 95–98 (2016).
Article PubMed PubMed Central MATH CAS Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr Protoc Bioinform. 25(1), 4–10 (2009).
Article Google Scholar
Flynn, J. M. et al. RepeatModeler 2 for automated genomic discovery of transposable element families. Proc. Natl. Acad. Sci. USA 117, 9451–9457 (2020).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article PubMed PubMed Central MATH CAS Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Ellinghaus, D., Kurtz, S. & Willhoeft, U. LTRharvest, an efficient and flexible software for de novo detection of LTR retrotransposons. BMC Bioinform. 9, 18 (2008).
Article Google Scholar
Ou, S. & Jiang, N. LTR_retriever: A highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Article PubMed MATH CAS Google Scholar
Beier, S., Thiel, T., Münch, T., Scholz, U. & Mascher, M. MISA-web: a web server for microsatellite prediction. Bioinformatics. 33(16), 2583–2585 (2017).
Article PubMed PubMed Central CAS Google Scholar
Krzywinski, M. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19(9), 1639–1645 (2009).
Article PubMed PubMed Central MATH CAS Google Scholar
Lowe, T. M. & Eddy, S. R. tRNAscan-SE: A Program for Improved Detection of Transfer RNA Genes in Genomic Sequence. Nucleic Acids Res. 25, 955–964 (1997).
Article PubMed PubMed Central MATH CAS Google Scholar
Lagesen, K. et al. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–8 (2007).
Article ADS PubMed PubMed Central MATH CAS Google Scholar
Nawrocki, E. P. & Eddy, S. C. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics 29, 2933–2935 (2013).
Article PubMed PubMed Central CAS Google Scholar
Kalvari, I. et al. Rfam 14: expanded coverage of metagenomic, viral and microRNA families. Nucleic Acids Res. 49(D1), D192–200 (2021).
Article MathSciNet PubMed MATH CAS Google Scholar
Stanke, M., Diekhans, M., Baertsch, R. & Haussler, D. Using native and syntenically mapped cDNA alignments to improve de novo gene finding. Bioinformatics 24(5), 637–644 (2008).
Article PubMed CAS Google Scholar
Stanke, M., Schöffmann, O., Morgenstern, B. & Waack, S. Gene prediction in eukaryotes with a generalized hidden Markov model that uses hints from external sources. BMC Bioinformatics. 7, 1–11 (2006).
Article MATH Google Scholar
Holt, C. & Yandell, M. MAKER2: an annotation pipeline and genome-database management tool for second-generation genome projects. BMC Bioinformatic. 12, 491 (2011).
Article Google Scholar
Kim, D., Langmead, B. & Salzberg, S. L. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods. 12(4), 357–60 (2015).
Article PubMed PubMed Central CAS Google Scholar
Haas, B. J. et al. De novo transcript sequence reconstruction from RNA-seq using the Trinity platform for reference generation and analysis. Nat. Protocols. 8(8), 1494–1512 (2013).
Article PubMed MATH CAS Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the program to assemble spliced alignments. Genome Biol. 9, R7 (2008).
Article PubMed PubMed Central MATH Google Scholar
Mitchell, A. et al. The InterPro protein families database: the classification resource after 15 years. Nucleic Acids Res. 43(D1), D213–21 (2015).
Article PubMed MATH Google Scholar
Boeckmann, B. et al. The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res. 31(1), 365–370 (2003).
Article PubMed PubMed Central MATH CAS Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25(17), 3389–3402 (1997).
Article PubMed PubMed Central MATH CAS Google Scholar
Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res. 40(D1), D109–114 (2012).
Article PubMed CAS Google Scholar
Mistry, J., Bateman, A. & Finn, R. D. Predicting active site residue annotations in the Pfam database. BMC Bioinformatic. 8, 1–4 (2007).
Article MATH Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 30(9), 1236–1240 (2014).
Article PubMed PubMed Central MATH CAS Google Scholar
Finn, R. D. et al. Pfam: the protein families database. Nucleic Acids Res. 42(D1), D222–230 (2014).
Article PubMed MATH CAS Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25(1), 25–29 (2000).
Article PubMed PubMed Central MATH CAS Google Scholar
Conesa, A. & Götz, S. Blast2GO: a comprehensive suite for functional analysis in plant genomics. Int. J. Plant Genomics. 1, 619832 (2008).
MATH Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31143456 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31715051 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31715464 (2024).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRR31715465 (2024).
Liu, R. Y. Anisarchus medius, whole genome shotgun sequencing project. GenBank https://identifiers.org/ncbi/insdc:JBJXUX000000000 (2024).
Liu, R. Y. Genome annotations of Anisarchus medius. Figshare dataset https://doi.org/10.6084/m9.figshare.27368472 (2024).
Article Google Scholar

Download references

Acknowledgements

This research was supported by the National Natural Science Foundation of China (grants 42276247 and 32202995), National Key Research and Development Program of China (2022YFD2401001), and Natural Science Foundation of Fujian Province (2022J01135). We would like to express our sincere gratitude to the captain, crew, and scientific expedition members of the icebreaker R/V Xuelong (Snow Dragon), especially Pengfei Cui from the Third Institute of Oceanography, Ministry of Natural Resources, for their support and cooperation during the 2014 Chinese Arctic Research Expedition. We also appreciate Ke Xing and Jianan Chen for their assistance in sample preservation, genome survey and manuscript revision. Additionally, we are indebted to the Chinese Scientific Committee on Oceanic Research for providing an invaluable platform for academic discussion and exchange.

Author information

Authors and Affiliations

State Key Laboratory of Mariculture Breeding, Key Laboratory of Marine Biotechnology of Fujian Province, Institute of Oceanology, College of Marine Sciences, Haixia Institute of Science and Technology, Fujian Agriculture and Forestry University, Fuzhou, 350002, China
Ruoyu Liu, Ziyu Meng, Yinan Mu, Hanhui Ma, Jingjing Hu, Yanan Wang, Yuxin Shi, Chaofeng Wang, Weini Zhang, Ping Zheng & Xinhua Chen
Laboratory of Marine Biodiversity Research, Third Institute of Oceanography, Ministry of Natural Resources, Xiamen, 361005, China
Ran Zhang & Longshan Lin
Guangzhou Key Laboratory of Aquatic Animal Diseases and Waterfowl Breeding, College of Animal Sciences and Technology, Zhongkai University of Agriculture and Engineering, Guangzhou, 510222, China
Yanan Li

Authors

Ruoyu Liu
View author publications
Search author on:PubMed Google Scholar
Ziyu Meng
View author publications
Search author on:PubMed Google Scholar
Yinan Mu
View author publications
Search author on:PubMed Google Scholar
Ran Zhang
View author publications
Search author on:PubMed Google Scholar
Hanhui Ma
View author publications
Search author on:PubMed Google Scholar
Jingjing Hu
View author publications
Search author on:PubMed Google Scholar
Yanan Wang
View author publications
Search author on:PubMed Google Scholar
Yuxin Shi
View author publications
Search author on:PubMed Google Scholar
Yanan Li
View author publications
Search author on:PubMed Google Scholar
Chaofeng Wang
View author publications
Search author on:PubMed Google Scholar
Weini Zhang
View author publications
Search author on:PubMed Google Scholar
Longshan Lin
View author publications
Search author on:PubMed Google Scholar
Ping Zheng
View author publications
Search author on:PubMed Google Scholar
Xinhua Chen
View author publications
Search author on:PubMed Google Scholar

Contributions

Xinhua Chen, Longshan Lin, and Ruoyu Liu conceived of the project; Scientific expedition members of the icebreaker R/V Xuelong (Snow Dragon), especially Pengfei Cui collected the samples; Ruoyu Liu and Yinan Mu preserved the samples. Longshan Lin and Ran Zhang collected photos of this species. Ruoyu Liu and Ziyu Meng extracted the genomic DNA; Ruoyu Liu and Ping Zheng performed the data analysis and wrote the manuscript; Ruoyu Liu revised the manuscript. All authors read and approved the final version of the manuscript.

Corresponding authors

Correspondence to Longshan Lin, Ping Zheng or Xinhua Chen.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Liu, R., Meng, Z., Mu, Y. et al. Chromosome-level reference genome and annotation of the Arctic fish Anisarchus medius. Sci Data 12, 68 (2025). https://doi.org/10.1038/s41597-025-04419-9

Download citation

Received: 25 November 2024
Accepted: 03 January 2025
Published: 14 January 2025
Version of record: 14 January 2025
DOI: https://doi.org/10.1038/s41597-025-04419-9

This article is cited by

Chromosome-level genome assembly of an Arctic fish species pale eelpout (Lycodes pallidus)
- Ran Zhang
- Rui Wang
- Longshan Lin
Scientific Data (2025)