Chromosome-level genome assembly of scalloped spiny lobster Panulirus homarus homarus

Sun, Dongfang; Lv, Jianjian; Gao, Baoquan; Jia, Shaoting; Liu, Ping; Li, Jian; Li, Jitao; Ren, Xianyun

doi:10.1038/s41597-025-05253-9

Download PDF

Data Descriptor
Open access
Published: 28 May 2025

Chromosome-level genome assembly of scalloped spiny lobster Panulirus homarus homarus

Dongfang Sun¹,
Jianjian Lv^1,2,
Baoquan Gao^1,2,
Shaoting Jia¹,
Ping Liu^1,2,
Jian Li^1,2,
Jitao Li^1,2 &
…
Xianyun Ren^1,2

Scientific Data volume 12, Article number: 900 (2025) Cite this article

1869 Accesses
Metrics details

Subjects

Abstract

Lobsters, aquatic organisms of significant economic value, hold an important position in the global aquaculture and fisheries industries. However, due to overfishing and ecological change, the populations of certain lobster species have declined dramatically, prompting conservation efforts in various countries. However, limited genomics research has restricted our capacity to conserve and exploit lobster germplasm resources. Here, we present a chromosome-level reference genome for Panulirus homarus homarus constructed using PacBio long-read sequencing and Hi-C data. The genome assembly size was 2.61 Gb, with a contig N50 of 5.43 Mb, and a scaffold N50 of 36.69 Mb. The assembled sequences were anchored to 73 chromosomes, covering 96.05% of the total genome. A total of 25,580 protein-coding genes were predicted, and 99.98% of the genes were functionally annotated using various protein databases. The high-quality genome assembly provides a valuable resource for studying the biology and evolutionary history of P. h. homarus, and could facilitate sustainable resource management, aquaculture, and conservation of the species.

Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus

Article Open access 22 June 2024

Chromosome-level genome assembly and annotation of the gynogenetic large-scale loach (Paramisgurnus dabryanus)

Article Open access 26 January 2025

Assembling chromosome-level genomes of male and female Chanodichthys mongolicus using PacBio HiFi reads and Hi-C technologies

Article Open access 06 June 2025

Background & Summary

The scalloped spiny lobster, Panulirus homarus, belongs to Crustacea, Decapoda, Palinuridae, and Panulirus, and it consists of three economically valuable subspecies, including Panulirus h. homarus (Fig. 1a), Panulirus h. megasculptus, and Panulirus h. rubellus^1,2. P. h. rubellus has also been reported to be a distinct species^3,4. The species is an “engineer species” in coral reefs and rocky ecosystems, controlling population size by preying on benthic invertebrates (e.g., sea urchins, shellfish), maintaining ecological balance, and preventing overpopulation of certain species, which could destroy habitat structure⁵. In addition, P. homarus is distributed widely in the Indo-West Pacific (Fig. 1b), providing a good model for genetic comparisons of different geographic populations and for resolving gene flow, species differentiation, and ecological adaptation mechanisms^4,6.

With increasing pressure on marine ecosystems globally, aquaculture is increasingly recognized as an important strategy for mitigating the depletion of wild fisheries resources⁷. P. homarus exhibits more rapid growth than other lobsters and it has been successfully farmed on a large scale in Vietnam and Indonesia, making it the optimal candidate for intensive large-scale lobster aquaculture^8,9. Currently, the artificial nursery technology for scalloped spiny lobster remains valuable, with larvae mainly captured from the wild and then farmed in cage or industrial recirculating aquaculture systems^9,10. Aquaculture research on P. homarus covers a range of topics, including resource assessment⁹ and the effects of nutrition^11,12, salinity¹³, light¹⁴, and temperature¹⁵ on growth and reproduction. In addition, high-throughput sequencing technologies have advanced the study of lobster species genomes.

Currently, only the genomes of Homarus americanus¹⁶ and P. ornatus¹⁷ have been reported successfully. Genome assembly data for 13 Panulirus species are available in the NCBI database (Table S1). However, the average genome size of the species is only 1.5 Gb, which is significantly smaller than the anticipated genome size for lobsters. For instance, the genome size of P. h. homarus is merely 1.3 Gb, with a mere scaffold N50 of 2.9 kb, rendering the assembly quality inadequate to meet the demands of further research (Table S1). A comprehensive understanding of P. homarus is essential for effective management of its resources and the development of sustainable aquaculture practices.

In the present study, the authors adopted a comprehensive multi-platform sequencing approach, combining Illumina short-read sequencing, PacBio long-read sequencing, and Hi-C chromosome conformation capture technologies to generate a chromosome-level genome assembly for P. h. homarus (Fig. 2). The project generated 140.56 Gb of Illumina short-read data, 341.51 Gb of PacBio long-read data, and 364.32 Gb of Hi-C data, culminating in a final assembled genome with a size of 2.61 Gb, a contig N50 of 5.43 Mb, and a scaffold N50 of 36.69 Mb (Tables 1, 2). The chromosome-level assembly enhances the genomic resources available for lobsters substantially and provides a crucial reference genome.

Table 1 Statistics of the sequencing data.

Full size table

Table 2 Assembly statistics of the P. h. homarus.

Full size table

Methods

Sample collection

An adult male P. h. homarus specimen was sourced from Hainan Yonghe Biotechnology Co., Ltd. (Qionghai, Hainan, China). Muscle tissue was collected after the lobster was anesthetized using cryogenic methods. The surface of the tissue was washed thoroughly several times with sterile phosphate buffered saline to effectively remove bacteria and impurities. The extraction of genomic DNA (gDNA) from muscle tissue for genome survey and library construction was carried out using the AMPure bead cleanup kit (Beckman Coulter, High Wycombe, UK) in strict accordance with the manufacturer’s instructions.

Total RNA was isolated from eye stalk, hemocyte, liver, muscle, intestine, and gills of the same specimen using TRIzol reagent, according to the manufacturer’s protocol. The integrity and quality of the RNA were evaluated by 1.5% agarose gel electrophoresis, whereas the concentrations were quantified precisely using a Qubit fluorometer (Thermo Fisher Scientific, Waltham, MA, USA).

Genome sequencing

A short-read library with an insert size of 350 bp was constructed and sequenced on the Illumina Novaseq-6000 (Illumina Inc., San Diego, CA, USA) platform, generating 2 × 150 bp paired-end reads. In total, 0.08 μg gDNA per sample was used as input material for the DNA library preparations. Library preparation was performed using the NEBNext® Ultra™ DNA Library Prep Kit (New England Biolabs, Ipswich, MA, USA), in strict accordance with the Illumina second-generation sequencing protocol, resulting in 140.56 Gb of raw data (Table 1).

For PacBio sequencing, gDNA was employed to construct SMRTbell libraries and sequenced on the PacBio Sequel (PacBio, Menlo Park, CA, USA) platform, leveraging single molecule real-time (SMRT) technology. In brief, the genomic DNA was first sheared into 6–20-kb fragments using g-TUBE. Subsequently, ExoVII (New England Biolabs, Beverly, MA, USA) was used to remove single-strand overhangs, followed by DNA damage repair with the SMRTbell Express Template Preparation Kit 2.0 (PacBio). T4 DNA polymerase and T4 PNK (New England Biolabs) were used to repair the ends, making them suitable for ligating SMRTbell hairpin adapters. After ligation, EXOIII (New England Biolabs) and ExoVII (New England Biolabs) enzymes were used to remove imperfect templates, and AMPure PB beads were used for purification. Subsequently, sequencing primers were annealed to the SMRTbell templates, and polymerase was bound to the template ends using the Binding Kit (PacBio). Finally, the library was loaded onto SMRT Cells for sequencing. A total of 341.51 Gb of continuous long reads was obtained, resulting in an extensive 131-fold coverage of the P. h. homarus genome (Table 1).

For Hi-C sequencing, high molecular weight gDNA was first cross-linked and then digested using the MboI restriction enzyme (New England Biolabs). The DNA was mechanically sheared into 300–500-bp fragments following 5′ overhang biotinylation and blunt-end ligation. Finally, the Hi-C library was sequenced on the Illumina NovaSeq 6000 platform (lllumina Inc., San Diego, CA, USA) in a 2 × 150-bp paired-end strategy, yielding 364.32 Gb raw reads, with a sequencing depth of 140× (Table 1).

For RNA sequencing, libraries of six tissues were prepared using the NEBNext® UltraTM RNA Library Prep Kit (New England Biolabs) for Illumina®, with all procedures following the manufacturer’s protocol rigorously. The RNA-seq libraries were subsequently sequenced on the Illumina NovaSeq 6000 platform in a 2 × 150-bp paired-end strategy and producing 55.7 Gb of clean reads.

Genome survey and assembly strategy

Prior to genome assembly, adapter sequences and low-quality reads from short-read sequencing data were filtered using Fastp software (v0.23.1)¹⁸ with default parameters, ensuring that only high-quality clean reads were retained for downstream processes. A comprehensive genome survey was performed to ascertain essential genomic characteristics, including overall size, heterozygosity, and repeat content. This was achieved by analyzing 17 distinct K-mer frequencies using SOAPec (v2.01)¹⁹ and GenomeScope (v2.0)²⁰. Based on the analyses, the genome size of P. h. homarus was estimated to be 3,127.74 Mb, with 1.04% heterozygosity and 66.75% repetitive sequences at the dominant peak depth of 26 (Table S2, Fig. S1).

A dual-strategy utilizing two independent assembly software—Wtdbg2 (v2.5)²¹ and Flye (v2.9)²² with default parameters was employed for P. h. homarus genome assembly. The assembled genome drafts were subsequently improved using the Arrow (v8.0) polishing process²³. The assembly results generated by Wtdbg2 and Flye after initial polish were merged using Quickmerge (v0.3)²⁴. The merged assembly underwent further refinement through a comprehensive polishing process, involving two rounds of Arrow polishing followed by two rounds of Pilon polishing (v1.22)²⁵, both utilizing default parameters. PacBio subreads were employed for Arrow polishing, whereas Illumina short reads were used for Pilon polishing to ensure high sequence accuracy. The rigorous and iterative assembly process culminated in the generation of 7,135 contigs, and a total assembly length of 2,720,451,987 bp (Table 2), representing a high-quality and robust genome assembly for P. h. homarus.

Chromosome-level assembly refinement

During the Hi-C scaffolding phase, the Juicer pipeline²⁶ was used to align Fastp (v0.23.1)¹⁸ processed high-quality reads with draft genome assembly. The alignment was followed by the assembly of contigs into chromosomes, as well as to orient and sort contigs within each chromosome using the 3D-DNA pipeline²⁷. Further refinement of the assembly was achieved through manual error correction using Juicebox Assembly Tools (v2.13.06)²⁶. The rigorous scaffolding process successfully anchored 2,613.14 Mb of the genome to 73 chromosomes (Fig. 3), representing 96.05% of the total genome assembly (Table S3). The final assembly achieved a scaffold N50 of 36.69 Mb, reflecting a high level of continuity (Table 2). Remarkably, this assembly demonstrates exceptional contiguity, with 37 chromosomes containing no more than 30 gaps each (Table 3).

Table 3 Assembly statistics for chromosomes.

Full size table

Comprehensive annotation of repetitive elements and noncoding RNAs

Repetitive sequences in the P. h. homarus genome were predicted using two strategies, including de novo assembly and homology matching²⁸. Initially, the de novo-predicted repetitive sequence database was integrated with the Repbase homologous repetitive sequence database²⁹. A suite of tools—including RepeatScout (v1.0.5)²⁸, RepeatModeler (v2.0.1)³⁰, Piler (v1.0)³¹, and LTR-FINDER (v1.0.6)³²—was employed to identify transposable element (TE) families. Subsequently, classification of distinct repetitive elements was performed using RepeatMasker (v4.1.0)³⁰, RepeatProteinMask (v4.1.0), and TRF (v4.0.9)³³, aligning the P. h. homarus genome sequences against the integrated repetitive sequence database. After removing redundant entries from the various methods, repetitive sequences were observed to constitute 69.67% of the P. h. homarus genome (Table S4). Additionally, the Kimura divergence values of TEs were calculated using the ‘calcDivergenceFromalign.pl’ script³⁴, and TE landscapes were visualized with ‘createRepeatLandscape.pl’³⁵. The repeat elements identified included DNA transposons, which comprised 9.82% of the genome, long interspersed nuclear elements accounting for 37.44%, short interspersed nuclear elements representing 0.02%, and long terminal repeats making up 30.06% (Table 4 and Fig. 4).

Table 4 Classification of repetitive sequences in the P. h. homarus genome.

Full size table

Tools specialized for non-coding RNAs (ncRNAs) were used to annotate ncRNA in P. h. homarus genome. tRNA were identified using tRNAScan (v1.4)³⁶, whereas rRNA were predicated via BLAST (v2.2.26)³⁷. Additional ncRNAs, including miRNAs and snRNAs, were annotated by aligning sequences with the Rfam database³⁸ using the INFERNAL tool (v1.0)³⁹. Ultimately, four distinct classes of ncRNAs were annotated successfully, comprising 20,765 miRNAs, 3,608 tRNAs, 1,421 rRNAs, and 3,066 snRNAs (Table 5), providing valuable insight into the non-coding RNA landscape of P. h. homarus.

Table 5 Classification of non-coding RNAs in the P. h. homarus genome.

Full size table

Integrated gene structure prediction and functional annotation

A comprehensive, integrated approach combining ab initio prediction, homology prediction, and transcriptome sequencing-based prediction was used to predict the gene structure of the P. h. homarus genome. For de novo gene predication, a robust suite of tools—AUGUSTUS (v3.2.3)⁴⁰, GlimmerHMM (v3.02)⁴¹, SNAP (v2013.11.29)⁴², Geneid (v1.4)⁴³, and Genscan (v1.0)⁴⁴—was employed to predict gene structures directly from the genome sequence. For homology-based annotation, protein sequences from Drosophila melanogaster (fruit fly), Penaeus chinensis (Chinese shrimp), Eriocheir sinensis (Chinese mitten crab), Litopenaeus vannamei (Pacific white shrimp), Marsupenaeus japonicus (Kuruma shrimp), P. ornatus (ornate spiny lobster), Portunus trituberculatus (swimming crab), and H. americanus (American lobster) were retrieved from NCBI’s GenBank database and aligned to the P. h. homarus genome using BLAST (v2.2.26)³⁷ and Genewise (v2.4.1)⁴⁵. The integrated multifaceted strategy enabled comprehensive and accurate prediction of protein-coding genes, significantly advancing our understanding of the genetic architecture of P. h. homarus. A total of 8,545–178,660 homologous genes were identified for D. melanogaster, P. chinensis, E. sinensis, L. vannamei, M. japonicus, P. ornatus, P. trituberculatus, and H. americanus (Table 6). Gene length, along with CDS, exon, and intron lengths, was analyzed and compared to those of other species (Fig. 5). The mean transcript length in P. h. homarus was 31,472.77 bp, with CDS, exon, and intron lengths averaging 1,613.73 bp, 279.37 bp, and 6,251.44 bp, respectively (Table S5).

Table 6 Statistical analyses of gene structure annotation of the P. h. homarus genome.

Full size table

To further refine transcriptome assembly, two distinct methods were employed: genome-guided transcript assembly and de novo assembly using Trinity software (v2.11.0)⁴⁶. Gene structures were identified using PASA (v2.1.0)⁴⁷, and gene sets predicted through various methods were integrated into a non-redundant comprehensive gene set of 25,580 protein-coding genes using Evidence Modeler (v1.1.1)⁴⁸ (Table 7 and Fig. 6a).

Table 7 Summary of functional gene annotation in the P. h. homarus genome.

Full size table

Functional annotation of these protein-coding genes was performed using BLASTp (v2.2.26)³⁷ and Diamond (v0.8.22)⁴⁹ to align sequences against several key protein databases, including SwissProt⁵⁰, NCBI Nonredundant protein (NR), KEGG⁵¹, InterPro⁵², Gene Ontology (GO)⁵³, and Pfam⁵⁴, with an E-value cutoff of 1E-5. Protein domains and motifs were annotated using InterProScan (v5.52–86.0)⁵⁵. Among the 25,580 predicted genes, 25,575 (99.98%) were annotated to at least one database (Table 7) and 16,526 proteins (64.61%) received annotation support from across all four databases (Fig. 6b).

Data Records

We have deposited the Hi-C sequencing data (SRR30872734), Illumina sequencing data (SRR30872735), PacBio sequencing data (SRR30872736), and transcriptomic sequencing data (SRR3105790260 - SRR3105790765) in the SRA at NCBI⁵⁶.

The genome-wide shotgun project has been deposited in DDBJ/ENA/GenBank under accession number GCA_043589495.1⁵⁷, and the genome assembly along with its annotation information has been made available on Figshare⁵⁸.

Technical Validation

The quality of the P. h. homarus genome assembly was technically verified rigorously through a multifaceted evaluation. First, the genomic quality was analyzed using Benchmarking Universal Single-Copy Orthologs (BUSCO) (v5.8.0)⁵⁹, with the arthropoda_odb12 BUSCO database, to assess the presence of single-copy orthologous genes. Using tools such as tBLASTn (v2.2.26)³⁷, AUGUSTUS (v3.2.3)⁴⁰, and HMMER⁶⁰, 98.2% of gene orthologs were detected, of which 97.2% were complete and 1.0% fragmented, indicating a highly comprehensive assembly (Table S6). Second, using the Core Eukaryotic Genes Mapping Approach (CEGMA) (v2.5)⁶¹, we identified homologs for 226 highly conserved core genes in P. h. homarus, representing 92.34% (229) of the total, further supporting the completeness of the assembly (Table S7). Third, the consensus quality value and k-mer (k = 21) completeness of the assembly evaluated using Merqury software⁶² were 31.78 and 87.59%, respectively (Table S8). In addition, alignment of Illumina sequencing reads to the nuclear genome using BWA (v0.7.8)⁶³ yielded a high read mapping rate of 98.60% and a coverage rate of 94.85%, underscoring the robust integrity of the assembled genome and the consistency of the sequencing data (Table S9). Finally, to conduct genome-wide homology analysis, we used MCScanX within the JCVI toolkit (v1.1.12) (https://github.com/tanghaibao/jcvi) to perform a synteny comparison between the genomes of the P. h. homarus and P. ornatus, and visualized the macro-syntenic relationships using Circos (v0.69)⁶⁴. The results showed that 73 chromosome-level scaffolds of P. h. homarus exhibited significant synteny with the corresponding chromosomes of P. ornatus (Fig. S2). These combined results affirm the exceptional quality and completeness of the P. h. homarus genome assembly.

Code availability

In the present study, no custom code was developed. All commands and pipelines used for data processing are detailed comprehensively in the methods section. For software where specific parameters are not explicitly mentioned, we adhered to the default settings as recommended by the software developers. The core code is available at https://github.com/sundongfang/Genome-Assembly-of-Panulirus-homarus.

References

George, R. W. Tethys sea fragmentation and speciation of Panulirus spiny lobsters. Crustaceana 78, 1281–1309 (2005).
Article Google Scholar
Pancoro, A., Indriatmoko, Wijaya, D. & Karima, E. Sea surface temperature and current-related parameters affecting local adaptation of scalloped spiny lobster population in Indonesia’s archipelagic system. Sci. Rep. 14, 18403 (2024).
Article CAS PubMed PubMed Central Google Scholar
Sohana, P., Groeneveld, J. C., Al-Marzouqi, A. & Willows-Munro, S. A molecular phylogeny of the spiny lobster Panulirus homarus highlights a separately evolving lineage from the Southwest Indian Ocean. PeerJ 5, e3356 (2017).
Article Google Scholar
Lavery, S. D. et al. Evolutionary divergence of geographic subspecies within the scalloped spiny lobster Panulirus homarus (Linnaeus 1758). PloS ONE 9, e97247 (2014).
Article ADS PubMed PubMed Central Google Scholar
Phillips, B. F., Wahle, R. A., & Ward, T. J. Lobsters: Biology, Management, Aquaculture and Fisheries. pp. 1–35 (2013).
Farhadi, A., Jeffs, A. G. & Lavery, S. D. Genome-wide SNPs in the spiny lobster Panulirus homarus reveal a hybrid origin for its subspecies. BMC Genomics 23, 750 (2022).
Article CAS PubMed PubMed Central Google Scholar
Jones, C. M., Le Anh, T. & Priyambodo, B. Lobsters: Biology, Fisheries and Aquaculture. pp. 541–570 (2019).
Hung, L. V., Tuan, L. A. & Williams, K. C. Lobster seacage culture in Vietnam. Australian Centre for International Agricultural Research (ACIAR), 10–17 (2009).
Priyambodo, B., Jones, C. M. & Sammut, J. Assessment of the lobster puerulus (Panulirus homarus and Panulirus ornatus, Decapoda: Palinuridae) resource of Indonesia and its potential for sustainable harvest for aquaculture. Aquaculture 528, 735563 (2020).
Article Google Scholar
Huu, H. D. & Jones, C. M. Effects of dietary mannan oligosaccharide supplementation on juvenile spiny lobster Panulirus homarus (Palinuridae). Aquaculture 432, 258–264 (2014).
Article Google Scholar
Kulmiye, A. J. Reproductive seasonality and fecundity of the spiny lobster Panulirus homarus homarus (Linnaeus, 1758) (Decapoda: Achelata: Palinuridae) in Kenya: implications for sustainable management. J. Crust. Biol. 42, ruac028 (2022).
Article Google Scholar
Spencer, E. L., Fitzgibbon, Q. P., Day, R. D., Trotter, A. J. & Smith, G. G. Effects of acute salinity stress on the survival and haemolymph biochemistry of juvenile tropical rock lobster, Panulirus ornatus, at different moult stages. Aquaculture 573, 739597 (2023).
Article CAS Google Scholar
Chen, H. et al. Growth, health status and gut microbiota of the scalloped spiny lobster (Panulirus homarus) at different salinities. Aquaculture 562, 738779 (2023).
Article CAS Google Scholar
Ju, Z. M. et al. Cloning, expression analysis, and the photoperiodic response of CHH gene in Panulirus homarus. Aquacult. Rep. 40, 102552 (2025).
Google Scholar
Kemp, J. O. G. & Britz, P. J. The effect of temperature on the growth, survival and food consumption of the east coast rock lobster Panulirus homarus rubellus. Aquaculture 280, 227–231 (2008).
Article Google Scholar
Polinski, J. M. et al. The American lobster genome reveals insights on longevity, neural, and immune adaptations. Sci, Adv 7, eabe8290 (2021).
Article ADS CAS PubMed Google Scholar
Ren, X. et al. Chromosome-level genome of the long-tailed marine-living ornate spiny lobster, Panulirus ornatus. Sci. Data 11, 662 (2024).
Article CAS PubMed PubMed Central Google Scholar
Chen, S., Zhou, Y., Chen, Y. & Gu, J. fastp: an ultra-fast all-in-one FASTQ preprocessor. Bioinformatics 34, i884–i890 (2018).
Article PubMed PubMed Central Google Scholar
Li, R. et al. De novo assembly of human genomes with massively parallel short read sequencing. Genome Res. 20, 265–272 (2010).
Article CAS PubMed PubMed Central Google Scholar
Ranallo-Benavidez, T. R., Jaron, K. S. & Schatz, M. C. GenomeScope 2.0 and Smudgeplot for reference-free profiling of polyploid genomes. Nat. Commun. 11, 1432 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Ruan, J. & Li, H. Fast and accurate long-read assembly with wtdbg2. Nat. Methods 17, 155–158 (2020).
Article CAS PubMed Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Wenger, A. et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37, 1155–1162 (2019).
Article CAS PubMed PubMed Central Google Scholar
Chakraborty, M., Baldwin-Brown, J. G., Long, A. D. & Emerson, J. J. Contiguous and accurate de novo assembly of metazoan genomes with modest long read coverage. Nucleic Acids Res. 44, e147 (2016).
PubMed PubMed Central Google Scholar
Walker, B. J. et al. Pilon: an integrated tool for comprehensive microbial variant detection and genome assembly improvement. PLoS One 9, e112963 (2014).
Article ADS PubMed PubMed Central Google Scholar
Durand, N. C. et al. Juicer provides a one-click system for analyzing loop-resolution Hi-C experiments. Cell Syst. 3, 95–98 (2016).
Article CAS PubMed PubMed Central Google Scholar
Dudchenko, O. et al. De novo assembly of the Aedes aegypti genome using Hi-C yields chromosome-length scaffolds. Science 356, 92–95 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Price, A. L., Jones, N. C. & Pevzner, P. A. De novo identification of repeat families in large genomes. Bioinformatics 21(Suppl 1), i351–358 (2005).
Article CAS PubMed Google Scholar
Jurka, J. et al. Repbase Update, a database of eukaryotic repetitive elements. Cytogenet. Genome Res. 110, 462–467 (2005).
Article CAS PubMed Google Scholar
Tarailo-Graovac, M. & Chen, N. Using RepeatMasker to identify repetitive elements in genomic sequences. Curr. Protoc. Bioinformatics Chapter 4(1), 4.10.1–4.10.14 (2009).
PubMed Google Scholar
Edgar, R. C. & Myers, E. W. PILER: identification and classification of genomic repeats. Bioinformatics 21(Suppl 1), i152–158 (2005).
Article CAS PubMed Google Scholar
Xu, Z. & Wang, H. LTR_FINDER: an efficient tool for the prediction of full-length LTR retrotransposons. Nucleic Acids Res. 35, W265–W268 (2007).
Article PubMed PubMed Central Google Scholar
Benson, G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 27, 573–580 (1999).
Article CAS PubMed PubMed Central Google Scholar
Hubley, R. GitHub repository https://github.com/rmhubley/RepeatMasker/blob/master/util/createRepeatLandscape.pl (2023).
Rosen, J. GitHub repository https://github.com/rmhubley/RepeatMasker/blob/master/util/calcDivergenceFromAlign.pl (2020).
Chan, P. P., Lin, B. Y., Mak, A. J. & Lowe, T. M. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 49, 9077–9096 (2021).
Article CAS PubMed PubMed Central Google Scholar
Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. Basic local alignment search tool. J. Mol. Biol. 215, 403–410 (1990).
Article CAS PubMed Google Scholar
Griffiths-Jones, S. et al. Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 33, D121–D124 (2005).
Article CAS PubMed Google Scholar
Nawrocki, E. P., Kolbe, D. L. & Eddy, S. R. Infernal 1.0: inference of RNA alignments. Bioinformatics 25, 1335–1337 (2009).
Article CAS PubMed PubMed Central Google Scholar
Stanke, M. et al. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res 34, W435–W439 (2006).
Article CAS PubMed PubMed Central Google Scholar
Majoros, W. H., Pertea, M. & Salzberg, S. L. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics 20, 2878–2879 (2004).
Article CAS PubMed Google Scholar
Korf, I. Gene finding in novel genomes. BMC Bioinformatics 5, 59 (2004).
Article PubMed PubMed Central Google Scholar
Alioto, T., Blanco, E., Parra, G. & Guigo, R. Using geneid to identify genes. Curr. Protoc. Bioinformatics 64, e56 (2018).
Article PubMed Google Scholar
Burge, C. & Karlin, S. Prediction of complete gene structures in human genomic DNA. J. Mol. Biol. 268, 78–94 (1997).
Article CAS PubMed Google Scholar
Birney, E., Clamp, M. & Durbin, R. GeneWise and Genomewise. Genome Res. 14, 988–995 (2004).
Article CAS PubMed PubMed Central Google Scholar
Grabherr, M. G. et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat. Biotechnol. 29, 644–652 (2011).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Improving the arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Res. 31, 5654–5666 (2003).
Article CAS PubMed PubMed Central Google Scholar
Haas, B. J. et al. Automated eukaryotic gene structure annotation using EVidenceModeler and the Program to Assemble Spliced Alignments. Genome. Biol. 9, R7 (2008).
Article PubMed PubMed Central Google Scholar
Buchfink, B., Xie, C. & Huson, D. H. Fast and sensitive protein alignment using DIAMOND. Nat. Methods 12, 59–60 (2015).
Article CAS PubMed Google Scholar
Bairoch, A. & Apweiler, R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 28, 45–48 (2000).
Article CAS PubMed PubMed Central Google Scholar
Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. & Tanabe, M. KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44, D457–D462 (2016).
Article CAS PubMed Google Scholar
Finn, R. D. et al. InterPro in 2017-beyond protein family and domain annotations. Nucleic Acids Res. 45, D190–D199 (2017).
Article CAS PubMed Google Scholar
Ashburner, M. et al. Gene ontology: tool for the unification of biology. Nat. Genet. 25, 25–29 (2000).
Article CAS PubMed PubMed Central Google Scholar
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2014).
Article Google Scholar
Mulder, N. & Apweiler, R. InterPro and InterProScan: tools for protein sequence classification and comparison. Methods Mol. Biol. 396, 59–70, https://doi.org/10.1007/978-1-59745-515-2_5 (2007).
Article CAS PubMed Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP536394 (2024).
NCBI GenBank https://identifiers.org/ncbi/insdc.gca:GCA_043589495.1.
Sun, D. F. et al. Chromosome-level genome assembly of the scalloped spiny lobster Panulirus homarus. figshare. https://doi.org/10.6084/m9.figshare.28465208.v1 (2025).
Simao, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article CAS PubMed Google Scholar
Potter, S. et al. HMMER web server: 2018 update. Nucleic Acids Res. 46, W200–W204 (2018).
Article CAS PubMed PubMed Central Google Scholar
Parra, G., Bradnam, K. & Korf, I. CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes. Bioinformatics 23, 1061–1067 (2007).
Article CAS PubMed Google Scholar
Rhie, A. et al. Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies. Genome Biol. 21, 245 (2020).
Article CAS PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Martin, K. et al. Circos: an information aesthetic for comparative genomics. Genome Res. 19, 1639–1645 (2009).
Article Google Scholar

Download references

Acknowledgements

This work was supported by grants from the Key R&D Program of Shandong Province, China [grant number 2024CXPT071-3]; the Project First National Survey of Aquaculture Germplasm Resources in the Yellow and Bohai Seas [grant number17210247]; the China Agriculture Research System of MOF and MARA [grant number CARS-48) and the Central Public-interest Scientific Institution Basal Research Fund, CAFS [grant number 2023TD50].

Statement of ethics

Animal experiments were conducted in accordance with relevant national and international guidelines and were approved by the Yellow Sea Fisheries Research Institute, Shandong, China. No specific permits were required for the collection of wild lobster from seawater in China. This study did not involve endangered or protected species.

Author information

Authors and Affiliations

National Key Laboratory of Mariculture Biobreeding and Sustainable Goods, Yellow Sea Fisheries Research Institute, Chinese Academy of Fishery Sciences, Qingdao, Shandong, 266071, China
Dongfang Sun, Jianjian Lv, Baoquan Gao, Shaoting Jia, Ping Liu, Jian Li, Jitao Li & Xianyun Ren
Laboratory for Marine Fisheries Science and Food Production Processes, Laoshan Laboratory, Qingdao, Shandong, 266237, China
Jianjian Lv, Baoquan Gao, Ping Liu, Jian Li, Jitao Li & Xianyun Ren

Authors

Dongfang Sun
View author publications
Search author on:PubMed Google Scholar
Jianjian Lv
View author publications
Search author on:PubMed Google Scholar
Baoquan Gao
View author publications
Search author on:PubMed Google Scholar
Shaoting Jia
View author publications
Search author on:PubMed Google Scholar
Ping Liu
View author publications
Search author on:PubMed Google Scholar
Jian Li
View author publications
Search author on:PubMed Google Scholar
Jitao Li
View author publications
Search author on:PubMed Google Scholar
Xianyun Ren
View author publications
Search author on:PubMed Google Scholar

Contributions

X.Y.R. and J.T.L. conceived the study and supervised the project. D.F.S. collected the sample and wrote the manuscript. J.J.L. performed the data analysis and data uploading. B.Q.G. and J.L. supervised this work and assisted in data analysis. S.T.J., P.L. collected the samples. All authors contributed to the final manuscript editing.

Corresponding authors

Correspondence to Jitao Li or Xianyun Ren.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary figure and table Legends

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sun, D., Lv, J., Gao, B. et al. Chromosome-level genome assembly of scalloped spiny lobster Panulirus homarus homarus. Sci Data 12, 900 (2025). https://doi.org/10.1038/s41597-025-05253-9

Download citation

Received: 07 January 2025
Accepted: 21 May 2025
Published: 28 May 2025
Version of record: 28 May 2025
DOI: https://doi.org/10.1038/s41597-025-05253-9