Background & Summary

The genus Meretrix is a commercially significant marine bivalve widely distributed across the warm coastal waters of East and Southeast Asia1. It is particularly abundant along the southern Taiwan coastline, where it has become one of the most economically valuable species in aquaculture2. Meretrix thrives in water temperatures ranging from 25 °C to 33 °C, with significant growth slowing below 20 °C and mass mortality occurring when temperatures exceed 45 °C. Additionally, it prefers salinities between 16 and 35 ppt, with extreme fluctuations in salinity adversely affecting its survival and development3. Due to this environmental sensitivity, Meretrix aquaculture has recently suffered from slowed growth and mass mortality linked to climate change, directly contributing to the dramatic decline in production observed in Taiwan. Historically, a single hectare of culture area could yield up to 18 metric tons, but current yields have plummeted to as low as 0.6 metric tons4. Beyond environmental degradation and climate change, other contributing factors to this decline include disease outbreaks, improper aquaculture management, and genetic deterioration due to inbreeding5.

Despite the economic and ecological significance of Meretrix, genomic resources for this genus remain scarce. To date, the genome of only M. petechialis has been published6, and the morphological similarities among various Meretrix species present challenges for accurate classification and genetic studies. A high-quality reference genome is essential for understanding the genetic basis of adaptive evolution, population dynamics, and potential genetic vulnerabilities within Meretrix species. Moreover, genomic data could shed light on mechanisms underlying disease resistance, stress tolerance, and reproductive strategies, all of which are critical for the sustainable management and conservation of these species. Meretrix species are commonly found in the coastal and estuarine areas of Taiwan. However, these two habitats exhibit distinct environmental conditions. Coastal waters typically maintain higher salinity levels, ranging from 32 to 35 psu, whereas estuarine areas experience greater salinity fluctuations, potentially varying from 0.5 to 35 psu. Therefore, in this study, we collected Meretrix samples from these two contrasting environments. Meretrix sp. MF1 was specifically collected from the open coastal waters (Anping, Tainan), while Meretrix sp. MT1 was exclusively obtained from the estuarine environment (Cigu, Tainan). Our Previously study has showed that the Meretrix lamarckii clade is divided into two main distinct groups: one containing sample collect from Japan, and the other containing samples from Taiwan, suggesting that M. lamarckii from Taiwan and M. lamarckii from Japan are distinct species7. Therefore, we selected Meretrix sp. MF1, a potential novel species, for high-quality chromosome-level genome assembly. As there is no reference genome for M. lamarckii currently, M. lamarckii JML1 from Japan was also selected for genome assembly. On the other hand, Meretrix sp. MT1, MT2, and MT3, collected from the coastal waters of Taiwan, formed a distinct clade and were most closely related to M. lusoria from China. Meretrix sp. MT1 was selected for genome assembly.

In this study, we present chromosome-level genome assemblies of one Meretrix species, using a combination of Illumina short-read sequencing, Nanopore long-read sequencing, and Hi-C chromatin conformation capture technologies. For Meretrix sp. MF1, we generated a total of 51.1 Gb of Illumina data, 80.02 Gb of Nanopore data, and 46.48 Gb of Hi-C data. The final assembly yielded 19 chromosomes with a total length of approximately 883.3 Mb and a scaffold N50 of 46.87 Mb. Based on this high-quality reference genome, we successfully assembled the genomes of two additional Meretrix species, Meretrix sp. MT1 and M. lamarckii JML1. For Meretrix sp. MT1, we obtained 56.6 Gb of Illumina data and 66.79 Gb of Nanopore data, resulting in the assembly of 19 chromosomes with a total length of 944.74 Mb. Similarly, for M. lamarckii JML1, we obtained 42.6 Gb of Illumina data and 88.91 Gb of Nanopore data, resulting in the assembly of 19 chromosomes with a total length of 883.07 Mb. Meretrix sp. MF1 was historically regarded as conspecific with Meretrix lamarckii due to their indistinguishable external morphology. However, our preliminary studies based on mtDNA COI revealed distinct genetic differences between the two. To further explore the genetic relationships among these species, we conducted comparative genomic analyses and average nucleotide identity (ANI) calculations. In this study, our results further demonstrate that Meretrix sp. MF1 and M. lamarckii JML1 exhibit genomic divergence with an ANI of 94.33%. Additionally, estimated divergence times among Meretrix species inferred from metazoan orthologous genes indicated further divergence. These lines of evidence consistently support the conclusion that Meretrix sp. MF1 is a cryptic species within the genus Meretrix and should not be considered conspecific with M. lamarckii. Based on these findings, we consider Meretrix sp. MF1 to be a novel species, distinct from M. lamarckii. However, its formal taxonomic status remains pending further morphological and taxonomic investigation.

The high-quality reference genome presented in this study provides a valuable foundation for future research on Meretrix population genomics, adaptive evolution, and genetic diversity. It will also facilitate further studies on gene function, aquaculture enhancement, and sustainable aquaculture practices. Additionally, our findings highlight the importance of genomic resources in identifying cryptic species, understanding evolutionary processes, and supporting sustainable aquaculture efforts. The availability of this genomic data will empower researchers and aquaculture practitioners to develop targeted breeding programs and genetic management strategies, ultimately enhancing the resilience and productivity of Meretrix populations in the face of environmental challenges.

Methods

Sampling and nucleic acid extraction

Samples of Meretrix sp. MF1 were collected from the coastal waters of southern Taiwan (Anping, Tainan), while Meretrix sp. MT1 was obtained from the estuarine region of southern Taiwan (Cigu, Tainan). M. lamarckii JML1 was commercially purchased from GOURMET HUNTER CO., LTD., a Taiwan-based international trading company specializing in aquatic products, originating from an aquaculture farm in Chiba, Japan. Genomic DNA was extracted from 25 mg of muscle tissue using the Nanobind® PanDNA Kit (PacBio, USA) following the ‘Extracting DNA from animal tissue using the Nanobind® PanDNA kit’ protocol. The extracted DNA was stored at −80 °C to preserve its integrity. DNA quality was assessed using 1.0% agarose gel electrophoresis, fluorescence quantification with the Qubit™ 4 Fluorometer (Thermo Fisher Scientific, USA) with Qubit™ dsDNA BR Assay Kits (Thermo Fisher Scientific, USA), as well as spectrophotometric analysis using the NanoDrop™ One Microvolume UV-Vis Spectrophotometer (Thermo Fisher Scientific, USA).

Phylogenetic analysis of Meretrix species

There are 33 Cytochrome c oxidase subunit I (COXI) sequences from Meretrix species were selected for phylogenetic analysis, 27 sequences from NCBI database (M. lamarckii, M. lusoria, M. lyrate, M. meretrix, and M. petechialis) and six from this study (M. lamarckii JML1, JML2 and Meretrix sp. MF1, MT1, MT2, MT3). A neighbor-joining tree was constructed using MEGA version 11.0.138, with 1000 bootstrap replicates and the Tamura-Nei model.

Library preparation and sequencing

Genomic DNA was purified using AMPure XP Reagent (Beckman Coulter, USA) following the manufacturer’s protocol, and each purified sample was quantified using the Qubit™ 4 Fluorometer with Qubit™ dsDNA BR Assay Kits. Nanopore sequencing libraries were prepared using SQK-LSK110 Ligation Sequencing Kit (Oxford Nanopore Technologies, UK) according to the manufacturer’s protocol. A 150 µL aliquot of the library was loaded onto FLO-PRO002 (R9.4.1) flow cells (Oxford Nanopore Technologies, UK) for the PromethION 2 Solo (Oxford Nanopore Technologies, UK), and sequenced for approximately 120 hrs. The reads were then basecalled using Dorado version 0.7.0 (https://github.com/nanoporetech/dorado) with the super-accurate (SUP) model, yielding 80.02 Gb of data with 6.76 M high-quality reads for Meretrix sp. MF1 (Table 1). Additionally, the data for Meretrix sp. MT1 and M. lamarckii JML1 are summarized in Table 1. Illumina sequencing libraries were constructed using the TruSeq® Nano DNA Library Prep Kit (Illumina, USA) following the manufacturer’s guidelines. Genomic DNA was fragmented to approximately 350 bp via sonication, purified with Sample Purification Beads (Illumina, USA), and sequenced on the NovaSeq X Plus System (Illumina, USA), producing 150 bp paired-end reads. The raw Illumina reads, averaging 50.1 Gb per sample, were processed using fastp version 0.23.49 for quality control (Table 1). For chromosome-level assembly, the Hi-C library was constructed using the Dovetail® Omni-C® Kit (Cantata Bio, USA) following the manufacturer’s protocol. The library quality was assessed using a Qsep 100 Bio-Fragment Analyzer (BiOptic, Taiwan) with an S2 Standard Cartridge Kit (BiOptic, Taiwan) and a Qubit™ 4 Fluorometer with Qubit™ dsDNA HS Assay Kits (Thermo Fisher Scientific, USA). The library was then sequenced on the Novaseq X Plus System, generating 150 bp paired-end reads and yielding 46.48 Gb of data, with 309.86 M reads (Table 1).

Table 1 Statistics for the sequencing data of the Meretrix genome.

Genome assembly and scaffolding

The general workflow of this study is illustrated in Fig. 1. Draft genome for Meretrix sp. MF1 and Meretrix sp. MT1 were generated using Nanopore data processed with Nextdenovo version 2.5.210. However, due to the shorter read lengths in M. lamarckii JML1 Nanopore data, its genome was assembled using Masurca version 4.1.211. The data were then processed with NanoFilt version 2.8.012 with Q12 for quality control. Next, both Nanopore and Illumina data were integrated and polished with Nextpolish version 1.4.113 followed by Purge_Dups version 1.2.614 to remove redundant sequences. Hi-C data was utilized to construct the chromosome-level genome assembly for Meretrix sp. MF1. Initially, fastp version 0.23.49 was employed for quality control, and Chromap version 0.2.715 was used for alignment and preprocessing. Scaffolding was carried out using YaHS version 1.2.216 to generate chromosome-level scaffolds. Subsequently, Juicer tools version 2.20.0017 was applied to construct the Hi-C contact matrix and contact map. The resulting chromosome-level genome assembly for Meretrix sp. MF1 had a total length of 883.3 Mb, with a longest scaffold of 59.29 Mb, an N50 of 46.87 Mb, and an L90 of 17 (Table 2). The Hi-C map (Fig. 2A) revealed 19 chromosome-scale scaffolds, which collectively accounted for 99.54% of the total genome size. Chromosome sizes ranged from 28.62 Mb to 59.29 Mb, with an average length of 46.27 Mb (Table 3). The genome was further visualized using TBtools-II version 2.15618 (Fig. 2B). To refine and scaffold the genomes of Meretrix sp. MT1 and M. lamarckii JML1, RAGTAG version 2.1.019 was used, with M. petechialis (GCA_046203225.1) serving as the reference genome for Meretrix sp. MT1, and Meretrix sp. MF1 as the reference for M. lamarckii JML1. Redundant sequences were then filtered using Purge_Dups version 1.2.614, and Nextpolish version 1.4.113 was applied for a final round of genome refinement. The final assembly details for all three species are summarized in Table 3.

Fig. 1
figure 1

Schematic overview of the general workflow.

Table 2 The assembly statistics of Meretrix genome.
Fig. 2
figure 2

Characteristics of Meretrix sp. MF1 genome assembly. (A) Hi-C heatmap of chromosomal interactions in the Meretrix sp. MF1 genome. (B) A circos plot of the Meretrix sp. MF1 genome, with tracks from innermost to outermost as follows: (a) Numbers and sizes of Meretrix sp. MF1 chromosomes; (b) Scatter plot of N ratio; (c) Line plot of GC skew; (d) Heatmap of gene density; (e) Bar plot of GC ratio.

Table 3 The 19 chromosomes length (bp) of Meretrix genome.

Mitochondrial genome assembly

The mitochondrial genome was assembled using Illumina data with MitoZ version 3.620, which was further employed for mitochondrial annotation. To ensure accuracy, the assembled mitochondrial genome was compared against the nuclear genome using BLAST + version 2.16.021, and the verified mitochondrial sequence was incorporated into the final genome assembly. Notably, Meretrix sp. MF1 and Meretrix lamarckii JML1 exhibited the closest match to the same species, Meretrix lamarckii, albeit from distinct sources. Specifically, Meretrix sp. MF1 showed the highest similarity to Sequence ID: NC_016174.1 (GenBank), while Meretrix lamarckii JML1 showed the highest similarity to Sequence ID: KP244451.1. Furthermore, mitochondrial data revealed an additional tRNA-Leu in Meretrix sp. MF1 compared to Meretrix lamarckii JML1, potentially indicating distinct species status. In addition, Meretrix sp. MT1 was found to be most closely related to Meretrix lusoria (Sequence ID: NC_014809.1). A summary of all assembled mitochondrial data is provided in Table 4.

Table 4 Summary statistics of the Meretrix mitochondrial genome.

Repetitive sequence identification

RepeatModeler version 2.0.522 and RepeatMasker version 4.1.523 were used to analyze the Meretrix genome assemblies, enabling the de novo identification of transposable elements (TEs) and the classification of repetitive and low-complexity sequences (Table 5). The total proportion of repetitive elements in Meretrix sp. MF1, Meretrix sp. MT1, and M. lamarckii JML1 genomes were 41.57%, 41.75%, and 40.35%, respectively, with unclassified repeats accounting for 32.30%, 32.40%, and 31.36%. In terms of TE composition, Retroelements (Class I) were identified, constituting 6.99%, 6.55% and 6.64% of the genomes, respectively. The DNA transposons (Class II) were 1.98%, 1.59% and 1.77%, respectively. The consistent repeat content and distribution patterns across the three Meretrix species suggest a conserved genome organization and repetitive element dynamics within the genus.

Table 5 Repetitive Element Composition of the Meretrix Genome Assembly.

Gene prediction and functional annotation

Gene prediction was performed on a genome version that was soft-masked for repeats using RepeatMasker version 4.1.523. The prediction was carried out with BRAKER version 3.0.824, employing a protein evidence-based approach using Metazoa dataset from OrthoDB version 1225. Gene prediction for Meretrix sp. MF1 was performed using BRAKER, which initially predicted 45,263 genes and 49,050 transcripts. To address gene over-prediction, the selectSupportedSubsets.py script within the BRAKER package was utilized. This script classifies predicted genes into three confidence categories: fully supported by hints (highest confidence), partially supported by hints, and not supported by hints (lowest confidence, purely computational). Subsequently, the selectSupportedSubsets.py script was employed to filter transcripts based on hint support, resulting in a subset of 32,329 transcripts. Transposable elements (TEs) were then masked using TEsorter version 1.2.726, yielding a final set of 30,417 transcripts. Functional annotation was conducted using EggNOG-mapper version 2.1.1227 and InterProScan version 5.73–104.028,29, to identify protein homologs, which included six database resources: eggNOG, Gene Ontology (GO) terms, Kyoto Encyclopedia of Genes and Genomes (KEGG), InterPro, Protein ANalysis THrough Evolutionary Relationships (PANTHER), and Pfam. A total of 25,531 genes were successfully annotated with functional information from at least one of these databases. Comprehensive gene annotation statistics for the Meretrix genome are provided in Supplementary Table 1.

Genomic similarity comparison and evolutionary analysis

FastANI version 1.3430 was applied to calculate the ANI among the genomes of Meretrix sp. MF1, Meretrix sp. MT1, M. lamarckii JML1, and M. petechialis. The results revealed that the ANI between Meretrix sp. MF1 and M. lamarckii JML1 was 94.33% (other comparisons are provided in Supplementary Table 2), suggesting that Meretrix sp. MF1 might represent a potentially novel species in Taiwan. We propose the name M. formosana. To explore evolutionary relationships, BUSCO version 5.8.331 was used to extract conserved Metazoa homologous genes from 11 genomes of Veneridae, including Callista chione, Cyclina sinensis32, Mercenaria mercenaria33, M. lamarckii JML1, M. petechialis6, Meretrix sp. MF1, Meretrix sp. MT1, Mysia undata, Ruditapes philippinarum32, Saxidomus purpurata34, and Venus verrucosa (Supplementary Table 3). Multiple sequence alignment was performed using MUSCLE version 5.335, followed by trimming with trimAI version 1.5.036 to generate the supermatrix alignment file. A phylogenetic tree was constructed based on the concatenated alignments using IQ-TREE version 1.6.1237, incorporating divergence times estimates obtained from the TimeTree database38 (accessed on Feb. 10, 2025). The estimated divergence times included 194 million years between M. mercenaria and V. verrucosa, 171 million years between V. verrucosa and R. philippinarum. The final phylogenetic tree was visualized using MEGA version 11.0.138, with M. mercenaria as the outgroup (Fig. 3). Genome-wide collinearity analysis was performed among M. lamarckii JML1, Meretrix sp. MF1, Meretrix sp. MT1, and M. petechialis using MCscanX version 1.0.039, then visualized with ChiPlot website (https://www.chiplot.online) (Fig. 4).

Fig. 3
figure 3

Estimated divergence times among Meretrix species inferred from metazoan orthologous genes. Phylogenetic tree of 11 mollusk species, rooted with Mercenaria mercenaria as the outgroup. Bootstrap values are shown in red next to each node. Divergence time estimates from the TimeTree database are indicated by blue. Estimated divergence times between species pairs are listed next to each node. Mya: million years ago.

Fig. 4
figure 4

Whole genome synteny and collinearity among Meretrix species. This figure displays the genome-wide collinearity among M. lamarckii JML1, Meretrix sp. MF1, Meretrix sp. MT1, and M. petechialis. Each block represents a distinct chromosome, and lines of the same color connect and highlight regions of collinearity between species.

Data Records

All raw sequencing data have been deposited in the BioProject at NCBI under accession number PRJNA122774040.

The Illumina data were deposited in the Sequence Read Archive at NCBI under accession number SRR32575144, SRR32575146, and SRR3257514941.

The Nanopore data were deposited in the Sequence Read Archive at NCBI under accession number SRR32575145, SRR32575147, and SRR3257515041.

The Hi-C data were deposited in the Sequence Read Archive at NCBI under accession number SRR3257514841.

The assembled genome were deposited in the Genbank under the accession number GCA_04924435542, GCA_04924436543, and GCA_04924437544.

The mitochondrial genome assembly under the accession number PV38317045, PV38317146, and PV38317247.

Genome annotation files are available in Figshare48.

Technical Validation

Genome assembly and annotation completeness evaluation

To assess the completeness and accuracy of the assembled genomes, multiple quality assessment tools were utilized. First, BUSCO version 5.8.331 with the mullsuca_odb12 lineage database, was used to evaluate the genome completeness. In the Meretrix sp. MF1 genome, 4264 (96.4%) single-copy ortholog were fully identified, while Meretrix sp. MT1 and M. lamarckii JML1 contained a complete set of 4116 (93.1%) and 4095 (92.6%) single-copy orthologs, respectively. The completeness scores for all three species exceeded 92.6% based on mullsuca_odb12 database, demonstrating the high quality and completeness of the assembled genomes (Table 6). Subsequently, BUSCO was applied with the mollusca_odb12 lineage database to assess the completeness of the predicted proteins. Results indicated that 4017 (90.9%) single-copy orthologs were fully identified in the Meretrix sp. MF1 predicted protei. In comparison, Meretrix sp. MT1 and M. lamarckii JML1 exhibited a complete set of 3752 (84.9%) and 3266 (73.9%) single-copy orthologs, respectively (Supplementary Table 4).

Table 6 Results of BUSCO completeness assessment for the Meretrix genome assembly.

Next, Merqury version 1.349 was used to evaluate genome completeness using a k-mer-based approach. K-mers derived from Nanopore data were analyzed to calculate the quality value (QV) score, resulting in 97.62% k-mer completeness and an assembly consensus QV of 49.74 in Meretrix sp. MF1 (Supplementary Table 5). The statistical results for Meretrix sp. MT1 and M. lamarckii JML1 are also presented in Supplementary Table 5. To further assess assembly accuracy, Illumina reads were aligned to the genome using BWA version 0.7.1850. Statistical analysis with SAMtools version 1.2151 showed that 99.72% of the Illumina reads successfully mapped to the genome, achieving a coverage of 98.25%, confirming the high accuracy of the assembly (Supplementary Table 6). The results for Meretrix sp. MT1 and M. lamarckii JML1 are also presented in Supplementary Table 5. Omni-C library quality control was performed following the official Cantata Bio standard protocol (https://omni-c.readthedocs.io/en/latest/). The results yielded 151,321,804 total read pairs, with 58.36% mapped read pairs and 86.83% non-duplicate valid read pairs (cis ≥ 1 kb + trans). More detailed statistical information is presented in Supplementary Table 7. Additionally, Juicebox version 1.11.0852 was employed to visualize the assembled scaffolds and detect potential misassemblies. Manual inspection revealed no characteristic patterns of read coverage indicative of misjoins, translocations, or inversions.