Background & Summary

Semisulcospiridae Morrison, 1952 is a group of gastropods that comprises the majority of freshwater benthic fauna1,2,3,4, which is important in freshwater ecology5. It contains over 90 valid species belonging to four genera: Hua S.-F. Chen, 1943, Juga H. Adams & A. Adams, 1854, Koreoleptoxis J. B. Burch & Y. Jung, 1988, and Semisulcospira O. Boettger, 18863,6,7,8. Species in Semisulcospiridae are predominantly found in East Asia and North America, with the majority (over 60 species from three genera) documented in China2,7,9,10. Species from Semisulcospira, Koreoleptoxis, and Hua can be used as environmental indicators because they can only live in relatively clean water10,11,12,13. Moreover, some species of Semisulcospira and Koreoleptoxis have been proven to be intermediate hosts for human and animal parasite Paragonimus westermani Kerbert, 187814,15,16.

The evolutionary relationships within Semisulcospiridae are not clear. Most previous studies mainly focused on phylogenetic analyses based on single genes (e.g., COI, 16S, 28S) or their combination, but it is insufficient when explaining the phylogeny in this family5,17. Semisulcospira was paraphyletic in phylogenetic trees based on COI and 16S, although its reproductive mode (viviparous) differs from the other three genera (oviparous) of Semisulcospiridae5,9,10,17. Moreover, using only a few genes is not suitable for distinguishing some congeners7,10. Omics data may help to solve this problem5,17. However, only a few studies have involved omics research in this family18,19,20: Lee18 published the de-novo assembled transcriptome of Semisulcospira coreana (E. von Martens, 1886); Gim19 published the draft genome of Semisulcospira libertina (A. Gould, 1859); Miura20 published the chromosome-scale genome of Semisulcospira habei G. M. Davis, 1969.

Chen21 first described Hua based on specimens collected from Southwest China and designated Melania telonaria Heude, 1889 as its type species. They can be distinguished by their reproductive organs from other genera of Semisulcospiridae. It is characterized by female reproductive organs with an egg-laying groove or an ovipositor under the left tentacle. Comparatively, Koreoleptoxis has both, and Semisulcospira has none2. There are 29 species in Hua, and most are distributed endemically in Southwest China2,7,10. Morphology of shells, radulae, reproductive organs, and DNA barcoding are commonly used to distinguish the species in the genus. The species of Hua show a high variety in shell morphology, such as size, color, shape, and sculpture2,7,10. Phylogenetic analysis proved that Hua is a monophyletic group in several studies2,7,10, but not in He7. Species Hua are of great research value in the endemism ecology and high biodiversity. However, they are threatened by human activities10.

With the rapid development and the advancement of high-throughput sequencing technologies, numerous whole-genome sequencing projects are currently undergoing, including the Darwin Tree of Life Project and the Earth BioGenome Project22,23. However, whole-genome sequencing remains prohibitively expensive and encounters significant technical challenges for small specimens. RNA sequencing (RNA-seq) is a cost-effective alternative that can obtain high-quality coding sequences and has been widely adopted across diverse research fields. For instance, it has been used to study the physiological ecology of aquatic invertebrates24,25 and contributed to the conservation of endangered species26,27. Gene expression profiles generated from RNA-seq provided new insights into physiological acclimatization, metabolic trade-offs, resilience, high-mortality events, and niche partitioning between similar species27. RNA-seq has also been used to provide a robust framework for the phylogeny of mollusks28,29,30,31. Additionally, high-quality transcriptomes will facilitate the annotation of species’ genomes and promote future studies of evolution, systematics, and functional genomics.

Although some genomic resources on Semisulcospiridae have been published, there is still not much literature on transcriptome research, and there is no transcriptomics data for Hua18,19,20. In this study, we report de novo assembled transcriptomes for six Hua species collected from China, facilitating multifaceted research on Semisulcospiridae.

Materials and Methods

Sample collection

Specimens of Hua were collected from the streams in Yunnan and Guizhou of China (Table 1, Fig. 1). They were transported alive to the laboratory, and the foot was cut and stored in RNAlater Stabilization Reagent (Coolaber, Beijing, China) for RNA extraction. The morphological and molecular information (COI and 16S rRNA; Table 2) was integrated for species identification. Live individuals were identified using morphological criteria (Fig. 1) in Du10 and Du2 by Yuanzheng Meng, a taxonomic specialist in Semisulcospiridae.

Table 1 Collection information of six Hua species and SRA Acc. No. of the raw data generated from RNA-Seq.
Fig. 1
figure 1

Specimens used in this study. (A) Hua textrix (Heude, 1888), SEM-A1; (B) Hua yangi L.-N. Du, J.-X. Yang & Chen, 2023, SEM-B1; (C) Hua sp. 1, SEM-C1; (D) Hua sp. 2, SEM-D1; (E) Hua sp. 3, GZA-001A; (F) Hua wujiangensis L.-N. Du, J.-X. Yang & Chen, 2023, GZB-001A.

Table 2 The Blast results of COI and 16S (Top 5 blast results are shown).

Species identification

Hua textrix (Heude, 1888), SEM-A1 (Fig. 1A): Shell medium-sized, solid, conical, with seven whorls. Surface brown, with a dark brown band on the bottom of each whorl, with spiral lines and axial ribs crossing each other, forming checkerboard patterns. Apex pointed. Aperture ovate. The blast results show high identity values (over 99%) with H. textrix2,9.

Hua yangi L.-N. Du, J.-X. Yang & Chen, 2023, SEM-B1 (Fig. 1B): Shell small, solid, ovate, with four whorls. Surface brown, smooth. Body whorl inflated. Apex blunt. Aperture round. The blast results show high identity values (over 99%) with H. yangi2,9.

Hua sp. 1, SEM-C1 (Fig. 1C): Shell medium-sized, solid, conical, with seven whorls. Surface brown, with spiral lines and axial ribs crossing each other, forming checkerboard patterns. Apex pointed. Aperture ovate. According to blast results, sequences from Hua aubryana (Heude, 1888) and Hua tchangsii Du et al., 2019 both show high identity values (over 99%), but morphological characters differ from both species: H. aubryana has two lines of nodules on the body whorl, and H. tchangsii has smooth shells2,9. Therefore, it is marked as an undetermined species (Hua sp. 1) here.

Hua sp. 2, SEM-D1, (Fig. 1D): Shell small, solid, ovate, with four whorls. Surface yellow-brownish, smooth. Body whorl inflated. Aperture round. The blast results show low identity values (less than 87.35% of COI and 92.39% of 16S), and morphological features cannot match any species either. Based on the above information, it may be a potential new species. Therefore, it is marked as an undetermined species (Hua sp. 2) here.

Hua sp. 3, GZA-001A (Fig. 1E): Shell small, solid, ovate, with four whorls. Surface dark brown, smooth except for growth lines. Body whorl inflated. Apex blunt. Aperture ovate. The blast results of COI show low identity values (less than 89.04%), but 16S show high identity (over 99%) with unpublished species. Morphological features and molecular information cannot determine species. It might be a potential new species and is marked as an undetermined species (Hua sp. 3). Therefore, it is marked as an undetermined species (Hua sp. 3) here.

Hua wujiangensis L.-N. Du, J.-X. Yang & Chen, 2023, GZB-001A, (Fig. 1F): Shell small, thin, conical, with five whorls. Surface brown, with 3-4 dark brown bands on each whorl, smooth except for growth lines. Apex pointed. Aperture ovate. The blast results show high identity values (99.48% of COI and 98.49% of 16S) with H. wujiangensis2,10.

Work-flow

A diagram of the workflow used in this study is presented in Fig. 2. Transcriptomes were de novo assembled from short-read sequences by Trinity. Each assembly was assessed using multiple indicators, and transcripts were annotated based on sequence similarity using Trinotate.

Fig. 2
figure 2

Flow chart of de novo assembly and annotation. The flow chart was created by the online Mermaid Live Editor (https://mermaid-js.github.io/mermaid-live-editor).

Total RNA extraction, library construction, RNA sequencing, and quality control

Total RNA extraction and sequencing were performed by Novogene Company (Beijing, China). The quality of RNA was detected by electrophoresis on BioAnalyzer (Agilent, Santa Clara CA). Libraries were prepared using the NEBNext Ultra RNA Library Prep Kit for Illumina (NEB, USA) following the manufacturer’s recommendations. Paired-end sequencing was performed on the DNBSEQ-T7 platform with 150 bp read length. The quality of raw data was assessed using FastQC version 0.11.8 (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/). Low-quality reads (i.e. short reads which are <50 bp, reads with an average Phred score <30 or N > 3) and sequencing adaptors were removed using fastp (version 0.23.4)32.

De novo assembly and redundancy removal

De novo assembly was performed using Trinity software (version 2.8.5)24,25,33 (default parameters). The assembled sequences were then clustered and de-duplicated with 95% sequence similarity using CD-HIT(v.4.8.1)34. The completeness of each transcriptome was evaluated using Benchmarking Universal Single-Copy Orthologs (BUSCO) software35 against the Mollusca database (BUSCO version 5.7.1, dataset: mollusca_odb10, 2024-01-08). Additionally, mitochondrial genes (13 protein-coding genes and two ribosomal RNAs) were annotated and extracted from the transcripts using MitoFinder (version 1.4).

Open reading frame prediction and transcriptome annotation

Functional annotations were performed using Trinotate (version 3.3.0). Open reading frames (ORFs) were identified using TransDecoder (version 5.7.1) (https://transdecoder.github.io/). The predicted ORFs were annotated against the UniProtKB/Swiss-Prot (a manually annotated and reviewed protein sequence database)36 and Pfam database (classification of protein families)37 using the Blastp tool from DIAMOND (version 2.1.8.162)38,39 and hmmscan from HMMER (version 3.1b2)40. Additionally, annotations were conducted against the KEGG (Kyoto Encyclopedia of Genes and Genomes) database41, Gene Ontology (GO)42, and eggNOG (Evolutionary Genealogy of Genes: Non-supervised Orthologous Groups)43 databases. The version of the eggNOG database used in this study is 4.5, which includes COG (Cluster of Orthologous Groups)44 annotation information. Ultimately, all results were integrated to generate a summary spreadsheet.

Data Records

The raw data were deposited in NCBI under the BioProject accession number PRJNA1160414: BioSample accessions SAMN47530241, SAMN47526927, SAMN47526927, SAMN47523056, SAMN47521559, SAMN47521559; Sequence Read Archive (SRA) accessions SRR3281490445, SRR3281497246, SRR3283517147, SRR3283520248, SRR3283483949, SRR3283365850. The remaining information (i.e. 01.fastp&FastQC_results, 02.Trinity_results, 03.BUSCO_results, 04.Transcriptome_annotation_results, and 05.MitoFinder_results) was uploaded to Figshare (https://doi.org/10.6084/m9.figshare.28637714.v2)51.

Technical Validation

Accuracy of species identification

Species identifications were verified through integrative taxonomic approaches combining morphological and Sanger sequencing of partial mitochondrial genes (COI and 16S rRNA) data. The three undetermined species were recognized by the second author, Yuanzheng Meng, a taxonomic specialist in Semisulcospiridae. Therefore, we ensure reliable species identification for all specimens included in this study.

Quality control, de novo assembly, and transcript statistics

The RIN scores of all RNA were in the range of 7.40–8.50 (Table 3). The FastQC analysis of the cleaned reads from all six species revealed high quality, with the majority of reads (more than 95%) exhibiting Phred quality scores above 35. The GC content per sequence ranged from 45% to 47%, and no adapter contamination was detected. Sequencing yields, assembly statistics, and transcriptome completeness for the six datasets are summarized in Table 4. The cleaned reads retained exceeded 90%, ranging from 36.7 to 59.3 million. The number of transcripts ranged from 147,147 to 268,741. Average contig length ranged from 716.6 to 883.3 bp. Excluding Hua sp. 2, transcripts from the other five species had N50 values that exceeded 1,000 bp. For BUSCO analysis, the complete BUSCO ranged from 59.1% to 84.0%, with two species failing to attain 70%, and fragmented BUSCO from 3.1% to 6.6%. A total of 4,666 BUSCOs were identified across six species, and 978 of them were present in all species. Most of the mitochondrial genes (except tRNAs) were successfully assembled in all species (Table 5), and COI (1,515–1,536 bp) and 16S (1,322–1,338 bp) were used for species identification (Table 2).

Table 3 Summary of RNA concentration (ng/µL), total RNA amount (ng), and RIN scores for all six Hua species (SEM-A1, SEM-B1, SEM-C1, SEM-D1, GZA-001A, GZB-001A).
Table 4 Summary of de novo assembly and annotation statistics for transcriptomes from six Hua species.
Table 5 Length of mitochondrial genes from six Hua species.

Transcriptome annotation

The annotation of the transcripts against the UniProtKB/Swiss-Prot database identified over 18,000 significant hits for each species (Table 4), with the highest of over 29,000 hits. Most of the transcripts (>95%) with BLAST hits were also annotated with GO terms and fewer with KEGG pathways and COG functional categories (Table 4, Fig. 3). The annotation against the Pfam database identified over 8,000 Pfam accessions of each species with the highest of 10,700 in H. wujiangensis (Table 4).

Fig. 3
figure 3

Distribution of Gene Ontology terms (GO). The stacked bar shows the number of transcripts of each GO term for six species.

Usage Notes

Given that Hua species are endemic to Southwest China and threatened by human activities, these transcriptomic resources support conservation efforts by providing genetic markers for phylogenetics, population studies to assess genetic diversity, and identify evolutionarily significant units. The functional annotations will serve as valuable references for future single- and multi-species analyses. The homologous genes across the transcriptomes of these six species will enhance our understanding of evolutionary and functional relationships in Hua. Furthermore, the sequence assemblies can be utilized to design species-specific primers for biomarkers. The dataset complements the limited genomic resources for Semisulcospiridae, creating a more comprehensive molecular framework for family-wide comparative studies and phylogenetic analyses. With its potential for extensive analysis, this dataset is a valuable resource for research in Semisulcospiridae genomic diversity.