Background & Summary

Sea cucumbers are high-valued aquaculture species in Asian countries (especially in China)1. Due to a continuing interest in the nutritional and medicinal value of sea cucumbers2, the increasing consumer demand has driven the rapid expansion of the sea cucumber industry in the last decades3. Papillae are fleshy, conical ambulacral appendages of sea cucumbers. Papilla appearance (size and arrangement) of sea cucumbers is a major metric affecting the pricing and grading of sea cucumbers and sea cucumber products (Fig. 1)4. Elucidation of molecular mechanisms underlying papilla phenotype formation and identification of biomarkers determining papilla appearance of sea cucumbers have been always a popular research topic in the sea cucumber aquaculture industry4,5,6.

Fig. 1
figure 1

The schematic representation of experimental design and workflow of this study. SB: the small papillae from the back side; BB: the big papillae from the back side; BS: the big papillae from the lateral side. This figure was created with BioRender.com.

It is acknowledged that morphological or trait phenotype formation is the intuitive consequence of gene expression and regulation4. Although whole-genome resequencing7 and mRNA transcriptome sequencing6 have provided some gene candidates correlated with papilla number and papilla differentiation of sea cucumbers, the molecular mechanisms, in particular the epigenetic mechanisms (non-coding RNA regulation and DNA methylation) underlying papilla appearance formation are still limited, as well as, specific-biomarkers associated closely with papilla appearance.

In this study, big papillae and small papillae were sampled from the sea cucumber Apostichopus japonicus (the primary Asian commercial sea cucumber), which were subsequently processed for whole-transcriptome sequencing and methylome sequencing. Quality control of samples was performed at each step using (OD260/OD280), purity (OD260/OD230), and RIN (RNA integrity number). Differentially expressed lncRNAs, circRNAs, miRNAs, mRNAs, and methylated DNA fragments were identified. Expression regulation networks were then generated based on target prediction and gene expression level analysis. The schematic representation of experimental design and workflow is presented in Fig. 1. The datasets generated from this study not only established a robust theoretical foundation (especially from the epigenetic aspect) for elucidating expression regulation network determining papilla size in sea cucumbers but also can be a valuable resource of biomarker mining for papilla appearance based selective breeding in sea cucumbers.

Methods and Results

Ethical approval

This study was conducted in line with the Guidance of the operation of the Animals (Scientific Procedures) Act 1986 and associated guidelines, EU Directive 2010/63 for the protection of animals used for scientific purposes, and the National Research Council’s Guide for the Care and Use of Laboratory Animals. Approval was granted by the Ethics Committee of Dalian Ocean University (Date: 2025.01.09/No. DLOU20250001).

Sample collection

To avoid the influence of pigment on sample quality, white A. japonicus individuals were selected from the same lineage created by the Ministry of Agriculture Key Laboratory of Mariculture & Stock Enhancement at Dalian Ocean University, Dalian, China (121°33′47″ E, 38°51′55″ N) (Fig. S1). The schematic diagram of experimental design is summarized in Fig. 1. The BS group represents the big papillae from the lateral side of A. japonicus individuals, the BB group represents the big papillae from the back side of A. japonicus individuals, and the SB group represents the small papillae from the back side of A. japonicus individuals (Fig. 1).

Extraction

Total RNA and DNA isolation and qualification were performed following the method proposed by Zhan et al.4 and Jiao et al.8. In terms of RNA samples, the OD260/OD280 values were 1.92 ± 0.11 (SB), 2.06 ± 0.65 (BB), and 2.18 ± 0.35 (BS); the OD260/OD230 values were 0.42 ± 0.04 (SB), 0.69 ± 0.04 (BB), and 0.71 ± 0.13 (BS); the RNA integrity numbers were 8.80 ± 0.36 (SB), 9.20 ± 0.26 (BB), and 9.53 ± 0.25 (BS). All above indicate the good quality of RNA samples used in this study.

Library preparation and sequencing

For whole-transcriptome sequencing, a total amount of 3 μg RNA per sample was used as input materials for the RNA sample preparations. Sequencing libraries were generated using NEBNext® UltraTM RNA Library Prep Kit for Illumina® (NEB, USA) following the manufacturer’s recommendations and index codes were added to attribute sequences to each sample. Briefly, mRNA was purified from total RNA using poly-T oligo-attached magnetic beads. Fragmentation was carried out using divalent cations under elevated temperature in NEBNext First Strand Synthesis Reaction Buffer (5×). The first-strand cDNA was synthesized using a random hexamer primer and M-MuLV Reverse Transcriptase (RNase H). The second-strand cDNA synthesis was subsequently performed using DNA Polymerase I and RNase H. Remaining overhangs were converted into blunt ends via exonuclease/polymerase activities. After adenylation of 3′ ends of DNA fragments, NEBNext Adaptor with hairpin loop structure was ligated to prepare for hybridization. Selecting cDNA fragments of preferentially 150–200 bp in length, the library fragments were purified with the AMPure XP system (Beckman Coulter, Beverly, USA). Then 3 µl USER Enzyme (NEB, USA) was used with size-selected, adaptor-ligated cDNA at 37 °C for 15 min followed by 5 min at 95 °C before polymerase chain reaction (PCR). Then PCR was performed with Phusion High-Fidelity DNA polymerase, Universal PCR primers, and Index (X) Primer. At last, PCR products were purified (AMPure XP system) and the library was subjected to initial quantification using a Qubit® 2.0 fluorometer (Life Technologies, CA, USA), and subsequently diluted to 1 ng/μL. The insert size distribution of the library was then analyzed using an Agilent 2100 Bioanalyzer system, confirming fragments predominantly distributed within the 250–300 bp (for lncRNA) and 18–40 bp (for small RNA) range9. The effective library concentration (>3 mM) was determined through quantitative real-time quantitative PCR (qPCR) analysis to ensure the quality of the library.

For methylome sequencing, genomic DNA degradation and contamination were validated by agarose gels. DNA purity was checked using a NanoPhotometer® spectrophotometer (IMPLEN, CA, USA). DNA concentration was measured using Qubit® DNA Assay Kit in Qubit® 2.0 Fluorometer. A total amount of 100 ng genomic DNA spiked with 0.5 ng lambda DNA was fragmented by sonication to 200–300 bp with Covaris S220. These DNA fragments were treated with bisulfite using the EZ DNA Methylation-GoldTM Kit (Zymo Research), and the library was constructed by Novogene Corporation (Beijing, China). The library was subjected to initial quantification using a Qubit® 2.0 fluorometer, and subsequently diluted to 1 ng/μL. The insert size distribution of the library was then analyzed using an Agilent 2100 Bioanalyzer system, confirming fragments predominantly distributed within the 200–300 bp range10. The effective library concentration (>2 mM) was determined through qPCR analysis, ensuring the quality of the library. Subsequently, pair-end sequencing of the sample was performed on the Illumina platform (Illumina, CA, USA).

Quality control

For whole-transcriptome sequencing, raw data (raw reads) of fastq format were firstly processed through in-house perl scripts. In this step, clean data (clean reads) were obtained by removing reads containing adapter, reads containing ploy-N and low-quality reads from raw data. At the same time, Q20, Q30, and GC content of clean data were calculated. All the downstream analyses were based on clean data with high quality.

For methylome sequencing, first of all, we used FastQC (v0.11.5) (http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) to perform basic statistics on the quality of the raw reads. Then, those read sequences produced by the Illumina pipeline in FASTQ format were pre-processed through fastp (v0.23.1) (https://github.com/OpenGene/fastp). The remaining reads that passed all the filtering steps were counted as clean reads and all subsequent analyses were based on this. Finally, we used FastQC to perform basic statistics on the quality of the clean reads.

Data processing and analyses

For whole-transcriptome sequencing, 16.35 G of average clean bases and approximately 8.11–15.39 million raw reads were obtained, and 7.97–15.02 million clean reads were obtained from each sample after trimming. The Q20 range of clean reads was 96.57–98.32%, and the Q30 range was 91.2–94.96%. The GC content of all examined samples ranged from 41.6% to 42.52% (Table S1). Identification of mRNAs was performed using Cuffcompare (v2.1.1) (https://cole-trapnell-lab.github.io/cufflinks/cuffcompare/) software to align transcripts to public databases.

For methylome sequencing, 28.92 G of average clean bases and approximately 9.17–12.82 million raw reads were obtained, and 9.04–12.51 million clean reads were obtained from each sample after trimming. The Q20 range of clean reads was 96.93–98.08%, and the Q30 range was 90.9–93.62%. The GC content of all examined samples ranged from 19.26% to 19.88% (Table S2).

Comparative whole-transcriptome and comparative methylome analyses

For whole-transcriptome sequencing, differential expression analysis of groups (three biological replicates per group) was performed using the DESeq2 R package (v1.10.1) (https://bioconductor.org/packages/release/bioc/html/DESeq2.html). DESeq2 provides statistical routines for determining differential expression in digital expression data using a model based on the negative binomial distribution. CircRNA identification was performed by find_circ (v1.2) (https://github.com/marvin-jens/find_circ) and CIRI2 (v2.0.6) (https://ciri-cookbook.readthedocs.io/en/latest/CIRI2.html). Identification of lncRNAs and miRNAs was performed by Cuffcompare, miREvo (v1.2) (https://github.com/akahanaton/miREvo) and mirdeep2 (v2.0.05) (https://github.com/rajewsky-lab/mirdeep2), respectively. The target genes of miRNAs were predicted by miRanda (v3.3a) (http://www.microrna.org/microrna/home.do), and the target genes of lncRNAs were predicted by co-expression method11,12,13. The P value was adjusted using Benjamini and Hochberg’s approach for controlling the false discovery rate. In terms of lncRNAs, circRNAs, miRNAs, and mRNAs, the threshold of log2FoldChange > 0 was defined as upregulated expression (UP), when log2FoldChange > 0 and P < 0.05 was defined as significant upregulation (sigUP) between groups; the threshold of log2FoldChange < 0 was defined as downregulated expression (DOWN), when log2FoldChange < 0 and P < 0.05 was defined as significant downregulation (sigDOWN) between groups. The statistical summary of the numbers of identified lncRNAs, circRNAs, miRNAs, mRNAs (including transcriptional factors, TFs), and TFs is presented in Fig. 2.

Fig. 2
figure 2

Data profiles. (A) numbers of differentially expressed lncRNAs; (B) numbers of differentially expressed circRNAs; (C) numbers of differentially expressed microRNAs; (D) numbers of upregulated differentially expressed genes (DEGs); (E) numbers of downregulated DEGs. SB: the small papillae from the back side; BB: the big papillae from the back side; BS: the big papillae from the lateral side.

For methylome sequencing, differentially methylated regions (DMRs) were identified using the DSS software (v2.12.0) (https://github.com/deepskystacker/DSS/releases)14,15,16. The core of DSS is a new dispersion shrinkage method for estimating the dispersion parameter from Gamma-Poisson or Beta-Binomial distributions. According to the distribution of DMRs through the genome, we defined the genes related to DMRs as genes whose gene body region [from transcript start sites (TSS) to transcript end sites (TES)] or promoter region (2 kb upstream from the TSS) have an overlap with the DMRs. DMRs with a differentially methylated level > 0.05 between groups were considered as high-level methylated DMRs and those with a differentially methylated level < −0.05 between groups were considered as low-level methylated DMRs. There are 3,679, 3,039, and 2,988 high-level methylated DMRs in BS vs. BB, BS vs. SB, and BB vs. SB, respectively. A total of 2,945, 3,244, and 3,841 low-level methylated DMRs in BS vs. BB, BS vs. SB, and BB vs. SB, respectively.

Gene expression regulation network construction

Based on the whole transcriptomic data and targeting relationship between lncRNAs, circRNAs, miRNAs and mRNAs. We constructed four gene expression regulation pair models including lncRNA-miRNA pairs, circRNA-miRNA pairs, lncRNA-mRNA pairs, circRNA-mRNA pairs, and miRNA-mRNA pairs. The statistical summary of the numbers of predicted lncRNA-miRNA pairs, circRNA-mRNA pairs, lncRNA-mRNA (including TFs) pairs, miRNA-mRNA (including TFs) pairs is presented in Fig. 3.

Fig. 3
figure 3

Regulation network of lncRNAs-microRNAs (A), circRNAs-microRNAs (B), lncRNAs-mRNAs (C) and microRNAs-mRNAs (D). mRNAs include differentially expressed gene (DEGs), non-DEGs, and transcription factors (TFs). SB: the small papillae from the back side; BB: the big papillae from the back side; BS: the big papillae from the lateral side.

Considering the complexity of ceRNA regulation17,18,19,20,21,22, we only focused on prediction and construction of conservative ceRNA networks (circRNAs/lncRNAs-miRNAs-mRNAs) with negative interaction in this study. Two conservative types of ceRNA networks were identified, including type I (TI) ceRNA networks (upregulated circRNAs/lncRNAs – downregulated microRNAs – upregulated DEGs) and type II (TII) ceRNA networks (downregulated circRNAs/lncRNAs – upregulated microRNAs – downregulated DEGs). The statistical summary of the numbers of predicted TI and TII ceRNA networks is shown in Fig. 4.

Fig. 4
figure 4

Regulation network of ceRNAs. (A) type I (TI) ceRNA networks (upregulated circRNAs/lncRNAs – downregulated microRNAs – upregulated DEGs) between groups; (B) type II (TII) ceRNA networks (downregulated circRNAs/lncRNAs – upregulated microRNAs – downregulated DEGs) between groups. SB: the small papillae from the back side; BB: the big papillae from the back side; BS: the big papillae from the lateral side.

For methylome sequencing, high-level methylation occurred in intron regions predominantly across all comparison groups, and the percent high-level methylation at promoter level in big papillae vs. small papillae groups (BS vs. SB: 18.23% and BB vs. SB: 18.84%) was higher than that in the BS vs. BB group (17.80%). Low-level methylation occurred in intron regions predominantly across all comparison groups. The percent low-level methylation at intron level in the BS vs. BB group (34.47%) was higher than those in both the BS vs. SB (32.77%) and BB vs. SB (31.14%) groups. Integrated methylome-transcriptome analysis data is presented in Fig. 5. Gene-specific variations of methylation region and levels were obtained from above data. Specifically, high-level methylation of SLC6A9 at genebody level observed in both BS vs. BB and BS vs. SB, and high-level methylation of SLC5A8 at promoter level was observed in both BS vs. BB and BS vs. SB. As for POL, high-level methylation at promotor level was observed in the BS vs. BB, whereas, low-level methylation at the genebody level was observed in the BB vs. SB. In the BS vs. BB, CPN2 was low-level methylated at the genebody level but high-level methylated at promotor level in the BB vs. SB. As for F54H12.3, low-level methylation was observed both at promotor level (BB vs. SB) and at genebody level (BS vs. SB and BB vs. SB) (Fig. 5).

Fig. 5
figure 5

Correlation relationship between differentially expressed genes (DEGs) and differentially methylated regions (DMRs). TSS: transcription start site; TES: transcription end site. SB: the small papillae from the back side; BB: the big papillae from the back side; BS: the big papillae from the lateral side.

Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) enrichment analysis

To investigate the GO enrichment and regulation pathway of target genes of differentially expressed lncRNAs, circRNAs, and miRNAs, the GO and KEGG enrichment analysis were performed using GO (http://www.geneontology.org/) and KEGG database (https://www.genome.jp/kegg/). The results are presented in the Fig. 6.

Fig. 6
figure 6

The enrichment analysis of Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) of the target genes of differentially expressed lncRNAs (A), circRNAs (B) and microRNAs (C). SB: the small papillae from the back side; BB: the big papillae from the back side; BS: the big papillae from the lateral side.

Data Records

The datasets have been deposited in the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA) https://identifiers.org/ncbi/insdc.sra:SRP55526623. The files of differentially expressed lncRNAs, circRNAs, miRNAs and mRNAs, and methylated DNA fragments are provided in the differential expression analysis.xlsx. Gene expression regulation network information of identified lncRNA-miRNA pairs, circRNA-miRNAs pairs, lncRNA-mRNA pairs, miRNA-mRNA pairs, lncRNA(circRNA)-miRNA-mRNA pairs, and mRNAs with high-level methylation are presented in Figures.

Technical Validation

RNA quality and integrity assessment

To ensure the quality and integrity of the extracted total RNA, the A260/A280 ratio of each RNA sample was first measured with an acceptable range of 1.86–2.56. In addition, the Nanodrop and Agilent 2100 were also employed to assess the quality and integrity of each RNA sample.

Whole transcriptome and methylome sequencing data quality assessment

The raw fastq file were re-assessed using fastp tool (v0.23.1; https://github.com/OpenGene/fastp) for detailed information including per base sequence quality, Phred sore, GC content, sequence duplication before and after filtering, and the general summary of adapters and insert size estimation. All above indicators showed that the data quality we obtained from the raw fastq file has a good quality and suitable for analysis. The assessment report was given in the whole-transcriptome and methylome sequencing data quality assessment.xlsx.

Quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) validation

Differentially expressed lncRNAs (three), circRNAs (three), miRNAs (eight), and mRNAs (seven) were randomly selected from the whole transcriptome and methylome to be verified by qRT-PCR (Fig. 7). CYTB was used as an internal control for lncRNAs, circRNAs, and mRNAs24, and RNU6B was used as an internal control for miRNAs25. The relative expression level was determined using the 2−ΔΔCT method26. The primers used for qRT-PCR are listed in Table S3.

Fig. 7
figure 7

Quantitative real-time reverse transcription-polymerase chain reaction (qRT-PCR) validation. SB: the small papillae from the back side; BB: the big papillae from the back side; BS: the big papillae from the lateral side. Different letters are used to mark significant differences between groups (P < 0.05).