Background & Summary

Cold seeps are regions where methane and hydrogen sulfide-rich fluids ascend from the seafloor at rates varying from a few centimeters to several meters annually1,2,3. Despite the challenging environmental conditions of darkness, extreme pressure, and toxicity, these ecosystems act as significant biodiversity hotspots, offering unique ecological niches that differ markedly from the surrounding seafloor4,5. These ecosystems support a large amount of invertebrates that form symbiotic relationships with chemoautotrophic bacteria, enabling survival in toxic environments6,7. Research into the invertebrates of cold seeps has significantly enhanced our comprehension of the deep-sea environment and the origins of life under extreme conditions.

Among these invertebrates, vestimentiferan tubeworms dominate cold-seep communities and serve as model organisms for studying adaptations to chemosynthetic ecosystems8,9. Vestimentiferan tubeworms lack digestive systems but possess a trophosome hosting sulfur-oxidizing bacteria, which sustain them through chemosynthesis10. Simultaneously, tubeworms supply symbiotic bacteria with essential nutrients such as carbon dioxide, oxygen, hydrogen sulfide via the plume to the trophosome11,12. The branchial plume, a highly vascularized organ analogous to gills, directly interfaces with seawater, mediating gas exchange and nutrient transport to the symbionts. The hemoglobin-rich blood vessels of plume exhibit an extraordinary affinity for toxic sulfides, a key adaptation to cold-seep environments13,14,15,16.

However, this exposure subjects the plume to constant microbial invasion17, necessitating robust immune defenses. While bulk transcriptomic studies of tubeworms have revealed elevated immune gene expression in the plume, these approaches obscure cell-type-specific responses18. Notably, Toll-like receptors (TLRs) and pathogen-defense genes are upregulated in the plume, but their spatial distribution have not been mapped to discrete cell types. Single-cell RNA sequencing (scRNA-seq) is essential to resolve this complexity, enabling identification of specialized cell subpopulations and their molecular mechanisms.

Paraescarpia echinospica, a species of vestimentiferan, is commonly found in numerous cold seeps situated in the western Pacific Ocean19,20,21. Previous studies have successfully decoded the genome sequence of P. echinospica22,23. In this study, scRNA-seq was employed to comprehensively characterize the cell types in the plume of P. echinospica. We captured a total of 10,689 high-quality cells and identified six main cell types based on their gene expression. Additionally, given the crucial roles of the plume in transport and immune response, we further characterized the status of transporters and immune genes. This work will establish a foundational resource for deep-sea organism studies at cellular resolution.

Methods

Sample collection

In brief, individuals of the deep-sea tubeworm P. echinospica were collected from the Haima cold seep (16°72.90′ N, 110°47.26′ E, 1387.1 m) in the South China Sea via the human occupied vehicle Shenhaiyongshi in February 2023. During the cruise, specimens were collected using a handnet and subsequently housed in a biobox. Immediately upon arrival at the main deck of the research vessel, one specimen was dissected, followed by rapid transfer of the P. echinospica plume to petri dishes for single-cell preparation in the onboard laboratory. The remaining tissues were immediately frozen and stored in a −80 °C freezer.

RNA extraction, library construction, and sequencing

Total RNA was extracted from the plume using the RNeasy Mini Kit (Qiagen) according to the manufacturer’s instructions. The concentration of RNA was determined using the NanoDrop (Thermo Fisher Scientific), and its integrity was assessed through the RNA Nano 6000 Assay Kit and Agilent 2100 Bioanalyzer (Agilent Technologies). Subsequently, an experimental process was conducted in accordance with the standard protocol provided by Oxford Nanopore Technologies (ONT). This process involved detecting the quality of the samples, constructing libraries, and assessing the quality of the libraries. Finally, the Nanopore PromethION platform was utilized to sequence the final library.

Transcriptome annotation

Initially, the original fastq data underwent filtration based on short fragments and low-quality reads (less than 500 bp in length and a Qscore < 6) to obtain a cleaned dataset. The complete sequences were then aligned to the previous published reference genome23 using the minimap2 software. Subsequently, clustering was performed based on comparison information, and consensus sequences were generated using the pinfish software. To ensure there were no duplicated transcript copies, a single consensus sequence was selected for each transcript. These consensus sequences were then merged and aligned with the reference genome using minimap2, and any redundant comparison results were removed. Afterward, transcripts with an identity below 0.9 and coverage less than 0.85 were filtered out, resulting in a final collection of 22,994 non-redundant transcript sequences.

Fresh tissues dissociated into single cell

Prior to library construction, the dissected plume of P. echinospica was transferred onto petri dishes and rinsed with ice-cold artificial seawater, prepared according to previous reports24. Then the plume was transferred on new petri dishes and minced by scalpel. The minced plume was collected into cell dissociation buffer containing 10 mg/ml Bacillus licheniformis protease (Sigma-Aldrich), the reaction was lasted for 30 minutes at 8 °C. After the reaction, the suspension was filtered through 30 μm cell strainer and spanned down at 350 g for 5 min at 4 °C. The supernatant was carefully removed, then the pellets were washed by ice-cold artificial seawater. After that, the pellets were resuspended in 1 × PBS containing 0.04% bovine serum albumin. The cell concentration was counted with Trypan Blue for subsequent DNBelab C Series Single-Cell library construction.

Single cell RNA-seq with DNBelab C4 system

The DNBelab C Series Single-Cell Library Prep Set (MGI) was employed for the preparation of barcoded scRNA-seq libraries25. Indexed sequencing libraries were then constructed following the manufacturer’s instructions and sequenced using the high-throughput DNBSEQ sequencer at the China National GeneBank (CNGB). The sequencing reads were generated in a paired-end format, with read 1 containing 30 bases that encompassed a 10 bp cell barcode 1, 10 bp cell barcode 2, and a 10 bp unique molecular identifier (UMI), and read 2 containing 100 bases representing the transcript sequence, along with a 10 bp sample index.

scRNA-Seq Data Processing and cell clustering

The raw data’s cell barcodes and UMIs were merged into fastq files using the parse feature in PISA (v1.10.2), and FastQC (v0.11.3) was utilized for quality control. After quality control, the raw reads were aligned to the reference genome of P. echinospica. We resequenced the plume’s transcriptome to improve gene annotation and adjusted the Unique Molecular Identifier (UMI) for sequencing errors. Gene matrices were generated using valid barcodes identified by the EmptyDrops method. These matrices were then imported into Seurat (v4.3.0) to perform cell clustering based on a graph-based approach. For cell filtration, cells with less than 200 genes or more than 5,000 genes in the gene cell barcode matrix were filtered out. Additionally, cells with mitochondrial UMIs accounting for more than 20% were also excluded. After performing PCA (30 PCs) and UMAP (default parameters) analysis on the integrated and batch corrected data, we proceeded to use the ‘findallmarker’ function (with only. pos equal to TRUE, min. pct and logfc. threshold equal to 0.25) to find differentially expressed genes (DEGs) between different clusters.

Data Records

The raw sequencing data (scRNA-seq and ONT transcriptomic sequencing data) reported in this paper are available at the China National GenBank (CNGB) Sequence Archive (CNSA, https://db.cngb.org/cnsa/) under accession number CNP000560726. All raw sequencing data were also deposited to the National Center for Biotechnology Information Gene Expression Omnibus database under the accession number: GSE28193127, GSE28333128.

Technical Validation

On board, the tubeworm was dissected for its plume, and the single-cell libraries of this tissue were constructed. These libraries were subsequently sequenced using the MGI DNB-seq platform in the laboratory. The scRNA-Seq data was processed using PISA and FastQC for quality control, and then mapped to the reference genome of P. echinospica. Cell clustering was performed using Seurat based on a graph-based approach, with cell filtration based on gene counts and mitochondrial UMIs. Furthermore, differential gene expression (DGE) analysis was conducted utilizing the ‘findallmarker’ function in Seurat (Fig. 1).

Fig. 1
figure 1

The workflow for collecting and analyzing scRNA-seq data of deep-sea tubeworms.

A total of 9,075,171,350 reads were generated from the raw data across seven libraries (Table 1). After quality control and raw data filtering, the genome mapping rates for these libraries are over 90%, indicating adequate sequencing quality for further data analysis. On average, each cell contained approximately 500 median genes (Fig. 2a), ~2,800 median UMI counts per cell (Fig. 2b), with ~13,000 total genes detected at an average sequencing depth of ~22,000 reads per cell. The sequencing saturation percentages ranged between 37.14% and 71.52% (Table 2). In most cells, the percentage of mitochondrial genes was less than 20% (Fig. 2c). Quality control of the seven libraries demonstrated no significant batch-specific differences among them (Fig. 2d).

Table 1 Detailed quality control of FASTQ files.
Fig. 2
figure 2

Overview of quality control and cell filtering in each single-cell RNA sequencing library. (a) The violin plot of gene numbers in seven libraries after data filtering. (b) The violin plot of gene counts in seven libraries after data filtering. (c) Distribution of mitochondrial gene percentage in each library after data filtering. (d) The distribution of seven libraries in UMAP.

Table 2 Detailed quality control of scRNA-seq libraries.

After quality control, a total of 10,689 high-quality cells were obtained. We successfully annotated six cell types, including hemocytes, proliferative cells, muscle cells, epithelial cells, nerve1 cells, and nerve2 cells (Fig. 3a). The genes that distinguish these clusters were visualized in a heatmap, highlighting the top 50 differentially expressed genes (Fig. 3b and Supplemental Table S1). Furthermore, the expression levels of these marker genes were illustrated in DotPlot (Fig. 3c and Supplemental Table S2). Our single-cell data reveals that nerve cells (comprising both nerve1 cells and nerve2 cells) and muscle cells account for 29% and 24% of the total cell population, respectively, making them the main components of the plume (Fig. 3d). We subsequently investigated the correlation among different cell types, and the results show a strong correlation between muscle cells, epithelial cells and nerve cells (Fig. 3e).

Fig. 3
figure 3

Single-cell transcriptional landscape of tubeworm plume and cell cluster identification. (a) The analysis of 10,689 cells from the tubeworm using UMAP to visualize and color-code the cells according to their cluster cell type. Each cell is represented by a dot on the plot, and the different colors correspond to specific cell types. (b) Heatmap showing the expression of the top 50 DEGs of the main cell types. Colors range from yellow (low expression) to red (high expression). (c) DotPlot of marker genes in 6 different cell types. (d) The pie chart displays the proportion of each cell type out of the total number of cells. (e) The heatmap displays the similarity between cell types, with colors ranging from blue to red indicating high to low similarity, respectively.

We examined the expression patterns of transport proteins in different cell types and downloaded 1,801 transport-related genes from the “transporters” catalog in the KEGG BRITE Database (https://www.kegg.jp/brite/ko02000). Subsequently, these genes were screened among DEGs across all cell types, and 48 transport-related genes were identified (Fig. 4a and Supplemental Table S3). Notably, high expression of globin (glob1)29 and neuroglobin (ngb)30 involved in oxygen transport were found in hemocytes (Fig. 4b). High expression of slc6a8, a creatine transporter, was also discovered in hemocytes (Fig. 4b). Creatine, acting as a blood antioxidant, protects cells from oxidative damage and genotoxicity, and potentially extends the lifespan of hemocytes31. Presumably, the creatine transporter assists tubeworm hemocytes to better maintain their life activity and life cycle of tubeworm hemocytes in harsh environments. Furthermore, slc5a632, slc6a533 and dur334 are highly expressed in epithelial cells (Fig. 4c). Overall, the elevated expression of solute carrier (SLC) transporter in these cell types enables efficient molecule transportation across epithelial barriers and supports specialized cellular functions. Hemocytes are known as immune cells in invertebrates35. Notably, we investigated some immune-related genes from DEGs including mfge836, fut837 and plg38 in hemocytes (Fig. 4d). In summary, our study provides a valuable scRNA-seq dataset for investigating tubeworm gene expression profiles and offers an in-depth exploration of the molecular mechanisms underlying their adaptation to extreme environments.

Fig. 4
figure 4

scRNA analysis of transport and immune related genes. (a) Heatmap illustrates the expression levels of transport-related genes across different cell types, with each column representing a specific cell type. Colors range from blue (low expression) to red (high expression). (b) Violin plots of transport genes specially expressed in hemocytes. (c) Violin plots of transport genes specially expressed in epithelial cells. (d) Violin plots of immune genes in each cell type.

Usage Notes

All sequencing data has been deposited in the CNGB database and the GEO database, including the raw FASTQ files and the expression matrix files processed by DNBC4tools. We have also uploaded our scripts to GitHub (refer to the Code Availability section for more details), allowing other researchers to reproduce our results or integrate them with other datasets for further analysis.