Background & Summary

The ocean’s midwater: away from coasts, below the top 200 m of water, and above the ocean floor is the largest contiguous ecosystem on earth, yet, it remains one of the least explored for biodiversity1. It is home to a large diversity of pelagic animals that live suspended in a vast volume of water that encompasses over 1 billion km3 globally. Many of those animals are deep-sea gelatinous zooplankton (e.g., jellyfish, ctenophores, salps, siphonophores) that are particularly difficult to approach, observe, sample, and preserve owing to their extremely soft tissues2.

Gelatinous animals of the midwater harbour novel biology that can be unveiled through 3D modelling and whole genome and transcriptome sequencing3. They offer vital ecosystem services, establishing connections to the global food web, fisheries, and carbon sequestration in the deep sea, but estimates from the deep sea are lacking4. Deep sea gelatinous animals are underrepresented in genomic and transcriptomic datasets2,5,6,7,8.

The combination of whole transcriptomic genetic data coupled with high quality qualitative and quantitative image data allows for in depth exploration of the biology and evolution of deep-sea animals3. Here we present image and whole transcriptome data for 7 deep sea gelatinous animals (Fig. 1) that were collected as part of a research project that involved development of new technologies and elaboration of an integrated workflow to capture in situ image and genetic data from delicate gelatinous zooplankton3. The animals include 3 siphonophores: Praya sp., Apolemia sp., and Halistemma sp.; one jellyfish, Atolla sp.; one larvacean, Bathochordaeus mcnutti; one tunicate, Pyrosoma sp.; and one ctenophore, Lampocteis sp. Each dataset adds new information that either enriches available sequence and image data for the group, or adds the first genetic data available for the species.

Fig. 1
Fig. 1
Full size image

Image captures of the seven animals for which whole transcriptome, video, and image data are presented. The images are in situ captures from the 4k camera on the R/V SuBastian. Scale information is not possible on these image captures due to the differences in zoom, distance, and angle from the submarine for each animal. Additional data from the provided video captures can be used to get a sense of scale for each animal.

Methods

Sample collection

In August 2021, a ten-day integrated expedition aboard the Schmidt Ocean Institute’s R/V Falkor was conducted on the Pacific Ocean off the coast of San Diego, CA3. The expedition involved seven dives of the remotely operated vehicle (ROV) SuBastian, each lasting approximately eight hours. Launches took place in the mid-afternoon local time, with recovery occurring towards midnight to capture the diel migration patterns of ocean animals.

The ROV SuBastian was equipped with three innovative systems for midwater exploration:

  1. 1.

    The Deep Particle Image Velocimetry (DeepPIV) laser imaging system [ref], which allows for quantitative imaging of marine organisms and particles.

  2. 2.

    The EyeRIS plenoptic camera system, which enables 3D imaging and tracking of marine life.

  3. 3.

    The rotary actuated dodecahedron (RAD-2) for specimen encapsulation and tissue sampling3,9. RAD-2 is a 12-sided encapsulation device that uses a rotary actuator to open and close. It is constructed from anodized aluminum, stainless steel, and Hydex 301 for corrosion resistance. The central face is instrumented with a tissue sampling device driven by a thruster motor and an inlet port for pumping sampled tissue into a Labs Suspended Particulate Rosette (SuPR) sampling system10.

These tools allowed the team to conduct visual identification, quantitative imaging, and targeted sampling of marine organisms during the dives. In the data reported here only one specimen had sufficient imaging data for 3D reconstruction, the siphonophore Halistemma sp. as indicated in Table 1. In the companion publication four animals were reported with innovative imaging and digital modeling of each3.

Table 1 Data types and locations for each specimen.

A shipboard-mounted echosounder system (EK60) was used to locate high biodensity layers in the water column for focused ROV exploration. Once organisms of interest were identified and imaged using the ROV’s onboard 4k video camera and specialized imaging systems, targeted sampling was conducted using the RAD-2 sampler. The RAD-2 sampler collected tissue samples in conjunction with a McLane Labs SuPR Sampler10. The SuPR sampler uses a high-flow pump to sample a specified volume of water onto 100μm mesh filters with injection of a preservative (RNALater in this case) in situ.

Sample processing

Post-capture, tissue samples were preserved in situ using a custom RNA preservative (described as RNALater) according to the formulation by Malmstrom, 201511 to stabilize RNA for downstream sequencing. The SuPR sampler was preloaded with 10 L of RNALater per dive. After encapsulation and maceration, 0.5–1 L of RNALater was pumped through the filters to replace seawater and preserve the sample, taking on average 3:17 minutes.

Samples not preserved in situ were transferred to RNALater upon recovery of the ROV on deck. Large tissue fragments were placed directly into cryotubes with RNALater. If tissue could not be easily separated from filters, the entire filter was placed in a 50 mL Falcon tube with RNALater. Samples were incubated at 4 °C overnight for penetration then transferred to −80 °C. Samples in RNALater were transported to the Bigelow Laboratory for Ocean Sciences in East Boothbay, Maine for further processing. For RNA extraction, 100 mg tissue subsamples were lysed in Trizol reagent. Total RNA was then extracted using spin columns following the manufacturer’s instructions for the TRIzol Plus RNA Purification Kit (Invitrogen, Waltham, MA). RNA integrity was checked using an Agilent TapeStation with RNA Integrity Number (RIN) values reported in Table 2 and TapeStation data files available in a Zenodo data repository12.

Table 2 RNA and sequence statistics for each specimen.

Sequencing

To generate reference transcriptomes, long-read transcriptome sequencing was conducted using the Pacific Biosciences long read Isoform-Sequencing (Iso-Seq) method. Iso-Seq allows for the sequencing of full-length transcripts, facilitating isoform discovery and annotation. Iso-Seq read files can function as reference assemblies13. Functional completeness of the transcriptomes was assessed using BUSCO v5.2.2 against the metazoan database14.

Short-read, mRNA sequencing was conducted on the Illumina HiSeq platform for quantitative analysis of samples with high quality RNA, and to recover whole-transcriptome information on samples where the RNA quantity was too low for long read sequencing, or the RIN number indicated degradation such that long read sequencing would likely be unsuccessful (Table 2). Sequencing was conducted at Genewiz (Azenta, Genewiz, South Plainfield, NJ) with the goal of generating high integrity, functionally complete transcriptomic datasets.

Data Records

For each specimen we have uploaded raw sequence data to the NCBI sequence read archive (SRA)15. Those data are whole transcriptome data in the form of short read paired-end.fastq files for Illumina paired end (PE) libraries, and in the form of high quality (hq) and low quality (lq) long read.fasta files for transcriptomes sequenced using the Pacific Biosciences Iso-Seq method. The choice of sequencing platform was guided by RNA quality metrics and the total quantity of RNA recovered from each specimen. The long read-hq files can be used as reference transcriptome assemblies for the specimen without further processing. The short-read data requires de novo transcriptome assembly for full analyses. For the larvacean Bathochordaeus mcnutti (specimen ID RAD2_055) we created a long-read reference assembly as well as two short read quantitative datasets, which are technical replicates: sequence data generated from the same RNA pool. The short-read data can be used for quantitative assessment of highly and lowly expressed genes within the animal, and may provide insight into processes like mucus production for the larvacean “house”16. Also included are links to 4k video files for each animal and advanced imaging data where available17,18,19,20,21,22,23.

Technical Validation

Prior to each sequencing run, the RNA was subject to electrophoretic analysis on an Agilent TapeStation. We saw that specimens that were not preserved in situ tended to have lower RNA integrity numbers (RINs) with evidence of RNA degradation. We pursued short read sequencing of those samples to build a reference transcriptome, even from partially degraded RNA. Samples that were successfully preserved with RNALater at depth had higher RIN values and were subject to long-read sequencing. Because of the high RNA quality, high RNA abundance, and uniqueness of the sample, we pursued long and short read sequencing of specimen RAD2-055, from the larvacean B. mcnutti. For specimens RAD-005, RAD2-026, and RAD2-027, the read depth is sufficient for de novo assembly of reference transcriptomes for those animals. For RAD2-054, RAD2-055, RAD2-059, and RAD2-061, the long-read transcriptome data can serve as a reference assembly.

Molecular identification

Additional validation included manual assembly of marker gene sequences from short read datasets by finding an exact match to a marker seed and extending that match forward and backward, and selection of marker gene sequences from long read data to verify the organism identifications. Marker gene sequences can be found in the file

MolecularBarcodeSeqs_forValidation.fasta12.

Short read file molecular identifications

The COX1 sequence from RAD2-005 has 99.83% sequence identity (587/588 matches, 0 gaps) to Atolla vanhoeffeni isolate V3709ss2 COX1 gene, GenBank ID: OM214497.1

The COX1 sequence from RAD2-026 has 92% sequence identity (629/682 matches, 0 gaps) to Apolemia sp. BO-2009 voucher Hy100.2.1 COX1 gene, GenBank ID: GQ119954.124

The 18 S sequence from RAD2-027 has 99.78% sequence identity (1783/1787 matches, 0 gaps) to an 18 S barcode from Praya sp. AGC-2001, GenBank ID: AF358067.125. By the seed-extension method with the short reads, we could not find a hydrozoan COX1 sequence in the RAD2-027 data. We did uncover a COX1 sequence with a 98.98% sequence identity (676/683 matches, 0 gaps) to the COX1 gene from the krill Euphausia pacifica voucher Eu27.12.1, GenBank ID: MT826933.126, which is a presumed prey in this case. Upon full transcriptome assembly, the siphonophore and prey transcriptomes should be carefully disentangled prior to functional analyses.

Long read assembly molecular identifications

The closest match to the COX1 gene from specimen RAD2-054 in the NCBI nucleotide database is to a Physonectae sp. mitochondrial sequence, with 91% identity (1414/1555, 6 gaps) to GenBank ID: OQ957211.15. We have a tentative identification for that individual as Halistemma sp. based on morphology, but more molecular data from the group is needed.

The closest match to the COX1 gene from specimen RAD2-055 in the NCBI nucleotide database is to Bathochordaeus mcnutti isolate V3647ss1 COX1 gene, with 99.75% identity (399/400 matches, 0 gaps), GenBank ID: KX599264.127

The closest match to the COX1 gene from specimen RAD2-059 in the NCBI nucleotide database is to Pyrosomatidae sp. USNM IZ 1449850 COX1 gene, with 100% identity (658/658 matches, 0 gaps), GenBank ID: OK209615.1.

The closest match to the 18 S gene from specimen RAD2-061 in the NCBI nucleotide database is to the ctenophore Mnemiopsis leidyi 18 S ribosomal RNA gene, with 98.31% identity (1921/1954 matches, 10 gaps), GenBank ID: AF293700.128. Although we can identify putative COX1 genes from specimen RAD2-061 by searching the long-read assembly with a protein sequence query (tblastn), there are no close matches in the NCBI nucleotide database to the resultant sequences from RAD2-061.

Functional completeness

For the four long read RNA sequencing datasets, we completed BUSCO analysis14,29 to assess functional completeness of the transcriptomic data. We see an average BUSCO completeness of 73% against the metazoan database (Fig. 2), which is similar to completeness scores of the transcriptomes of other deep-sea gelatinous zooplankton3.

Fig. 2
Fig. 2
Full size image

BUSCO completeness metrics for each long-read dataset. Each high-quality long-read sequence file can be treated as a transcriptome assembly. BUSCO was run on each high-quality long-read sequence file using the metazoan BUSCO set to produce these completeness assessments.