Abstract
We present 4k video and whole transcriptome data for seven deep-sea invertebrate animals collected in the Eastern Pacific Ocean during a research expedition onboard the Schmidt Ocean Institute’s R/V Falkor in August of 2021. The animals include one jellyfish (Atolla sp.), three siphonophores (Apolemia sp., Praya sp., and Halistemma sp.), one larvacean (Bathochordaeus mcnutti), one tunicate (Pyrosomatidae sp.), and one ctenophore (Lampocteis sp.). Four of the animals were sequenced with long-read RNA sequencing technology, such that the reads themselves define a reference assembly for those animals. The larvacean tissues were successfully preserved in situ and has paired long-read reference data and short read quantitative transcriptomic data for within-specimen analyses of gene expression. Additionally, for three animals we provide quantitative image data, and a 3D model for one siphonophore. The paired image and transcriptomic data can be used for species identification, species description, and reference genetic data for these deep-sea animals.
Similar content being viewed by others
Background & Summary
The ocean’s midwater: away from coasts, below the top 200 m of water, and above the ocean floor is the largest contiguous ecosystem on earth, yet, it remains one of the least explored for biodiversity1. It is home to a large diversity of pelagic animals that live suspended in a vast volume of water that encompasses over 1 billion km3 globally. Many of those animals are deep-sea gelatinous zooplankton (e.g., jellyfish, ctenophores, salps, siphonophores) that are particularly difficult to approach, observe, sample, and preserve owing to their extremely soft tissues2.
Gelatinous animals of the midwater harbour novel biology that can be unveiled through 3D modelling and whole genome and transcriptome sequencing3. They offer vital ecosystem services, establishing connections to the global food web, fisheries, and carbon sequestration in the deep sea, but estimates from the deep sea are lacking4. Deep sea gelatinous animals are underrepresented in genomic and transcriptomic datasets2,5,6,7,8.
The combination of whole transcriptomic genetic data coupled with high quality qualitative and quantitative image data allows for in depth exploration of the biology and evolution of deep-sea animals3. Here we present image and whole transcriptome data for 7 deep sea gelatinous animals (Fig. 1) that were collected as part of a research project that involved development of new technologies and elaboration of an integrated workflow to capture in situ image and genetic data from delicate gelatinous zooplankton3. The animals include 3 siphonophores: Praya sp., Apolemia sp., and Halistemma sp.; one jellyfish, Atolla sp.; one larvacean, Bathochordaeus mcnutti; one tunicate, Pyrosoma sp.; and one ctenophore, Lampocteis sp. Each dataset adds new information that either enriches available sequence and image data for the group, or adds the first genetic data available for the species.
Image captures of the seven animals for which whole transcriptome, video, and image data are presented. The images are in situ captures from the 4k camera on the R/V SuBastian. Scale information is not possible on these image captures due to the differences in zoom, distance, and angle from the submarine for each animal. Additional data from the provided video captures can be used to get a sense of scale for each animal.
Methods
Sample collection
In August 2021, a ten-day integrated expedition aboard the Schmidt Ocean Institute’s R/V Falkor was conducted on the Pacific Ocean off the coast of San Diego, CA3. The expedition involved seven dives of the remotely operated vehicle (ROV) SuBastian, each lasting approximately eight hours. Launches took place in the mid-afternoon local time, with recovery occurring towards midnight to capture the diel migration patterns of ocean animals.
The ROV SuBastian was equipped with three innovative systems for midwater exploration:
-
1.
The Deep Particle Image Velocimetry (DeepPIV) laser imaging system [ref], which allows for quantitative imaging of marine organisms and particles.
-
2.
The EyeRIS plenoptic camera system, which enables 3D imaging and tracking of marine life.
-
3.
The rotary actuated dodecahedron (RAD-2) for specimen encapsulation and tissue sampling3,9. RAD-2 is a 12-sided encapsulation device that uses a rotary actuator to open and close. It is constructed from anodized aluminum, stainless steel, and Hydex 301 for corrosion resistance. The central face is instrumented with a tissue sampling device driven by a thruster motor and an inlet port for pumping sampled tissue into a Labs Suspended Particulate Rosette (SuPR) sampling system10.
These tools allowed the team to conduct visual identification, quantitative imaging, and targeted sampling of marine organisms during the dives. In the data reported here only one specimen had sufficient imaging data for 3D reconstruction, the siphonophore Halistemma sp. as indicated in Table 1. In the companion publication four animals were reported with innovative imaging and digital modeling of each3.
A shipboard-mounted echosounder system (EK60) was used to locate high biodensity layers in the water column for focused ROV exploration. Once organisms of interest were identified and imaged using the ROV’s onboard 4k video camera and specialized imaging systems, targeted sampling was conducted using the RAD-2 sampler. The RAD-2 sampler collected tissue samples in conjunction with a McLane Labs SuPR Sampler10. The SuPR sampler uses a high-flow pump to sample a specified volume of water onto 100μm mesh filters with injection of a preservative (RNALater in this case) in situ.
Sample processing
Post-capture, tissue samples were preserved in situ using a custom RNA preservative (described as RNALater) according to the formulation by Malmstrom, 201511 to stabilize RNA for downstream sequencing. The SuPR sampler was preloaded with 10 L of RNALater per dive. After encapsulation and maceration, 0.5–1 L of RNALater was pumped through the filters to replace seawater and preserve the sample, taking on average 3:17 minutes.
Samples not preserved in situ were transferred to RNALater upon recovery of the ROV on deck. Large tissue fragments were placed directly into cryotubes with RNALater. If tissue could not be easily separated from filters, the entire filter was placed in a 50 mL Falcon tube with RNALater. Samples were incubated at 4 °C overnight for penetration then transferred to −80 °C. Samples in RNALater were transported to the Bigelow Laboratory for Ocean Sciences in East Boothbay, Maine for further processing. For RNA extraction, 100 mg tissue subsamples were lysed in Trizol reagent. Total RNA was then extracted using spin columns following the manufacturer’s instructions for the TRIzol Plus RNA Purification Kit (Invitrogen, Waltham, MA). RNA integrity was checked using an Agilent TapeStation with RNA Integrity Number (RIN) values reported in Table 2 and TapeStation data files available in a Zenodo data repository12.
Sequencing
To generate reference transcriptomes, long-read transcriptome sequencing was conducted using the Pacific Biosciences long read Isoform-Sequencing (Iso-Seq) method. Iso-Seq allows for the sequencing of full-length transcripts, facilitating isoform discovery and annotation. Iso-Seq read files can function as reference assemblies13. Functional completeness of the transcriptomes was assessed using BUSCO v5.2.2 against the metazoan database14.
Short-read, mRNA sequencing was conducted on the Illumina HiSeq platform for quantitative analysis of samples with high quality RNA, and to recover whole-transcriptome information on samples where the RNA quantity was too low for long read sequencing, or the RIN number indicated degradation such that long read sequencing would likely be unsuccessful (Table 2). Sequencing was conducted at Genewiz (Azenta, Genewiz, South Plainfield, NJ) with the goal of generating high integrity, functionally complete transcriptomic datasets.
Data Records
For each specimen we have uploaded raw sequence data to the NCBI sequence read archive (SRA)15. Those data are whole transcriptome data in the form of short read paired-end.fastq files for Illumina paired end (PE) libraries, and in the form of high quality (hq) and low quality (lq) long read.fasta files for transcriptomes sequenced using the Pacific Biosciences Iso-Seq method. The choice of sequencing platform was guided by RNA quality metrics and the total quantity of RNA recovered from each specimen. The long read-hq files can be used as reference transcriptome assemblies for the specimen without further processing. The short-read data requires de novo transcriptome assembly for full analyses. For the larvacean Bathochordaeus mcnutti (specimen ID RAD2_055) we created a long-read reference assembly as well as two short read quantitative datasets, which are technical replicates: sequence data generated from the same RNA pool. The short-read data can be used for quantitative assessment of highly and lowly expressed genes within the animal, and may provide insight into processes like mucus production for the larvacean “house”16. Also included are links to 4k video files for each animal and advanced imaging data where available17,18,19,20,21,22,23.
Technical Validation
Prior to each sequencing run, the RNA was subject to electrophoretic analysis on an Agilent TapeStation. We saw that specimens that were not preserved in situ tended to have lower RNA integrity numbers (RINs) with evidence of RNA degradation. We pursued short read sequencing of those samples to build a reference transcriptome, even from partially degraded RNA. Samples that were successfully preserved with RNALater at depth had higher RIN values and were subject to long-read sequencing. Because of the high RNA quality, high RNA abundance, and uniqueness of the sample, we pursued long and short read sequencing of specimen RAD2-055, from the larvacean B. mcnutti. For specimens RAD-005, RAD2-026, and RAD2-027, the read depth is sufficient for de novo assembly of reference transcriptomes for those animals. For RAD2-054, RAD2-055, RAD2-059, and RAD2-061, the long-read transcriptome data can serve as a reference assembly.
Molecular identification
Additional validation included manual assembly of marker gene sequences from short read datasets by finding an exact match to a marker seed and extending that match forward and backward, and selection of marker gene sequences from long read data to verify the organism identifications. Marker gene sequences can be found in the file
MolecularBarcodeSeqs_forValidation.fasta12.
Short read file molecular identifications
The COX1 sequence from RAD2-005 has 99.83% sequence identity (587/588 matches, 0 gaps) to Atolla vanhoeffeni isolate V3709ss2 COX1 gene, GenBank ID: OM214497.1
The COX1 sequence from RAD2-026 has 92% sequence identity (629/682 matches, 0 gaps) to Apolemia sp. BO-2009 voucher Hy100.2.1 COX1 gene, GenBank ID: GQ119954.124
The 18 S sequence from RAD2-027 has 99.78% sequence identity (1783/1787 matches, 0 gaps) to an 18 S barcode from Praya sp. AGC-2001, GenBank ID: AF358067.125. By the seed-extension method with the short reads, we could not find a hydrozoan COX1 sequence in the RAD2-027 data. We did uncover a COX1 sequence with a 98.98% sequence identity (676/683 matches, 0 gaps) to the COX1 gene from the krill Euphausia pacifica voucher Eu27.12.1, GenBank ID: MT826933.126, which is a presumed prey in this case. Upon full transcriptome assembly, the siphonophore and prey transcriptomes should be carefully disentangled prior to functional analyses.
Long read assembly molecular identifications
The closest match to the COX1 gene from specimen RAD2-054 in the NCBI nucleotide database is to a Physonectae sp. mitochondrial sequence, with 91% identity (1414/1555, 6 gaps) to GenBank ID: OQ957211.15. We have a tentative identification for that individual as Halistemma sp. based on morphology, but more molecular data from the group is needed.
The closest match to the COX1 gene from specimen RAD2-055 in the NCBI nucleotide database is to Bathochordaeus mcnutti isolate V3647ss1 COX1 gene, with 99.75% identity (399/400 matches, 0 gaps), GenBank ID: KX599264.127
The closest match to the COX1 gene from specimen RAD2-059 in the NCBI nucleotide database is to Pyrosomatidae sp. USNM IZ 1449850 COX1 gene, with 100% identity (658/658 matches, 0 gaps), GenBank ID: OK209615.1.
The closest match to the 18 S gene from specimen RAD2-061 in the NCBI nucleotide database is to the ctenophore Mnemiopsis leidyi 18 S ribosomal RNA gene, with 98.31% identity (1921/1954 matches, 10 gaps), GenBank ID: AF293700.128. Although we can identify putative COX1 genes from specimen RAD2-061 by searching the long-read assembly with a protein sequence query (tblastn), there are no close matches in the NCBI nucleotide database to the resultant sequences from RAD2-061.
Functional completeness
For the four long read RNA sequencing datasets, we completed BUSCO analysis14,29 to assess functional completeness of the transcriptomic data. We see an average BUSCO completeness of 73% against the metazoan database (Fig. 2), which is similar to completeness scores of the transcriptomes of other deep-sea gelatinous zooplankton3.
BUSCO completeness metrics for each long-read dataset. Each high-quality long-read sequence file can be treated as a transcriptome assembly. BUSCO was run on each high-quality long-read sequence file using the metazoan BUSCO set to produce these completeness assessments.
Code availability
No propriety codes were used to generate these data.
References
Webb, T. J., Berghe, E. V. & O’Dor, R. Biodiversity’s Big Wet Secret: The Global Distribution of Marine Biological Records Reveals Chronic Under-Exploration of the Deep Pelagic Ocean. PLOS ONE 5, e10223 (2010).
Haddock, S. H. A golden age of gelata: past and future research on planktonic ctenophores and cnidarians. Hydrobiologia 530, 549–556 (2004).
Burns, J. A. et al. An in situ digital synthesis strategy for the discovery and description of ocean life. Science Advances 10, eadj4960 (2024).
Luo, J. Y. et al. Gelatinous Zooplankton-Mediated Carbon Flows in the Global Oceans: A Data-Driven Modeling Study. Global Biogeochemical Cycles 34, e2020GB006704 (2020).
Ahuja, N. et al. Giants among Cnidaria: Large Nuclear Genomes and Rearranged Mitochondrial Genomes in Siphonophores. Genome Biol Evol 16, evae048 (2024).
Jue, N. K. et al. Rapid Evolutionary Rates and Unique Genomic Signatures Discovered in the First Reference Genome for the Southern Ocean Salp, Salpa thompsoni (Urochordata, Thaliacea). Genome Biology and Evolution 8, 3171–3186 (2016).
Haddock, S. H. D. & Choy, C. A. Life in the Midwater: The Ecology of Deep Pelagic Animals. Annual Review of Marine Science 16, 383–416 (2024).
Dunn, C. W., Leys, S. P. & Haddock, S. H. D. The hidden biology of sponges and ctenophores. Trends in Ecology & Evolution 30, 282–291 (2015).
Teoh, Z. E. et al. Rotary-actuated folding polyhedrons for midwater investigation of delicate marine organisms. Science Robotics 3, eaat5276 (2018).
Breier, J. A. et al. A large volume particulate and water multi-sampler with in situ preservation for microbial and biogeochemical studies. Deep Sea Research Part I: Oceanographic Research Papers 94, 195–206 (2014).
Malmstrom, R. RNAlater Recipe. (2015).
Burns, J. Molecular markers for taxonomic validation of 7 deep water invertebrate animals. Zenodo https://doi.org/10.5281/zenodo.11586941 (2024).
Minio, A. et al. Iso-Seq Allows Genome-Independent Transcriptome Profiling of Grape Berry Development. G3 (Bethesda) 9, 755–767 (2019).
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Molecular Biology and Evolution 38(10):4647-4654 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP471527 (2024).
Katija, K., Sherlock, R. E., Sherman, A. D. & Robison, B. H. New technology reveals the role of giant larvaceans in oceanic carbon cycling. Science Advances 3, e1602374 (2017).
Burns, J. & Phillips, B. RAD2-005 Atolla sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10987660 (2024).
Burns, J. & Phillips, B. RAD2-026 Apolemia sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10987828 (2024).
Burns, J. & Phillips, B. RAD2-027 Praya sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10988012 (2024).
Burns, J. & Phillips, B. RAD2-054 Halistemma sp. siphonophore video and image data. Zenodo https://doi.org/10.5281/zenodo.10988194 (2024).
Burns, J. & Phillips, B. RAD2-055 Bathochordaeus mcnutti, Giant Larvacean, video and image data. Zenodo https://doi.org/10.5281/zenodo.10988949 (2024).
Burns, J. & Phillips, B. RAD2-059 Pyrosomatidae sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10998462 (2024).
Burns, J. & Phillips, B. RAD2-061 Lampocteis sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10998479 (2024).
Ortman, B. D., Bucklin, A., Pagès, F. & Youngbluth, M. DNA Barcoding the Medusozoa using mtCOI. Deep Sea Research Part II: Topical Studies in Oceanography 57, 2148–2156 (2010).
Collins, A. G. Phylogeny of Medusozoa and the evolution of cnidarian life cycles. Journal of Evolutionary Biology 15, 418–432 (2002).
Bucklin, A. et al. Toward a global reference database of COI barcodes for marine zooplankton. Mar Biol 168, 78 (2021).
Sherlock, R. E., Walz, K. R., Schlining, K. L. & Robison, B. H. Morphology, ecology, and molecular biology of a new species of giant larvacean in the eastern North Pacific: Bathochordaeus mcnutti sp. nov. Mar Biol 164, 20 (2017).
Podar, M., Haddock, S. H., Sogin, M. L. & Harbison, G. R. A molecular phylogenetic framework for the phylum Ctenophora using 18S rRNA genes. Mol Phylogenet Evol 21, 218–230 (2001).
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Acknowledgements
This work was supported by the following: Schmidt Ocean Institute grant “ROV-based 3D reality capture, specimen encapsulation, and tissue voucher sampling to explore and describe midwater biodiversity in the deep sea” to B.T.P., K.K., R.J.W., and D.F.G. and its corresponding research cruise “Designing the Future 2” (https://schmidtocean.org/cruise/designing-the-future-2/); David and Lucile Packard Foundation (to K.K.); and Gordon and Betty Moore Foundation (no. 7583 to K.K.). J.A.B. was supported in part by NSF OIA-1826734. The National Geographic Society provided support (grant no. SP 12-14) to R.J.W. and D.F.G and NSF Instrument Development for Biological Research Award no.1556164 to R.J.W. and no. 1556123 to D.F.G.
Author information
Authors and Affiliations
Contributions
Conceptualization: D.M.V., B.G., K.K., R.J.W., J.A.B., R.W., Z.E.T., K.P.B., D.F.G. and BTP. Methodology: D.M.V., P.R., A.H.Y., B.G., K.K., R.J.W., J.A.B., R.W., Z.E.T., K.P.B., D.F.G., B.T.P. and J.D. Resources: D.M.V., D.C., B.G., K.K., J.A.B., R.W., K.P.B., D.F.G. and B.T.P. Validation: J.A.B., B.G., D.F.G. Project administration: D.M.V., D.C., K.K., R.J.W., J.A.B., Z.E.T., D.F.G. and B.T.P. Writing—original draft: J.A.B. Writing—review and editing: J.A.B., D.F.G and B.T.P. Data curation: K.K., J.A.B., E.O., B.T.P. and J.D. Supervision: K.K., R.J.W., J.A.B., D.F.G. and B.T.P. Visualization: K.K., R.J.W., J.A.B., Z.E.T., E.O., D.F.G., B.T.P. and J.D.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Burns, J.A., Daniels, J., Becker, K.P. et al. Transcriptome sequencing of seven deep marine invertebrates. Sci Data 11, 679 (2024). https://doi.org/10.1038/s41597-024-03533-4
Received:
Accepted:
Published:
Version of record:
DOI: https://doi.org/10.1038/s41597-024-03533-4




