Transcriptome sequencing of seven deep marine invertebrates

Burns, John A.; Daniels, Joost; Becker, Kaitlyn P.; Casagrande, David; Roberts, Paul; Orenstein, Eric; Vogt, Daniel M.; Teoh, Zhi Ern; Wood, Ryan; Yin, Alexander H.; Genot, Baptiste; Wood, Robert J.; Katija, Kakani; Phillips, Brennan T.; Gruber, David F.

doi:10.1038/s41597-024-03533-4

Download PDF

Data Descriptor
Open access
Published: 24 June 2024

Transcriptome sequencing of seven deep marine invertebrates

John A. Burns ORCID: orcid.org/0000-0002-2348-8438¹,
Joost Daniels²,
Kaitlyn P. Becker³,
David Casagrande⁴,
Paul Roberts²,
Eric Orenstein²,
Daniel M. Vogt³,
Zhi Ern Teoh⁵,
Ryan Wood⁵,
Alexander H. Yin⁴,
Baptiste Genot¹,
Robert J. Wood ORCID: orcid.org/0000-0001-7969-038X³,
Kakani Katija ORCID: orcid.org/0000-0002-7249-0147²,
Brennan T. Phillips⁴ &
…
David F. Gruber ORCID: orcid.org/0000-0001-9041-2911⁶

Scientific Data volume 11, Article number: 679 (2024) Cite this article

3852 Accesses
1 Citations
21 Altmetric
Metrics details

Subjects

Abstract

We present 4k video and whole transcriptome data for seven deep-sea invertebrate animals collected in the Eastern Pacific Ocean during a research expedition onboard the Schmidt Ocean Institute’s R/V Falkor in August of 2021. The animals include one jellyfish (Atolla sp.), three siphonophores (Apolemia sp., Praya sp., and Halistemma sp.), one larvacean (Bathochordaeus mcnutti), one tunicate (Pyrosomatidae sp.), and one ctenophore (Lampocteis sp.). Four of the animals were sequenced with long-read RNA sequencing technology, such that the reads themselves define a reference assembly for those animals. The larvacean tissues were successfully preserved in situ and has paired long-read reference data and short read quantitative transcriptomic data for within-specimen analyses of gene expression. Additionally, for three animals we provide quantitative image data, and a 3D model for one siphonophore. The paired image and transcriptomic data can be used for species identification, species description, and reference genetic data for these deep-sea animals.

Spatiotemporal faunal connectivity across global sea floors

Article 23 July 2025

De novo transcriptome reconstruction in aquacultured early life stages of the cephalopod Octopus vulgaris

Article Open access 08 October 2022

Global perspective of environmental distribution and diversity of Perkinsea (Alveolata) explored by a meta-analysis of eDNA surveys

Article Open access 17 November 2023

Background & Summary

The ocean’s midwater: away from coasts, below the top 200 m of water, and above the ocean floor is the largest contiguous ecosystem on earth, yet, it remains one of the least explored for biodiversity¹. It is home to a large diversity of pelagic animals that live suspended in a vast volume of water that encompasses over 1 billion km³ globally. Many of those animals are deep-sea gelatinous zooplankton (e.g., jellyfish, ctenophores, salps, siphonophores) that are particularly difficult to approach, observe, sample, and preserve owing to their extremely soft tissues².

Gelatinous animals of the midwater harbour novel biology that can be unveiled through 3D modelling and whole genome and transcriptome sequencing³. They offer vital ecosystem services, establishing connections to the global food web, fisheries, and carbon sequestration in the deep sea, but estimates from the deep sea are lacking⁴. Deep sea gelatinous animals are underrepresented in genomic and transcriptomic datasets^2,5,6,7,8.

The combination of whole transcriptomic genetic data coupled with high quality qualitative and quantitative image data allows for in depth exploration of the biology and evolution of deep-sea animals³. Here we present image and whole transcriptome data for 7 deep sea gelatinous animals (Fig. 1) that were collected as part of a research project that involved development of new technologies and elaboration of an integrated workflow to capture in situ image and genetic data from delicate gelatinous zooplankton³. The animals include 3 siphonophores: Praya sp., Apolemia sp., and Halistemma sp.; one jellyfish, Atolla sp.; one larvacean, Bathochordaeus mcnutti; one tunicate, Pyrosoma sp.; and one ctenophore, Lampocteis sp. Each dataset adds new information that either enriches available sequence and image data for the group, or adds the first genetic data available for the species.

Methods

Sample collection

In August 2021, a ten-day integrated expedition aboard the Schmidt Ocean Institute’s R/V Falkor was conducted on the Pacific Ocean off the coast of San Diego, CA³. The expedition involved seven dives of the remotely operated vehicle (ROV) SuBastian, each lasting approximately eight hours. Launches took place in the mid-afternoon local time, with recovery occurring towards midnight to capture the diel migration patterns of ocean animals.

The ROV SuBastian was equipped with three innovative systems for midwater exploration:

1.
The Deep Particle Image Velocimetry (DeepPIV) laser imaging system [ref], which allows for quantitative imaging of marine organisms and particles.
2.
The EyeRIS plenoptic camera system, which enables 3D imaging and tracking of marine life.
3.
The rotary actuated dodecahedron (RAD-2) for specimen encapsulation and tissue sampling^3,9. RAD-2 is a 12-sided encapsulation device that uses a rotary actuator to open and close. It is constructed from anodized aluminum, stainless steel, and Hydex 301 for corrosion resistance. The central face is instrumented with a tissue sampling device driven by a thruster motor and an inlet port for pumping sampled tissue into a Labs Suspended Particulate Rosette (SuPR) sampling system¹⁰.

These tools allowed the team to conduct visual identification, quantitative imaging, and targeted sampling of marine organisms during the dives. In the data reported here only one specimen had sufficient imaging data for 3D reconstruction, the siphonophore Halistemma sp. as indicated in Table 1. In the companion publication four animals were reported with innovative imaging and digital modeling of each³.

Table 1 Data types and locations for each specimen.

Full size table

A shipboard-mounted echosounder system (EK60) was used to locate high biodensity layers in the water column for focused ROV exploration. Once organisms of interest were identified and imaged using the ROV’s onboard 4k video camera and specialized imaging systems, targeted sampling was conducted using the RAD-2 sampler. The RAD-2 sampler collected tissue samples in conjunction with a McLane Labs SuPR Sampler¹⁰. The SuPR sampler uses a high-flow pump to sample a specified volume of water onto 100μm mesh filters with injection of a preservative (RNALater in this case) in situ.

Sample processing

Post-capture, tissue samples were preserved in situ using a custom RNA preservative (described as RNALater) according to the formulation by Malmstrom, 2015¹¹ to stabilize RNA for downstream sequencing. The SuPR sampler was preloaded with 10 L of RNALater per dive. After encapsulation and maceration, 0.5–1 L of RNALater was pumped through the filters to replace seawater and preserve the sample, taking on average 3:17 minutes.

Samples not preserved in situ were transferred to RNALater upon recovery of the ROV on deck. Large tissue fragments were placed directly into cryotubes with RNALater. If tissue could not be easily separated from filters, the entire filter was placed in a 50 mL Falcon tube with RNALater. Samples were incubated at 4 °C overnight for penetration then transferred to −80 °C. Samples in RNALater were transported to the Bigelow Laboratory for Ocean Sciences in East Boothbay, Maine for further processing. For RNA extraction, 100 mg tissue subsamples were lysed in Trizol reagent. Total RNA was then extracted using spin columns following the manufacturer’s instructions for the TRIzol Plus RNA Purification Kit (Invitrogen, Waltham, MA). RNA integrity was checked using an Agilent TapeStation with RNA Integrity Number (RIN) values reported in Table 2 and TapeStation data files available in a Zenodo data repository¹².

Table 2 RNA and sequence statistics for each specimen.

Full size table

Sequencing

To generate reference transcriptomes, long-read transcriptome sequencing was conducted using the Pacific Biosciences long read Isoform-Sequencing (Iso-Seq) method. Iso-Seq allows for the sequencing of full-length transcripts, facilitating isoform discovery and annotation. Iso-Seq read files can function as reference assemblies¹³. Functional completeness of the transcriptomes was assessed using BUSCO v5.2.2 against the metazoan database¹⁴.

Short-read, mRNA sequencing was conducted on the Illumina HiSeq platform for quantitative analysis of samples with high quality RNA, and to recover whole-transcriptome information on samples where the RNA quantity was too low for long read sequencing, or the RIN number indicated degradation such that long read sequencing would likely be unsuccessful (Table 2). Sequencing was conducted at Genewiz (Azenta, Genewiz, South Plainfield, NJ) with the goal of generating high integrity, functionally complete transcriptomic datasets.

Data Records

For each specimen we have uploaded raw sequence data to the NCBI sequence read archive (SRA)¹⁵. Those data are whole transcriptome data in the form of short read paired-end.fastq files for Illumina paired end (PE) libraries, and in the form of high quality (hq) and low quality (lq) long read.fasta files for transcriptomes sequenced using the Pacific Biosciences Iso-Seq method. The choice of sequencing platform was guided by RNA quality metrics and the total quantity of RNA recovered from each specimen. The long read-hq files can be used as reference transcriptome assemblies for the specimen without further processing. The short-read data requires de novo transcriptome assembly for full analyses. For the larvacean Bathochordaeus mcnutti (specimen ID RAD2_055) we created a long-read reference assembly as well as two short read quantitative datasets, which are technical replicates: sequence data generated from the same RNA pool. The short-read data can be used for quantitative assessment of highly and lowly expressed genes within the animal, and may provide insight into processes like mucus production for the larvacean “house”¹⁶. Also included are links to 4k video files for each animal and advanced imaging data where available^{17,18,19,20,21,22,23}.

Technical Validation

Prior to each sequencing run, the RNA was subject to electrophoretic analysis on an Agilent TapeStation. We saw that specimens that were not preserved in situ tended to have lower RNA integrity numbers (RINs) with evidence of RNA degradation. We pursued short read sequencing of those samples to build a reference transcriptome, even from partially degraded RNA. Samples that were successfully preserved with RNALater at depth had higher RIN values and were subject to long-read sequencing. Because of the high RNA quality, high RNA abundance, and uniqueness of the sample, we pursued long and short read sequencing of specimen RAD2-055, from the larvacean B. mcnutti. For specimens RAD-005, RAD2-026, and RAD2-027, the read depth is sufficient for de novo assembly of reference transcriptomes for those animals. For RAD2-054, RAD2-055, RAD2-059, and RAD2-061, the long-read transcriptome data can serve as a reference assembly.

Molecular identification

Additional validation included manual assembly of marker gene sequences from short read datasets by finding an exact match to a marker seed and extending that match forward and backward, and selection of marker gene sequences from long read data to verify the organism identifications. Marker gene sequences can be found in the file

MolecularBarcodeSeqs_forValidation.fasta¹².

Short read file molecular identifications

The COX1 sequence from RAD2-005 has 99.83% sequence identity (587/588 matches, 0 gaps) to Atolla vanhoeffeni isolate V3709ss2 COX1 gene, GenBank ID: OM214497.1

The COX1 sequence from RAD2-026 has 92% sequence identity (629/682 matches, 0 gaps) to Apolemia sp. BO-2009 voucher Hy100.2.1 COX1 gene, GenBank ID: GQ119954.1²⁴

The 18 S sequence from RAD2-027 has 99.78% sequence identity (1783/1787 matches, 0 gaps) to an 18 S barcode from Praya sp. AGC-2001, GenBank ID: AF358067.1²⁵. By the seed-extension method with the short reads, we could not find a hydrozoan COX1 sequence in the RAD2-027 data. We did uncover a COX1 sequence with a 98.98% sequence identity (676/683 matches, 0 gaps) to the COX1 gene from the krill Euphausia pacifica voucher Eu27.12.1, GenBank ID: MT826933.1²⁶, which is a presumed prey in this case. Upon full transcriptome assembly, the siphonophore and prey transcriptomes should be carefully disentangled prior to functional analyses.

Long read assembly molecular identifications

The closest match to the COX1 gene from specimen RAD2-054 in the NCBI nucleotide database is to a Physonectae sp. mitochondrial sequence, with 91% identity (1414/1555, 6 gaps) to GenBank ID: OQ957211.1⁵. We have a tentative identification for that individual as Halistemma sp. based on morphology, but more molecular data from the group is needed.

The closest match to the COX1 gene from specimen RAD2-055 in the NCBI nucleotide database is to Bathochordaeus mcnutti isolate V3647ss1 COX1 gene, with 99.75% identity (399/400 matches, 0 gaps), GenBank ID: KX599264.1²⁷

The closest match to the COX1 gene from specimen RAD2-059 in the NCBI nucleotide database is to Pyrosomatidae sp. USNM IZ 1449850 COX1 gene, with 100% identity (658/658 matches, 0 gaps), GenBank ID: OK209615.1.

The closest match to the 18 S gene from specimen RAD2-061 in the NCBI nucleotide database is to the ctenophore Mnemiopsis leidyi 18 S ribosomal RNA gene, with 98.31% identity (1921/1954 matches, 10 gaps), GenBank ID: AF293700.1²⁸. Although we can identify putative COX1 genes from specimen RAD2-061 by searching the long-read assembly with a protein sequence query (tblastn), there are no close matches in the NCBI nucleotide database to the resultant sequences from RAD2-061.

Functional completeness

For the four long read RNA sequencing datasets, we completed BUSCO analysis^14,29 to assess functional completeness of the transcriptomic data. We see an average BUSCO completeness of 73% against the metazoan database (Fig. 2), which is similar to completeness scores of the transcriptomes of other deep-sea gelatinous zooplankton³.

Code availability

No propriety codes were used to generate these data.

References

Webb, T. J., Berghe, E. V. & O’Dor, R. Biodiversity’s Big Wet Secret: The Global Distribution of Marine Biological Records Reveals Chronic Under-Exploration of the Deep Pelagic Ocean. PLOS ONE 5, e10223 (2010).
Article PubMed PubMed Central ADS Google Scholar
Haddock, S. H. A golden age of gelata: past and future research on planktonic ctenophores and cnidarians. Hydrobiologia 530, 549–556 (2004).
Google Scholar
Burns, J. A. et al. An in situ digital synthesis strategy for the discovery and description of ocean life. Science Advances 10, eadj4960 (2024).
Article CAS PubMed PubMed Central Google Scholar
Luo, J. Y. et al. Gelatinous Zooplankton-Mediated Carbon Flows in the Global Oceans: A Data-Driven Modeling Study. Global Biogeochemical Cycles 34, e2020GB006704 (2020).
Article CAS ADS Google Scholar
Ahuja, N. et al. Giants among Cnidaria: Large Nuclear Genomes and Rearranged Mitochondrial Genomes in Siphonophores. Genome Biol Evol 16, evae048 (2024).
Article PubMed PubMed Central Google Scholar
Jue, N. K. et al. Rapid Evolutionary Rates and Unique Genomic Signatures Discovered in the First Reference Genome for the Southern Ocean Salp, Salpa thompsoni (Urochordata, Thaliacea). Genome Biology and Evolution 8, 3171–3186 (2016).
Article PubMed PubMed Central Google Scholar
Haddock, S. H. D. & Choy, C. A. Life in the Midwater: The Ecology of Deep Pelagic Animals. Annual Review of Marine Science 16, 383–416 (2024).
Article PubMed ADS Google Scholar
Dunn, C. W., Leys, S. P. & Haddock, S. H. D. The hidden biology of sponges and ctenophores. Trends in Ecology & Evolution 30, 282–291 (2015).
Article Google Scholar
Teoh, Z. E. et al. Rotary-actuated folding polyhedrons for midwater investigation of delicate marine organisms. Science Robotics 3, eaat5276 (2018).
Article PubMed Google Scholar
Breier, J. A. et al. A large volume particulate and water multi-sampler with in situ preservation for microbial and biogeochemical studies. Deep Sea Research Part I: Oceanographic Research Papers 94, 195–206 (2014).
Article CAS ADS Google Scholar
Malmstrom, R. RNAlater Recipe. (2015).
Burns, J. Molecular markers for taxonomic validation of 7 deep water invertebrate animals. Zenodo https://doi.org/10.5281/zenodo.11586941 (2024).
Minio, A. et al. Iso-Seq Allows Genome-Independent Transcriptome Profiling of Grape Berry Development. G3 (Bethesda) 9, 755–767 (2019).
Article CAS PubMed Google Scholar
Manni, M., Berkeley, M. R., Seppey, M., Simao, F. A. & Zdobnov, E. M. BUSCO update: novel and streamlined workflows along with broader and deeper phylogenetic coverage for scoring of eukaryotic, prokaryotic, and viral genomes. Molecular Biology and Evolution 38(10):4647-4654 (2021).
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP471527 (2024).
Katija, K., Sherlock, R. E., Sherman, A. D. & Robison, B. H. New technology reveals the role of giant larvaceans in oceanic carbon cycling. Science Advances 3, e1602374 (2017).
Article PubMed PubMed Central ADS Google Scholar
Burns, J. & Phillips, B. RAD2-005 Atolla sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10987660 (2024).
Burns, J. & Phillips, B. RAD2-026 Apolemia sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10987828 (2024).
Burns, J. & Phillips, B. RAD2-027 Praya sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10988012 (2024).
Burns, J. & Phillips, B. RAD2-054 Halistemma sp. siphonophore video and image data. Zenodo https://doi.org/10.5281/zenodo.10988194 (2024).
Burns, J. & Phillips, B. RAD2-055 Bathochordaeus mcnutti, Giant Larvacean, video and image data. Zenodo https://doi.org/10.5281/zenodo.10988949 (2024).
Burns, J. & Phillips, B. RAD2-059 Pyrosomatidae sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10998462 (2024).
Burns, J. & Phillips, B. RAD2-061 Lampocteis sp. video and image data. Zenodo https://doi.org/10.5281/zenodo.10998479 (2024).
Ortman, B. D., Bucklin, A., Pagès, F. & Youngbluth, M. DNA Barcoding the Medusozoa using mtCOI. Deep Sea Research Part II: Topical Studies in Oceanography 57, 2148–2156 (2010).
Article CAS ADS Google Scholar
Collins, A. G. Phylogeny of Medusozoa and the evolution of cnidarian life cycles. Journal of Evolutionary Biology 15, 418–432 (2002).
Article Google Scholar
Bucklin, A. et al. Toward a global reference database of COI barcodes for marine zooplankton. Mar Biol 168, 78 (2021).
Article Google Scholar
Sherlock, R. E., Walz, K. R., Schlining, K. L. & Robison, B. H. Morphology, ecology, and molecular biology of a new species of giant larvacean in the eastern North Pacific: Bathochordaeus mcnutti sp. nov. Mar Biol 164, 20 (2017).
Article CAS PubMed Google Scholar
Podar, M., Haddock, S. H., Sogin, M. L. & Harbison, G. R. A molecular phylogenetic framework for the phylum Ctenophora using 18S rRNA genes. Mol Phylogenet Evol 21, 218–230 (2001).
Article CAS PubMed Google Scholar
Simão, F. A., Waterhouse, R. M., Ioannidis, P., Kriventseva, E. V. & Zdobnov, E. M. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics 31, 3210–3212 (2015).
Article PubMed Google Scholar

Download references

Acknowledgements

This work was supported by the following: Schmidt Ocean Institute grant “ROV-based 3D reality capture, specimen encapsulation, and tissue voucher sampling to explore and describe midwater biodiversity in the deep sea” to B.T.P., K.K., R.J.W., and D.F.G. and its corresponding research cruise “Designing the Future 2” (https://schmidtocean.org/cruise/designing-the-future-2/); David and Lucile Packard Foundation (to K.K.); and Gordon and Betty Moore Foundation (no. 7583 to K.K.). J.A.B. was supported in part by NSF OIA-1826734. The National Geographic Society provided support (grant no. SP 12-14) to R.J.W. and D.F.G and NSF Instrument Development for Biological Research Award no.1556164 to R.J.W. and no. 1556123 to D.F.G.

Author information

Authors and Affiliations

Bigelow Laboratory for Ocean Sciences, East Boothbay, ME, 04544, USA
John A. Burns & Baptiste Genot
Monterey Bay Aquarium Research Institute, Research and Development, Moss Landing, 95039, USA
Joost Daniels, Paul Roberts, Eric Orenstein & Kakani Katija
School of Engineering and Applied Sciences, Harvard University, Cambridge, MA, 02138, USA
Kaitlyn P. Becker, Daniel M. Vogt & Robert J. Wood
Department of Ocean Engineering, University of Rhode Island, 215 South Ferry Road, Narragansett, RI, 02882, USA
David Casagrande, Alexander H. Yin & Brennan T. Phillips
PA Consulting, Concord, MA, 01742, USA
Zhi Ern Teoh & Ryan Wood
Department of Natural Sciences, Baruch College, City University of New York and PhD Program in Biology, CUNY Graduate Center, New York, NY, 10010, USA
David F. Gruber

Authors

John A. Burns
View author publications
Search author on:PubMed Google Scholar
Joost Daniels
View author publications
Search author on:PubMed Google Scholar
Kaitlyn P. Becker
View author publications
Search author on:PubMed Google Scholar
David Casagrande
View author publications
Search author on:PubMed Google Scholar
Paul Roberts
View author publications
Search author on:PubMed Google Scholar
Eric Orenstein
View author publications
Search author on:PubMed Google Scholar
Daniel M. Vogt
View author publications
Search author on:PubMed Google Scholar
Zhi Ern Teoh
View author publications
Search author on:PubMed Google Scholar
Ryan Wood
View author publications
Search author on:PubMed Google Scholar
Alexander H. Yin
View author publications
Search author on:PubMed Google Scholar
Baptiste Genot
View author publications
Search author on:PubMed Google Scholar
Robert J. Wood
View author publications
Search author on:PubMed Google Scholar
Kakani Katija
View author publications
Search author on:PubMed Google Scholar
Brennan T. Phillips
View author publications
Search author on:PubMed Google Scholar
David F. Gruber
View author publications
Search author on:PubMed Google Scholar

Contributions

Conceptualization: D.M.V., B.G., K.K., R.J.W., J.A.B., R.W., Z.E.T., K.P.B., D.F.G. and BTP. Methodology: D.M.V., P.R., A.H.Y., B.G., K.K., R.J.W., J.A.B., R.W., Z.E.T., K.P.B., D.F.G., B.T.P. and J.D. Resources: D.M.V., D.C., B.G., K.K., J.A.B., R.W., K.P.B., D.F.G. and B.T.P. Validation: J.A.B., B.G., D.F.G. Project administration: D.M.V., D.C., K.K., R.J.W., J.A.B., Z.E.T., D.F.G. and B.T.P. Writing—original draft: J.A.B. Writing—review and editing: J.A.B., D.F.G and B.T.P. Data curation: K.K., J.A.B., E.O., B.T.P. and J.D. Supervision: K.K., R.J.W., J.A.B., D.F.G. and B.T.P. Visualization: K.K., R.J.W., J.A.B., Z.E.T., E.O., D.F.G., B.T.P. and J.D.

Corresponding authors

Correspondence to John A. Burns or David F. Gruber.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Burns, J.A., Daniels, J., Becker, K.P. et al. Transcriptome sequencing of seven deep marine invertebrates. Sci Data 11, 679 (2024). https://doi.org/10.1038/s41597-024-03533-4

Download citation

Received: 23 April 2024
Accepted: 14 June 2024
Published: 24 June 2024
Version of record: 24 June 2024
DOI: https://doi.org/10.1038/s41597-024-03533-4