Full-length transcriptome analysis of a bloom-forming dinoflagellate Scrippsiella acuminata (Dinophyceae)

Li, Fengting; Yue, Caixia; Deng, Yunyan; Tang, Ying Zhong

doi:10.1038/s41597-025-04699-1

Download PDF

Data Descriptor
Open access
Published: 27 February 2025

Full-length transcriptome analysis of a bloom-forming dinoflagellate Scrippsiella acuminata (Dinophyceae)

Fengting Li^1,2,3,
Caixia Yue¹^nAff4,
Yunyan Deng^1,2,3 &
…
Ying Zhong Tang^1,2,3

Scientific Data volume 12, Article number: 352 (2025) Cite this article

3107 Accesses
4 Citations
6 Altmetric
Metrics details

Subjects

Abstract

The dinoflagellate Scrippsiella acuminata (formerly Scrippsiella trochoidea) is a cosmopolitan species commonly forms dense blooms worldwide. It also has been adopted as representative species for life history studies on dinoflagellates. The peculiar genetic features of its large nuclear genome make the whole-genome sequencing highly challenging. Herein, we employed single-molecule real-time (SMRT) sequencing technology to acquire the full-length transcriptome of S. acuminata at different life stages (vegetative cells at different growth stages and resting cysts). A total of 51.44 Gb clean SMRT sequencing reads were generated, which finally yielded 226,117 non-redundant full-length non-chimeric (FLNC) reads. Among them, 78,393 long non-coding RNA (lncRNA), 550 putative transcription factors (TF) members from 19 TF families, 67,212 simple sequence repeats (SSRs), and 116,417 protein coding sequences (CDSs) were identified. This is the first report on full-length transcriptome of Thoracosphaeraceae in Dinophyceae, which is valuable for future genome annotation in this order. Our dataset provides a reference transcriptomic background for life history studies on dinoflagellates and will contribute to understanding molecular mechanisms underlying S. acuminata blooms.

Full-length transcriptome analysis of a bloom-forming dinoflagellate Prorocentrum shikokuense (Dinophyceae)

Article Open access 25 April 2024

One high quality genome and two transcriptome datasets for new species of Mantamonas, a deep-branching eukaryote clade

Article Open access 09 September 2023

Chromosome-level genome assembly and annotation of the sharpsnout seabream (Diplodus puntazzo)

Article Open access 04 April 2025

Background & Summary

Dinoflagellates are an ancient¹ and diverse group of unicellular eukaryotes, belonging to the infrakingdom Alveolata that also comprises the phyla Ciliophora and Apicomplexa². As most important eukaryotic primary producers in the ocean other than diatoms, they are pivotal contributors to primary productivity and food webs of coastal marine ecosystems and involved in global carbon fixation, element cycles, and oxygen production³. Some members in this lineage have drawn great attention owing to their mutualism relationship with other living organisms, such as reef-building corals and invertebrates⁴. This group also gained increasing attentions due to being the most common causative agents of harmful algal blooms (HABs) worldwide, nearly 200 out of ~2300 dinoflagellate species account for ~75% of HABs events⁵, ~40% of all HABs-causing species⁶. Dinoflagellates also include the highest number of toxic species among marine phytoplankton⁷, which directly poison animals and indirectly threaten human health via consumption of contaminated seafood. The non-toxic dinoflagellate HABs also cause seawater discolorations, oxygen depletion and mucilage duo to high biomass, and thus negatively affect coastal ecosystems and human activities⁷. Resting cyst is well-known as the dormant stage in the life history of dinoflagellates^8,9, which have many structural, functional, and adaptive features analogous to the seeds of higher plants^8,10,11,12. Resting cysts play pivotal roles in the biology and ecology of dinoflagellates, particularly in the processes of HABs, as they are associated with genetic recombination, bloom initiation, bloom termination, bloom recurrence, and geographic expansion^9,13,14,15. The abundance and distribution of resting cysts in marine sediments have been used to empirically predict the intensity and extent of the forthcoming blooms^16,17.

Genome size varies considerably in dinoflagellates (~3–245 Gb)², which represent 1–80-fold of a human haploid genome and is larger than the largest genomes of animal (lungfish, 130 Gb)¹⁸ and plant (Paris japonica, 149 Gb)¹⁹. Over half a century of extensive research on dinoflagellate biology reveals that they defy many cytological and genomic characteristics commonly attributed to eukaryotic cell biology and function²⁰, such as high copy numbers of genes (e.g. rRNA gene) in a single cell, high methylated nucleotides, high portion (>35%) of thymine replaced by hydroxumethyluracil, and permanently condensed chromosomes throughout cell cycle²¹. Genomic approaches to study dinoflagellates are usually restricted by their extremely large nuclear genomes and peculiar genetic features^4,20. Until now, available high-quality dinoflagellate genome data are restricted to symbiotic or parasitic species with relatively small genome-sized (0.12–4.8 G bp)^4,20,22.

The thecate dinoflagellate Scrippsiella acuminata (formerly S. trochoidea²³) is a cosmopolitan species²⁴ that commonly forms dense blooms worldwide and has been shown to cause rapid lethal effects on shellfish (Crassostrea virginica and Mercenaria mercenaria) larvae²⁵. Due to easily produce resting cysts both in laboratory and field, S. acuminata has been adopted as representative species for studies on dinoflagellates life history^{11,26,27,28,29,30}. The RNA-Seq (high-throughput RNA-sequencing) has been employed to unravel the molecular mechanisms underpinning the life stage transitions of S. acuminata with aim to gain insights into how external environmental factors induce resting cysts formation and/or blooms initiation^11,26,30. Meanwhile, transcriptomic approach has been extensively used to explore the mechanisms of S. acuminata blooms^31,32. However, the fundamental limitation of these work is short read lengths (Table 1), which makes them difficult to accurately capture complete transcripts due to the complex genome structure. The full-length transcriptome sequencing offers a practical avenue and better opportunity to decipher the physiological mechanisms of these non-model but ecologically important species without the reference genome^33,34.

Table 1 Information on the current available S. acuminata transcriptomes.

Full size table

In this study, we obtained the full-length transcriptome of S. acuminata at different life stages (vegetative cells at different growth stages and resting cysts) by SMRT sequencing on the Pacific Biosciences Sequel Platform (Fig. 1). Totally, 51.44 Gb of clean SMRT sequencing reads were generated, resulting in 596,485 circular consensus (CCS) reads with an average length of 3,726 bp. Among these, 579,919 (97.2%) were identified as FLNC reads. After clustering, error correction, and redundancy removal, 226,117 (37.9%) non-redundant FLNC reads were obtained, with 130,134 (57.55%) successfully annotated against five public databases. In addition, a total of 116,417 open reading frames (ORFs) with protein coding sequences were identified. Furthermore, a total of 78,393 lncRNAs, 550 putative TF members from 19 TF families, and 67,212 SSRs were identified, respectively. Our work provided the first full-length transcripts set of Thoracosphaeraceae in Dinophyceae, which will contribute to future genome annotation in this order. The dataset is significant for advancing life history studies on dinoflagellates and valuable for elucidating mechanisms of S. acuminata HABs in the coastal environments.

Methods

Algal maintenance

Scrippsiella acuminata (strain IOCAS-St-1) used in this study was originally isolated from the Yellow Sea, China, and maintained in the Marine Biological Culture Collection Centre, Institute of Oceanology, Chinese Academy of Sciences. The culture was maintained in f/2-Si medium³⁵, prepared using pre-filtered (0.22 μm membrane filter; Millipore, Billerica, MA, USA) and autoclaved (121 °C for 30 minutes) natural seawater, with a salinity range of 32–33. To prevent bacterial contamination, the medium was supplemented with a penicillin-streptomycin solution (containing 10,000 I.U. of penicillin and 10,000 µg·mL⁻¹ of streptomycin, Solarbio, Beijing, China) to achieve final concentrations of 300 I.U./mL of penicillin and 300 µg·mL⁻¹ of streptomycin after inoculation. The antibiotic solution was added every 7 days throughout the experiment. Cultures were kept in an incubator at 20 ± 1 °C with a 12-hour light/dark cycle, receiving a light intensity of 100 µmol photons m⁻²·s⁻¹.

Preparation and collection of S. acuminata at different life stages

Three samples were used for library construction: vegetative cells at exponential growth stage, vegetative cells at stationary growth stage and resting cysts. For vegetative cells preparation, three replicate 500 mL flasks, each containing 300 mL of medium, were inoculated with S. acuminata cultures at an initial cell density of approximately 1 × 10³ cells per millilitre under consistent culture maintenance conditions. The experiments lasted 16 days, 0.5 mL of culture sample (3 replicates) was taken every day and fixed with Lugol’s solution (3% final concentration) for enumeration under an inverted microscope using a 1 mL Sedgewick Rafter counting chambers. The growth curve was determined via daily cell counting. The growth stages of vegetative cells were determined according to growth curve (Fig. 2). Vegetative cells at the exponential growth stage (Day 8; the day of inoculation was recorded as Day 0) and stationary growth stage (Day 14) were harvested for further experiments.

Resting cysts were produced following the method described by Yue et al.³⁶, and briefed as follows. Vegetative cells at the exponential growth stage were cultured in cell culture flasks (Nest, US; 72 cm² cell culture flasks) for resting cyst production. These were incubated under the previously mentioned maintenance conditions, with the addition of f/2-Si medium diluted to one-thousandth of the nitrogen (N) and phosphorus (P) nutrients³⁶. Resting cysts formation was monitored every day using an inverted microscope (Olympus IX73). There were clear distinctions in morphological features between vegetative cells and resting cysts of S. acuminata (Fig. 3), which allowed a swift judgement via light microscopy between them. Resting cysts were generally collected from cultures that had been inoculated for over 14 days. The obtained resting cysts were washed with sterile filtered seawater until no motile cells (vegetative cells or planozygotes) observed in the samples under microscope. All collected samples (vegetative cells at different growth stages and resting cysts) were concentrated via concentration, immediately frozen in liquid nitrogen, and then stored at −80 °C until RNA extraction.

RNA isolation and library construction

Total RNA was extracted by using RNeasy^® Plant Mini Kit (QIAGEN, Germany) and subsequently digested using the RNase-Free DNase Set (QIAGEN, Hilden, Germany) according to the manufacturer’s instructions. The quality and concentration of the RNA were assessed using 1% agarose gel electrophoresis and quantified with a NanoDrop™ 2000 spectrophotometer (Thermo Fisher Scientific, Waltham, MA, USA). The detailed information on cell quantity, RNA quality and quantity for each sample has been provided in Table 2.

Table 2 The detailed information on the number of cells, RNA quality and quantity of each sample.

Full size table

For library construction, a total of 12 μg of high-quality RNA from different samples were mixed in equal amounts. The sequencing libraries were then created according to PacBio’s iso-seq sequencing protocol. The initial step involved generating complete cDNA strands by using the NEBNext^® Single Cell/Low Input cDNA Synthesis & Amplification Module. This was succeeded by the PCR-based amplification and subsequent purification of the cDNA. The next phase entailed repairing any damage to the cDNA and refining the ends. The concluding stage saw the attachment of SMRT hairpin adapters to the terminal regions of the cDNA duplexes. To verify the quality of the library, evaluations of its purity, concentration, and the size of the inserts were conducted using the Agilent Bioanalyzer 2100 system from Agilent Technologies, based in California, USA. Qualified libraries were then subjected to full-length transcriptome sequencing on the PacBio sequencing platform at BGI Genomics Co. Ltd.

Single-molecule real-time sequencing

The raw sequencing data were processed using SMRT Link v10.0 software. The CCSs were obtained from subreads using the following parameters: min_length = 300, max_length = 15,000, and min passes = 1. Polished CCS subreads were generated using the ccs tool (v6.2.0), applied to the subread BAM files with a minimum quality threshold of 0.9 (-min-rq 0.9). Lima (v2.1.0) and IsoSeq. 3 refine were employed to remove the primers and poly (A) tails, respectively. The FLNCs were recognized by detecting the presence of cDNA ends (3′ and 5′) and poly (A) tail. The clustering algorithm Iterative Clustering for Error Correction (ICE) was used to identify consensus FLNCs (high-quality transcripts, accuracy > 99%). The integrity of the transcriptome data was evaluated by BUSCO (Benchmarking Universal Single-Copy Orthologs) software (v3.0.2), which is based on OrthoDB database.

The raw data comprising polymerase read sequences was 52.56 Gb, while the output data containing subreads sequences was 51.44 Gb, which includes 15,509,209 reads (Table 3). Following transcript merging, the N50 length was 3,696 bp. A total of 596,485 CCS reads were generated, with a mean insert read length of 3,726 bp. Subsequently, all reads were subsequently classified into 226,177 non-redundant FLNCs. The distribution of sequence length was a vital factor in assessing the quality of library construction, as depicted in Fig. 4a. The BUSCO analysis revealed that 160 out of 303 conserved eukaryotic orthologous genes (52.8%) were identified in the S. acuminata transcriptome, with 113 genes (37.3%) being perfectly matched (Fig. 4b).

Table 3 Statistics of SMRT sequencing data.

Full size table

Functional annotation of full-length transcriptome

All the FLNCs were annotated by conducting translated nucleotide BLAST (BLASTX) searches against public databases, including the National Center for Biotechnology Information (NCBI) non-redundant protein database (NR) database, NCBI non-redundant nucleotide database (NT) database, SwissProt database, Pfam database, Kyoto Encyclopedia of Genes and Genomes (KEGG) database, and clusters of euKaryotic Orthologous Groups (KOG) database, with a 10⁻⁵ E-value cutoff. The GO annotations were performed based on the best BLASTX hit from the NR database using the Blast2GO software (v 2.5.0).

A total of 226,117 non-redundant FLNCs were annotated using the NR, NT, SwissProt, KEGG, KOG, Pfam and GO databases and 130,134 FLNCs, accounting for 57.55% of the total, showed successful BLAST hits against known sequences in at least one of the above databases (Table 4). Most sequence similarities were against the NR database (108,570, 48.01%), followed by the NT database (26,460, 11.70%), the SwissProt database (43,963, 19.44%) and the KEGG database (53,997, 23.88%). The annotation results are summarized in Table 4 and represented in Fig. 5.

Table 4 Summary of annotation result.

Full size table

In the NR annotation, 34,949 (32.19%) of the homologous FLNCs were aligned to Polarella glacialis, followed by Symbiodinium natans (8,598, 7.92%), Symbiodinium microadriaticum (7,665, 7.06%), and Symbiodinium sp. CCMP2592 (7,274, 6.70%) (Fig. 6). In the KEGG database classification, non-redundant FLNCs were annotated across various functional groups, including cellular processes (2,190, 3.59%), environmental information processing (2,320, 3.80%), genetic information processing (19,420, 31.82%), metabolism (35,487, 58.14%) and organismal systems (1,619, 2.65%) (Fig. 7).

The putative functions of non-redundant FLNCs and their products were identified via searching the annotated genes in GO classifications. A total of 48,908 FLNCs were assigned to at least one of the 3 main GO categories: biological process, cellular component, and molecular function (Fig. 8a). These FLNCs were further classified into functional subcategories: 44,468 participated in 23 biological processes, including “cellular process” (41.40%) and “metabolic process” (36.12%) comprised the largest proportion; 34,048 FLNCs related to 2 cellular components, including “cellular anatomical entity” (88.41%) and “protein-containing complex” (11.59%); 60,701 FLNCs involved in 20 molecular functions, with “catalytic activity” (41.66%) being the most abundant, followed by “binding” (39.39%), “transporter activity” (6.65%), and “transcription regulator activity” (3.67%) (Fig. 8a).

The KOG annotation yielded 51,800 putative proteins in 25 categories (Fig. 8b). Among these categories, the cluster for “General function prediction” was the largest group (15,161, 29.27%), followed by “Signal transduction mechanisms” (14,605, 28.19%), and “RNA processing and modification” (13,565, 26.19%). Clusters for “Cellmotility” (208, 0.40%), “Nucleotide transport and metabolism” (671, 1.30%), and “Coenzyme transport and metabolism” (731, 1.41%) were the smallest 3 groups (Fig. 8b).

Structure analysis of simple sequence repeats (SSRs)

The SSRs within the full-length transcriptome was identified by using the microsatellite identification tool (MISA v1.0). The FLNCs exceeding 500 bp were screened and used for analysis.

A total of 226,117 non-redundant FLNCs with a total length of 149,572,208 bp were used for SSR prediction. The number of SSRs included mononucleotide repeats (42,038, 62.55%), dinucleotide repeats (5,174, 7.70%), tri-nucleotide repeats (17,724, 26.37%), tetra-nucleotide repeats (435, 0.65%), hexa-nucleotide repeats (1,000, 1.49%) and penta-nucleotide repeats (841, 1.25%) (Fig. 9).

Prediction of protein coding sequences (CDS), transcription factors (TFs), and long non-coding RNA (lncRNAs)

The TransDecode software (v3.0.1) was employed to predict CDS and their corresponding amino acid sequences in FLNCs. The prediction criteria included ORF length, log-likelihood score, and alignment against the Pfam protein domain database. The iTAK software was used for TFs prediction with PlantTFBD as a reference database. Four programs, including Coding Potential Calculator (CPC)³⁷, Coding-Non-Coding-Index (CNCI)³⁸, predictor of long non-coding RNAs and messenger RNAs based on an improved k-mer scheme (PLEK)³⁹, and Pfam⁴⁰ database were used to predict lncRNAs within the full-length transcriptome.

In total, 116,417 coding sequences were predicted using TransDecoder, with the total length of 75,895,596 bp. A total number of 550 putative TF members were obtained and categorized into 19 families. The top 11 TF families were as follows: Cysteine and Serine-rich Domain (CSD) with 213 members; C3H-type Zinc Finger (C3H) with 178 members; Forkhead-associated domain (FHA) with 69 members; Apetala2/Ethylene-responsive element-binding protein (AP2-EREBP) with 19 members; Myeloblastosis (MYB) with 14 members, MYB-related with 14 members; C2H2 Zinc Finger (C2H2) with 10 members; Tubulin (TUB) with 9 members; Lin-11, Isl-1, and Mec-3 (LIM) with 4 members; Bromodomain and SAMP domain-containing (BSD) with 4 members; and Bending-Binding Receptor (BBR)/BPC with 4 members (Fig. 10).

LncRNAs were predicted using four methods including CPC, CNCI, PLEK and Pfam structural domain analysis. The common non-coding hits/intersection of the four results were then filtered and considered as lncRNA. We obtained 166,855, 85235, 139,800 and 188,205 putative lncRNAs determined using CPC, CNCI, PLEK and Pfam database, respectively, and among these 78,393 (38.86%) were identified in all analyses (Fig. 11).

Data Records

The dataset is available in the NCBI Sequence Read Archive under BioProject accession number PRJNA1127753⁴¹.

Technical Validation

Obtaining high-quality RNA is a critical prerequisite for the success of full-length transcriptome sequencing. To validate the precision of the sequencing data, we conducted a thorough examination of the RNA samples. This process encompassed the evaluation of sample purity and concentration using the Nanodrop, and the integrity assessment was conducted through the Agilent Bioanalyzer 2100 system.

Code availability

All commands and pipelines used in data processing were executed according to the manual and protocols of the corresponding bioinformatics software. The version and code/parameters of software have been described in Methods. No custom code was used for the curation and/or validation of the dataset in this study.

References

Fensome, R. A. et al. The early Mesozoic radiation of dinoflagellates. Paleobiology 22, 329–338 (1996).
Article MATH Google Scholar
Lin, S. Genomic understanding of dinoflagellates. Res. Microbiol. 162, 551–569 (2011).
Article CAS PubMed MATH Google Scholar
Taylor, F. J. R. et al. Dinoflagellate diversity and distribution. Biodivers. Conserv. 17, 407–418 (2008).
Article MATH Google Scholar
Lin, S. et al. The Symbiodinium kawagutii genome illuminates dinoflagellate gene expression and coral symbiosis. Science 350, 691–694 (2015).
Article ADS CAS PubMed MATH Google Scholar
Smayda, T. J. Reflections on the ballast water dispersal—harmful algal bloom paradigm. Harmful Algae 6, 601–622 (2007).
Article Google Scholar
Jeong, H. J. et al. Feeding diverse prey as an excellent strategy of mixotrophic dinoflagellates for global dominance. Sci. Adv. 7, eabe4214 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Hallegraeff, G. M. et al. Perceived global increase in algal blooms is attributable to intensified monitoring and emerging bloom impacts. Commun. Earth Environ. 2, 1–10 (2021).
Article Google Scholar
Tang, Y. Z. et al. Exploration of resting cysts (stages) and their relevance for possibly HABs-causing species in China. Harmful Algae 107, 102050 (2021).
Article CAS PubMed MATH Google Scholar
Bravo, I. & Figueroa, R. I. Towards an ecological understanding of dinoflagellate cyst functions. Microorganisms 2, 11–32 (2014).
Article PubMed PubMed Central MATH Google Scholar
Bewley, J. D. et al. Seeds: Physiology of Development, Germination and Dormancy 3rd edn (Springer: New York, Heidelberg, Dordrecht, London, 2013).
Deng, Y. et al. Transcriptomic analyses of Scrippsiella trochoidea reveals processes regulating encystment and dormancy in the life cycle of a dinoflagellate, with a particular attention to the role of abscisic acid. Front. Microbiol. 8, 2450 (2017).
Article PubMed PubMed Central MATH Google Scholar
Ellegaard, M. & Ribeiro, S. The long‐term persistence of phytoplankton resting stages in aquatic ‘seed banks’. Biol. Rev. 93, 166–183 (2018).
Article PubMed MATH Google Scholar
Dale, B. Dinoflagellate resting cysts: “Benthic plankton”, in Survival Strategies of the Algae (ed. Fryxell, G. A.) 69–136 (Cambridge Univ. Press,1983).
Tang, Y. Z. et al. Characteristic life history (resting cyst) provides a mechanism for recurrence and geographic expansion of harmful algal blooms of dinoflagellates: a review. Stud. Mar. Sin. 51, 132–154 (2016).
MATH Google Scholar
Anderson, D. M. et al. Evidence for massive and recurrent toxic blooms of Alexandrium catenella in the Alaskan Arctic. Proc. Natl. Acad. Sci. 118(41), e2107387118 (2021).
Article CAS PubMed PubMed Central Google Scholar
Hu, Z. et al. The notorious harmful algal blooms-forming dinoflagellate Prorocentrum donghaiense produces sexual resting cysts, which widely distribute along the coastal marine sediment of China. Front. Mar. Sci. 9, 826736 (2022).
Article Google Scholar
Wang, Z. et al. Distribution of dinoflagellate cysts in surface sediments from the Qingdao coast, the Yellow Sea, China: the potential risk of harmful algal blooms. Front. Mar. Sci. 9, 910327 (2022).
Article Google Scholar
Metcalfe, C. J. et al. Evolution of the australian lungfish (Neoceratodus forsteri) genome: a major role for CR1 and L2 LINE elements. Mol. Biol. Evol. 29, 3529–3539 (2012).
Article CAS PubMed MATH Google Scholar
Ruvindy, R. et al. Genomic copy number variability at the genus, species and population levels impacts in situ ecological analyses of dinoflagellates and harmful algal blooms. ISME Commun. 3, 70 (2023).
Article PubMed PubMed Central Google Scholar
Aranda, M. et al. Genomes of coral dinoflagellate symbionts highlight evolutionary adaptations conducive to a symbiotic lifestyle. Sci. Rep. 6, 39734 (2016).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, Y. et al. Dependence of genome size and copy number of rRNA gene on cell volume in dinoflagellates. Harmful Algae 109, 102108 (2021).
Article PubMed MATH Google Scholar
Stephens, T. G. et al. Genomes of the dinoflagellate Polarella glacialis encode tandemly repeated single-exon genes with adaptive functions. BMC Biol. 18, 56 (2020).
Article CAS PubMed PubMed Central Google Scholar
Kretschmann, J. et al. Taxonomic clarification of the dinophyte Peridinium acuminatum Ehrenb., ≡ Scrippsiella acuminata, comb. nov. (Thoracosphaeraceae, Peridiniales). Phytotaxa 220, 239–256 (2015).
Article Google Scholar
Steidinger, K. A. & Tangen, K. Dinoflagellates. In Identifying Marine Diatoms and Dinoflagellates (ed. Tomas, C. R.) 387–589 (Academic Press, San Diego, 1997).
Tang, Y. Z. & Gobler, C. J. Lethal effects of Northwest Atlantic Ocean isolates of the dinoflagellate, Scrippsiella trochoidea, on Eastern oyster (Crassostrea virginica) and Northern quahog (Mercenaria mercenaria) larvae. Mar. Biol. 159, 199–210 (2012).
Article Google Scholar
Guo, X. et al. Transcriptome and metabolome analyses of cold and darkness-induced pellicle cysts of Scrippsiella trochoidea. BMC Genomics 22, 526 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Deng, Y. et al. Molecular cloning of heat shock protein 60 (Hsp60) and 10 (Hsp10) genes from the cosmopolitan and harmful dinoflagellate Scrippsiella trochoidea and their differential transcriptions responding to temperature stress and alteration of life cycle. Mar. Biol. 166, 7 (2018).
Article Google Scholar
Li, F. et al. Probing the energetic metabolism of resting cysts under different conditions from molecular and physiological perspectives in the harmful algal blooms-forming dinoflagellate Scrippsiella trochoidea. Int. J. Mol. Sci. 22, 7325 (2021).
Article CAS PubMed PubMed Central MATH Google Scholar
Li, F. et al. Characterizing the status of energetic metabolism of dinoflagellate resting cysts under mock conditions of marine sediments via physiological and transcriptional measurements. Int. J. Mol. Sci. 23, 15033 (2022).
Article CAS PubMed PubMed Central MATH Google Scholar
Wu, X. et al. Energy metabolism and genetic information processing mark major transitions in the life history of Scrippsiella acuminata (Dinophyceae). Harmful Algae 116, 102248 (2022).
Article CAS PubMed MATH Google Scholar
Cooper, J. T. et al. Transcriptome Analysis of Scrippsiella trochoidea CCMP 3099 Reveals Physiological Changes Related to Nitrate Depletion. Front. Microbiol. 7, 639 (2016).
Article PubMed PubMed Central MATH Google Scholar
Sirius, P. K. & Lo, S. C. Comparative proteomic studies of a Scrippsiella acuminata bloom with its laboratory-grown culture using a 15N-metabolic labeling approach. Harmful algae 67, 26–35 (2017).
Article Google Scholar
Pan, X. et al. Full-length transcriptome analysis of a bloom-forming dinoflagellate Prorocentrum shikokuense (Dinophyceae). Sci. Data 11, 430 (2024).
Article CAS PubMed PubMed Central MATH Google Scholar
Ji, N. et al. Full-Length Transcriptome analysis of the ichthyotoxic harmful alga Heterosigma akashiwo (Raphidophyceae) using single-molecule real-time sequencing. Microorganisms 11, 389 (2023).
Article CAS PubMed PubMed Central Google Scholar
Guillard, R. R. L. Culture of phytoplankton for feeding marine invertebrates. in Culture of marine invertebrate animals: Proceedings—1st Conference on Culture of Marine Invertebrate Animals Greenport (eds. Smith, W. L. & Chanley, M. H.) 29–60 (Springer US, Boston, MA, 1975).
Yue, C. et al. Deficiency of nitrogen but not phosphorus triggers the life cycle transition of the dinoflagellate Scrippsiella acuminata from vegetative growth to resting cyst formation. Harmful Algae 118, 102312 (2022).
Article CAS PubMed MATH Google Scholar
Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).
Article PubMed PubMed Central MATH Google Scholar
Sun, L. et al. Utilizing sequence intrinsic composition to classify protein-coding and long non-coding transcripts. Nucleic Acids Res. 41, e166 (2013).
Article CAS PubMed PubMed Central Google Scholar
Li, A. et al. PLEK: a tool for predicting long non-coding RNAs and messenger RNAs based on an improved k-mer scheme. BMC bioinformatics 15, 1–10 (2014).
Article Google Scholar
Finn, R. D. et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 44, D279–D285 (2016).
Article CAS PubMed MATH Google Scholar
NCBI Sequence Read Archive https://identifiers.org/ncbi/insdc.sra:SRP516046 (2024).

Download references

Acknowledgements

This research was financially supported by the National Science Foundation of China (Grant No. 42176207, 42406214), and the Qingdao Postdoctoral Applied Research Project (No. QDBSH20220202137).

Author information

Caixia Yue
Present address: Animal, Plant and Food Inspection Center of Nanjing Customs, Nanjing, 210000, China

Authors and Affiliations

CAS Key Laboratory of Marine Ecology and Environmental Sciences, Institute of Oceanology, Chinese Academy of Sciences Qingdao, Qingdao, 266071, China
Fengting Li, Caixia Yue, Yunyan Deng & Ying Zhong Tang
Laboratory for Marine Ecology and Environmental Science, Qingdao Marine Science and Technology Center, Qingdao, 266237, China
Fengting Li, Yunyan Deng & Ying Zhong Tang
Center for Ocean Mega-Science, Chinese Academy of Sciences, Qingdao, 266071, China
Fengting Li, Yunyan Deng & Ying Zhong Tang

Authors

Fengting Li
View author publications
Search author on:PubMed Google Scholar
Caixia Yue
View author publications
Search author on:PubMed Google Scholar
Yunyan Deng
View author publications
Search author on:PubMed Google Scholar
Ying Zhong Tang
View author publications
Search author on:PubMed Google Scholar

Contributions

Fengting Li carried out the study and drafted the manuscript; Caixia Yue participated the data processing; Yunyan Deng and Ying Zhong Tang directed the project, discussed and revised the manuscript.

Corresponding authors

Correspondence to Yunyan Deng or Ying Zhong Tang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, F., Yue, C., Deng, Y. et al. Full-length transcriptome analysis of a bloom-forming dinoflagellate Scrippsiella acuminata (Dinophyceae). Sci Data 12, 352 (2025). https://doi.org/10.1038/s41597-025-04699-1

Download citation

Received: 28 August 2024
Accepted: 21 February 2025
Published: 27 February 2025
Version of record: 27 February 2025
DOI: https://doi.org/10.1038/s41597-025-04699-1