Background & Summary

Rare minnow (Gobiocypris rarus) is a small cyprinid fish endemic to China, primarily distributed in Hanyuan County, Sichuan Province1. It was officially described as a new species in 1983. With its short life cycle, transparent chorion, and high sensitivity to chemicals, the rare minnow serves as an ideal experimental model in basic biology, toxicology, and genetics1,2. Today, it is widely used as a model organism in various studies, including investigations into grass carp hemorrhagic disease and the toxicological effects of chemicals such as bisphenols3,4. Due to the relatively large size of its embryos and transparent chorion, the rare minnow has been instrumental in research on genome editing, and developmental toxicity of chemicals5,6. Furthermore, China has established a national standard, “Chemicals: Rare Minnow (Gobiocypris rarus) Embryo Acute Toxicity Test” (GB/T 44396-2024), to regulate the use of rare minnow embryos in acute toxicity testing.

The embryonic development of the rare minnow is similar to that of other teleosts and can be divided into the cleavage, blastula, gastrula, organogenesis, and hatching stages7,8. The transition from the blastula stage to the organogenesis stage is particularly critical. After the rapid and synchronous cleavage stage, the embryo enters the mid-blastula transition (MBT), during which developmental control shifts from maternal factors to the zygotic genome9. This process, known as the maternal-to-zygotic transition (MZT), involves the gradual clearance of maternal factors and the activation of the zygotic genome10. Analysis of the natural mortality rate of fertilized zebrafish embryos reveals that the highest mortality occurs at approximately 12 hours post-fertilization (hpf)11. This finding highlights zygotic genome activation (ZGA) as a pivotal event in the early development of zebrafish embryos. If mutations in maternal or paternal genes disrupt the transition from maternal control to zygotic genome regulation, embryonic development may arrest or result in mortality12. Transcriptomic sequencing of embryos at this stage can provide insights into the regulatory roles of maternal and zygotic genes in embryonic development13.

In this study, we utilized next-generation sequencing (NGS) to obtain gene expression profiles of rare minnow embryos at three developmental stages: the blastula stage, gastrula stage, and optic rudiment stage. Additionally, we assessed the quality of the data to ensure its reliability. These data can be used to investigate the mechanisms of zygotic genome activation and serve as a control group for comparative analysis when studying the effects of chemicals on embryonic development.

Methods

Animal sampling

Adult rare minnow from an artificial outbred strain were provided by the National Aquatic Biological Resource Center (NABRC). After artificial fertilization, the fertilized eggs were cultured in glass dishes at 26 °C. Embryos were sampled at 3 hours post fertilization, 7 hours post fertilization, and 13 hours post fertilization, corresponding to the blastula, gastrula, and optic rudiment stages, respectively. Three biological replicates were set for each developmental stage, with 40 embryos collected per replicate. The samples were stored at −80 °C.

Transcriptome

Total RNA was extracted using Trizol Reagent (Ambion, USA) and assessed for integrity using the Agilent 2100 Bioanalyzer (Agilent Technologies, USA). mRNA purification and library preparation were performed following the NEBNext Ultra II RNA Library Prep Kit for Illumina (NEB #E7775, New England Biolabs, USA) protocol. 0.5 μg of total RNA was subjected to Poly(A) mRNA enrichment using the NEBNext Poly(A) mRNA Magnetic Isolation Module (NEB #E7490, New England Biolabs, USA). The enriched mRNA was eluted from the magnetic beads, mixed with first-strand synthesis buffer and random primer mix, and thermally fragmented at 94 °C for 10 minutes. The reaction was then rapidly cooled to 4 °C and transferred to a PCR tube for first-strand cDNA synthesis. Subsequently, second-strand synthesis, end repair and A-tailing, adaptor ligation, USER enzyme digestion, and PCR amplification were performed. Library size selection was conducted using dual-step SPRI bead purification (NEBNext Sample Purification Beads, NEB #E7775) to ensure an insert size of approximately 300 bp, resulting in a final library size of 450 bp. The constructed libraries were quality-controlled using the Agilent 2100 Bioanalyzer (Agilent Technologies, USA) to assess fragment distribution, and qPCR-based quantification using the KAPA Library Quantification Kit was performed to ensure a final library concentration of ≥1.5 nM. Finally, sequencing was carried out on the Illumina NovaSeq 6000 platform using the S4 flow cell. Libraries with unique index sequences were pooled prior to sequencing to ensure balanced data representation across samples. The pooled libraries were diluted to 2 nM, and sequencing was performed using paired-end sequencing (PE150), with a target data output of 6 G.

Data Records

The raw sequencing data has been deposited in the NCBI Sequence Read Archive (SRA) database under the accession number PRJNA120025214.

Technical Validation

Raw sequencing reads underwent quality control using FastQC to assess GC content (Fig. 1), sequence duplication level (Fig. 2), adapter content (Fig. 3), and insert size distribution (Table 1). To improve data quality, Fastp was used to remove reads containing 3′ adapter sequences, and reads with an average quality score below Q20 were discarded. After these filtering steps, the remaining high-quality reads were defined as clean data.

Fig. 1
figure 1

Per sequence GC content.

Fig. 2
figure 2

Per sequence duplication level.

Fig. 3
figure 3

Per sequence adapter content.

Table 1 Sequencing statistics for rare minnow embryos at the blastula (B), gastrula (G), and optic rudimen.

The number of Pass Filter reads per sample ranged from 39.47 million to 48.38 million, yielding a total of 58.90 GB of Pass Filter data. The proportion of Q30 bases exceeded 94.16% in all samples, ensuring high sequencing accuracy. Filtered reads were mapped to the Gobiocypris rarus reference genome15,16 using HISAT2, achieving a mapping rate of 95.63%–96.92%, with uniquely mapped reads ranging from 91.29% to 94.46%. Reads were predominantly aligned to gene regions (Mapped to Gene) and exon regions (Mapped to Exon), with the proportion of reads mapped to genes ranging from 66.35% to 71.74%, while those mapped to exons reached a maximum of 94.35%. Additionally, the proportion of genes covered by more than one read count ranged from 67.66% to 77.14%, demonstrating good data uniformity and sequencing depth (Table 1).

Principal component analysis (PCA) of the RNA-Seq data from the nine samples across the three stages revealed distinct clustering within each developmental stage, with no clustering observed between samples from different stages. This indicates significant differences in gene transcription levels across embryonic development stages (Fig. 4).

Fig. 4
figure 4

PCA score plot of transcriptomic profiles for the blastula (B), gastrula (G), and optic rudiment (O) stages.

Differentially expressed genes (DEGs) were identified by comparing expression levels between different sample groups using the criteria: Q-value ≤ 0.01 and |log2(fold change)| ≥1. A total of 5,857 and 4,908 DEGs were identified between the blastula and gastrula stages (Fig. 5), while 4,803 and 3,290 DEGs were found between the gastrula and optic rudiment stages (Fig. 6). Between the blastula and optic rudiment stages, 6,826 and 4,623 DEGs were identified, respectively (Fig. 7).

Fig. 5
figure 5

Volcano plot of differentially expressed genes between Blastula (B) and Gastrula (G) stages.

Fig. 6
figure 6

Volcano plot of differentially expressed genes between gastrula (G) and optic rudiment (O) stages.

Fig. 7
figure 7

Volcano plot of differentially expressed genes between blastula (B) and optic rudiment (O) stages.

The volcano plot highlights the differential expression of known developmental biomarkers. sox19b (sry-box transcription factor 19b) is associated with zygotic genome activation and is significantly upregulated during the blastula stage compared to the gastrula and optic rudiment stages17. gsc(goosecoid) plays a crucial role in Spemann’s organizer phenomenon and is involved in gastrulation, with significantly higher expression in the gastrula stage compared to the blastula and optic rudiment stages18. foxa2 (forkhead box a2) is a key regulator of endoderm development and participates in notochord formation, showing increased transcription in the gastrula and optic rudiment stages compared to the blastula stage19,20. irx1b (iroquois homeobox 1b) is associated with retinogenesis and is significantly upregulated after the optic rudiment stage21. bmp4 (bone morphogenetic protein 4) is a key mediator of dorsoventral patterning in vertebrates and is essential for lens development, exhibiting significantly higher expression in the gastrula and optic rudiment stages compared to the blastula stage22,23. wnt10b (wnt family member 10b) is involved in osteogenesis and shows significant upregulation after the optic rudiment stage24.