Zygotic activation of transposable elements during zebrafish early embryogenesis

Li, Bo; Li, Ting; Wang, Dingjie; Yang, Ying; Tan, Puwen; Wang, Yunhao; Yang, Yun-Gui; Jia, Shunji; Au, Kin Fai

doi:10.1038/s41467-025-58863-7

Download PDF

Article
Open access
Published: 18 April 2025

Zygotic activation of transposable elements during zebrafish early embryogenesis

Nature Communications volume 16, Article number: 3692 (2025) Cite this article

7649 Accesses
2 Citations
2 Altmetric
Metrics details

Subjects

Abstract

Although previous studies have shown that transposable elements (TEs) are conservatively activated to play key roles during early embryonic development, the details of zygotic TE activation (ZTA) remain poorly understood. Here, we employ long-read sequencing to precisely identify that only a small subset of TE loci are activated among numerous copies, allowing us to map their hierarchical transcriptional cascades at the single-locus and single-transcript level. Despite the heterogeneity of ZTA across family, subfamily, locus, and transcript levels, our findings reveal that ZTA follows a markedly different pattern from conventional zygotic gene activation (ZGA): ZTA occurs significantly later than ZGA and shows a pronounced bias for nuclear localization of TE transcripts. This study advances our understanding of TE activation by providing a high-resolution view of TE copies and creating a comprehensive catalog of thousands of previously unannotated transcripts and genes that are activated during early zebrafish embryogenesis. Among these genes, we highlight two that are essential for zebrafish development.

Satb2 acts as a gatekeeper for major developmental transitions during early vertebrate embryogenesis

Article Open access 19 October 2021

Pre-marking chromatin with H3K4 methylation is required for accurate zygotic genome activation and development

Article Open access 19 December 2025

Coordinated action of multiple active histone modifications shapes the zygotic genome activation in teleost embryos

Article Open access 16 June 2025

Introduction

Transposable elements (TEs) make up a significant portion of animal genomes (e.g., human, 46%; mouse, 37%; zebrafish, 51%)^1,2,3, by spreading across the genome in different formats, such as intact sequences with complete transcription units, and fragments inserted into genes or truncated copies within intergenic regions. These confounding formats result in diverse and complicated activities and functions of TEs. In addition to potential transposition activity following autonomous expression (e.g., retrotransposons), TEs are also known to contribute to gene transcription in many aspects, such as TE-derived regulatory elements (e.g., enhancers and promoters), and TE-derived chimeric transcripts with miscellaneous biological functions^4,5,6,7. While dysregulated TE expression has been strongly associated with developmental defects^8,9,10, there is growing urgency to achieve a more accurate and comprehensive understanding of TEs and their broader biological impact. The community is making progress towards resolving this problem. For instance, recent studies based on single-cell RNA-seq have indicated that cell type-specific activation of different TEs may contribute to tissue-specific developmental processes^11,12,13. Recently, the activation of a particular endogenous retrovirus has been implicated as critical for mesoderm development in zebrafish¹⁴.

Previous studies have shown that TE transcription can be also activated as regular genes over early embryonic development in mouse, zebrafish, and a few other organisms^7,10,11,15. Over this critical developmental course, the degradation of maternal transcripts is sophisticatedly coordinated with the zygotic genome activation, where the activation dynamics of regular genes have been widely studied^16,17,18,19. In contrast, limited details of TE activation and transcription over early embryonic development are known due to (1) the hierarchical sequence similarity within TE families and subfamilies, hindering the single-copy-resolved studies of each locus and transcript^20,21; and (2) the ambiguity between autonomously expressed TE copies and TE-gene chimeric transcripts^10,20. Therefore, there is a critical knowledge gap of the complicated roles of TE-derived transcripts in transposition activity against genome integrity and gene regulation over embryogenesis¹⁰.

Here we leverage high-quality long reads plus manual annotation to establish a high-resolution landscape of TE activation and transcription at the levels of locus and transcript, over zebrafish early embryonic development. Of note, in this study, TE activation is specifically defined as transcriptional activity, without implying transpositional activity. More importantly, we reveal a previously uncharacterized temporal trajectory and subcellular distribution of zygotic TE activation (ZTA) in zebrafish, where extensive variation exists among TE families, subfamilies, loci, and transcripts with respect to evolutionary age. While ZTA is an integral component of zygotic genome activation (ZGA) and may rely on zygotic gene activation, unique features are identified compared with regular genes in terms of their transcription dynamics and transcriptional regulations during zebrafish early development. Additionally, this unambiguous transcriptome catalog lists out thousands of previously unannotated transcripts and genes, among which two have been experimentally verified to be essential for zebrafish early embryogenesis. This comprehensive approach provides insights into the role of TEs in early developmental processes.

Results

Only a tiny subset of TE-alone loci are actively transcribed over zebrafish early embryogenesis

A total of 25,788 full-length TE-derived transcripts are identified from 11 featured stages (from fertilization to the shield stage), including 706 autonomously expressed TE transcripts (referred to as “TE-alone”) and 25,082 chimeric transcripts of TE and gene sequences (referred to as “TE-gene”) (Fig. 1a, b). The TE-alone transcripts undergo stringent manual curation that merges highly identical transcripts from the same TE loci, corrects TE annotation and transcript structures, and removes incorrectly annotated SINEs (short interspersed nuclear elements) (Fig. 1a, Supplementary Fig. 1 and Supplementary Note 1). The manual annotation confirms that 706 active TE-alone transcripts are expressed at 550 TE-alone loci, from 30 families and 210 subfamilies (Fig. 1c, d and Supplementary Fig. 2). Despite most TE-alone transcripts deriving from single TE loci, 48 chimeric TE-alone transcripts are likely generated from multiple adjacent loci by readthrough transcription and nested TE insertion (Fig. 1e, Supplementary Fig. 3, Supplementary Table 1 and Supplementary Note 2).

**Fig. 1: Characterization of TE-derived transcripts.**

Compared to the short read-based characterization of expressed TE families/subfamilies containing numerous loci in zebrafish and mouse^11,22, manual annotation with the unambiguous long-read RNA-seq alignment constructs an accurate and high-resolution catalog of active TE-alone loci and transcripts over zebrafish early embryogenesis (Supplementary Fig. 4 and Supplementary Note 3). Therefore, the expression dynamics from a small subset of TE loci would not be mixed or overwhelmed by the tremendous number of inactive copies across the whole genome. The high-resolution catalog reveals the distinct splicing patterns of DNA transposons, LINEs (long interspersed nuclear elements), and LTRs (long terminal repeats) (Fig. 1f and Supplementary Table 2). A total of 95.13% (215/226) of DNA transposon transcripts contain multiple exons, compared to 17.05% (15/88) of LINE transcripts. In between, 166 and 224 LTR transcripts are expressed with/without exon splicing, respectively (Fig. 1f and Supplementary Fig. 5a, b). Some introns in active LINE and LTR loci are acquired by insertion of other TEs (Supplementary Fig. 5c and Supplementary Note 4). A total of 85.39% of splicing sites from the active TE-alone transcripts contain canonical splicing signal GU-AG, and this bias is more significant in DNA transposon transcripts than LTR ones and LINE ones (Fig. 1f and Supplementary Fig. 5a). A total of 31.50% of active DNA transposon loci express multiple transcript isoforms, significantly higher than 8.86% for LINEs and 11.11% for LTRs (Fig. 1f and Supplementary Fig. 5a).

With the full-length sequences, we further investigate whether the expressed TE-alone transcripts encode essential proteins to fulfill the transposition process²³. Most LINE transcripts may maintain autonomous transposition potential, as only 13 (14.77%) lose one or two core domains (Fig. 1g). In contrast, 178 (78.76%) DNA transposon transcripts and 366 (93.85%) LTR transcripts lose core domains. Particularly, all 170 active ERV transcripts, accounting for 43.59% of total LTR transcripts, lack one to six core domains, suggesting completely or partially deficient autonomous transposition function (Fig. 1g and Supplementary Table 3).

ZTA shows a unique activation pattern during zebrafish early embryogenesis

Previous studies reported that TE is also activated as zygotic gene activation in mammals and zebrafish^7,11, yet the precise activation timing, trajectory and variation at the levels of locus and transcript are unknown. Among 706 active TE-alone transcripts, only 11 (including 9 DNA transposons), are maternally inherited (Supplementary Fig. 6a), and 695 zygotic ones undergo considerably later activation than the ZGA waves in zebrafish (Fig. 2a). The ZTA starts at the 1k-cell stage with 28 TE-alone transcripts (mean TPM = 2.07), and has a remarkable increase at the oblong stage with a greater number of transcripts (108) and higher abundance (mean TPM = 5.52) (Fig. 2a, Supplementary Fig. 6b and Supplementary Note 5), which is supported by the stage-by-stage expression correlation analysis (Fig. 2b and Supplementary Fig. 6c) and whole mount in situ hybridization (WISH) (Fig. 2c and Supplementary Fig. 6d). By contrast, previous studies have indicated that the minor wave of ZGA typically initiates between the 64-cell and 512-cell stages, while the major wave predominantly begins at the 1k-cell stage^24,25,26. Moreover, the expression levels of TE-alone transcripts differ significantly from those of TE-gene and regular gene transcripts during the developmental window spanning the 1k-cell to shield stages (Fig. 2a). In particular, before the oblong stage, TE-alone transcripts exhibit significantly lower expression levels compared to TE-gene and regular gene transcripts, but after the oblong stage, their expression levels increase sharply, surpassing those of both TE-gene and regular gene transcripts (Fig. 2a).

**Fig. 2: Characterization of ZTA during zebrafish early embryogenesis.**

The difference between the existing knowledge and our findings likely arises from the limitation of the conventional short read-based analyses, where TE-gene transcripts and TE-alone ones are not distinguished unambiguously: using short reads alone, we could only show a similar activation curve between TEs and genes (Supplementary Fig. 6e). In fact, 71.21% of TE-derived transcripts contain <20% TE proportion and their activation pattern is similar to ZGA. When the short read-based analysis pooled TE-derived transcripts, the overall pattern was demonstrated by this subset of TE-derived transcripts and thus was similar to ZGA.

We find that ZTA in mouse shares a similar delayed onset by analyzing two publicly available data^27,28: while ZGA can be detectable as early as 4 h post fertilization (hpf), the first TE-alone transcripts (i.e., from the subfamilies L1MdTf_II, L1MdTf_III and L1MdA_II) are activated at 6 hpf, followed by MERVL and MERVL-3A at 8 hpf and MT2_Mm at 10 hpf (“TE activation during early mouse embryonic development” subsection of the “Methods” and Supplementary Fig. 7). From 6 to 12 hpf, the overall TE-alone transcriptome remains at a certain level without remarkable change (Supplementary Fig. 7) while 1777 genes are upregulated²⁷. Because the abovementioned mouse TE quantification is based on a short-read RNA-seq dataset (from 0 to 12 hpf)²⁷, we could only investigate mouse ZTA at the subfamily level until 12 hpf. A more comprehensive long-read RNA-seq data collection at fine time points beyond 12 hpf until the 2-cell stage is required to further reveal the details of the species specificity.

Previous studies have underscored the pivotal role of the minor wave of ZGA in establishing the major wave of ZGA in both zebrafish and mouse^29,30,31. Therefore, the delayed onset of ZTA suggests that the preceding ZGA may also provide essential factors for the subsequent ZTA. We revisit a previous study and find transcriptomics evidence to support this possibility from several zebrafish mutants and embryos exposed to transcription/translation inhibitors²⁹: ZTA can be abolished through α-Amanitin injection (Fig. 2d) or repressed in triple loss-of-function of Nanog, Pou5f1, and SoxB1 (Fig. 2e and Supplementary Fig. 8), but ZTA can also undergo a global reduction when there is a translational blockade of the zygotic genes in the minor ZGA wave (Fig. 2f). Furthermore, we look into two transcription factors (TFs) in zebrafish, mxtx1/2, which are predicted to bind to the promoter regions (5’ LTRs) of eight ERV subfamilies (Supplementary Fig. 9a). They are orthologous to human DUX4 and mouse Dux that have been well documented for their roles in activating HERVL and MERVL retrotransposons, respectively^32,33. Dux is activated at 6 hpf in mouse embryos, while Dux-responsive genes and TEs are expressed later (e.g., MERVL expressed at 8 hpf) (Supplementary Fig. 7)²⁷. In zebrafish, mxtx1/2 are strictly zygotic genes activated during the 64-cell to 1k-cell stages, preceding the defined ZTA period (i.e., the 1k-cell to oblong stages) (Supplementary Fig. 9b). Consequently, after blocking the translation of mxtx1/2 (Supplementary Fig. 9c) and other early zygotic genes, all 21 ERV subfamilies are downregulated as compared to the wild type at the sphere (Supplementary Fig. 9d) and shield stages (Fig. 2g). Interestingly, re-analysis of the publicly available ChIP-seq data for zebrafish Nanog and Mxtx2³⁴ suggests that Nanog may function as a general TF for various types of TEs, while Mxtx2 may specifically regulate ERVs (Supplementary Fig. 9e, f).

Programmed ZTA shows complex expression trajectories during the maternal-to-zygotic transition

From the maternal carryover in fertilized egg to the zygotic transcripts in the shield stage, the total abundance of TE-alone transcripts increases by 318 folds (Fig. 3a), and there is a dramatic composition shift of TE-alone transcriptome components: DNA transposons account for the majority of the maternally inherited transcripts until ZTA starting at the 1k-cell stage, while ZTA of LINEs and LTRs, especially ERVs, rapidly dominate TE-alone expression (Fig. 3a and Supplementary Fig. 10a). However, considerable variations on expression pattern, including the abundance, activation time and peak stages, exist extensively among their families/subfamilies (Fig. 3b, Supplementary Fig. 10a, b and Supplementary Note 6). For example, the WISH experiment validates the activation of two ERV1 subfamilies BHIKHARI-3-I_DR and BHIKHARI-5-I_DR at the sphere stage while the other subfamily BHIKHARI_I at the dome stage (Fig. 2c).

**Fig. 3: Expression dynamics of TE-alone transcripts during zebrafish early embryonic development.**

In a higher-resolution view, these variations become more remarkable and extensive at both levels of locus and transcript within subfamilies (Fig. 3c, d and Supplementary Fig. 11a). Among 97 and 103 TE subfamilies with multiple active TE-alone loci and transcripts, respectively, 79 and 86 subfamilies have ≥2 activation time points for different loci and transcripts (Supplementary Fig. 11b). The activation trajectories of all 695 zygotic TE-alone transcripts are categorized into 8 types (Fig. 3d). The heterogeneity of ZTA trajectories within each subfamily, which is measured by entropy (“Expression trajectory analysis on the locus/transcript level” subsection of the “Methods”), varies greatly from being homogeneous to containing 7 of 8 types with comparable proportions (Fig. 3e). 72 and 80 subfamilies with multiple active TE-alone loci and transcripts, respectively, span ≥2 types of ZTA trajectories (Supplementary Fig. 11c). This intra-subfamily heterogeneity and the inter-subfamily variation exist extensively in TE-alone expression of DNA transposons, LTRs and LINEs (Supplementary Fig. 12a). For example, within the BHIKHARI_I subfamily, 24 active TE-alone transcripts belong to 6 trajectory types, including some activated as early as at the 1k-cell and high stages while some others as late as at the 30%-epiboly stage; some continuously increasing and some others saturating or even decreasing prior to the shield stage (Fig. 3e and Supplementary Fig. 12a). Although WISH could reveal the subfamily-level difference between BHIKHARI_I versus BHIKHARI-3-I_DR and BHIKHARI-5-I_DR (Fig. 2c), WISH and the other conventional approaches rarely disclose the detailed complexity among the activated transcripts within each subfamily (Fig. 3e). However, 19 active loci of these 24 TE-alone BHIKHARI_I transcripts contain nearly identical regulatory sequences that were reported previously³⁵, including the conserved CCAAT and TATA boxes at the promoter regions (Supplementary Fig. 12b). The intra-subfamily heterogeneity of their ZTA trajectories versus the high similarity of the annotated regulatory elements highlights the different epigenetics layers of regulation and the varying genomic contexts in the locus-/transcript-specific manner over zebrafish embryogenesis.

Evolutionary age influences ZTA

The embryogenesis activation pattern of TE-alone loci varies with respect to their evolutionary ages. Kimura divergences (KDs) of the active LTR loci are significantly lower, indicating younger evolutionary ages, than the silent copies (Fig. 3f). This significant difference extends to the active copies versus the silent ones of the intact LTRs (Supplementary Fig. 13a), which is supported by the orthogonal measurement of evolutionary age, i.e., the estimated insertion years (Fig. 3g and “Evolutionary analysis on TE-alone loci” of the “Methods”). Similarly, the active DNA transposon and LINE loci are also significantly younger than the silent ones (Fig. 3f).

Despite that there is no linear correlation between KD and the abundance of the active TE-alone loci (Supplementary Fig. 13b), there exist critical differences in the activation trajectories between the relatively “young” and “old” subsets that are defined by comparative genomics (Supplementary Fig. 14a, b and Supplementary Note 7). The evolutionarily young TE-alone loci show higher and increasing abundance at the oblong stage, while the abundance of the old subset remains at a certain level after activation (Fig. 3h and Supplementary Fig. 14c, d). The majority (50.60%) of the young subset are activated at the oblong and sphere stages, while the activation time of the old ones spreads more evenly across eight stages (Supplementary Fig. 14e). In addition, two subsets are distributed differently in eight expression clusters (Supplementary Fig. 14f).

Imbalanced subcellular localization of TE-derived transcripts

RNA subcellular localization is closely associated with diverse biological functions³⁶, which can be exemplified by the comparison between long intergenic non-coding RNAs and protein-coding genes (Supplementary Fig. 15a). In the context of three stages over ZTA (1k-cell, dome and shield), TE-alone transcripts have the highest relative mRNA abundance between the nucleus and cytoplasm (N/C proportion) at the subfamily level, followed by TE-gene transcripts, and gene transcripts have the lowest ratio (Fig. 4a, b and “Subcellular localization analysis with nuclear and cytosolic RNA-seq data” subsection of the “Methods”). TE-gene transcripts display a mixed pattern, with a relative N/C proportion similar to that of genes at the 1k-cell stage but shifting towards the TE-alone transcripts’ ratio at the shield stage (Fig. 4b). Moreover, the relative N/C proportion of a transcript is positively correlated with its TE fraction (Fig. 4c), highlighting a possibility of TE sequence-mediated regulation of transcript subcellular localization^37,38.

**Fig. 4: Analysis of subcellular localization of TE-alone loci.**

At the resolution of TE subfamily, significant variations in the relative N/C proportion are observed across different developmental stages. Despite these variations, DNA transposons consistently have a lower relative N/C proportion compared to LTRs and LINEs (Fig. 4d), suggesting a milder nuclear bias for DNA transposons. Additionally, LTRs show significantly higher relative N/C proportion than LINEs at the dome stage, but this pattern reverses at the shield stage, indicating dynamic subcellular localization for different TE subfamilies during development.

To illustrate these observations, we applied fluorescence in situ hybridization (FISH) to investigate the dynamic subcellular localization of several TE subfamilies. Our analysis reveals that the DNA transposon subfamily hAT-N76_DR predominantly localizes in the cytoplasm, while the LINE subfamily L2-1_DR is localized in the nucleus, consistently from the shield stage to the 6-somite stages (Fig. 4e, f and Supplementary Fig. 15b, c). These findings align with a previous study in mouse embryonic stem cells, which demonstrated a nuclear bias of LINE1 transcripts, suggesting potential regulatory roles in early development³⁹. This parallel implies a similar function for LINE transcripts in zebrafish embryogenesis. The transposition mechanisms likely play a key role in influencing the subcellular localization of TEs. For instance, LINEs replicate via target-primed reverse transcription, requiring their mRNA template to be transported back into the nucleus. This process may likely contribute to the observed N/C proportion⁴⁰. In addition, the LTR subfamily BHIKHARI_I displays a more dynamic subcellular distribution across developmental stages. It shows a nuclear bias at the shield stage but shifts to clear cytoplasmic localization at the bud and 6-somite stages (Fig. 4g and Supplementary Fig. 15d). This dynamic localization pattern aligns with above speculations (Fig. 4d), highlighting a more flexible subcellular positioning for LTRs during development.

Epigenetic dynamics over the full course of ZTA

Several types of ZTA trajectories (Type 4 to 8 in Fig. 3d) are saturated or even decrease by the shield stage, which may reflect a control of their transposition potential against genome integrity. Indeed, the whole TE-alone transcriptome landscape over zebrafish embryogenesis includes not only activation but also a repression process at the gastrula and segmentation stages (Fig. 5a). Although the accurate timing of repression may be heterogeneous as their activation timing, the full course is coordinated with epigenetics changes at multiple layers (Fig. 5b).

**Fig. 5: Epigenetic regulation of TE transcriptional activity during zebrafish embryonic development.**

Previous studies have revealed that the establishment of an open/accessible chromatin status at gene promoters typically marks future transcriptional activation in many species²⁴. To investigate the epigenetic regulation of TEs, we analyzed the publicly available datasets of a few epigenetic marks^{41,42,43,44,45}. The chromatin accessibility around the TSSs of 550 active TE-alone loci increases since the high and oblong stages and decreases at the shield and 80%-epiboly stages (Fig. 5c), which are immediately followed by the increase and decrease of their overall abundance, respectively (Fig. 5a). In contrast, the chromatin openness of regular genes increases continuously from the 64-cell to the 80%-epiboly stage (Supplementary Fig. 16a and Supplementary Note 8).

As previously reported⁴⁶, the active mark H3K4me3 exists in sperm yet is removed in the early embryo for both genes and TEs (Supplementary Fig. 16b and Fig. 5d, respectively). Compared to regular genes, which show elevated levels of H3K4me3 at the 128-cell stage, TE-alone loci have a subtle increase at the 1k-cell stage and a significant rise at the dome stage (Fig. 5d, e and Supplementary Fig. 16b, c), which is consistent with the delayed activation of TEs versus genes. Another active mark H3K27ac also demonstrates a similar regulatory pattern (Fig. 5f and Supplementary Fig. 16d). The repressive mark H3K9me3 is deposited across TE-alone loci at the sphere and shield stages (Fig. 5b, g), prior to their abundance decrease. In addition, as the other epigenetic and post-transcriptional silencing regulators, a rise of PIWI-interacting RNAs (piRNAs) at 76.36% (420/550) of TE-alone loci occurs from the oblong to 50%-epiboly stages (Fig. 5h, Supplementary Figs. 16e and 17 and Supplementary Note 9), suggesting that TE-alone transcript degradation might be mediated by PIWI-piRNA complexes⁴⁷. Moreover, several key factors of the piRNA-medicated regulation, such as piwil1 and piwil2 genes, are constantly expressed during the activation process and fade away during the repression process (Supplementary Fig. 16f). Both repressive markers H3K9me3 and small RNA profiles show no obvious control at gene loci, which is distinct from TE-alone loci (Supplementary Fig. 16g, h). In sum, the activation/repression of TE-alone loci and transcripts could be a complex process in multiple pre-/post-transcriptional regulatory layers.

Characterization of TE-gene chimeric transcripts

In addition to TE-alone transcript, TE-gene chimeric transcript is another critical format that TE sequences contribute to the embryogenesis transcriptome complexity. Many TE-gene chimeric transcripts have been shown diverse and important functions under various biomedical contexts, such as developmental biology and stem cells^6,7,48,49, yet their repetitive fragments lead to the challenge of being discovered or studied precisely by the conventional approaches (Supplementary Fig. 18a). Over the zebrafish early embryogenesis, we identify 25,082 TE-gene transcripts, including 4752 annotated and 20,330 novel transcripts that are supported by at least two high-quality reads (for 24,940 transcripts) or detected in at least two developmental stages (for 23,551 transcripts) (Fig. 1b and Supplementary Fig. 18a, b). Of note, DNA transposon sequences are found in the majority (21,187) of these chimeric transcripts (Supplementary Fig. 18c).

Loci identified with essential functions for zebrafish early embryogenesis

Our comprehensive transcriptome analysis of zebrafish early development not only provides a high-resolution profiling of TE activation but also identifies 1674 novel gene loci (including TE-genes and regular genes), that were previously unannotated in the Ensembl reference annotation library⁵⁰, which may play critical unknown functions in zebrafish development. Among these, 1260 loci have protein-coding potential, and 698 exhibit known functions and/or conserved domains (Supplementary Fig. 18d, e). To further understand the biological functions of these genes, co-expression network analysis is applied to cluster them with other functionally annotated genes, and two co-expression modules (M1 and M2) were selected for further investigation (Fig. 6a).

**Fig. 6: Functional analysis of novel genes.**

Within M1, we identified a locus, zeat1 (Zebrafish Embryogenesis Associated Transcript 1), serves as a hub gene. zeat1 is highly abundant at the fertilized egg and 1-cell stages, but degrades since the 64-cell stage (Fig. 6a, b and Supplementary Note 10). M1 genes are maternally inherited, and they are functionally enriched in protein transport, cell cycle, and transcription regulation (Supplementary Fig. 19a), including the important embryonic development genes, e.g., phf8 essential for cell cycle⁵¹, and e2f4, a key transcription activator⁵². A Tc1N1_DR transposon is inserted at the 3’ UTR of zeat1 without interrupting the coding sequence and four isoforms are identified over early embryogenesis (Fig. 6a). bmb, which is essential for mediating nuclear envelope fusion⁵³, is the only homolog of zeat1 despite their long-term divergence (Supplementary Fig. 19b). The zygotic zeat1 (zeat1^−/−) CRISPR-Cas9 knockout mutants (Supplementary Fig. 19c, d) show normal morphology during embryogenesis, while both the maternal zeat1 (Mzeat1) and maternal-zygotic zeat1 (MZzeat1) mutant embryos fail to undergo cell movements associated with epiboly and gastrulation, and arrest shortly after the mid-blastula transition (Fig. 6c). Therefore, zeat1 is a strictly maternal-effect gene and essential for early embryogenesis (Supplementary Note 10).

To elucidate the function of the zeat1 gene, we performed microinjections of H1m-GFP protein (a zebrafish H1 type linker histone that labels the chromatin) and wheat germ agglutinin (WGA) Alexa Fluor 647 conjugate (WGA-Alexa 647) into wild-type (WT) and Mzeat1 mutant embryos at the one-cell stage. These injections labeled the nuclei and nuclear envelope, respectively. Time-lapse confocal microscopy revealed that Mzeat1 mutant embryos showed severely abnormal nuclear morphology that was characterized by fragmented chromatin enclosed by the nuclear envelope (Fig. 6d).

Further investigation of the entire cell cycle suggested that karyomere fusion was disrupted in Mzeat1 mutants, resulting in multimicronucleated cells during the transition from telophase to interphase (Fig. 6e and Supplementary Movies 1, 2). Karyomeres, which are intermediate cleavage-stage structures with individual or groups of chromosomes enclosed by the nuclear envelope, failed to fuse properly, as seen in similar bmb maternal mutants⁵³. The localization of zeat1 to the nuclear envelope further supports its role in facilitating nuclear envelope fusion (Supplementary Fig. 19e). These results show that zeat1 is a critical nuclear envelope-associated protein and is essential for karyomere fusion during early zebrafish embryogenesis.

The other novel gene zeat2, which was unannotated by Ensembl (version v103) at the time of data analysis and manuscript preparation but is currently reported as “fastkd5” in RefSeq release 227, was selected from the M2 co-expression module, where the genes have high abundance specifically at the 64-cell and 1k-cell stages and are functionally enriched in regulation of cell cycle, protein deubiquitination, and translation (Fig. 6a, b and Supplementary Fig. 19a). zeat2 is a homolog of the FASTKD5/Fastkd5 genes in humans and mice (Supplementary Fig. 19b), yet its function in zebrafish has not been reported. Zygotic CRISPR-Cas9 knockout mutants of zeat2^-/- (Supplementary Fig. 19d) develop normally until 6 days post fertilization (dpf), whereafter showing a body curvature phenotype and apparent phenomenon of cell death (Fig. 6f). Finally, all the mutants cannot survive beyond 18 dpf, and therefore fail to develop into fertile adults (Fig. 6g and Supplementary Note 10).

To determine whether zeat2 regulates non-canonical mitochondrial RNA processing like FASTKD5 in humans⁵⁴, we performed qRT-PCR to quantify the abundance of mitochondrial transcript precursors between WT and zeat2^−/− mutant embryos. The zebrafish mitochondrial genome encodes 13 protein-coding genes, some of which have non-canonical junctions, including the ones in the 5’UTR of co1, the 3’UTR of nd6, and regions between atp6 and co3, and nd5 and cyb^54,55 (Supplementary Fig. 19f). We observed a significant accumulation of non-canonical precursor mRNAs in the absence of zeat2, with a particularly 27-fold enrichment of the 5’ UTR of co1 (Fig. 6h). In contrast, canonical cleavage sites involving tRNA flanking, such as nd1-nd2 and nd3-nd4l, showed no significant precursor accumulation in zeat2^−/− mutants compared to WT (Fig. 6h). These results indicate that zeat2 is an ortholog of FASTKD5 and plays a similar role in processing mitochondrial mRNAs at non-canonical junctions. The loss of zeat2 disrupts the processing of co1, co3, and other mitochondrial genes, which in turn probably affects the oxidative phosphorylation system and leads to developmental arrest in zebrafish embryos.

Discussion

In contrast to the large size of genomic coverage by transposable elements, limited details of ZTA have been revealed in depth although TEs have been known to be activated^7,10,11. Here we demonstrate that only a small proportion of TEs could be activated among the tremendous number of copies. It is worth noting that 550 TE-alone loci is a conservative estimate and several factors may influence the identification of active TE-alone loci, including stage-specific TE expression, low sequencing coverage, TE insertion polymorphisms among zebrafish strains and the TE annotation methods. Additionally, TE heterogeneity among zebrafish strains may contribute to a biased estimate of TE-alone loci. The TE polymorphism between the experimental samples (AB and India strains) and the TU reference genome may lead to the misidentification of highly divergent TE loci and strain-specific copies.

By precisely identifying and quantifying the TE-alone loci/transcripts, we disclose the hierarchical heterogeneity of activation patterns among families, subfamilies, loci, and transcripts, and between TE-alone and TE-gene loci, which could be a result from their diverse sequence contexts and local epigenetics. These subtle but unique transcriptomic patterns of ZTA were largely mixed or masked in the previous studies until long-read sequencing is applied, and thus the development and application of the long read-based epigenetic assays will further benefit the understanding of ZTA^{56,57,58,59,60}. While various types of TEs undergo polyadenylation^61,62, our analysis might overlook the detection of certain TE transcripts lacking poly(A) tails. Cytoplasmic polyadenylation prevalently existing in zebrafish early development may also lead to a bias in quantifying maternal TE transcripts^63,64 albeit significant underestimation is not observed for TE-alone transcripts between poly(A) enrichment-based and ribosomal RNA depletion-based RNA-seq (Supplementary Figs. 20 and 21 and Supplementary Note 11). This necessitates further investigation using a poly(A)-independent sequencing approach. To further reduce the impact on characterization of regular genes, we suggested to only consider the zygotically activated transcripts.

Despite that ZTA and ZGA are similar in several aspects, their difference is more remarkable, such as the highly heterogeneous trajectories and late activation time of ZTA. Indeed, limited resources in very early embryogenesis must be allocated to essential genes and proteins for cellular maintenance and zygotic genome activation. In parallel, previous studies showed that aberrant transposon accumulation at earlier stages can arrest ZGA and cause severe defects in embryogenesis^39,65. A few TFs were previously reported to be involved in the TE activation (e.g., Dux, Klf5, Gata2)^32,66 and the TE silencing (e.g., KRAB-ZFPs)⁶⁷. A recent study showed that a large ZNF gene family represses TE expression in zebrafish, underscoring a potential conserved approach in TE silencing as mammals⁶⁸. In addition to mxtx1/2, we also analyzed 2,546 annotated zebrafish TFs⁶⁹ and found that 469 TFs were highly expressed with motifs identified at the activated TE-alone loci during ZTA (Supplementary Fig. 22 and Supplementary Note 12). This presumable machinery may be not fully supplied by maternal cells but would be only available until early ZGA is completed, because blocking the translation of earlier activated genes, such as mxtx1/2 and other potential TFs can dramatically repress ZTA (Fig. 2f). In addition to the silence stage from zygote to cleavage and the activation stage from blastula to gastrula, we find an overall decreasing trend of TE-alone transcripts starting at gastrula and extending to pharyngula in zebrafish, with partial support at epigenetics (Fig. 5), which is also distinct from ZGA. The bulk-level data in the previous study also indicated the same decrease in several mammals⁷ and prolonged TE transcription (i.e., LINE-1) across the normal repression process in mice can lead to developmental arrest⁶⁵. Therefore, ZTA is likely under a conserved control by multi-layer epigenetic regulation (Fig. 5).

TE-alone and TE-gene transcripts are more likely localized inside the nuclei based on both RNA-seq analysis and FISH experiments (Fig. 4). Although those TE-associated transcripts are significantly longer than regular gene transcripts (Supplementary Fig. 15e), which may potentially cause a longer time of transcription, the influences of the nascent TE-associated transcripts on the detection of subcellular localization remains unclear and requires further study. Our catalog of the full-length sequences of TE transcripts delivers informative clues to the persistent paradox concerning the prevalence of TE activation and the rarity of TE transposition. Most active TE-alone transcripts, such as ERVs that are the most abundant in zebrafish early embryogenesis, lack conserved protein domains (Fig. 1g), suggesting impaired transposition ability. In contrast, although most LINE transcripts possess intact coding regions (Fig. 1g), they are among the most nuclear-localized TE-alone transcripts (Fig. 4f and Supplementary Fig. 15c), indicating tight host genome control over their translation in the cytoplasm and thus autonomous transposition. Moreover, the PIWI-piRNA pathway likely adds another layer of post-transcriptional regulation upon the TE transcripts already transported into the cytoplasm before entering the translation machinery. However, transposition frequency is not zero although low, as most active TE-alone loci, especially LINEs and LTRs, are more likely originated by active transposition instead of segmental duplication (Supplementary Fig. 23a–d and Supplementary Note 13). This result suggests that some TEs can still be transposed without intact structure of domains. For example, an endogenous retrovirus in zebrafish, BHIKHARI is likely actively transposed despite that it only encodes a Gag protein¹⁴. Additionally, the active TE-alone loci are predominantly harbored at genomic regions with relatively low recombination rates, such as sex chromosome⁷⁰ (Supplementary Fig. 23e), which may enhance their survival from genomic recombination elimination.

The full-length sequences are also informative for distinguishing TE-alone and TE-gene transcripts and enable reliable discovery of many unannotated loci/transcripts. While this study focuses on TE-alone transcripts, TE-gene chimeras represent a larger reservoir of TE-derived transcripts (total 25,082 versus 706; novel 20,330 versus 661) (Fig. 1b) activated during zebrafish early embryogenesis. The chimeric products, such as zeat1, may deliver the previously unknown functions and regulatory roles in zebrafish early embryogenesis (Fig. 6). The integration of different types of TEs with genes, along with the position and proportions of TE insertions, results in a more complicated stratification. Consequently, conducting another comprehensive, yet challenging, investigation of TE-gene transcripts is warranted to study how TEs cooperate with the host genome in embryogenesis.

Methods

Zebrafish embryo collection

Two zebrafish strains, AB and India WT, were used in this study. They were reared in a pH (7.2–7.6) controlled recirculation system at around 28 °C, with a 14-h light and 10-h dark cycle. Breeding involved placing males and females in a tank with a separator, followed by water replacement and separator removal for egg laying. Collected embryos, cultured in Holtfreter’s solution (pH 7.0–7.4), were used for further generations or experiments. Eleven developmental stages of embryos were defined at specific time points, and embryos were collected at the following stages: fertilized egg (immediately after fertilization), 1-cell at 0.5 h post fertilization (hpf), 64-cell at 2, 1k-cell at 3, high at 3.3, oblong at 3.7, sphere at 4, dome at 4.3, 30%-epiboly at 4.7, 50%-epiboly at 5.3 and shield at 6 hpf, respectively.

PacBio Iso-Seq library preparation and sequencing

Total RNA was isolated from each developmental stage of zebrafish embryos using TRIzol^TM Reagent (Invitrogen, USA). RNA purity and concentration were assessed with the NanoPhotometer® Spectrophotometer (IMPLEN, CA, USA) and the Qubit® RNA Assay Kit in the Qubit® 3.0 Fluorometer (Life Technologies, CA, USA). The RNA integrity number (RIN) was determined using the RNA Nano 6000 Assay Kit and Agilent Bioanalyzer 2100 system (Agilent Technologies, CA, USA). RNA samples with a RIN ≥ 8 were used to synthesize cDNA with SMARTer^TM PCR cDNA Synthesis Kit (Takara Bio, CA, USA). PCR amplification was performed using a KAPA HiFi PCR Kit (Kapa Biosystems, MA, USA) with the optimized number of cycles. Size selection of PCR products (cDNA) for each sample was applied using the BluePippin System (Sage Science, MA, USA): <3 kb and >3 kb. Subsequently, two cDNA libraries (<3 kb, >3 kb) were prepared using a SMRTbell^TM Template Prep Kit 1.0 (Pacific Biosciences, CA, USA) and sequenced on the PacBio sequel and sequel II platforms.

Short read library preparation and sequencing

mRNA from five stages (fertilized egg, 1-cell, 64-cell, 1k-cell, and shield) was extracted with Dynabeads^TM mRNA Purification Kit (Invitrogen, USA) and subjected to TURBOTM DNase (Invitrogen, USA) treatment. RNA-Seq libraries were prepared using the NEBNext® Ultra^TM RNA Library Prep Kit for Illumina® (NEB, USA) and then sequenced at 2 × 75 bp paired-end mode on an Illumina HiSeq X Ten system with 8 replicates for each stage. Sequencing data quality was checked with FastQC (v0.11.8)⁷¹ and the low-quality reads and adapters were removed using Cutadapt (v1.8.1)⁷².

Identification of TE-alone, TE-gene and gene transcripts

A total of 1.84 million CCS (Circular Consensus Sequencing) reads were extracted from PacBio subreads BAM files using ccs function embedded within Iso-Seq3 pipeline with “–all” parameter (https://github.com/PacificBiosciences/IsoSeq). The CCS reads were then divided into full-length (FL) and Non-full length (Non-FL) groups using BamTools (v2.5.1) (https://github.com/pezmaster31/bamtools) with the parameter “-tag rq = −1”. All FL CCS reads were processed through the standard Iso-Seq3 pipeline, including adapter removal with lima, polyA tail removal with refine, and then clean reads were mapped on the zebrafish reference genome (GRCz11) with minimap2 (v2.24)⁷³ with the parameter “--MD -ax splice:hq -uf --secondary=no”. The output alignment files in SAM format from all 11 developmental stages were fed together to TALON (v5.0)⁷⁴ for a combined transcript annotation with “--allowGenomic” parameter. We set the following criteria to filter out different types of unreliable transcripts annotated by TALON: (1) If one transcript is recorded in the zebrafish Ensembl reference annotation (v103), it is required to have ≥1 supporting read in at least one stage; (2) If one transcript is defined as novel transcript derived from a known gene locus, it is required to have at least 3 reads (equal to 1 TPM calculated based on our dataset) to support in at least one stage; (3) If one transcript is a novel transcript from an unannotated locus, it is required to have at least 2 reads to support in at least one stage. We also exclude extremely lowly expressed transcripts by filtering transcripts with <1 TPM at all 11 stages, except for autonomously expressed TE transcripts. We applied the same analysis with Non-FL CCS reads by TALON for read annotation, but Non-FL CCS reads are only used in quantification for transcripts identified by FL CCS reads.

All transcripts were classified into Known-Known (annotated gene locus and annotated transcript), Known-Novel (annotated gene locus and unannotated transcript) and Novel-Novel (unannotated gene locus and unannotated transcript) transcripts according to zebrafish Ensembl annotation.

We implemented CD-HIT (v4.8.1)⁷⁵ to merge incomplete and highly identical transcripts generated by TALON. We established specific criteria for clustering different types of transcripts defined by TALON, including: (1) Ensembl annotated transcripts were treated as true transcripts and not allowed to cluster, (2) autonomously expressed TE transcripts were not merged at this step and retained for further manual annotation, (3) truncated transcripts, such as those annotated as ISM (Incomplete Splice Match) in known genes by TALON, were clustered at 98% identity, and (4) other types of transcripts, such as NIC (Novel In Catalog) and NNC (Novel Not in Catalog) were clustered at 100% identity to minimize the loss of potential novel transcripts.

For the final full-length transcript dataset, we quantified the expression abundance by combining both the FL and nonFL CCS read count for each transcript at each development stage. Transcript per million reads (TPM) was then calculated to represent the normalized expression value for each transcript. Gene-level expression level is the sum of all transcripts derived from the same locus.

Classification of TE-alone, TE-gene and gene transcripts

Zebrafish TE consensus sequences were obtained from Dfam (v3.2) (https://www.dfam.org/) and used to annotate the latest released zebrafish reference genome (GRCz11) with RepeatMasker (v4.1.0) (https://www.repeatmasker.org). The TE and transcript annotation were compared based on genome coordinates, and all transcripts were classified into three categories: autonomously expressed TE transcripts (TE-alone), if the transcripts were embedded within TE regions over 90%, and the first exon was also covered by TE (TE promoter-driven transcription); TE-gene chimeric transcripts (TE-gene) if the transcripts had overlapped with TE annotation (≥10 bp, which is the smallest TE fragment in zebrafish reference genome), and gene transcripts, which had no overlaps with any TE-related sequences. TE-alone transcripts were then classified into different TE subfamilies, families, and types (DNA transposon, LTR, LINE, and SINE) according to the zebrafish TE classification system. TE-alone transcripts can be further separated into two subgroups based on the structure: transcripts from single-TE loci (singular) and chimeric TE-alone transcripts that are derived from multiple adjacent TE loci that belong to different subfamilies. Chimeric TE-alone transcripts can then be defined with major TE components (defined with subfamily name) and minor TE components based on the proportion of each TE within the transcript. The major TE components were used to classify chimeric TE-alone transcripts into different TE subfamilies, families, and types.

Manual annotation of TE transcripts and genomic loci

By using Integrated Genome Viewer (IGV) (v2.16.0)⁷⁶, we compared gene annotation, TE annotation, TALON TE transcript annotation and read alignment together to manually check if the transcripts are truly transcribed from corresponding TE loci. A curated GTF annotation file recording the genomic coordinate information about TE loci and TE transcripts was created for further use.

Comparison between short read and long read-based RNA-sequencing

Both Iso-Seq data (CCS reads) and Illumina RNA-seq data from mouse embryonic development were retrieved from NCBI (SRP225196) according to this study²⁸. We applied the same pipeline for analyzing mouse long-read data as previously described for zebrafish, except for manual annotation for this initial comparison.

Clean reads from Illumina RNA-seq data were aligned onto zebrafish (GRCz11) and mouse (GRCm39) reference genomes by HISAT2 (v2.1.0)⁷⁷ with “-k = 1”, respectively. Alignment results from each sample were then merged into a combined, sorted BAM file with SAMtools (v1.9)⁷⁸. A combined transcript annotation was generated by using StringTie (v1.3.5)⁷⁹ and based on this annotation, sample-specific quantification was conducted by StringTie with “-e” parameter.

Both the number and the length of expressed TE-alone transcripts were compared between two sequencing technologies. Two BED files with expressed TE-alone loci annotation by two approaches were prepared to detect the overlapped TE-alone loci (detected by both long-read and short-read data) by intersect function in BEDTools (v2.30.0)⁸⁰.

A conventional approach was applied to define ZGA and ZTA during zebrafish embryogenesis with short-read RNA-seq data spanning the following five developmental stages: fertilized egg, 1-cell, 64-cell, 1k-cell, and shield. We first aligned short reads onto zebrafish genome using HISAT2 with default parameters and then quantified the gene/TE expression using StringTie with a combined annotation (Ensembl gene annotation and RepeatMasker annotation for zebrafish reference genome). Finally, we calculated the total TPM for all zygotic genes and TEs, respectively at each stage.

LTR structure analysis

Based on our annotation, we defined the LTR locus as intact if it contains 5’LTR, internal coding sequence and 3’LTR structure. Losing more than one portion will be defined as truncated.

Conserved domain analysis on TE-alone transcripts

Conserved domains for DNA transposons, LTRs, LINEs and ERVs were defined according to the previous description⁸¹. For TE-alone transcripts, open reading frames (ORFs) were predicted with TransDecoder (https://github.com/TransDecoder/TransDecoder) and protein sequences were generated to annotate conserved domains by HMMER (v3.3.1) with Pfam database (version 35.0)⁸². TE-related domains were collected and compared within different types of TEs.

Define maternally deposited transcripts and zygotically activated transcripts

For TE-alone, TE-gene and gene transcripts, we defined a transcript as a maternal-deposited transcript if one transcript has TPM > 0 at either fertilized egg or the 1-cell stage. Otherwise, a transcript can be defined as a zygotically activated transcript.

Whole mount in situ hybridization

For WISH experiment, we developed a specific probe design pipeline for TE subfamily. Simply, we extracted sequences for all transcripts from the target TE subfamily and aligned these sequences against all other transcripts (both TE-derived and gene-derived transcripts) according to our transcriptome annotation. Based on the alignment results, any overlapped sequences were detected and removed, and the remaining sequences were further processed with multiple sequence alignment by MAFFT (v7.455)⁸³ with default parameters. The consensus TE subfamily-specific sequences were extracted for WISH probe design.

The length of the probe for in situ hybridization was generally designed to be between 500–800 bp according to the length of unique regions, with the T7 promoter sequence directly added to the reverse primer of the probe. cDNA synthesized from reverse transcription was used as a template for PCR amplification of the target fragment. The PCR products were then checked by gel electrophoresis, and the desired bands were recovered from gel extraction. The recovered product is then used as a template for in vitro transcription to synthesize the probe, typically at 37 °C for 2 h.

Collect embryos at desired stages, and fix overnight in 4% paraformaldehyde at 4 °C. Next day, replace paraformaldehyde with PBS, then dehydrate using 50%, 70%, and 100% methanol for 5 min each at room temperature. Store at −20 °C for 2 h. For in situ hybridization, rehydrate embryos. Shift to RNase-free tubes and add DEPC + PBST gradually. Set hybridization temperature (55–65 °C). Pre-hybridization: 300 µl HYB−, then 300 µl HYB+ for 4 h + at the hybridization temperature. Add 1% probe diluted in HYB+, heat to 70 °C for 10 min, and add to the tube. Hybridize for 16 h at the hybridization temperature. Store retrieved probe at −20 °C. Clean by washing 50% formamide/2×SSCT (30 min), 2×SSCT (15 min), 0.2×SSCT (30 min) at hybridization temperature. Wash thrice with MABT for 5 min. Transfer to a 48-well plate, add substrate with levamisole, and stain at 28 °C. Monitor under a microscope, and stop reaction with 4% paraformaldehyde when desired.

RNA-seq data analysis with zebrafish mutants and treated samples

We downloaded RNA-seq data for zebrafish embryo samples at 4, 6, and 8 hpf, including WT, ɑ-Amanitin injection, cycloheximide treatment, triple LOF and mRNA rescue samples (PRJNA206070). Total RNA-seq data were first aligned onto zebrafish rRNA references to filter rRNA reads. Then, both poly(A) capture RNA-seq data and filtered total RNA-seq data were quantified using RSEM (v1.3.3)⁸⁴, with our long read-based sample-specific transcriptome reference. For each developmental stage and condition, the expression between WT and treatment was compared following log₂(TPM + 1) transformation. Genes with low expression levels (TPM < 1) in WT samples were excluded from this analysis.

Binding sites prediction of mxtx1/2 on zebrafish ERV promoters

Quantification of mxtx1/2 expression was performed using both short-read data (obtained from Expression Atlas on the EBI website) and long-read RNA-seq data from this study. The prediction of binding motifs of mxtx1/2 was conducted with CIS-BP⁸⁵ (http://cisbp.ccbr.utoronto.ca/). As direct ChIP-seq data for mxtx1/2 were not available in the database, binding motifs were indirectly inferred by considering closely related TFs in human and mouse. The ortholog relationship between Mxtx1/2 and DUX4 was established by OrthoFinder and PANTHER as the reciprocal best alignment (see https://www.alliancegenome.org/gene/ZFIN:ZDB-GENE-000710-7#orthology). For each ERV subfamily, the 5’LTR sequence was extracted, and potential binding TFs on these LTR sequences (promoter regions) were predicted using FIMO within the MEME suite⁸⁶. Predicted binding motifs of mxtx1/2 were subsequently compared with predicted TF motifs on ERV promoters to evaluate mxtx1/2’s binding potential on ERV elements.

A previously published ChIP-seq data of zebrafish Nanog and Mxtx2³⁴ were analyzed to identify genome-wide binding sites. The clean data were aligned onto the reference genome with Bowtie2 (v2.5.1)⁸⁷ and peak calling was performed using MACS2 (v2.2.9.1)⁸⁸ with default parameters. This resulted in 3534 and 33,111 peaks for Mxtx2 and Nanog, respectively. We further intersected these peaks with our TE annotation and found 43 Mxtx2 and 164 Nanog peaks overlapped with TE-alone loci. Of 43 Mxtx2 TE-related binding sites, 40 loci are ERVs and the other 3 are DNA transposons. However, Nanog can bind more diverse types of TEs, including 126 LTRs, 24 DNA transposons and 14 LINEs.

TE activation during early mouse embryonic development

We utilized public Iso-Seq data (see above) covering mouse oocytes and embryos at the following developmental stages (1-cell, 2-cell, 4-cell, 8-cell, and blastocyst) for TE identification, with the same standard applied in zebrafish. In total, 28,939 full-length transcripts were identified and further classified into 20,321 gene transcripts, 7170 TE-gene chimeras and 1448 TE-alone transcripts.

Transcript quantification was performed using short-read RNA-seq data from the aforementioned stages. RSEM was employed with the established long-read transcripts as a reference, and with Bowtie 2 as the aligner with default parameters. Additionally, mouse single-cell embryo RNA-seq data was downloaded, spanning time points of 0, 2, 4, 6, 8, 10, and 12 hpf, from NCBI (PRJNA662943). Adapter sequences, along with six bases from the 5’ end and five bases from the 3’ end, were removed using Cutadapt (v1.18) (-u 6 -u -5). The clean data were used for transcript quantification using the same procedure. Transcripts Per Million (TPMs) were collected at each stage to create an expression matrix, and TPM < 1 at all stages were filtered out as lowly expressed transcripts. To quantify the abundance of each TE subfamily, TPMs were summed across all TE transcripts within the same subfamily. Mean TPMs were calculated from replicates at each respective time point. Maternally inherited and zygotically activated TE subfamilies were differentiated by setting a cutoff of TPM > 5 at the 1-cell or 0 hpf stages for maternally inherited TEs. Otherwise, these were categorized as zygotic TEs. To determine the activation stages of TE subfamilies, we required the expression at the previous stage to be <5 TPM, while at the tested stage ≥5 TPM and a continuous increase in at least two following stages.

Examine the regulatory sequences of TE-alone loci from BHIKHARI subfamily

We extracted 5’LTR sequences from all loci within BHIKHARI-I subfamily, conducted multiple sequence alignments using MAFFT. The structure and conserved motifs on 5’LTR sequences were identified based on a previous publication³⁵. We also performed alignments between 5’LTR from BHIKHARI-I_DR and 5’LTR from BHIKHARI-3-I_DR and BHIKHARI-5-I_DR, and very low sequence identity was detected.

Expression analysis of TE-gene and gene transcripts

Maternal and newly activated transcripts were defined for TE-gene and gene transcripts with the same criteria as TE-alone. For each stage, the total TPM and total number of activated transcripts were calculated. For TE-gene transcripts, we further divided them into five subgroups (0–20%, 20–40%, 40–60%, 60–80%, and 80–100%) according to TE proportion in the sequences.

Expression analysis on family/subfamily/locus-levels

Locus-level expression abundance was calculated by summing the expression values of all transcripts derived from the loci. For expression analysis on the family and subfamily level, we calculated the sum of abundance for all TE-alone transcripts belonging to a certain TE family or subfamily.

Expression trajectory analysis on the locus/transcript level

The R package “pheatmap” was used to perform unsupervised clustering and plot heatmaps. We used “Ward.D” as the clustering method and the clusters were identified using “cutree” method. Then, the median expression value of transcripts from each cluster was used to represent the average expression level. To estimate the expression complexity of TE-alone transcripts within each subfamily, we calculated the entropy of active TE-alone subfamilies. Specifically, we first calculated the number of TE-alone transcript expression patterns for each TE-alone subfamily. Then, the function “entropy” from the R package “entropy” was used to calculate the entropy for each TE-alone subfamily.

For TE subfamilies with multiple transcripts, we summarized the number of activation stages and expression clusters covered. For each TE subfamily, we determined the activation stage for each transcript and recorded the number of different stages this subfamily can cover. The same strategy was applied for summarizing the number of expression clusters one subfamily can distribute.

Subcellular localization analysis with nuclear and cytosolic RNA-seq data

While direct comparison of RNA abundance between the nucleus and cytoplasm is not typically feasible⁸⁹, we can calculate the relative nucleus/cytoplasm (N/C) proportion to indirectly compare the subcellular localization of gene transcript, TE-gene transcripts, and TE-alone transcripts. A publicly available nuclear and cytosolic RNA-seq dataset⁹⁰, covering the 1k-cell, dome, and shield stages was downloaded (PRJNA599208), processed, and mapped onto our combined transcriptome. Both gene and transcript expression are quantified with RSEM. To compare the subcellular localization among TE-alone, TE-gene, and gene transcripts, we calculated a relative N/C proportion for each group at each stage. We first calculated the total abundance of TE-alone, TE-gene, and gene transcripts in the nucleus and cytoplasm, respectively; then we calculated the abundance proportion for TE-alone, TE-gene, and gene transcripts in the nucleus and cytoplasm, respectively. Finally, the relative N/C proportion was measured by dividing the proportion in the nucleus by the proportion in the cytoplasm for each group. The same approach was applied to calculate the relative N/C proportion for each TE subfamily. Boxplots were created, depicting Q1 (25th percentile), Q3 (75th percentile), the maximum (Q3 + 1.5IQR), IQR (Interquartile Range), and the minimum (Q1 − 1.5IQR), with medians represented by center lines unless otherwise specified.

To test the prediction performance on subcellular localization with relative N/C proportion, we calculate relative N/C proportion with long intergenic non-coding RNAs (lincRNAs) and protein-coding genes at each stage, based on the prior knowledge that lincRNAs tend to localize inside the nuclei and protein-coding genes tend to localize in the cytoplasm. We first extracted all annotated lincRNAs and protein-coding genes according to zebrafish Ensembl annotation. Then, we removed maternally deposited lincRNAs and protein-coding genes and calculated the relative N/C proportion following the approach previously described.

Fluorescence in situ hybridization

Candidate TE subfamilies were selected for subcellular localization validation using FISH. Representative TE subfamilies with moderate to high expression levels and different relative N/C proportion from DNA transposons, LTR and LINEs were used for experimental validation. Specific probes for each candidate subfamily were designed with the same strategy as WISH probe design described above.

Embryos were prepared according to the WISH protocol. Sequentially, embryos underwent hybridization rounds: Anti-digoxigenin antibody (Digoxigenin-AP antibody, Catalog#: 11093274910, Roche) (POD-conjugated) was diluted 1:1000 in blocking solution and incubated overnight at 4 °C with rotation. Post-incubation, samples were switched to MABT solution and washed thrice for 30 min at room temperature. Subsequently, a single 5-min wash with PBST was performed. For signal enhancement, Cy3-labeled tyramide reagent (PerkinElmer, 1:50 in amplification dilution buffer) was applied to embryos, followed by a 20-min dark incubation and a 10-min PBST wash. Blocking involved 1% BSA and 10% inactivated goat serum in PBST at room temperature for 1 h. β-catenin (L54E2) mouse antibody (1:200) (Catalog#: 2677, Cell Signaling Technology) was applied and incubated overnight at 4 °C. Afterward, samples underwent three 15-min PBST washes. A secondary antibody (Alexa Fluor 488 AffiniPure Goat anti-Mouse IgG, Catalog#: 115-545-003, Jackson ImmunoResearch Labs) was applied and incubated at room temperature for 2 h. Post-secondary antibody incubation, DAPI (0.5 μg/ml) was used for a 10-min incubation at room temperature. The samples were then subjected to three 15-min PBST washes. Following PBST removal, samples were mounted with an anti-fluorescence quenching mounting agent suitable for confocal imaging.

Short-read RNA-seq quantification of TE-alone loci beyond the shield stage

A benchmark RNA-seq dataset over the early development of zebrafish was downloaded from European Nucleotide Archive (PRJEB7244, PRJEB12296 and PRJEB12982)¹⁹. For each developmental stage, we quantified the expression of TE-alone transcripts using RSEM (v1.3.3) based on our long read-established transcriptome.

Epigenetic regulation on activation and repression of TE-alone loci

ATAC-seq data were downloaded from NCBI (GSE130944)⁴² and aligned onto the genome with Bowtie2 (v2.5.1)⁸⁷ with parameters “--very-sensitive –no-mixed –no-unal -X 2000”. PCR duplicates were removed with “MarkDuplicates” function in GATK4. Uniquely mapped reads were extracted with SAMtools with “-q 30”. BAM files were converted into “bw” files with “bamCoverage” function in deepTools (v2.0)⁹¹.

H3k4me3 and H3K27ac data were downloaded from NCBI (PRJNA473799 and PRJNA434216) and analyzed with the same method used for ATAC-seq data.

H3K9me3 ChIP-seq data were downloaded from NCBI (PRJNA449956)⁴⁴ and mapped onto genome with Bowtie (v1.3.1)⁹² with “-M 1 –best –strate”. PCR duplicates were marked and removed with MarkDuplicates. ChIP and control BAM files were converted into “bw” file with “bamCompare” function in deepTools.

Small RNA data were downloaded from NCBI (PRJNA215266)⁴⁵ and were aligned using the same strategy as ChIP-seq data. BAM files were converted into a “bw” file with “bamCoverage” function in deepTools. Besides mapping onto the genome, we also aligned the data onto our TE-alone transcripts.

Heatmap visualization for the above datasets was generated by “computeMatrix” and “plotHeatmap” functions in deepTools with default parameters. BED files recording 5’ TSS to 3’ TTS used for deepTools were generated according to our transcriptome annotation.

We downloaded zebrafish piRNA collection from piRBase Release (v3.0)⁹³. Small RNA reads mapped onto TE-alone regions were extracted and aligned against the zebrafish piRNA database using Bowtie (v1.3.1) with default parameters. The percentage of annotated and unannotated reads was summarized for each stage. We investigated several sequence features of piRNAs, including the total number of mapped reads, read length distribution, and the 5’ uridine and adenine at the tenth base. Sequence logos were created using WebLogo 3 (ref. ⁹⁴).

“piRNA” as a keyword was used to search for the relevant gene names on The Zebrafish Information Network (ZFIN) (https://zfin.org). Genes involved with piRNA-dependent silencing, including piwil1 and piwil2, were recorded and quantified with short-read RNA-seq data previously described.

Functional analysis of the unannotated genes

We combined the transcripts from novel gene and TE-gene loci and prepared sequences for 2895 novel transcripts for several downstream analyses. First, we predicted the coding potential for these transcripts using CPC⁹⁵ (https://cpc.gao-lab.org/), which resulted in 1622 protein-coding and 1273 non-coding RNAs. Second, we conducted functional analysis on the 1622 protein-coding mRNAs. We extracted the protein sequences of the longest mRNA to represent each gene and ran InterProScan (v5.61-93.0)⁹⁶ to search the Pfam domains. In addition, we aligned protein sequences of all novel genes against all protein sequences of human and zebrafish and identified 367 genes with homologs in both or either species.

TE-related TF prediction

To computationally predict the potential TFs in regulating TE expression or repression, we first ran FIMO function within the MEME Suite to search the known motifs for 550 TE loci and identified 486 TFs with potential binding sites (p ≤ 0.00001) on TE’s promoter regions. We then investigated the expression pattern of 2546 annotated TFs (downloaded from AnimalTFDB 4.0)⁶⁹ during early embryonic development in zebrafish and found that 2120 TFs have expression ≥1 TPM at ≥1 stage. Of 486 potential TE-regulating TFs, 469 are highly expressed during early development in zebrafish. A heatmap was generated to illustrate the expression pattern of these TFs.

The comparison of RNA-seq data generated by different library preparations

Two publicly available RNA-seq datasets (PRJNA624126 and PRJNA529241) covering the zebrafish early development generated by both poly(A) capture and ribosomal RNA depletion library preparation methods were downloaded and quantified using Salmon (v1.10.0)⁹⁷. The TPMs estimated by different library preparation methods were compared and displayed in scatter plots for each transcript. For the data with biological replicates (PRJNA529241), we further applied DESeq2 (Log₂FC > 2 & FDR < 0.01) to identify significantly differentially expressed genes between two methods.

Evolutionary analysis on TE-alone loci

Kimura divergence (KD) values for each copy were retrieved from RepeatMaker results. Among DNA transposons, LTRs and LINEs, we compared the KD distribution between active and silent copies. For LTRs, we also compared the KD distribution between active intact and silent intact copies. Truncated and intact TE copies are determined by the sequence comparison with consensus sequences for the corresponding subfamily. We required the intact copy to account for at least 80% of consensus sequences and 5’end loss of no more than 30 bp and 3’ end loss no more than 100 bp. For each stage, Pearson’s correlation coefficients were calculated between KD and TPM.

A de novo LTR annotation was conducted with LTR_retriever⁹⁸. The insertion years of intact LTRs were calculated based on the sequence divergence of two LTRs with T = K/2μ, where K and μ are the divergence rate and neutral mutation rate. The K is estimated by the Jukes-Cantor model (1969) with K = −3/4*ln(1 − d*4/3), where the proportion of sequence differences, d = 100%-identity%, and a neutral rate of 1.46 × 10⁻⁸ in fish genomic evolution was used⁹⁹. We compared this de novo annotated LTR dataset with our expressed LTR annotation to find the common copies between the two datasets. The density profiles of evolutionary ages for both activated LTR and silent LTR groups were drawn and compared with the Kolmogorov–Smirnov test.

We downloaded genome assemblies for additional three zebrafish strains, including AB (PRJEB38589), Nadia (PRJEB38577) and CB (PRJEB38573) from NCBI for comparative genomic analysis. To identify the potential orthologous regions of expressed TE-alone loci on TU reference from these three strains, we extracted 25 kb upstream and downstream of TE-alone loci and aligned the sequences (50 kb surrounding sequences plus TE-alone sequences) against AB, Nadia and CB genomes with BLASTN (v2.13.0)¹⁰⁰. The best-aligned regions were identified, extracted and then validated by “nucmer” function within MUMMER (v3.23)¹⁰¹. Dot plots were generated by “mummerplot” function within MUMMER. The presence and absence of TE-alone loci on each genome were assayed and assigned to corresponding evolutionary splits across the phylogeny of four zebrafish strains modified according to the previous study¹⁰². TE-alone loci were then classified into four major groups according to this comparative analysis and evolutionary relationship among strains: TU (TE-alone loci existing in only TU genome), TU-AB (presence in TU and AB genomes), TU-AB-Nadia (presence in TU, AB and Nadia genomes) and TU-AB-Nadia-CB (presence in all four genomes), from evolutionarily young to old. Further, we combined TU and TU-AB into the young group and TU-AB-Nadia and TU-AB-Nadia-CB into the old group for comparison on ZTA pattern.

For four subgroups formed based on the evolutionary relationship among TU, AB, Nadia and CB, we tested the relationship between TPM and evolutionary ages. The activation and expression patterns between evolutionarily old and young groups were also tested. We calculated the number of TE-loci activated in each of the eight developmental stages and distributed in each of the eight clusters we established based on expression.

To determine which active TE-alone loci were originated by TE transposition or segmental duplication, we assumed that segmental duplication would copy both the TE loci and flanking regions, while transposition may only duplicate TE loci and perhaps a small number of flanking bases. Therefore, by comparing both the sequences from TE-alone loci and flanking regions between the target TE-alone locus and corresponding homologous region, we can differentiate these two mechanisms: if both TE-alone loci and flanking regions show high similarity, this TE locus is more likely originated by a segmental duplication event; otherwise, it is more like a transposition.

To do so, we first conducted genome-wide searching for homologous regions of each TE-alone locus, by employing minimap2 (N = 1000) to output all aligned regions. From this alignment result, the best alignment was evaluated and selected as the potential donor for a given TE-alone locus, which will be used to further infer the origin mechanism. For a pair of TE-alone locus and an identified donor region, we extracted the TE sequences plus extension into the flanking regions from 1 bp to 10 kb and conducted pair-wise alignment using MUMMER. If both the TE locus and its two flanking regions show high identity, the origin mechanism would be classified as segmental duplication; otherwise, it was categorized as TE transposition. We used 500 bp flanking regions (both 5’ and 3’ ends of TE-alone locus) as a cutoff to calculate the proportion of these two mechanisms in contributing to active TE-alone loci evolution. For DNA transposon, LTR and LINE, we employed the Chi-squared test to investigate if there was a different preference in either segmental duplication or transposition as an amplification approach.

Weighted correlation network analysis

We utilized the WGCNA (v1.72.1) package¹⁰³ in R (v4.2.3) to perform co-expression network analysis on 40 short-read RNA-seq samples from five stages (fertilized egg, 1-cell, 64-cell, 1k-cell and shield). The short-read RNA-seq data were quantified with kallisto (v0.46.0)¹⁰⁴, using our long read-based transcriptome annotation established above. Differential expression analysis between two neighboring stages was conducted with DESeq2 (v1.38.3)¹⁰⁵ using the filtering criteria of log₂(Fold Change) ≥ 1 and FDR ≤ 0.01. The resulting differentially expressed genes were used to establish a co-expression network with the “blockwiseModules” function in the WGCNA package. A pair-wise correlation matrix was computed for each set of genes, and an adjacency matrix was calculated by raising the correlation matrix to a power of 12, which was selected based on the scale-free topology criterion for network construction. The co-expression gene networks and modules were visualized and established with Cytoscape (v3.5.1)¹⁰⁶.

Gene ontology enrichment analysis

Gene sets were assayed with DAVID¹⁰⁷ (https://david.ncifcrf.gov/tools.jsp) for functional enrichment analysis. The DAVID results were downloaded and processed for dot-plot display with R (v4.2.3).

Homologous gene analysis

Gene domains were predicted by NCBI Conserved Domain Search Tools (https://www.ncbi.nlm.nih.gov/Structure/cdd/wrpsb.cgi)¹⁰⁸. Homologous genes were retrieved from NCBI by BLASTP with protein sequences of query genes¹⁰⁰.

Constructing phylogenetic trees

Multiple sequence alignments were conducted with protein sequences from homologous genes using MAFFT. Phylogenetic trees were established using FastTree (v2.1.11)¹⁰⁹ and drawn using iTOL (https://itol.embl.de/upload.cgi)¹¹⁰.

Design gRNA for CRISPR-Cas9 KO

CRISPR-Cas9-mediated mutagenesis was performed as previously reported¹¹¹. The guide RNA (gRNA) target site was selected with CHOPCHOP (v3)¹¹². The gRNA sequences used in this study were listed as follow:

zeat1: 5’-TAATACGACTCACTATAGggacccgctgcagctccaggGTTTTAGAGCTAGAAATAGC-3’.

zeat2: 5’-TAATACGACTCACTATAGggaccatctcccatggttggGTTTTAGAGCTAGAAATAGC-3’.

Each gRNA sequence contains the T7 promoter (upper case), a specific DNA-binding sequence (lower case), and a constant 19-nt tail (upper case). The gene-specific sequence was used in combination with the reverse universal sequence (5’-AAAAAAAGCACCGACTCGGTGCCAC-3’) to amplify the target sequence from the corresponding DNA.

Target validity detection and F₀ preparation

gRNA and Cas9 protein were diluted in nuclear-free water, gRNA at 50–100 ng/μl and Cas9 at 50 ng/μl. The mixture was microinjected into single-cell stage zebrafish embryos (50 injected, 10 as non-injected controls). After 24 h, 5 developing embryos were lysed in 50 μl 50 mM NaOH at 95 °C for 30 min, neutralized with 5 μl 1 M Tris-HCl (pH 8.0), and used for PCR assay. Gel electrophoresis was used to confirm PCR product size. T7E1 enzyme cleavage was applied to assess target efficiency. Positive PCR products were then sequenced. Upon confirmation, large-scale injections (around 200 embryos) yielded F₀ fish.

Mutant screening and propagation

F₀ zebrafish were bred with WT adults. F₁ embryos were collected at 24 h post-fertilization, with eight embryos per pair tested for mutations. Detected mutations led to the rearing of the remaining F₁ embryos. Upon maturity, F₁ fish underwent caudal fin clipping for mutation genotyping. Selected heterozygous F₁ adults were crossed to produce F₂ embryos, nurtured for a month, and genotyped. F₂ homozygous mutants produced F₃ embryos. Maternal mutants arose from F₂ homozygous females crossed with WT males, while maternal-zygotic mutants arose from the self-crossing of F₂ homozygous mutants. This method generated the heterozygous or maternal mutants discussed in this study.

Live imaging

One-cell embryos were injected with h1m-GFP protein (green) and WGA-Alexa 647 (red) and manually dechorionated. Embryos at the 64- to 128-cell stages were mounted in 5% methylcellulose, and images were taken on Leica TCS SP8 microscope in an environmentally controlled chamber (26 °C).

qRT-PCR

The precursor transcript abundance of mitochondrial genes and pre-processed junctions were measured on RNA isolated from 10-dpf zebrafish using the miRNeasy RNA extraction kit (Qiagen). cDNA was prepared using reverse transcriptase (Promega) with random hexamers and used as a template in the subsequent PCR that was performed using a Roche Light Cycler 480 machine and analyzed following the 2^−oche method. 18S rRNA was used as the reference gene for qRT-PCR. An unpaired T-test was performed with Prism 10.

Statistics and reproducibility

No statistical method was used to predetermine sample size. No data were excluded from the analyses. For long read RNA-seq data, no biological replicates were used, but for short-read RNA-seq data, eight biological replicates were used to generate the data. Unless specifically stated, a boxplot shows the center as median, lower bound of the box as the first quartile (Q1), upper bound of the box as the third quartile (Q3) and lower whisker (Q1 − 1.5 × IQR) and upper whisker (Q3 + 1.5 IQR) as minima and maxima.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All original sequencing data generated in this study have been submitted to the NCBI Sequence Read Archive (SRA) and are accessible under the BioProject accession number PRJNA1028258. The previously published data with the detailed description used in this study is listed in Supplementary Data. A video presentation introducing this project is available on the website (https://youtu.be/Bb-qBZZNQo4). Source data are provided with this paper.

Code availability

The code of the major analysis modules in this study (including data quality control, alignment, transcript identification and quantification, TE-derived transcript identification and classification) are packaged into a bioinformatics pipeline called aTEA. aTEA is publicly available at https://github.com/Augroup/aTEA.

References

Waterston, R. H. et al. Initial sequencing and comparative analysis of the mouse genome. Nature 420, 520–562 (2002).
Article ADS CAS PubMed Google Scholar
Hoyt, S. J. et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 (2022).
Article CAS PubMed PubMed Central Google Scholar
Howe, K. et al. The zebrafish reference genome sequence and its relationship to the human genome. Nature 496, 498–503 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Miao, B. et al. Tissue-specific usage of transposable element-derived promoters in mouse development. Genome Biol. 21, 255 (2020).
Article CAS PubMed PubMed Central Google Scholar
Judd, J., Sanderson, H. & Feschotte, C. Evolution of mouse circadian enhancers from transposable elements. Genome Biol. 22, 193 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cosby, R. L. et al. Recurrent evolution of vertebrate transcription factors by transposase capture. Science 371, eabc6405 (2021).
Article CAS PubMed PubMed Central Google Scholar
Modzelewski, A. J. et al. A mouse-specific retrotransposon drives a conserved Cdk2ap1 isoform essential for development. Cell 184, 5541–5558.e22 (2021).
Article CAS PubMed PubMed Central Google Scholar
Pontis, J. et al. Primate-specific transposable elements shape transcriptional networks during human development. Nat. Commun. 13, 7178 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Garcia-Perez, J. L., Widmann, T. J. & Adams, I. R. The impact of transposable elements on mammalian development. Development 143, 4101–4114 (2016).
Article CAS PubMed Google Scholar
Senft, A. D. & Macfarlan, T. S. Transposable elements shape the evolution of mammalian development. Nat. Rev. Genet. 22, 691–711 (2021).
Article CAS PubMed Google Scholar
Chang, N. C., Rovira, Q., Wells, J., Feschotte, C. & Vaquerizas, J. M. Zebrafish transposable elements show extensive diversification in age, genomic distribution, and developmental expression. Genome Res. 32, 1408–1423 (2022).
Article CAS PubMed PubMed Central Google Scholar
He, J. et al. Identifying transposable element expression dynamics and heterogeneity during development at the single-cell level with a processing pipeline scTE. Nat. Commun. 12, 1456 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, H. J. et al. Epigenomic analysis reveals prevalent contribution of transposable elements to. Genome Res. 32, 1424–1436 (2022).
Article CAS PubMed PubMed Central Google Scholar
Chang, N. C. et al. Gag proteins encoded by endogenous retroviruses are required for zebrafish development. bioRxiv https://doi.org/10.1101/2024.03.25.586437 (2024).
Macfarlan, T. S. et al. Embryonic stem cell potency fluctuates with endogenous retrovirus activity. Nature 487, 57–63 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, M. T., Bonneau, A. R. & Giraldez, A. J. Zygotic genome activation during the maternal-to-zygotic transition. Annu. Rev. Cell Dev. Biol. 30, 581–613 (2014).
Article CAS PubMed PubMed Central Google Scholar
Pálfy, M., Joseph, S. R. & Vastenhouw, N. L. The timing of zygotic genome activation. Curr. Opin. Genet. Dev. 43, 53–60 (2017).
Article PubMed Google Scholar
Vastenhouw, N. L., Cao, W. X. & Lipshitz, H. D. The maternal-to-zygotic transition revisited. Development 146, dev161471 (2019).
Article CAS PubMed Google Scholar
White, R. J. et al. A high-resolution mRNA expression time course of embryonic development in zebrafish. Elife 6, e30860 (2017).
Article ADS PubMed PubMed Central Google Scholar
Lanciano, S. & Cristofari, G. Measuring and interpreting transposable element expression. Nat. Rev. Genet. 21, 721–736 (2020).
Article CAS PubMed Google Scholar
Bourque, G. et al. Ten things you should know about transposable elements. Genome Biol. 19, 199 (2018).
Article CAS PubMed PubMed Central Google Scholar
Sakashita, A. et al. Endogenous retroviruses drive species-specific germline transcriptomes in mammals. Nat. Struct. Mol. Biol. 27, 967–977 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wicker, T. et al. A unified classification system for eukaryotic transposable elements. Nat. Rev. Genet. 8, 973–982 (2007).
Article CAS PubMed Google Scholar
Schulz, K. N. & Harrison, M. M. Mechanisms regulating zygotic genome activation. Nat. Rev. Genet. 20, 221–234 (2019).
Article CAS PubMed PubMed Central Google Scholar
Heyn, P. et al. The earliest transcribed zygotic genes are short, newly evolved, and different across species. Cell Rep. 6, 285–292 (2014).
Article CAS PubMed Google Scholar
Hadzhiev, Y. et al. The miR-430 locus with extreme promoter density forms a transcription body during the minor wave of zygotic genome activation. Dev. Cell 58, 155–170.e8 (2023).
Article CAS PubMed PubMed Central Google Scholar
Asami, M. et al. A program of successive gene expression in mouse one-cell embryos. Cell Rep. 42, 112023 (2023).
Article CAS PubMed Google Scholar
Qiao, Y. et al. High-resolution annotation of the mouse preimplantation embryo transcriptome using long-read sequencing. Nat. Commun. 11, 2653 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Lee, M. T. et al. Nanog, Pou5f1 and SoxB1 activate zygotic gene expression during the maternal-to-zygotic transition. Nature 503, 360–364 (2013).
Article ADS CAS PubMed PubMed Central Google Scholar
Abe, K. I. et al. Minor zygotic gene activation is essential for mouse preimplantation development. Proc. Natl Acad. Sci. USA 115, E6780–E6788 (2018).
Article ADS CAS PubMed PubMed Central Google Scholar
Liu, B. et al. The landscape of RNA Pol II binding reveals a stepwise transition during ZGA. Nature 587, 139–144 (2020).
Article ADS CAS PubMed Google Scholar
Hendrickson, P. G. et al. Conserved roles of mouse DUX and human DUX4 in activating cleavage-stage genes and MERVL/HERVL retrotransposons. Nat. Genet. 49, 925–934 (2017).
Article CAS PubMed PubMed Central Google Scholar
De Iaco, A. et al. DUX-family transcription factors regulate zygotic genome activation in placental mammals. Nat. Genet. 49, 941–945 (2017).
Article PubMed PubMed Central Google Scholar
Xu, C. et al. Nanog-like regulates endoderm formation through the Mxtx2-Nodal pathway. Dev. Cell 22, 625–638 (2012).
Article CAS PubMed PubMed Central Google Scholar
Vogel, A. M. & Gerster, T. Promoter activity of the zebrafish bhikhari retroelement requires an intact activin signaling pathway. Mech. Dev. 85, 133–146 (1999).
Article CAS PubMed Google Scholar
Holt, C. E. & Bullock, S. L. Subcellular mRNA localization in animal cells and why it matters. Science 326, 1212–1216 (2009).
Article ADS CAS PubMed PubMed Central Google Scholar
Carlevaro-Fita, J. et al. Ancient exapted transposable elements promote nuclear enrichment of human long noncoding RNAs. Genome Res. 29, 208–222 (2019).
Article CAS PubMed PubMed Central Google Scholar
Babarinde, I. A. et al. Transposable element sequence fragments incorporated into coding and noncoding transcripts modulate the transcriptome of human pluripotent stem cells. Nucleic Acids Res. 49, 9132–9153 (2021).
Article CAS PubMed PubMed Central Google Scholar
Percharde, M. et al. A LINE1-nucleolin partnership regulates early development and ESC identity. Cell 174, 391–405.e19 (2018).
Article CAS PubMed PubMed Central Google Scholar
Levin, H. L. & Moran, J. V. Dynamic interactions between transposable elements and their hosts. Nat. Rev. Genet. 12, 615–627 (2011).
Article CAS PubMed PubMed Central Google Scholar
Zhu, W., Xu, X., Wang, X. & Liu, J. Reprogramming histone modification patterns to coordinate gene expression in early zebrafish embryos. BMC Genomics 20, 248 (2019).
Article PubMed PubMed Central Google Scholar
Pálfy, M., Schulze, G., Valen, E. & Vastenhouw, N. L. Chromatin accessibility established by Pou5f3, Sox19b and Nanog primes genes for activity during zebrafish genome activation. PLoS Genet. 16, e1008546 (2020).
Article PubMed PubMed Central Google Scholar
Zhang, B. et al. Widespread enhancer dememorization and promoter priming during parental-to-zygotic transition. Mol. Cell 72, 673–686.e6 (2018).
Article PubMed Google Scholar
Laue, K., Rajshekar, S., Courtney, A. J., Lewis, Z. A. & Goll, M. G. The maternal to zygotic transition regulates genome-wide heterochromatin establishment in the zebrafish embryo. Nat. Commun. 10, 1551 (2019).
Article ADS PubMed PubMed Central Google Scholar
Yao, Y. et al. Systematic characterization of small RNAome during zebrafish early developmental stages. BMC Genomics 15, 117 (2014).
Article PubMed PubMed Central Google Scholar
Akdogan-Ozdilek, B., Duval, K. L. & Goll, M. G. Chromatin dynamics at the maternal to zygotic transition: recent advances from the zebrafish model. F1000Res 9, F1000 Faculty Rev-299 (2020).
Wang, X., Ramat, A., Simonelig, M. & Liu, M. F. Emerging roles and functional mechanisms of PIWI-interacting RNAs. Nat. Rev. Mol. Cell Biol. 24, 123–141 (2023).
Article CAS PubMed Google Scholar
Au, K. F. et al. Characterization of the human ESC transcriptome by hybrid sequencing. Proc. Natl Acad. Sci. USA 110, E4821–E4830 (2013).
Article CAS PubMed PubMed Central Google Scholar
Durruthy-Durruthy, J. et al. The primate-specific noncoding RNA HPAT5 regulates pluripotency during human preimplantation development and nuclear reprogramming. Nat. Genet. 48, 44–52 (2016).
Article CAS PubMed Google Scholar
Martin, F. J. et al. Ensembl 2023. Nucleic Acids Res. 51, D933–D941 (2023).
Article CAS PubMed Google Scholar
Liu, W. et al. PHF8 mediates histone H4 lysine 20 demethylation events involved in cell cycle progression. Nature 466, 508–512 (2010).
Article ADS CAS PubMed PubMed Central Google Scholar
Chong, Y. L., Zhang, Y., Zhou, F. & Roy, S. Distinct requirements of E2f4 versus E2f5 activity for multiciliated cell development in the zebrafish embryo. Dev. Biol. 443, 165–172 (2018).
Article CAS PubMed Google Scholar
Abrams, E. W. et al. Dynamic assembly of brambleberry mediates nuclear envelope fusion during early development. Cell 150, 521–532 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ohkubo, A. et al. The FASTK family proteins fine-tune mitochondrial RNA processing. PLoS Genet. 17, e1009873 (2021).
Article CAS PubMed PubMed Central Google Scholar
Broughton, R. E., Milam, J. E. & Roe, B. A. The complete sequence of the zebrafish (Danio rerio) mitochondrial genome and evolutionary patterns in vertebrate mitochondrial DNA. Genome Res. 11, 1958–1967 (2001).
Article CAS PubMed PubMed Central Google Scholar
Wang, Y. et al. Single-molecule long-read sequencing reveals the chromatin basis of gene expression. Genome Res. 29, 1329–1342 (2019).
Article CAS PubMed PubMed Central Google Scholar
Shipony, Z. et al. Long-range single-molecule mapping of chromatin accessibility in eukaryotes. Nat. Methods 17, 319–327 (2020).
Article CAS PubMed PubMed Central Google Scholar
Altemose, N. et al. DiMeLo-seq: a long-read, single-molecule method for mapping protein-DNA interactions genome wide. Nat. Methods 19, 711–723 (2022).
Article CAS PubMed PubMed Central Google Scholar
Stergachis, A. B., Debo, B. M., Haugen, E., Churchman, L. S. & Stamatoyannopoulos, J. A. Single-molecule regulatory architectures captured by chromatin fiber sequencing. Science 368, 1449–1454 (2020).
Article ADS CAS PubMed Google Scholar
Liu, Y. et al. DNA methylation-calling tools for Oxford Nanopore sequencing: a survey and human epigenome-wide evaluation. Genome Biol. 22, 295 (2021).
Article CAS PubMed PubMed Central Google Scholar
Dewannieux, M. & Heidmann, T. Role of poly(A) tail length in Alu retrotransposition. Genomics 86, 378–381 (2005).
Article CAS PubMed Google Scholar
Moran, J. V. et al. High frequency retrotransposition in cultured mammalian cells. Cell 87, 917–927 (1996).
Article CAS PubMed Google Scholar
Aanes, H. et al. Zebrafish mRNA sequencing deciphers novelties in transcriptome dynamics during maternal to zygotic transition. Genome Res. 21, 1328–1338 (2011).
Article CAS PubMed PubMed Central Google Scholar
Winata, C. L. et al. Cytoplasmic polyadenylation-mediated translational control of maternal mRNAs directs maternal-to-zygotic transition. Development 145, dev159566 (2018).
Article PubMed Google Scholar
Jachowicz, J. W. et al. LINE-1 activation after fertilization regulates global chromatin accessibility in the early mouse embryo. Nat. Genet. 49, 1502–1510 (2017).
Article CAS PubMed Google Scholar
Kinisu, M. et al. Klf5 establishes bi-potential cell fate by dual regulation of ICM and TE specification genes. Cell Rep. 37, 109982 (2021).
Article CAS PubMed PubMed Central Google Scholar
Yang, P., Wang, Y. & Macfarlan, T. S. The role of KRAB-ZFPs in transposable element repression and mammalian evolution. Trends Genet. 33, 871–881 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wells, J. N. et al. Transposable elements drive the evolution of metazoan zinc finger genes. Genome Res. 33, 1325–1339 (2023).
Article CAS PubMed PubMed Central Google Scholar
Shen, W. K. et al. AnimalTFDB 4.0: a comprehensive animal transcription factor database updated with variation and expression annotations. Nucleic Acids Res. 51, D39–D45 (2023).
Article CAS PubMed Google Scholar
Anderson, J. L. et al. Multiple sex-associated regions and a putative sex chromosome in zebrafish revealed by RAD mapping and population genomics. PLoS ONE 7, e40701 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Andrews, S. FASTQC: a quality control tool for high throughput sequence data. https://www.bioinformatics.babraham.ac.uk/projects/fastqc/ (2018).
Martin, M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 17, 3 (2011).
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Wyman, D. et al. A technology-agnostic long-read analysis pipeline for transcriptome discovery and quantification. bioRxiv https://doi.org/10.1101/672931 (2019).
Fu, L., Niu, B., Zhu, Z., Wu, S. & Li, W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 28, 3150–3152 (2012).
Article CAS PubMed PubMed Central Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
Article CAS PubMed PubMed Central Google Scholar
Kim, D., Paggi, J. M., Park, C., Bennett, C. & Salzberg, S. L. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37, 907–915 (2019).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25, 2078–2079 (2009).
Article PubMed PubMed Central Google Scholar
Pertea, M. et al. StringTie enables improved reconstruction of a transcriptome from RNA-seq reads. Nat. Biotechnol. 33, 290–295 (2015).
Article CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
Article CAS PubMed PubMed Central Google Scholar
Seberg, O. & Petersen, G. A unified classification system for eukaryotic transposable elements should reflect their phylogeny. Nat. Rev. Genet. 10, 276 (2009).
Article CAS PubMed Google Scholar
Mistry, J. et al. Pfam: the protein families database in 2021. Nucleic Acids Res. 49, D412–D419 (2021).
Article CAS PubMed Google Scholar
Katoh, K., Misawa, K., Kuma, K. & Miyata, T. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066 (2002).
Article CAS PubMed PubMed Central Google Scholar
Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome. BMC Bioinforma. 12, 323 (2011).
Article CAS Google Scholar
Weirauch, M. T. et al. Determination and inference of eukaryotic transcription factor sequence specificity. Cell 158, 1431–1443 (2014).
Article CAS PubMed PubMed Central Google Scholar
Bailey, T. L., Johnson, J., Grant, C. E. & Noble, W. S. The MEME Suite. Nucleic Acids Res. 43, W39–W49 (2015).
Article CAS PubMed PubMed Central Google Scholar
Langmead, B. & Salzberg, S. L. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9, 357–359 (2012).
Article CAS PubMed PubMed Central Google Scholar
Zhang, Y. et al. Model-based analysis of ChIP-Seq (MACS). Genome Biol. 9, R137 (2008).
Article PubMed PubMed Central Google Scholar
Zhao, S., Ye, Z. & Stanton, R. Misuse of RPKM or TPM normalization when comparing across samples and sequencing protocols. RNA 26, 903–909 (2020).
Article CAS PubMed PubMed Central Google Scholar
Pillay, S., Takahashi, H., Carninci, P. & Kanhere, A. Antisense RNAs during early vertebrate development are divided in groups with distinct features. Genome Res. 31, 995–1010 (2021).
Article CAS PubMed PubMed Central Google Scholar
Ramírez, F. et al. deepTools2: a next generation web server for deep-sequencing data analysis. Nucleic Acids Res. 44, W160–W165 (2016).
Article PubMed PubMed Central Google Scholar
Langmead, B., Trapnell, C., Pop, M. & Salzberg, S. L. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10, R25 (2009).
Article PubMed PubMed Central Google Scholar
Wang, J. et al. piRBase: integrating piRNA annotation in all aspects. Nucleic Acids Res. 50, D265–D272 (2022).
Article ADS CAS PubMed Google Scholar
Crooks, G. E., Hon, G., Chandonia, J. M. & Brenner, S. E. WebLogo: a sequence logo generator. Genome Res. 14, 1188–1190 (2004).
Article CAS PubMed PubMed Central Google Scholar
Kong, L. et al. CPC: assess the protein-coding potential of transcripts using sequence features and support vector machine. Nucleic Acids Res. 35, W345–W349 (2007).
Article PubMed PubMed Central Google Scholar
Jones, P. et al. InterProScan 5: genome-scale protein function classification. Bioinformatics 30, 1236–1240 (2014).
Article CAS PubMed PubMed Central Google Scholar
Patro, R., Duggal, G., Love, M. I., Irizarry, R. A. & Kingsford, C. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14, 417–419 (2017).
Article CAS PubMed PubMed Central Google Scholar
Ou, S. & Jiang, N. LTR_retriever: a highly accurate and sensitive program for identification of long terminal repeat retrotransposons. Plant Physiol. 176, 1410–1422 (2018).
Article CAS PubMed Google Scholar
Aiewsakun, P. & Katzourakis, A. Marine origin of retroviruses in the early Palaeozoic Era. Nat. Commun. 8, 13954 (2017).
Article ADS CAS PubMed PubMed Central Google Scholar
Altschul, S. F. et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 (1997).
Article CAS PubMed PubMed Central Google Scholar
Kurtz, S. et al. Versatile and open software for comparing large genomes. Genome Biol. 5, R12 (2004).
Article PubMed PubMed Central Google Scholar
Wilson, C. A. et al. Wild sex in zebrafish: loss of the natural sex determinant in domesticated strains. Genetics 198, 1291–1308 (2014).
Article CAS PubMed PubMed Central Google Scholar
Langfelder, P. & Horvath, S. WGCNA: an R package for weighted correlation network analysis. BMC Bioinforma. 9, 559 (2008).
Article Google Scholar
Bray, N. L., Pimentel, H., Melsted, P. & Pachter, L. Near-optimal probabilistic RNA-seq quantification. Nat. Biotechnol. 34, 525–527 (2016).
Article CAS PubMed Google Scholar
Love, M. I., Huber, W. & Anders, S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15, 550 (2014).
Article PubMed PubMed Central Google Scholar
Shannon, P. et al. Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 13, 2498–2504 (2003).
Article CAS PubMed PubMed Central Google Scholar
Sherman, B. T. et al. DAVID: a web server for functional enrichment analysis and functional annotation of gene lists (2021 update). Nucleic Acids Res. 50, W216–W221 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Lu, S. et al. CDD/SPARCLE: the conserved domain database in 2020. Nucleic Acids Res. 48, D265–D268 (2020).
Article CAS PubMed Google Scholar
Price, M. N., Dehal, P. S. & Arkin, A. P. FastTree 2-approximately maximum-likelihood trees for large alignments. PLoS ONE 5, e9490 (2010).
Article ADS PubMed PubMed Central Google Scholar
Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL) v4: recent updates and new developments. Nucleic Acids Res. 47, W256–W259 (2019).
Article CAS PubMed PubMed Central Google Scholar
Yao, L., Chen, J., Wu, X., Jia, S. & Meng, A. Zebrafish cdc6 hypomorphic mutation causes Meier-Gorlin syndrome-like phenotype. Hum. Mol. Genet. 26, 4168–4180 (2017).
Article CAS PubMed PubMed Central Google Scholar
Montague, T. G., Cruz, J. M., Gagnon, J. A., Church, G. M. & Valen, E. CHOPCHOP: a CRISPR/Cas9 and TALEN web tool for genome editing. Nucleic Acids Res. 42, W401–W407 (2014).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work is supported by an institutional fund from the Gilbert S. Omenn Department of Computational Medicine and Bioinformatics, University of Michigan (K.F.A., B.L., D.W., Y.W., and P.T.) and an institutional fund from the Department of Biomedical Informatics, The Ohio State University (K.F.A., B.L., D.W., and Y.W.). We thank the computational service provided by the Ohio Supercomputer Center and Advanced Research Computing, University of Michigan. S.J. acknowledges the fund from the National Key Research and Development Program (2024YFA1803000 and 2024YFA1802200) and the National Natural Science Foundation of China (92254302 and 32293202), and Y.G.Y. acknowledges the fund from the National Natural Science Foundation of China (32121001). We thank Dr. Aifu Li for the helpful discussion and proofreading.

Author information

These authors contributed equally: Bo Li, Ting Li, Dingjie Wang.

Authors and Affiliations

Gilbert S. Omenn Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
Bo Li, Dingjie Wang, Puwen Tan, Yunhao Wang & Kin Fai Au
Department of Biomedical Informatics, The Ohio State University, Columbus, OH, USA
Bo Li, Dingjie Wang, Yunhao Wang & Kin Fai Au
School of Life Sciences, Fudan University, Shanghai, China
Ting Li
China National Center for Bioinformation, Beijing, China
Ying Yang & Yun-Gui Yang
Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing, China
Ying Yang & Yun-Gui Yang
Institute of Genetics and Developmental Biology, Chinese Academy of Sciences, Beijing, China
Shunji Jia

Authors

Bo Li
View author publications
Search author on:PubMed Google Scholar
Ting Li
View author publications
Search author on:PubMed Google Scholar
Dingjie Wang
View author publications
Search author on:PubMed Google Scholar
Ying Yang
View author publications
Search author on:PubMed Google Scholar
Puwen Tan
View author publications
Search author on:PubMed Google Scholar
Yunhao Wang
View author publications
Search author on:PubMed Google Scholar
Yun-Gui Yang
View author publications
Search author on:PubMed Google Scholar
Shunji Jia
View author publications
Search author on:PubMed Google Scholar
Kin Fai Au
View author publications
Search author on:PubMed Google Scholar

Contributions

K.F.A. and S.J. designed and supervised the study. B.L. and D.W. analyzed the data. T.L. and Y.Y. performed experiments. B.L., T.L., P.T., and Y.W. visualized the results. B.L., K.F.A., and S.J. wrote the manuscript with assistance from T.L., D.W., Y.Y., P.T., Y.W., and Y.G.Y.

Corresponding authors

Correspondence to Yun-Gui Yang, Shunji Jia or Kin Fai Au.

Ethics declarations

Competing interests

The authors declare no competing interests.

Ethical approval

All fish experiments in this study were approved by the animal care and use committee of the Institute of Genetics and Developmental Biology, Chinese Academy of Sciences.

Peer review

Peer review information

Nature Communications thanks Jonathan (N) Wells, and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Description of Additional Supplementary Files

Supplementary Dataset 1-16

Supplementary Movie1

Supplementary Movie2

Reporting Summary

Transparent Peer Review file

Source data

Source data 1-6

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Li, B., Li, T., Wang, D. et al. Zygotic activation of transposable elements during zebrafish early embryogenesis. Nat Commun 16, 3692 (2025). https://doi.org/10.1038/s41467-025-58863-7

Download citation

Received: 26 March 2024
Accepted: 31 March 2025
Published: 18 April 2025
Version of record: 18 April 2025
DOI: https://doi.org/10.1038/s41467-025-58863-7