Disruption of TAD hierarchy promotes LTR co-option in cancer

Wong, Elissa W. P.; Sahin, Merve; Yang, Rui; Lee, UkJin; Li, Dan; Zhan, Yingqian A.; Misra, Rohan; Tomas, Fanny; Alomran, Nawaf; Polyzos, Alexander; Lee, Cindy J.; Trieu, Tuan; Martinez-Fundichely, Alexander; Wiesner, Thomas; Rosowicz, Andrew; Cheng, Shuyuan; Liu, Christina; Lallo, Morgan; Shoushtari, Alexander N.; Merghoub, Taha; Hamard, Pierre-Jacques; Koche, Richard; Khurana, Ekta; Apostolou, Effie; Zheng, Deyou; Chen, Yu; Leslie, Christina S.; Chi, Ping

doi:10.1038/s41588-025-02239-6

Download PDF

Article
Open access
Published: 30 June 2025

Disruption of TAD hierarchy promotes LTR co-option in cancer

Nature Genetics volume 57, pages 1754–1765 (2025)Cite this article

22k Accesses
1 Citations
33 Altmetric
Metrics details

Subjects

Abstract

Transposable elements (TEs) are abundant in the human genome, and they provide the source for genetic and functional diversity. Previous studies have suggested that TEs are repressed by DNA methylation and chromatin modifications. Here through integrating transcriptome and 3D genome architecture studies, we showed that haploinsufficient loss of NIPBL selectively activates alternative promoters (altPs) at the long terminal repeats (LTRs) of the TE subclasses. This activation occurs through the reorganization of topologically associating domain (TAD) hierarchical structures and the recruitment of proximal enhancers. These observations indicate that TAD hierarchy restricts transcriptional activation of LTRs that already possess open chromatin features. Perturbation of hierarchical chromatin topology can lead to co-option of LTRs as functional altPs, driving aberrant transcriptional activation of oncogenes. These data uncovered a new layer of regulatory mechanisms of TE expression and posit TAD hierarchy dysregulation as a new mechanism for altP-mediated oncogene activation and transcriptional diversity in cancer.

DNA methylation cooperates with genomic alterations during non-small cell lung cancer evolution

Article Open access 10 September 2025

Long-range gene regulation in hormone-dependent cancer

Article 03 August 2023

Teloxantron inhibits the processivity of telomerase with preferential DNA damage on telomeres

Article Open access 28 November 2022

Main

Approximately 50% of the mammalian genome consists of transposable elements (TEs), mainly retrotransposons, that once possessed the ability to move their locations in the genome^1,2. TEs have actively shaped genomic functions through a multitude of mechanisms, including genome structural rearrangements, introduction of mutations and transcriptional regulation^1,3. In turn, the mammalian genome has evolved to take advantage of this abundant source of genetic material to regulate essential cellular functions^4,5,6. In mammals, several mechanisms have been described to tightly control the expression of TEs spatially and temporally, including noncoding RNAs, DNA methylation and histone modifications (for example, H3K9me3 and H3K27me3)^{4,7,8,9,10,11}. Derepression or insertion of TEs has been linked to cancer, where TEs were either used as alternative promoters (altPs) for oncogene activation or as disruptors of tumor suppressors^7,12,13. Furthermore, previous studies have shown that derepression of hypermethylated TEs by inhibiting DNA methylation caused a viral mimicry response in cancer^14,15, where in some cases it can be evaded by cancer cells via epigenetic state switch to restore TE repression^16,17. Recent studies suggest that TE-driven oncogene expression, a process known as onco-exaptation, is frequent across multiple cancer types¹⁸. TEs were also co-opted as cis-regulatory elements, for example, by lineage-specific transcription factors (TFs), to drive oncogenesis¹⁹. Among TEs, long terminal repeats (LTRs) that contain intact RNA polymerase II binding sites, TF binding sites and conserved splice donor sites have higher propensity to be co-opted as altPs than other TE subclasses, for example, long interspersed nuclear element (LINE) and short interspersed nuclear element (SINE)^13,20.

Chromatin is organized into 3D structures, consisting of self-interacting topologically associating domains (TADs). TADs are established and maintained through CTCF/cohesin-mediated looping and regulate long-range interactions between enhancers and promoters and consequent transcriptional activation^21,22. TADs can be further organized into a hierarchical structure of nested sub-TADs, that is, hierarchical TADs²³. Compared to nonhierarchical TADs, hierarchical TADs are associated with functionally more active epigenetic states and a higher level of gene transcription²³. Unlike CTCF, the cohesin complex does not recognize specific DNA sequences, and its residence on chromatin relies on the continuous loading action by the cohesin complex loading factors, Nipped-B-like protein (NIPBL) and MAU2 sister chromatid cohesion factor (MAU2)²¹. NIPBL is essential in the establishment and maintenance of TADs²⁴. Complete loss of NIPBL is lethal in dividing cells²⁴. Pathologically, heterozygous loss-of-function (LoF) NIPBL mutations underlie the molecular basis of the majority of Cornelia de Lange syndrome (CdLS). CdLS is a multisystem developmental disorder caused by genetic aberrations of the core structural or regulatory components of the cohesin complex (for example, NIPBL, SMC1A and SMC3)²⁵. Furthermore, monoallelic LoF mutations of NIPBL are frequently observed in multiple cancers, and NIPBL is recognized as a bona fide tumor suppressor in pan-cancer analyses²⁶. Although TE has recently been described to modulate CTCF binding and influence TAD boundary^27,28, whether and how TADs regulate TE expression remains largely unknown.

Here we used a doxycycline-inducible system to mimic the monoallelic loss of NIPBL in disease and evaluated its impact on chromatin topology and transcriptional regulation. Haploinsufficient loss of NIPBL led to hierarchical TAD reduction and global transcriptional activation of altPs that arose primarily from LTRs that already possessed open chromatin features. Hierarchical TAD reorganization and subsequent recruitment and retargeting of proximal enhancers contributed to the activation of altPs from LTRs. These data indicate that chromatin topological hierarchy maintained by NIPBL restricts transcriptional activation from LTRs, and perturbation of the hierarchical chromatin topology in cancer can lead to the co-option of LTRs and aberrant transcriptional activation of new oncogenes and divergent transcripts.

Results

Haploinsufficient loss of NIPBL induces global altP activation

To mimic the monoallelic loss of NIPBL in cancer and CdLS, we used a doxycycline-inducible RNA interference (RNAi) system. After doxycycline treatment, shNIPBL_1 and shNIPBL_2 downregulated the NIPBL mRNA by ~34% and 56%, respectively, in 501mel melanoma cells (Extended Data Fig. 1a). NIPBL downregulation did not significantly alter the total SMC1A protein level, a core component of the cohesin complex²⁴ (Extended Data Fig. 1b), whereas chromatin immunoprecipitation coupled with next-generation sequencing (ChIP–seq) showed a modest reduction of SMC1A protein on chromatin, indicating a partial loss of cohesin loading (Extended Data Fig. 1c). To investigate the effects of haploinsufficient NIPBL loss on transcriptome and transcription start site (TSS) usage, we performed poly-A RNA sequencing (RNA-seq) to capture most mature RNA transcripts and cap analysis of gene expression sequencing (CAGE-seq) to accurately assess TSS, including altPs²⁹, in 501mel cells. Compared to shLuc, shNIPBL_1 and shNIPBL_2 resulted in 305/169 and 1,261/1,308 significantly upregulated/downregulated genes, respectively (Fig. 1a and Supplementary Table 1), which was consistent with short-term degradation of cohesin, as well as CdLS mouse models and patient studies^24,30,31 that caused minor perturbations in gene expression. The changes in CAGE-seq-TSSs (CTSSs) and RNA-seq-detected TSSs for annotated transcripts showed a highly significant correlation (Fig. 1b,c, Extended Data Fig. 1d–f and Supplementary Table 2). At baseline, ~56% of all CTSSs were located at annotated promoters, and ~16%, 10%, 9% and 6% were localized to annotated exons, introns, distal intergenic regions and 3′ UTRs, respectively (Fig. 1d and Supplementary Table 2). Notably, shNIPBL-upregulated CTSSs were enriched in introns and distal intergenic regions (Fig. 1d,e), indicating altP usage, whereas shNIPBL-downregulated CTSSs were preferentially located at annotated promoters (Fig. 1d,e). Majority of the top 30% upregulated CTSSs transcribed from altPs produced mature transcripts (Supplementary Note 1). For example, the ALK^ATI arose from an altP in intron 19 and subsequently spliced into exons 20–29 of ALK, expressing the entire kinase domain, which has been previously characterized as an oncogenic variant of ALK prevalent in melanoma and sporadically in other cancer types³² (Fig. 1a,b,f). Similarly, the ULK4 variant arose from intron 31 and LINC01387 from intron 2, both of which spliced into existing exons to produce mature transcripts (Fig. 1a,b,f). Moreover, we observed that some RNA transcripts originated from distal intergenic regions, such as between SYN3 and LINC01640 in chromosome 22 (chr22): 33,468,501–33,471,517 and between TAS2R39 and TAS2R40 in chr7: 142,895,998–142,896,596, respectively (Fig. 1f). The distal intergenic altP in chr7: 142,895,998–142,896,596 revealed a distinctive feature where transcription was initiated, and the transcript was spliced across a genomic distance of ~240 kb into the coding gene KEL (Fig. 1f). These observations are reminiscent of a previous study where Nipbl knockout in nondividing mouse hepatocytes was associated with increased intergenic and antisense (exogenic) transcription²⁴ and demonstrated that the new CTSSs detected by CAGE-seq were associated with production of mature transcripts.

**Fig. 1: Haploinsufficient loss of *NIPBL* leads to altP activation in intron and distal intergenic regions.**

Additional doxycycline-inducible hairpins, shNIPBL_4 and shNIPBL_5, were used to validate transcriptional perturbation by poly-A RNA-seq and CAGE-seq (Extended Data Fig. 2, Supplementary Tables 3 and 4 and Supplementary Note 2). Doxycycline-inducible RNAi of NIPBL was performed in two additional melanoma cell lines, COLO800 and A375, to confirm altP usages (Supplementary Fig. 1 and Supplementary Note 3). In addition, CRISPR interference (CRISPRi)³³-mediated downregulation of NIPBL resulted in ~50% decrease in NIPBL and significant increase in ALK^ATI mRNA originating from the altP in ALK intron 19 (Supplementary Fig. 2).

We performed CAGE-seq after NIPBL loss in nine cell lines from additional cancer types and observed similar global altP usage from introns and distal intergenic regions (Extended Data Fig. 3, Supplementary Table 5 and Supplementary Note 4).

NIPBL loss activates altP, arising from LTR with open chromatin features

To investigate the mechanisms of altP activation by NIPBL perturbation, we focused on the melanoma context. We observed bimodal distribution of sharp and broad promoters of all CTSSs at baseline with a 16 bp cutoff (Fig. 2a), consistent with previous studies^34,35. NIPBL downregulation preferentially affected CTSSs with broad promoters (Fig. 2a and Extended Data Fig. 4a). Furthermore, we noted the presence of repetitive elements, particularly LTRs, at upregulated altP by NIPBL downregulation (Fig. 1f). Genome-wide, fewer than 3% of all CTSSs localized to annotated promoters contained repetitive elements, whereas 45–48% of all CTSSs localized in the intronic or distal intergenic regions contained repetitive elements, particularly 10–17% of each of the LTR, LINE and SINE subclasses of retrotransposons (Fig. 2b). With NIPBL downregulation, upregulated CTSSs in intron, distal intergenic and promoter regions were markedly enriched for LTR and depleted in SINE regions (Fig. 2b). Specific endogenous retrovirus subclass L (ERVL)/ERVL–mammalian-apparent LTR retrotransposons (MaLR) families were enriched in upregulated LTRs (Extended Data Fig. 4b–f and Supplementary Note 5).

**Fig. 2: Loss of *NIPBL* induced altP usage arising from LTR repetitive elements characterized by open chromatin characteristics.**

Because LTR retrotransposons are generally silenced epigenetically by DNA methylation and repressive chromatin modifications in the genome^1,4,7,8, we examined the local chromatin characteristics of the shNIPBL-upregulated CTSSs. We observed substantially lower CpG and C/G representation in the shNIPBL_1-upregulated than in the shNIPBL_1-downregulated and baseline CTSSs (Fig. 2c and Extended Data Fig. 4g), and modestly lower CpG representation in the shNIPBL-2-upregulated CTSSs than in the shNIPBL_2-downregulated and baseline CTSSs (Extended Data Fig. 4h). To experimentally delineate the DNA methylation status at CTSSs, we performed bisulfite-free enzymatic methyl sequencing (EM-seq). We observed uniform sequencing coverage among replicates of shLuc, shNIPBL_1 and shNIPBL_2 (Extended Data Fig. 5a). The DNA methylation ratio at CTSSs exhibited a bimodal distribution, consistent with prior studies^36,37 without any obvious differences in shLuc, shNIPBL_1 and shNIPBL_2, indicating the lack of global DNA methylation changes after NIPBL downregulation (Extended Data Fig. 5b). We observed that the shNIPBL_1-downregulated CTSSs and the shNIPBL_2 upregulated and downregulated CTSSs exhibited enrichment of low DNA methylation ratios, while the shNIPBL_1-upregulated CTSSs exhibited relatively uniform distribution across various DNA methylation levels (Extended Data Fig. 5c,d). These genomic regions have relatively few CpG sites (Extended Data Fig. 5e,f), consistent with the low CpG content (Fig. 2c and Extended Data Fig. 4g,h). The majority (>95%) of the differentially upregulated and downregulated CTSSs were not associated with differential methylation changes (Extended Data Fig. 5g,h). We then specifically evaluated the representative CTSSs upregulated by shNIPBL_1 and shNIPBL_2 (Fig. 1f). These regions generally contained few CpG sites and had no or low levels of CpG methylation; the CpG methylation levels were not altered by NIPBL perturbations (Extended Data Fig. 5i–m). Collectively, these results supported the notion that differentially activated CTSSs by NIPBL knockdown were not silenced by DNA methylation.

We next examined the baseline chromatin characteristics of altered CTSSs by NIPBL loss, using cleavage under targets and release using nuclease (CUT&RUN)³⁸ for histone modifications preferentially enriched at active enhancers (H3K4me1 and H3K27ac), promoters (H3K27ac and H3K4me3) and silenced chromatin regions (H3K9me3 and H3K27me3; Fig. 2d,e). Additional chromatin marks, for example, H3K36me3, CTCF and SMC1A, were used to generate 15 comprehensive chromatin states by ChromHMM analysis³⁹ at baseline (Fig. 2d). As expected, the annotated RefSeq TSS sites were enriched for ChromHMM states 6 (H3K4me3/CTCF high)⁴⁰, 7 (H3K4me3 high) and 8 (H3K27ac/H3K4me3 high); all are characterized by the enrichment of H3K4me3 (Fig. 2d). While the significantly downregulated CTSSs by shNIPBL_1 were enriched for the same ChromHMM states as active TSS (Fig. 2d,e), the significantly upregulated CTSSs by shNIPBL_1 were most enriched for ChromHMM state 9 characterized by enrichment for active and poised chromatin marks (for example, H3K27ac and H3K4me1) and devoid of repressive chromatin marks (for example, H3K9me3 and H3K27me3), reminiscent of active enhancers. Notably, a subset of these upregulated CTSSs were also enriched with H3K4me3 chromatin mark, reminiscent of active or poised promoters (Fig. 2e). These active chromatin marks were further enriched at the significantly upregulated CTSS heatmaps with NIPBL downregulation and are concordant with CTSS transcriptional changes (Fig. 2e and Supplementary Fig. 3a–c). Moreover, we observed similar patterns of chromatin modifications (for example, H3K4me3 and H3K27ac enrichment) at LTR-derived upregulated CTSSs with NIPBL downregulation (Supplementary Fig. 3e–g).

De novo and known motif analyses by hypergeometric optimization of motif enrichment (HOMER) revealed that shNIPBL-induced changes at nonpromoter CTSS regions were most enriched for microphthalmia-associated transcription factor (MITF), a lineage-specific master regulator in melanocyte and melanoma⁴¹ (Fig. 2f, Extended Data Fig. 6a,b and Supplementary Fig. 4). ChIP–seq confirmed that MITF bound the majority of differentially regulated CTSSs in shLuc cells (Fig. 2g,h and Supplementary Fig. 3d,h). With shNIPBL perturbation, differentially upregulated CTSSs enriched at intron and distal intergenic regions showed significantly increased binding of MITF, whereas downregulated CTSSs enriched at promoter regions showed no appreciable changes of MITF binding (Fig. 2g,h and Supplementary Fig. 3d,h). This observation was validated by quantitative ChIP–qPCR of MITF at shNIPBL-upregulated CTSSs (Fig. 2i and Extended Data Fig. 6c). We further observed enrichment of H3K4me3 at these selectively upregulated CTSSs by CUT&RUN-seq and ChIP–qPCR, corroborating the corresponding transcriptional activation (Extended Data Fig. 6c,d). Moreover, the increase in MITF binding to differentially upregulated CTSSs by NIPBL loss at TEs was preferentially at LTRs (Extended Data Fig. 6b). Collectively, these data demonstrated that loss of NIPBL led to altP usage arising from LTRs with open chromatin. They also indicate that the transcriptional activation from these LTRs is restricted despite open chromatin characteristics and may require additional chromatin structure changes that allow further recruitment of context-dependent master regulator TFs for activation, for example, MITF, in melanoma cells.

NIPBL partial loss leads to decrease in hierarchical TAD structures

Complete loss of the core cohesin family members on chromatin, either by biallelic genetic deletion of Nipbl in mouse hepatocytes²⁴ or through auxin-inducible degron-mediated protein degradation of RAD21 (ref. ³¹), has demonstrated global disappearance of all TADs. To evaluate the effects of haploinsufficient loss of NIPBL on 3D chromatin organization, we performed Hi-C in biological duplicates to generate contact maps at 10 kb resolution and analyzed the genome-wide chromatin interaction frequencies in melanoma cells with doxycycline-inducible knockdown of NIPBL (shNIPBL_2) and control (shLuc). The 3D chromatin organization is heterogeneous and can be organized into hierarchies of TAD structures that correlate with more transcriptionally active chromatin regions when compared to single nonhierarchical structures^21,23,42,43. We analyzed the Hi-C dataset with OnTAD, a TAD caller optimized for calling hierarchies of TAD with nested sub-TAD structures²³. Hi-C analysis of the shLuc control cells revealed that most TADs contained a hierarchy of nested sub-TADs (Fig. 3a, green arrows) within a larger outer TAD structure (Fig. 3a, light blue arrows), and a minority of TADs existed as single structures (singleton TAD; Fig. 3a, navy blue arrows). Haploinsufficient loss of NIPBL resulted in significant weakening of TADs and preferential loss of the larger outer TAD structure, manifested by the loss/weakening of corner dots, representing the disappearance of cohesin-mediated interaction between the most distal boundaries of the hierarchical TADs⁴⁴, with relative preservation of the inner nested sub-TADs (Fig. 3a). We defined the hierarchy of TADs based on the number of nested sub-TADs, where higher levels of hierarchical TADs correlated with more successive layers of nested sub-TADs. We then quantified the number of singleton and hierarchical TADs and identified a total of 3,791 TADs, including 967 (25.5%) singleton and 2,824 (74.5%) hierarchical TADs in control cells (Fig. 3b). With NIPBL loss (shNIPBL_2), there was a substantial increase in the total number of TADs to 4,541, with an increase in the proportion of singleton TADs (1,723/4,541, 37.9%) and a reciprocal decrease in hierarchical TADs (2,818/4,541, 62.1%; Fig. 3b). The decrease in hierarchical TADs was more pronounced in ≥ level 3 hierarchical TADs, 38.2% (1,448/3,791, shLuc) versus 27.3% (1,239/4,541, shNIPBL_2; Fig. 3b), likely due to the dissolution of higher-level hierarchical TADs ≥ level 3 into lower-level hierarchical and singleton TADs (Fig. 3a,b). Consistently, we observed an overall shift to substantially smaller TAD sizes, with a median TAD size of 530 kb in shLuc and 340 kb in shNIPBL_2 cells (Fig. 3c). Similarly, Hi-C read pair revealed that shNIPBL_2 cells contained more short-range interactions of ~10 to 100 kb (log₁₀ values between 4 and 5), while shLuc control cells contained more mid-to-long-range interactions of ~100 kb to 3 Mbp (log₁₀ values between 5 and 6.5; Extended Data Fig. 7a). Interestingly, we observed more mega-loop long-range interactions (>3 Mbp, log₁₀ > 6.5) with shNIPBL_2 perturbation than with shLuc control (Extended Data Fig. 7a). Furthermore, consistent with previous study²⁴, partial loss of NIPBL resulted in finer segregation of compartments (Extended Data Fig. 7b). Consistent with prior studies of cohesin perturbations that led to TAD boundary defects^24,31,45, we observed a substantial reduction in TAD boundary strength by insulation score⁴⁶ in shNIPBL_2 compared to control cells (Fig. 3d), indicating that partial NIPBL loss can also lead to weakened TAD boundaries.

**Fig. 3: *NIPBL* partial loss leads to a decrease in hierarchical TAD structures and preferentially affects CTSSs residing in high-level hierarchical TADs.**

We next evaluated the effect of hierarchical TAD structure changes following NIPBL knockdown on differential CTSS expression. There was a significant enrichment of differential CTSSs mediated by shNIPBL_1 and shNIPBL_2 that localized to ≥ level 3 hierarchical TADs compared to lower levels of hierarchical and singleton TADs defined in the control condition (Fig. 3e and Extended Data Fig. 7c). Similarly, significant enrichment of differential CTSSs by shNIPBL_4 and shNIPBL_5 localized to ≥ level 3 hierarchical TADs compared to lower levels of hierarchical and singleton TADs was observed (Extended Data Fig. 7d,e).

Differentially regulated CTSSs were localized at the boundary of redefined TADs in the shNIPBL_2 condition (Fig. 3f–h and Supplementary Note 6). These data indicate that the TAD hierarchical structure changes can impact transcriptional activation from altP.

Promiscuous gene activation through retargeting of enhancers to altP

We investigated the 3D chromatin topology around activated altPs, for example, ALK intron 19 (Fig. 4a,b) and ULK4 intron 31 (Fig. 4c,d), to further probe the molecular mechanisms. Both altPs (highlighted in blue) were situated inside complex hierarchical TADs (≥ level 5) that contained multiple layers of nested sub-TADs in control cells (Fig. 4a–d). With partial reduction of NIPBL, the Hi-C contact matrix revealed loss of the outer TAD structures (blue circles) with relative preservation of the inner nested sub-TADs (green circles) and splitting of hierarchical TADs into multiple smaller TADs (gray horizontal bars; Fig. 4a,c). Consistently, OnTAD analysis of the ALK intron 19 (Fig. 4b) and ULK4 intron 31 (Fig. 4d) altP regions revealed decreased hierarchical TAD structures, from ≥ level 5 to ≤ level 4. Additionally, both altPs that were initially embedded in the center of a complex network of hierarchical TAD structures in the control condition are now repositioned to the boundary of split and ‘simplified’ TAD structures (Fig. 4b,d).

**Fig. 4: Examples of promiscuous gene activation (*ALK*^ATI, *ULK4* from intron 31) through reorganization of hierarchical TADs and retargeting of enhancers to altP in proximity.**

To investigate if changes in local chromatin architecture directly affect transcription from ALK intron 19 and ULK4 intron 31, we selectively disrupted CTCF insulators located at the TAD/sub-TAD boundaries by either dCas9–KRAB-mediated blocking or Cas9-mediated deletion of the CTCF binding sites⁴⁷ (Extended Data Fig. 8a,j). dCas9–KRAB that targeted cognate CTCF binding motifs resulted in substantial reduction of CTCF binding by ChIP–qPCR (Extended Data Fig. 8b–f,k–m). Specifically, we observed increased expression of alternative transcripts from ALK intron 19 and ULK4 intron 31 when CTCF insulators at TAD/sub-TAD boundaries were disrupted by dCas9–KRAB (Extended Data Fig. 8g,n). Consistently, Cas9-mediated deletion of CTCF binding sites at TAD/sub-TAD boundaries enhanced alternative transcript expression at both loci (Extended Data Fig. 8h,i,o,p), indicating that changes in hierarchical TAD structure directly impact altP usage.

TADs are thought to facilitate and restrict long-range enhancer (E)–promoter interactions within and outside of TADs, respectively²¹. To assess the genome-wide E-altP interactions that may contribute to the transcriptional activation of de novo CTSSs, we performed circular chromosome conformation capture sequencing (4C-seq)⁴⁸, using ALK and ULK4 altPs in intron 19 and intron 31 as viewpoints, respectively (Fig. 4e,f, red shades). In the shLuc control, we observed a high frequency of interactions from genomic regions close to the viewpoints and a rapid decay of interaction frequencies with increased genomic distance. With partial NIPBL loss (shNIPBL_2), we observed a substantial decrease in interaction frequencies with the altPs, preferentially from distal genomic regions (Fig. 4e,f, arrowheads) and a marked increase in interactions from the proximal genomic regions (Fig. 4e,f, arrows), suggesting a shift of genomic interactions and possibly retargeting of enhancers from distal to proximal genomic regions to the altPs.

Consistent with the genome-wide chromatin analysis (Fig. 2d), altPs of ALK and ULK4 were devoid of heterochromatin marks, for example, H3K27me3, and were enriched for open chromatin features, for example, H3K4me1, in shLuc control cells (Fig. 4e,f). NIPBL depletion in shNIPBL_2 did not affect H3K4me1 but increased H3K4me3 and H3K27ac at the altPs and H3K36me3 at the gene bodies of the alternative transcripts, consistent with transcriptional activation (Fig. 4e,f). Multiple H3K27ac-enriched peaks flanking ALK intron 19 and ULK4 intron 31 within the hierarchical TADs suggested potential enhancers to the altPs. We performed CRISPRi to target the various potential enhancers, using guide RNAs specific for the ALK (intron 19) or ULK4 (intron 31) altP as positive and nontargeting sgGFP as negative controls (Fig. 4g–j). We also included two sets of sgRNAs (sgALK Ctrl1&2 and sgULK4 Ctrl1&2) targeting neighboring nonenhancer regions (without H3K27ac enrichment) as additional controls (Fig. 4g–j). Cas9 ChIP–seq, H3K27ac and H3K9me3 CUT&RUN were performed to validate the specificity of CRISPRi-sgRNAs (Extended Data Fig. 9 and Supplementary Note 7). CRISPRi-sgRNA targeting confirmed the recruitment of proximal enhancers to activate transcription from altPs of ALK and ULK4 loci with NIPBL downregulation (Fig. 4g–j and Supplementary Note 8).

These observations corroborated the 4C-seq data. They indicated that the distal enhancers in preserved hierarchical TADs provided basal interactions with altP but are not sufficient to activate transcription. Hierarchical TADs collapse by NIPBL loss shift E-altP interactions to recruit proximal enhancers to activate transcription from altP arising from LTR repetitive elements.

NIPBL loss-mediated altP usage from LTR can give rise to oncogene expression

To further evaluate the functional consequences of NIPBL loss-mediated altP usage, we noted the robust induction of the ALK^ATI variant originating from the LTR16B2 region in intron 19 in melanoma cells (Fig. 1a,b,f). This ALK^ATI variant was previously characterized as a new oncogenic ALK isoform that is biallelically expressed and independent of genetic alterations at the ALK locus³². To evaluate if NIPBL perturbations contribute to the expression of ALK^ATI in melanoma, we selected the top 50 cases of TCGA–SKCM samples of high ALK^ATI expression by RNA-seq, compared to the bottom 50 cases that had no or very low ALK^ATI expression, and analyzed the NIPBL somatic mutations by FunSeq2 (ref. ⁴⁹), a validated computational algorithm that prioritizes mutational variants with functional significance using a weighted score system (Fig. 5a and Extended Data Fig. 10a). Combining the number of NIPBL somatic mutations in each tumor sample and the FunSeq2 scores, we generated a composite value that estimated the functional inactivation of NIPBL mutations for each case (Fig. 5a and Extended Data Fig. 10a). The composite FunSeq2 score was significantly higher (P < 0.05) in high versus no-low ALK^ATI expression cases, indicating a higher level of NIPBL functional inactivation (Fig. 5a). To further establish the link between NIPBL perturbation and ALK^ATI expression, we screened available patient-derived melanoma cell lines for ALK^ATI expression and identified SKMEL-23 and SKMEL-1128 with ALK^ATI expression by RNA-seq (Fig. 5b). Compared to two well-characterized melanoma cell lines, COLO800 and A375, which did not express ALK^ATI, SKMEL-23 and SKMEL-1128 expressed lower levels of NIPBL mRNA transcripts (Fig. 5c). Consistently, H3K27ac ChIP–seq revealed H3K27ac enrichment at the ALK^ATI altP in all the melanoma cell lines examined³² (Fig. 5d), suggesting that melanoma might be primed to express ALK^ATI.

**Fig. 5: *NIPBL* perturbation contributes to transcriptional activation of *ALK*^ATI from the altP through engagement of proximal H3K27ac-enriched enhancers in melanoma.**

We performed H3K27ac Hi-C followed by chromatin immunoprecipitation (HiChIP) to investigate the enhancer connectome in SKMEL-23 and SKMEL-1128 and observed significant interactions between the proximal H3K27ac-enriched enhancers with ALK intron 19. Genomic regions, for example, the 5′ upstream, introns 1 and 4 of the ALK locus and surrounding the PCARE locus, were the main E-altP interaction sites with the altP of ALK by virtual 4C and H3K27ac HiChIP hotspot analysis using ALK intron 19 as viewpoint (Fig. 5d, blue highlights). CRISPRi-sgRNA targeting was performed to confirm the proximal enhancer retargeting (Fig. 5e and Supplementary Note 9).

Finally, we investigated whether MITF has a role in transcriptional activation of ALK^ATI in melanoma cell lines, analogous to its recruitment to activate altP after NIPBL loss (Fig. 2f–i, Extended Data Fig. 6 and Supplementary Fig. 3d,h). Compared to known MITF-binding sites at TYR and DCT promoters, we observed comparable levels of MITF binding to ALK^ATI proximal H3K27ac-enriched enhancers by MITF ChIP–qPCR in both SKMEL-23 and SKMEL-1128 (Extended Data Fig. 10b). RNAi-mediated MITF downregulation (siMITF_1 and siMITF_2) led to a significant decrease in ALK^ATI expression (Extended Data Fig. 10c,d), indicating that MITF binding is critical in the active transcription of ALK^ATI from the ALK intron 19 altP. Together, these observations suggest that NIPBL somatic mutations and reduced expression can induce context-dependent oncogene activation through proximal enhancer retargeting and altP usage.

Discussion

Promoter usage is highly dynamic in cancer, and the resulting isoform diversity could potentially increase the fitness of cancer cells⁵⁰ by virtue of differential stability, translational efficiency, protein structure and binding partners⁵¹. Differential isoforms expression can be used to stratify cancer molecular subtypes and predict patient survival, highlighting the distinctive functional roles of certain transcript isoforms^50,52. Cancer-specific altPs are prevalent in various cancer types, such as ovarian and stomach cancers, which also express more TE-encoded oncogenes^18,50, suggesting that TE co-option might be one of the underlying mechanisms that drive altP usage in cancer. Multiple layers of interconnected epigenetic mechanisms have been described to restrict TE expression, including DNA methylations and repressive chromatin modifications^4,7,8,10,11. Here we report a new mechanism of TE regulation by the 3D chromatin hierarchy to restrict the activation of TEs that already possess active chromatin features, for example, H3K4me1, H3K27ac and the lack of CpG methylation, as active promoters.

We observed that in melanoma cells, the melanocyte/melanoma-specific master regulator MITF⁴¹ was responsible for activating altPs from LTRs, following the collapse of hierarchical TAD structures. NIPBL downregulation led to altP activation in multiple cancer types in addition to melanoma (Extended Data Fig. 3 and Supplementary Table 5), suggesting the possible involvement of other lineage/cell context-specific TFs in regulating altP usage from TEs under regulation of higher-order chromatin structures that warrants future investigation. Activated TEs by 3D chromatin hierarchical changes in melanoma are marked by active chromatin features, which resemble open chromatin accessibility and key TF bindings (for example, OCT4 and NANOG) at TEs that were observed in pluripotent stem cells^53,54. It will be of interest to explore whether specific classes of TEs are more prone to be exploited by cancer cells that were once activated in the stem cell state. A previous study has hinted at the transactivation potential of TEs that are bound by TFs with the decoration of active histone modification marks (for example, H3K27ac, H3K4me1 and H3K4me3) and lower DNA methylation compared to their counterparts that are not associated with TFs⁵⁵. In our study, most TE-derived altPs have low levels of MITF binding at baseline, which is further enriched by the loss of NIPBL. Consistently, these regions are also decorated with a high H3K27ac level and H3K4me1/H3K4me3 ratio, closely resembling conventional enhancer elements. Recent studies indicate that the transcriptional machinery and architecture are strikingly similar between promoter and enhancer cis-regulatory elements^56,57. Whether a cis-regulatory element results in productive mRNA depends on its transcriptional strength⁵⁷, which in turn is controlled by enhancer activity and contact frequency^58,59. Consistently, our findings support that this class of enhancer-like TEs is typically restricted from transcriptional activation by hierarchical 3D chromatin organization. Upon the collapse of hierarchical TAD structures, long-range interactions between distal enhancers and promoters enabled by cohesin are lost; instead, H3K27ac-enriched regions close to TEs are recruited to enhance the transcriptional strength, resulting in productive mRNA transcription. Notably, the upregulated CTSSs have lower cytosine-guanine (CG) content or CpG methylation levels than CTSSs that are either downregulated (mostly canonical promoters) or unchanged. These observations have suggested that this type of CTSSs is unlikely to be repressed by DNA methylation; they also highlighted one of the differentiating features between enhancers and promoters, where enhancers are usually devoid of DNA methylation and have low CpG content⁵⁷.

Methods

The research conducted in this study complies with all relevant ethical and regulatory guidance, including the Memorial Sloan Kettering Cancer Center’s Institutional Review Board.

CRISPRi

Cells were transduced with pHR–SFFV–KRAB–dCas9–P2A–mCherry lentivirus (Addgene, 60954) and sorted for mCherry-positive cells. sgRNAs were designed using CRISPOR⁶⁰ (Supplementary Table 6). To perturb CTCF binding, sgRNAs were designed to target the cognate CTCF binding motif (Supplementary Table 6). sgRNAs were cloned into a modified U6-based expression vector pSLQ1651-sgTelomere (F + E; Addgene, 51024) that enhanced sgRNA stability and assembly with dCas9 protein⁶¹. To remove the mCherry reporter gene from the pSLQ1651-sgTelomere (F + E) vector, PCR was done using exCherry_NheI and exCherry_puro_EcoRI primers (Supplementary Table 6), with pSLQ1651-sgTelomere (F + E) as template. The PCR fragment and pSLQ1651-sgTelomere (F + E) were digested with NheI and EcoRI (New England Biolabs (NEB)), ligated and transformed into Stbl3 competent bacteria (Thermo Fisher Scientific). sgRNA sequence was amplified using the pSLQ1651-sgTelomere (F + E) vector with FE_AS and region-specific sgRNA primer (Supplementary Table 6). A pool of two to six sgRNAs was used to target each region. PCR product was gel purified, digested with BstXI and XhoI (NEB) and inserted into pSLQ1651-sgTelomere (F + E) without mCherry. Identity of all plasmids was confirmed by Sanger sequencing. Lentivirus carrying sgRNA was then transduced into KRAB–dCas9–P2A–mCherry-expressing cells, selected by 2 µg ml⁻¹ puromycin and collected after 6 days to test for mRNA level or CTCF binding by ChIP–qPCR.

RNA-seq

Cells were lysed with TRIzol RNA isolation reagent (Thermo Fisher Scientific). Total RNA was extracted with chloroform and isopropanol precipitation of the upper aqueous phase. Total RNA precipitate was washed once with 75% ethanol, air-dried and resuspended in RNase-free water. Absence of RNA degradation was confirmed by Agilent BioAnalyzer. In total, 500 ng of total RNA was used for poly-A selection of RNA and library construction using the TruSeq Stranded mRNA LT Kit (Illumina, RS-122-2102). Samples were barcoded and run on a HiSeq 4000 in PE50 to generate 40–50 million reads. Ribosomal reads represented 0.75–1.87% of the total reads generated and the percent of mRNA bases averaged 80%.

CAGE-seq

CAGE-seq was performed according to the protocol in ref. ⁶², with modifications to detect 5′ ends of capped RNAs. Five micrograms of total RNA were reverse transcribed using a primer that carried a 15-nt random region and EcoP15I recognition sequence and SuperScriptIII RNase H-reverse transcriptase (Thermo Fisher Scientific) in the presence of trehalose/sorbitol to increase the efficiency of reverse transcription. Diol groups on RNA/cDNA duplex were oxidized and labeled with biotin hydrazide (Vector Laboratories). RNase I (Promega) treatment was performed to cleave single-stranded RNA regions that were not hybridized with first-strand cDNA. tRNA-coated streptavidin beads (Thermo Fisher Scientific) were used to capture biotinylated RNA/cDNA duplex. Sample was washed three times with wash buffer A (4.5 M NaCl, 50 mM EDTA (pH 8.0) and 0.1 % Tween-20), once with wash buffer B (10 mM Tris–HCl (pH 8.5), 1 mM EDTA (pH 8.0), 0.5 M NaOAc (pH 6.1) and 0.1 % Tween-20) and once with wash buffer C (0.3 M NaCl, 1 mM EDTA (pH 8.0) and 0.1% Tween-20). Single-stranded cDNAs were released from streptavidin beads by heating at 95 °C for 5 min. Released cDNA was treated with RNase H (Thermo Fisher Scientific) and RNase I (Promega). Library was constructed with modified primers for compatible sequencing on the Illumina Nextseq 500 sequencer. Ligation of the barcoded 5′ linker was performed. Second-strand cDNA was synthesized using Deep Vent (exo-) DNA polymerase (NEB) and biotinylated second-strand primer. cDNA was treated with shrimp alkaline phosphatase (Thermo Fisher Scientific) to remove the phosphate group from the 5′ linker. EcoP15I (NEB) was used to cleave 27 nt downstream of the recognition site at the 5′ linker. The 3′ linker was ligated to the 27-nt-long tags, purified with tRNA-coated streptavidin beads and amplified with 2× Phusion high-fidelity PCR mix (NEB). CAGE libraries were treated with exonuclease I (NEB) to degrade single-stranded DNA before sequencing on NextSeq 500 using SR50 to obtain around 40 million reads for each library. All primers used for CAGE-seq are listed in Supplementary Table 6.

EM-seq

High-molecular-weight genomic DNA was extracted using the Puregene cell core kit (Qiagen) and sent to the Memorial Sloan Kettering Cancer Center, Epigenetics Research Innovation Lab for processing. EM-seq was performed with 100 ng of purified genomic DNA using the NEBNext Enzymatic Methyl-seq Kit (NEB, E7120) following the manufacturer’s instructions. After quantification and size distribution assessment using a Qubit Flex fluorometer (Thermo Fisher Scientific) and a TapeStation (Agilent), respectively, the purified libraries were sequenced on an Illumina NovaSeq 6000 to generate around 200 million PE100 reads per library.

CUT&RUN

CUT&RUN was performed using the CUT&RUN assay kit (Cell Signaling Technology) based on the published protocol in ref. ³⁸. Briefly, 0.25 × 10⁶ live cells were immobilized on activated concanavalin A beads and permeabilized with digitonin. Cell-bead suspension was washed with 1× wash buffer supplemented with spermidine and protease inhibitor cocktail and incubated with 1 µg of antibody at 4 °C overnight. The following antibodies were used: anti-H3K4me1 (Thermo Fisher Scientific/Invitrogen, 710795), anti-H3K4me3 (Thermo Fisher Scientific/Invitrogen, PA5-27029), anti-H3K27ac (Abcam, ab4729), anti-H3K27me3 (Cell Signaling Technology, 9733BF), anti-H3K9me3 (Abcam, ab176916), anti-H3K36me3 (Active Motif, 61021) and anti-CTCF (Cell Signaling Technology, 3418F) antibodies. Antibody-targeted digestion of chromatin was performed by incubation of pAG-MNase at 4 °C for 1 h and activation by calcium chloride at 4 °C for 30 min. MNase cleavage was stopped by the addition of 1× stop buffer. Digested chromatin was allowed to diffuse out of cells at 37 °C for 20 min, and DNA was purified by Nucleospin gel and PCR clean-up spin column (Takara Bio). Sequencing libraries were prepared using the KAPA HTP Library Preparation Kit (KAPA Biosystems). Barcoded libraries were run on NovaSeq 6000 in a PE100 run. An average of 25 million reads were generated per sample.

Hi-C

Arima Hi-C kit (Arima Genomics) was used for all Hi-C experiments. Standard input protocol was followed with 3 × 10⁶ cells for each condition. Proximally ligated DNA was fragmented to around 400 bp by the Covaris E220 focused ultrasonicator (peak power = 140 W, duty factor = 10% and cycle per burst = 200) for 70 s. Fragmented DNA was size-selected to have a distribution between ~200 and 600 bp using AMPure XP beads (Beckman Coulter). Approximately 2 µg of size-selected DNA was used for library preparation using the KAPA Hyper Prep Kit (KAPA Biosystems) and TruSeq DNA Unique Dual Indexes (Illumina). The number of PCR cycles used for library amplification was determined by the KAPA library quantification kit for the Illumina platform (KAPA Biosystems). Each Hi-C library was sequenced to obtain 500 million reads PE100 with biological duplicates to achieve a total of 1 billion reads for each condition.

4C-seq

4C-seq was performed as per ref. ⁴⁸ with modifications. In brief, 10 × 10⁶ cells were fixed with 1% formaldehyde for 10 min. Cross-linking was quenched with 200 mM glycine, and cells were lysed with NP-40 lysis buffer (10 mM Tris–HCl (pH 8.0), 10 mM NaCl, 0.2% NP-40 and 1× complete EDTA-free protease inhibitors (Roche)) for 20 min on ice. Nuclei were incubated with 0.5 % SDS at 65 °C for 10 min followed by 1.25 % Triton X-100 at 37 °C for 15 min. Nuclei were digested with 500 U EcoRI (ALK^ATI viewpoint) or NlaIII (ULK4 intron 31 viewpoint) overnight. qPCR was performed using primers that span the restriction enzyme cut site at the bait versus uncut control region to confirm the efficiency of restriction enzyme digestion to over 90%. Digested chromatin was ligated at 16 °C overnight, reverse cross-linked and purified. For efficient PCR amplification and next-generation sequencing, a second round of restriction enzyme digestion was performed using 10 µg of 3C product and 100 U of DpnII (for both ALK^ATI and ULK4 intron 31 viewpoints) at 37 °C for 5 h. Sample was ligated overnight at 16 °C, ethanol precipitated at −80 °C and purified with a QIAquick PCR purification column (Qiagen). Inverse PCR was performed using 4C_ALKATI_EcoRI or 4C_ULK4int31_NlaIII (reading) and 4C_ALKATI_DpnII or 4C_ULK4int31_DpnII (nonreading) primers with 2× Phusion high-fidelity PCR mix (NEB). Each 4C sample was amplified in four PCR reactions, each containing 200 ng of 4C DNA template for 18 cycles. A second round of PCR was performed to add P5 and P7 adapters as well as index sequences for multiplex sequencing. Double-sided size selection was performed using AMPure XP beads (Beckman Coulter) with 0.5× and 1× of the PCR volume. Sequencing was performed on an Illumina NovaSeq 6000 sequencer PE75. Each sample was sequenced to generate 10 million reads. All primers used for 4C-seq are listed in Supplementary Table 6.

RNA-seq analysis

RNA-seq reads were trimmed and filtered for quality using Trimmomatic (v0.38). Processed reads were then aligned to hg19 using STAR (v2.7.1a). For each GENCODE (v19) gene, reads were counted using featureCounts (v1.6.4). Gene counts were used to perform differential analysis using DESeq2 (v1.30.1). Transcript per million (TPM) normalized bigWig tracks were generated using deepTools (v3.1.10 bamCoverage with ‘–normalizeUsing BPM‘) for each replicate. BedGraph files from each replicate, converted from bigWig using bigWigToBedGraph, were combined using bedtools (v2.27.1) unionbedg, and scores were averaged for each condition. The averaged bedGraph files were converted to TPM-averaged bigWig tracks using bedGraphToBigWig (v4). TPM counts were generated with StringTie (v2.1.1) using BAM files and GENCODE (v19) gtf.

CAGE-seq analysis

CAGE-seq reads were analyzed based on the Nextflow CAGE-seq analysis pipeline⁶³. Index sequences, EcoP15I recognition site and the first G following the EcoP15I site at the 5′-end from reads were removed using Cutadapt (v2.6). rRNA sequences were filtered out using SortMeRNA (v4.2.0), and filtered reads were aligned to the hg19 genome using STAR (v2.7.1a). At this step, two sets of BAM files were created for each sample—one with uniquely mapped reads (adding–outFilterMultimapNmax 1 to the STAR command) and one with multimapped reads (adding–outFilterMultimapNmax 10 to the STAR command). Unmapped reads from BAM files were removed. BAM files were converted to BED files with the summed up 1 bp unclustered CAGE tags using SAMtools (v1.14) and bamtobed (v.2.26.0). We pooled CAGE tags across samples, filtered them by TPM threshold of 0.5 and then clustered them with paraclu⁶⁴ (minValue of 30) to generate a BED file with the clustered CTSSs. Then, paraclu-cut with default settings (remove single-position clusters, clusters longer than 200 bp, clusters with maximum density/baseline density less than 2, and any cluster contained in a larger cluster) was used to filter out clusters. Promoter width was calculated by the distance between CAGE tags for each clustered CTSS with a 200 bp cutoff using CAGEr (v1.33.1). CAGEr analyzed promoter width across all samples by considering both the position and the CAGE tag signal at TSSs along the tag cluster. The width of every tag cluster is calculated based on the cumulative distribution of the CAGE signal along the cluster. CAGE tags were intersected with the clusters identified by paraclu, and a raw count table with columns for each sample and rows for each tag cluster was generated. DESeq2 (v1.30.1) was used to perform differential CAGE expression using the raw count table. CTSS clusters were annotated using R/Bioconductor packages ChIPseeker (v1.26.2) and AnnotationDbi (v1.52) using GENCODE (v19) gene annotations. The annotatePeak function from the ChIPseeker package was used to assign peaks to genomic annotation, including promoter (±1 kb around TSS), exon, 5′ UTR, 3′ UTR, intronic, intergenic and downstream (downstream of gene end), and calculate the distance of a peak to the nearest gene.

Correlation between RNA-seq and CAGE-seq

Spearman correlation was calculated between either log₂(TPM) or log₂(fold change) in RNA-seq for each gene and CAGE-seq for the corresponding promoter CTSS.

Repetitive region analysis

Multimapped CTSSs were compared to all the repeatmasker-annotated repetitive elements by the genomic coordinates downloaded from the UCSC browser (hg19). An overlap of at least 1 bp was used to transfer the names of repeat class, family and subfamily to each CTSS.

CUT&RUN analysis

All CUT&RUN data were processed as described for ChIP–seq to generate BAM files. Replicates were merged using the pysam Python module. The merged BAM files were then converted to bigWig files with deepTools (v3.1.1) bamCoverage (binSize as 10, reads per genomic content normalization and other parameters as default). Density plot was generated using deepTools as in ChIP–seq.

ChromHMM

ChromHMM (v1.24) was used to partition the human genome into 15 states based on the CUT&RUN-seq data from H3K4me1, H3K4me3, H3K27ac, H3K27me3, H3K9me3 and H3K36me3, and ChIP–seq from CTCF and SMC1A. The two biological replicates were analyzed separately. The OverlapEnrichment function was used to determine enrichment of the upregulated and downregulated CTSSs in the 15 states.

CG content analysis

All CTSSs were expanded to 300 bp from the center and then used to count the percentages of C, G and CG in the sequences with both strands considered. Due to the short length of CTSSs, an expansion of 300 bp was made to get an accurate determination of the local CG composition. The analysis was also tested on 100-bp and 500-bp expansions and produced similar results.

EM-seq and DNA methylation analysis

EM-sequencing reads were processed using the Bismark pipeline⁶⁵. Raw reads with low quality (<20) and adapter sequences were removed by Trim Galore (v0.6.4). C(G) in the trimmed sequences was converted to T(A) and mapped to similarly converted reference human genome (hg19), lambda and pUC19 separately using default Bowtie2 settings implemented in Bismark. Duplicated reads were discarded. The remaining alignments were then used for cytosine methylation calling by the Bismark methylation extractor. Batch effects were adjusted using limma (v3.46.0). Any CpGs with coverage less than ten were removed before downstream analyses. After filtering, 969,316 CpG loci were recovered with an average coverage of 19× across all samples.

HOMER motif analysis

HOMER motif analysis was performed to discover both de novo and known motifs using the ‘findMotifsGenome.pl’ script (v4.11) from the HOMER software suite. The analysis was configured with a size parameter of 200, using hg19 genome sequences. CTSS was used as the background.

4C-seq analysis

R1 was used for analysis. From the single-end reads, nonreading primers at the 3′ end were trimmed using Cutadapt (v3.7). A maximum error rate of 0.15 was allowed, and trimmed reads shorter than 40 bp were discarded. The 5′ end of the trimmed reads was further trimmed until the primary restriction enzyme site (RE1; EcoRI or NlaIII) using R. Only reads containing RE1 next to the viewpoint were considered for downstream analysis. Bowtie2 (v2.4.5) was used for aligning the trimmed reads to the human genome (hg19), while SAMtools (v1.14; parameter, -q 1 -bSu) was used to filter for high-quality uniquely mapped reads. Fragment maps of the hg19 genome were generated for each restriction enzyme pair. Blind fragments (which have RE1 on both ends) were excluded from the analysis. Aligned reads were assigned to overlapping fragment ends. Fragment read counts were depth normalized to per million reads on the cis chromosome where the viewpoint is located. The top three fragments with the highest read counts corresponding to the undigested and self-ligated fragments (usually the ones closest to the viewpoint) were excluded when calculating depth normalization. For smoothing, running mean across the fragment read counts was calculated using a sliding window sized 21 (averaging ten fragments left and right to the fragment of interest). For statistical analysis, mean normalized read counts were calculated for 25 kb (ALK intron 19) or 10 kb (ULK4 intron 31) bins with a 1 kb shift. With an assumption that the measured read counts follow a normal distribution for each bin, a two-tailed paired t test was done for each replicate to call significantly differential bins between conditions using a P-value cutoff of 0.05. All 4C-seq data were visualized using the Gviz (1.41.1)⁶⁶ and ggplot2 (v3.4.4) packages in R.

FunSeq2 analysis

Normalized RNA-seq for TCGA–SKCM samples was downloaded from the National Cancer Institute Genomic Data Commons (NCI GDC) Data Portal. ALK expression was ranked, and the top ALK-expressing tumors were further examined (RNA-seq by expectation-maximization (RSEM) > 210). Tumors with ALK amplification were excluded (n = 3). Exon quantification data for ALK was analyzed, and high ALK^ATI-expressing tumors were defined as more than tenfold differential expression in ALK exons 20–29 than in exons 1–19 (ref. ³²). The top 50 cases of high ALK-expressing tumors that met these criteria were assigned as high ALK^ATI-expressing tumors and used to compare to the bottom 50 cases with low/no ALK^ATI expression. FunSeq2 (ref. ⁴⁹) analysis for NIPBL was run on default settings, and each mutation was assigned a FunSeq2 score of 1–5, with a higher score that prioritized somatic alterations with functional significance. A composite FunSeq2 score for NIPBL was calculated by combining the FunSeq2 score for each NIPBL mutation in the same tumor sample. Tumor sample that has no NIPBL mutation was assigned a FunSeq2 score of 0.

Hi-C and HiChIP analysis

Generation of valid paired-end reads

Hi-C sequencing files were processed with the HiC-Pro pipeline (v3.0.0). In silico digestion of the hg19 genome by ‘Arima’ restriction enzymes was performed with the ‘digest_genome.py’ tool from HiC-Pro, and bed files generated were used for assigning mapped reads to DNA fragments. All aligned, filtered and deduplicated read pairs from each replicate were used to calculate cis valid paired-end read distance after normalizing for sequencing depth in R.

Preprocessing of Hi-C and HiChIP data

Hi-C or HiChIP reads were aligned to the hg19 genome using the Juicer pipeline (v1.7.6) with default settings. MAPQ30 threshold.hic files were used for the subsequent analyses.

Calling significant and differential loops

HiC-DC+⁶⁷ was run on both pooled and individual replicates of Hi-C and HiChIP data to find significant interactions at 5, 10 or 25 kb resolutions. TSS interactions were defined as the ones with one anchor overlapping a promoter. HiC-DC+ differential interaction calling (function hicdcdiff) was also performed at different resolutions for pairwise comparison. HiC-DC+ normalized scores (observed over expected counts) were used to show Hi-C interactions in plots. To generate H3K27ac HiChIP hotspots, virtual 4C plots were calculated as the maximum −log₁₀(P) of the interactions of regions with the ALK^ATI altP in intron 19 as anchor.

TAD annotations

TADs were annotated using OnTAD (v1.3)²³ on ICE-normalized and MAPQ30 threshold Hi-C data at 10 kb resolution. A penalty score of 0.1 and a maximum TAD size of 2 Mbp were set while running OnTAD. Singleton TADs have no inner sub-TADs. TADs with nested sub-TADs were assigned a hierarchical TAD level according to the highest level of sub-TADs contained within a nested TAD.

Each TAD was annotated with the overlapping CAGE-seq CTSSs. The relative CTSS distance to TAD boundaries was calculated by dividing the distance of the CTSS to the nearest TAD boundary by the length of the respective TAD.

The insulation score for each TAD was obtained from OnTAD outputs.

Statistics and reproducibility

Experiments were reproduced two to four times as indicated in the figure legends. No blinding, randomization or predetermined sample size calculations were used. Three tumors with ALK amplification in the TCGA–SKCM dataset were excluded from FunSeq2 analysis. Data distributions were assumed to be normal, but this was not formally tested. All statistical tests were generated using GraphPad Prism (9.2.0) and are detailed in the figure legends. Significance levels are defined as follows: not significant—P ≥ 0.05, *P < 0.05, **P < 0.01, ***P < 0.001 and ****P <0.0001.

Additional experimental details are available in Supplementary Note 10.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All next-generation sequencing data were deposited at the Gene Expression Omnibus under accession code GSE232303. Source data are provided with this paper.

Code availability

No custom code package or newly developed algorithm was generated in this study. All analysis was performed using standard settings unless otherwise described in Methods and Supplementary Note 10.

References

Fueyo, R., Judd, J., Feschotte, C. & Wysocka, J. Roles of transposable elements in the regulation of mammalian transcription. Nat. Rev. Mol. Cell Biol. 23, 481–497 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hoyt, S. J. et al. From telomere to telomere: the transcriptional and epigenetic state of human repeat elements. Science 376, eabk3112 (2022).
Article CAS PubMed PubMed Central Google Scholar
Bourque, G. et al. Ten things you should know about transposable elements. Genome Biol. 19, 199 (2018).
Article CAS PubMed PubMed Central Google Scholar
Senft, A. D. & Macfarlan, T. S. Transposable elements shape the evolution of mammalian development. Nat. Rev. Genet. 22, 691–711 (2021).
Article CAS PubMed Google Scholar
Erwin, J. A., Marchetto, M. C. & Gage, F. H. Mobile DNA elements in the generation of diversity and complexity in the brain. Nat. Rev. Neurosci. 15, 497–506 (2014).
Article CAS PubMed PubMed Central Google Scholar
Faulkner, G. J. & Garcia-Perez, J. L. L1 mosaicism in mammals: extent, effects, and evolution. Trends Genet. 33, 802–816 (2017).
Article CAS PubMed Google Scholar
Deniz, O., Frost, J. M. & Branco, M. R. Regulation of transposable elements by DNA modifications. Nat. Rev. Genet. 20, 417–431 (2019).
Article CAS PubMed Google Scholar
Deleris, A., Berger, F. & Duharcourt, S. Role of Polycomb in the control of transposable elements. Trends Genet. 37, 882–889 (2021).
Article CAS PubMed Google Scholar
Shalgi, R., Pilpel, Y. & Oren, M. Repression of transposable-elements—a microRNA anti-cancer defense mechanism? Trends Genet. 26, 253–259 (2010).
Article CAS PubMed Google Scholar
Haws, S. A., Simandi, Z., Barnett, R. J. & Phillips-Cremins, J. E. 3D genome, on repeat: higher-order folding principles of the heterochromatinized repetitive genome. Cell 185, 2690–2707 (2022).
Article CAS PubMed PubMed Central Google Scholar
Walter, M., Teissandier, A., Perez-Palacios, R. & Bourc’his, D. An epigenetic switch ensures transposon repression upon dynamic loss of DNA methylation in embryonic stem cells. eLife 5, e11418 (2016).
Article PubMed PubMed Central Google Scholar
Burns, K. H. Transposable elements in cancer. Nat. Rev. Cancer 17, 415–424 (2017).
Article CAS PubMed Google Scholar
Chuong, E. B., Elde, N. C. & Feschotte, C. Regulatory activities of transposable elements: from conflicts to benefits. Nat. Rev. Genet. 18, 71–86 (2017).
Article CAS PubMed Google Scholar
Chiappinelli, K. B. et al. Inhibiting DNA methylation causes an interferon response in cancer via dsRNA including endogenous retroviruses. Cell 162, 974–986 (2015).
Article CAS PubMed PubMed Central Google Scholar
Roulois, D. et al. DNA-demethylating agents target colorectal cancer cells by inducing viral mimicry by endogenous transcripts. Cell 162, 961–973 (2015).
Article CAS PubMed PubMed Central Google Scholar
Deblois, G. et al. Epigenetic switch-induced viral mimicry evasion in chemotherapy-resistant breast cancer. Cancer Discov. 10, 1312–1329 (2020).
Article CAS PubMed Google Scholar
Chen, R., Ishak, C. A. & De Carvalho, D. D. Endogenous retroelements and the viral mimicry response in cancer therapy and cellular homeostasis. Cancer Discov. 11, 2707–2725 (2021).
Article CAS PubMed Google Scholar
Jang, H. S. et al. Transposable elements drive widespread expression of oncogenes in human cancers. Nat. Genet. 51, 611–617 (2019).
Article CAS PubMed PubMed Central Google Scholar
Grillo, G. et al. Transposable elements are co-opted as oncogenic regulatory elements by lineage-specific transcription factors in prostate cancer. Cancer Discov. 13, 2470–2487 (2023).
Article PubMed PubMed Central Google Scholar
Thompson, P. J., Macfarlan, T. S. & Lorincz, M. C. Long terminal repeats: from parasitic elements to building blocks of the transcriptional regulatory repertoire. Mol. Cell 62, 766–776 (2016).
Article CAS PubMed PubMed Central Google Scholar
Rowley, M. J. & Corces, V. G. Organizational principles of 3D genome architecture. Nat. Rev. Genet. 19, 789–800 (2018).
Article CAS PubMed Google Scholar
Kempfer, R. & Pombo, A. Methods for mapping 3D chromosome architecture. Nat. Rev. Genet. 21, 207–226 (2020).
Article CAS PubMed Google Scholar
An, L. et al. OnTAD: hierarchical domain structure reveals the divergence of activity among TADs and boundaries. Genome Biol. 20, 282 (2019).
Article PubMed PubMed Central Google Scholar
Schwarzer, W. et al. Two independent modes of chromatin organization revealed by cohesin removal. Nature 551, 51–56 (2017).
Article PubMed PubMed Central Google Scholar
Kline, A. D. et al. Diagnosis and management of Cornelia de Lange syndrome: first international consensus statement. Nat. Rev. Genet. 19, 649–666 (2018).
Article CAS PubMed PubMed Central Google Scholar
Bailey, M. H. et al. Comprehensive characterization of cancer driver genes and mutations. Cell 173, 371–385 e318 (2018).
Article CAS PubMed PubMed Central Google Scholar
Choudhary, M. N. K., Quaid, K., Xing, X., Schmidt, H. & Wang, T. Widespread contribution of transposable elements to the rewiring of mammalian 3D genomes. Nat. Commun. 14, 634 (2023).
Article CAS PubMed PubMed Central Google Scholar
Diehl, A. G., Ouyang, N. & Boyle, A. P. Transposable elements contribute to cell and species-specific chromatin looping and gene regulation in mammalian genomes. Nat. Commun. 11, 1796 (2020).
Article CAS PubMed PubMed Central Google Scholar
Adiconis, X. et al. Comprehensive comparative analysis of 5′-end RNA-sequencing methods. Nat. Methods 15, 505–511 (2018).
Article CAS PubMed PubMed Central Google Scholar
Newkirk, D. A. et al. The effect of Nipped-B-like (Nipbl) haploinsufficiency on genome-wide cohesin binding and target gene expression: modeling Cornelia de Lange syndrome. Clin. Epigenetics 9, 89 (2017).
Article PubMed PubMed Central Google Scholar
Rao, S. S. P. et al. Cohesin loss eliminates all loop domains. Cell 171, 305–320 (2017).
Article CAS PubMed PubMed Central Google Scholar
Wiesner, T. et al. Alternative transcription initiation leads to expression of a novel ALK isoform in cancer. Nature 526, 453–457 (2015).
Article CAS PubMed PubMed Central Google Scholar
Gilbert, L. A. et al. CRISPR-mediated modular RNA-guided regulation of transcription in eukaryotes. Cell 154, 442–451 (2013).
Article CAS PubMed PubMed Central Google Scholar
The FANTOM Consortium and the RIKEN PMI and CLST (DGT). A promoter-level mammalian expression atlas. Nature 507, 462–470 (2014).
Article Google Scholar
Carninci, P. et al. Genome-wide analysis of mammalian promoter architecture and evolution. Nat. Genet. 38, 626–635 (2006).
Article CAS PubMed Google Scholar
Bergman, Y. & Cedar, H. DNA methylation dynamics in health and disease. Nat. Struct. Mol. Biol. 20, 274–281 (2013).
Article CAS PubMed Google Scholar
Anastasiadi, D., Esteve-Codina, A. & Piferrer, F. Consistent inverse correlation between DNA methylation of the first intron and gene expression across tissues and species. Epigenetics Chromatin 11, 37 (2018).
Article PubMed PubMed Central Google Scholar
Skene, P. J. & Henikoff, S. An efficient targeted nuclease strategy for high-resolution mapping of DNA binding sites. eLife 6, e21856 (2017).
Article PubMed PubMed Central Google Scholar
Ernst, J. & Kellis, M. Chromatin-state discovery and genome annotation with ChromHMM. Nat. Protoc. 12, 2478–2492 (2017).
Article CAS PubMed PubMed Central Google Scholar
Oh, S. et al. Enhancer release and retargeting activates disease-susceptibility genes. Nature 595, 735–740 (2021).
Article CAS PubMed PubMed Central Google Scholar
Goding, C. R. & Arnheiter, H. MITF-the first 25 years. Genes Dev. 33, 983–1007 (2019).
Article CAS PubMed PubMed Central Google Scholar
Dixon, J. R., Gorkin, D. U. & Ren, B. Chromatin domains: the unit of chromosome organization. Mol. Cell 62, 668–680 (2016).
Article CAS PubMed PubMed Central Google Scholar
Hsieh, T. S. et al. Resolving the 3D landscape of transcription-linked mammalian chromatin folding. Mol. Cell 78, 539–553 (2020).
Article CAS PubMed PubMed Central Google Scholar
Beagan, J. A. & Phillips-Cremins, J. E. On the existence and functionality of topologically associating domains. Nat. Genet. 52, 8–16 (2020).
Article CAS PubMed PubMed Central Google Scholar
de Wit, E. & Nora, E. P. New insights into genome folding by loop extrusion from inducible degron technologies. Nat. Rev. Genet. 24, 73–85 (2022).
Article PubMed Google Scholar
Crane, E. et al. Condensin-driven remodelling of X chromosome topology during dosage compensation. Nature 523, 240–244 (2015).
Article CAS PubMed PubMed Central Google Scholar
Tarjan, D. R., Flavahan, W. A. & Bernstein, B. E. Epigenome editing strategies for the functional annotation of CTCF insulators. Nat. Commun. 10, 4258 (2019).
Article PubMed PubMed Central Google Scholar
Krijger, P. H. L., Geeven, G., Bianchi, V., Hilvering, C. R. E. & de Laat, W. 4C-seq from beginning to end: a detailed protocol for sample preparation and data analysis. Methods 170, 17–32 (2020).
Article CAS PubMed Google Scholar
Fu, Y. et al. FunSeq2: a framework for prioritizing noncoding regulatory variants in cancer. Genome Biol. 15, 480 (2014).
Article PubMed PubMed Central Google Scholar
Demircioglu, D. et al. A Pan-cancer transcriptome analysis reveals pervasive regulation through alternative promoters. Cell 178, 1465–1477 (2019).
Article CAS PubMed Google Scholar
Davuluri, R. V., Suzuki, Y., Sugano, S., Plass, C. & Huang, T. H. The functional consequences of alternative promoter use in mammalian genomes. Trends Genet. 24, 167–177 (2008).
Article CAS PubMed Google Scholar
Huang, K. K. et al. Long-read transcriptome sequencing reveals abundant promoter diversity in distinct molecular subtypes of gastric cancer. Genome Biol. 22, 44 (2021).
Article CAS PubMed PubMed Central Google Scholar
Kunarso, G. et al. Transposable elements have rewired the core regulatory network of human embryonic stem cells. Nat. Genet. 42, 631–634 (2010).
Article CAS PubMed Google Scholar
Gomez, N. C. et al. Widespread chromatin accessibility at repetitive elements links stem cells with human cancer. Cell Rep. 17, 1607–1620 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sundaram, V. et al. Widespread contribution of transposable elements to the innovation of gene regulatory networks. Genome Res. 24, 1963–1976 (2014).
Article CAS PubMed PubMed Central Google Scholar
Tome, J. M., Tippens, N. D. & Lis, J. T. Single-molecule nascent RNA sequencing identifies regulatory domain architecture at promoters and enhancers. Nat. Genet. 50, 1533–1541 (2018).
Article CAS PubMed PubMed Central Google Scholar
Core, L. J. et al. Analysis of nascent RNA identifies a unified architecture of initiation regions at mammalian promoters and enhancers. Nat. Genet. 46, 1311–1320 (2014).
Article CAS PubMed PubMed Central Google Scholar
Zuin, J. et al. Nonlinear control of transcription through enhancer–promoter interactions. Nature 604, 571–577 (2022).
Article CAS PubMed PubMed Central Google Scholar
Fulco, C. P. et al. Activity-by-contact model of enhancer–promoter regulation from thousands of CRISPR perturbations. Nat. Genet. 51, 1664–1669 (2019).
Article CAS PubMed PubMed Central Google Scholar
Concordet, J. P. & Haeussler, M. CRISPOR: intuitive guide selection for CRISPR/Cas9 genome editing experiments and screens. Nucleic Acids Res. 46, W242–W245 (2018).
Article CAS PubMed PubMed Central Google Scholar
Chen, B. et al. Dynamic imaging of genomic loci in living human cells by an optimized CRISPR/Cas system. Cell 155, 1479–1491 (2013).
Article CAS PubMed PubMed Central Google Scholar
Takahashi, H., Lassmann, T., Murata, M. & Carninci, P. 5′ end-centered expression profiling using cap-analysis gene expression and next-generation sequencing. Nat. Protoc. 7, 542–561 (2012).
Article CAS PubMed PubMed Central Google Scholar
Ewels, P. A. et al. The nf-core framework for community-curated bioinformatics pipelines. Nat. Biotechnol. 38, 276–278 (2020).
Article CAS PubMed Google Scholar
Frith, M. C. et al. A code for transcription initiation in mammalian genomes. Genome Res. 18, 1–12 (2008).
Article CAS PubMed PubMed Central Google Scholar
Krueger, F. & Andrews, S. R. Bismark: a flexible aligner and methylation caller for Bisulfite-Seq applications. Bioinformatics 27, 1571–1572 (2011).
Article CAS PubMed PubMed Central Google Scholar
Hahne, F. & Ivanek, R. Visualizing genomic data using Gviz and Bioconductor. Methods Mol. Biol. 1418, 335–351 (2016).
Article PubMed Google Scholar
Sahin, M. et al. HiC-DC+ enables systematic 3D interaction calls and differential analysis for Hi-C and HiChIP. Nat. Commun. 12, 3366 (2021).
Article CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We want to thank J. Li, N. Mohibullah, C. Cobbs and A. Viale from Memorial Sloan Kettering Cancer Center Integrated Genomics Operation (IGO) for their excellent technical support and C. Liu from Memorial Sloan Kettering Cancer Center for collecting and extracting RNA from patient-derived melanoma cell lines for screening ALK^ATI-expressing cell lines. This work was supported by grants from the National Institutes of Health (NIH)/NCI (R01 CA228216, DP2 CA174499, R01 CA280657, U01 CA252048 and P50 CA217694), the Department of Defense (W81XWH-15-1-0124 and W81XWH-22-1-0326), Francis Collins Scholar Neurofibromatosis Therapeutic Acceleration Program, the Geoffrey Beene Cancer Research Fund and Cycle for Survival Linn Family Discovery Fund (to P.C.); and by NIH/NCI grants (R01 CA208100-04, U54 CA224079-03, P50 CA092629-20 and R01 CA265026 to Y.C.). The IGO core was funded by the NCI Cancer Center Support Grant (P30 CA08748), the Cycle for Survival and the Marie-Josée and Henry R. Kravis Center for Molecular Oncology.

Author information

Thomas Wiesner
Present address: Department of Dermatology, Medical University of Vienna, Vienna, Austria
Thomas Wiesner
Present address: Department of Pathology, Medical University of Vienna, Vienna, Austria
Andrew Rosowicz
Present address: Icahn School of Medicine at Mount Sinai, New York City, NY, USA
Christina Liu
Present address: Yale University School of Medicine, New Haven, CT, USA
Taha Merghoub
Present address: Department of Pharmacology and Edward Meyer Cancer Center and Ludwig Collaborative and Swim Across America Laboratory, Weill Cornell Medicine, New York City, NY, USA
These authors contributed equally: Elissa W. P. Wong, Merve Sahin.

Authors and Affiliations

Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Elissa W. P. Wong, Dan Li, Fanny Tomas, Cindy J. Lee, Thomas Wiesner, Andrew Rosowicz, Christina Liu, Taha Merghoub, Yu Chen & Ping Chi
Computational and Systems Biology Program, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Merve Sahin, Rui Yang & Christina S. Leslie
Tri-Institutional Training Program in Computational Biology and Medicine, New York City, NY, USA
Merve Sahin & Rui Yang
Weill Cornell Graduate School of Medical Science, Weill Cornell Medicine, New York City, NY, USA
UkJin Lee, Yu Chen & Ping Chi
Center for Epigenetics Research, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Yingqian A. Zhan, Morgan Lallo, Pierre-Jacques Hamard & Richard Koche
Department of Genetics, Albert Einstein College of Medicine, Bronx, NY, USA
Rohan Misra & Deyou Zheng
Sandra and Edward Meyer Cancer Center, Weill Cornell Medicine, New York City, NY, USA
Nawaf Alomran, Alexander Polyzos, Tuan Trieu, Alexander Martinez-Fundichely, Ekta Khurana & Effie Apostolou
Institute for Computational Biomedicine, Weill Cornell Medicine, New York City, NY, USA
Nawaf Alomran, Tuan Trieu, Alexander Martinez-Fundichely & Ekta Khurana
Department of Physiology and Biophysics, Weill Cornell Medicine, New York City, NY, USA
Nawaf Alomran, Tuan Trieu, Alexander Martinez-Fundichely & Ekta Khurana
Englander Institute for Precision Medicine, Weill Cornell Medicine, New York City, NY, USA
Nawaf Alomran, Tuan Trieu, Alexander Martinez-Fundichely & Ekta Khurana
Sanford I. Weill Department of Medicine, Weill Cornell Medical College, New York City, NY, USA
Alexander Polyzos, Alexander N. Shoushtari, Taha Merghoub, Effie Apostolou, Yu Chen & Ping Chi
Molecular Pharmacology Program, Sloan Kettering Institute, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Shuyuan Cheng
Department of Medicine, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Alexander N. Shoushtari, Yu Chen & Ping Chi
Parker Institute for Cancer Immunotherapy, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Taha Merghoub
The Saul R. Korey Department of Neurology, Albert Einstein College of Medicine, Bronx, NY, USA
Deyou Zheng
Dominick P. Purpura Department of Neuroscience, Albert Einstein College of Medicine, Bronx, NY, USA
Deyou Zheng
Louis V. Gerstner, Jr. Graduate School of Biomedical Sciences, Memorial Sloan Kettering Cancer Center, New York City, NY, USA
Yu Chen & Ping Chi

Authors

Elissa W. P. Wong
View author publications
Search author on:PubMed Google Scholar
Merve Sahin
View author publications
Search author on:PubMed Google Scholar
Rui Yang
View author publications
Search author on:PubMed Google Scholar
UkJin Lee
View author publications
Search author on:PubMed Google Scholar
Dan Li
View author publications
Search author on:PubMed Google Scholar
Yingqian A. Zhan
View author publications
Search author on:PubMed Google Scholar
Rohan Misra
View author publications
Search author on:PubMed Google Scholar
Fanny Tomas
View author publications
Search author on:PubMed Google Scholar
Nawaf Alomran
View author publications
Search author on:PubMed Google Scholar
Alexander Polyzos
View author publications
Search author on:PubMed Google Scholar
Cindy J. Lee
View author publications
Search author on:PubMed Google Scholar
Tuan Trieu
View author publications
Search author on:PubMed Google Scholar
Alexander Martinez-Fundichely
View author publications
Search author on:PubMed Google Scholar
Thomas Wiesner
View author publications
Search author on:PubMed Google Scholar
Andrew Rosowicz
View author publications
Search author on:PubMed Google Scholar
Shuyuan Cheng
View author publications
Search author on:PubMed Google Scholar
Christina Liu
View author publications
Search author on:PubMed Google Scholar
Morgan Lallo
View author publications
Search author on:PubMed Google Scholar
Alexander N. Shoushtari
View author publications
Search author on:PubMed Google Scholar
Taha Merghoub
View author publications
Search author on:PubMed Google Scholar
Pierre-Jacques Hamard
View author publications
Search author on:PubMed Google Scholar
Richard Koche
View author publications
Search author on:PubMed Google Scholar
Ekta Khurana
View author publications
Search author on:PubMed Google Scholar
Effie Apostolou
View author publications
Search author on:PubMed Google Scholar
Deyou Zheng
View author publications
Search author on:PubMed Google Scholar
Yu Chen
View author publications
Search author on:PubMed Google Scholar
Christina S. Leslie
View author publications
Search author on:PubMed Google Scholar
Ping Chi
View author publications
Search author on:PubMed Google Scholar

Contributions

E.W.P.W., M.S., Y.C., C.S.L. and P.C. conceptualized the project. E.W.P.W., C.J.L., T.W., A.N.S. and T.M. arranged the resources. E.W.P.W., M.S., R.Y., U.L., D.L., F.T., N.A., T.T., A.M.-F., A.R., S.C., C.L., M.L. and D.Z. curated the data. E.W.P.W., M.S., R.Y., U.L., D.L., Y.A.Z., N.A., A.P. and D.Z. carried out formal analysis. E.W.P.W. and F.T. performed validation. E.W.P.W., M.S., R.Y., U.L., D.L., Y.A.Z., R.M., N.A., A.P. and D.Z. handled visualization. E.W.P.W. and M.S. conducted the investigation. E.W.P.W. and M.S. managed project administration. M.S. and N.A. conducted software analysis. E.W.P.W., M.S. and P.C. wrote the original draft of the paper. E.W.P.W., M.S., P.C., R.Y., U.L., D.L., Y.A.Z., R.M., F.T., N.A., A.P., C.J.L., T.T., A.M.-F., T.W., A.R., S.C., C.L., M.L., A.N.S., T.M., P.-J.H., R.K., E.K., E.A., D.Z., Y.C. and C.S.L. did the writing, reviewing and editing of the paper. P.-J.H., R.K., E.K., E.A., D.Z., Y.C., C.S.L. and P.C. provided supervision. Y.C. and P.C. secured funding.

Corresponding authors

Correspondence to Yu Chen, Christina S. Leslie or Ping Chi.

Ethics declarations

Competing interests

Y.C. reports other support from Oric Pharmaceuticals and grants from Foghorn outside the submitted work. P.C. reports personal fees from Deciphera and Ningbo NewBay, grants and institutional support from Deciphera, Pfizer/Array and Ningbo NewBay, and personal fees from Zai Lab and Novartis outside the submitted work. The other authors declare no competing interests.

Peer review

Peer review information

Nature Genetics thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Efficiency of NIPBL downregulation by dox-inducible shRNAs and their effect on cohesin complex chromatin distribution.

a, Relative NIPBL mRNA levels by dox-inducible shRNA-mediated downregulation (shNIPBL_1 and shNIPBL_2) in various melanoma cell lines, 501mel, COLO800 and A375. Error bars are mean ± sd, n = 4 biological replicates. ***P < 0.001, ****P < 0.0001, ordinary one-way ANOVA, Dunnett’s multiple comparisons test. b, Immunoblots of SMC1A and loading control actin (ACTB) in whole cell lysate with shNIPBL knockdown. c, Representative density plots of normalized genome-wide ChIP–seq profile of SMC1A, demonstrating partial loss of NIPBL resulted in the reduction of the cohesin complex core subunit, SMC1A, on chromatin in 501mel cells. Data from one representative biological replicate (n = 3) are shown. d–f, Two-sided Spearman correlation of annotated transcripts’ expression between whole transcriptome analysis by poly-A RNA-seq and CTSS analysis by CAGE-seq under shLuc (control; d), shNIPBL_1 (e) and shNIPBL_2 (f) NIPBL perturbation conditions.

Source data

Extended Data Fig. 2 Loss of NIPBL leads to altP activation in intron and distal intergenic regions.

a, Relative NIPBL mRNA levels by dox-inducible shRNA-mediated downregulation (shNIPBL_4 and shNIPBL_5) in 501mel melanoma cells. Error bars are mean ± sd, n = 4 biological replicates. ****P < 0.0001, ordinary one-way ANOVA, Dunnett’s multiple comparisons test. b, Volcano plots of differentially regulated genes (FDR < 0.05) by whole transcriptome analysis of shRNA-mediated NIPBL perturbation. Upregulated genes (red, log₂(fold change) > 1), downregulated genes (blue, log₂(fold change) < −1) and genes of interest (green). n = 2 biological replicates. c, Volcano plots of significantly altered TSSs (CTSSs, FDR < 0.1, log₂(fold change) > 1) by CAGE-seq under NIPBL perturbation conditions as in a. n = 2 biological replicates. d, Two-sided Spearman correlation of transcriptome changes (log₂(fold change)) of annotated transcripts between whole transcriptome analysis by poly-A RNA-seq and CTSS analysis by CAGE-seq under NIPBL perturbation conditions (P < 1 × 10⁻¹⁶ for both shNIPBL conditions). e,f, Venn diagram showed overlapping upregulated/downregulated genes (e) and CTSSs (f) among the 4 hairpins and a highly significant overlap between shNIPBL_1, shNIPBL_4 or shNIPBL_5 with shNIPBL_2 (e,f). Hypergeometric tests were performed to examine the significance of overlapping differentially expressed genes (RNA-seq) or CTSSs (CAGE-seq) in comparison to genomic background. log₂(fold change) of upregulated/downregulated genes (g) and CTSSs (h) by shNIPBL_1, shNIPBL_4 or shNIPBL_5 was highly correlated with shNIPBL_2. i, Genomic annotation of all CTSSs by CAGE-seq under control condition. Percentage of each genomic feature is indicated. j, Distribution of genomic features of significantly (FDR < 0.1) upregulated (log₂(fold change) > 1) and downregulated (log₂(fold change) < −1) CTSSs with shRNA-mediated NIPBL downregulation. k, Representative examples of CAGE-seq and RNA-seq profiles of significantly upregulated CTSSs in intron and distal intergenic regions with NIPBL downregulation. Normalized CAGE-seq (blue, plus strand; red, minus strand) and RNA-seq profiles from two independent experiments are shown. New spliced transcripts initiated from respective CTSSs are shaded in pink. The presence of repetitive elements is indicated by the ‘repeatmasker’ track from the UCSC genome browser (hg19 (GRCh37) genomic version). Enlarged genomic regions are shown in dotted inset.

Source data

Extended Data Fig. 3 The impact of NIPBL downregulation by dox-inducible shRNA on CTSS by CAGE-seq in various cancer cell lines.

a–d, Efficiency of NIPBL downregulation by doxycycline-inducible shRNA in non-small cell lung cancer (KRAS-mutant: A549 and H358; ALK-fusion-positive: H2228; a), prostate cancer (AR-positive: VCAP and 22RV1; b), colorectal cancer (C106, HCT116, and SW620; c) and breast cancer (ER-positive, HER2-negative: CAMA-1; d) cell lines. Error bars are mean ± sd, n = 3 biological replicates, **P < 0.01, ***P < 0.001, ****P < 0.0001, two-tailed unpaired t test. e, Genomic annotation of all, significantly (FDR < 0.1) upregulated (log₂(fold change) > 1) and downregulated (log₂(fold change) < −1) CTSS by CAGE-seq with shNIPBL_2-mediated NIPBL downregulation in indicated cell lines. Percentage of each genomic feature is indicated.

Source data

Extended Data Fig. 4 Characteristics and distribution of TEs and subclasses in differentially regulated CTSSs by NIPBL downregulation.

a, The number and corresponding percentage of CTSSs with promoter width lower or equal to 16 bp (sharp promoter) or higher than 16 bp (broad promoter) in baseline and shNIPBL perturbation. P value by two-tailed chi-square test. b,c, Distribution of LTR families (b) and subtypes (c) for baseline and upregulated CTSSs by shNIPBL_1 and shNIPBL_2. d, Distribution of significantly upregulated CTSSs by shNIPBL_4 and shNIPBL_5 at repetitive vs. non-repetitive elements in intron, distal intergenic and promoter regions. e,f, Distribution of LTR families (e) and subtypes (f) for baseline and upregulated CTSSs by shNIPBL_4 and shNIPBL_5. g, CpG and C/G composition in baseline CTSSs and differentially regulated CTSSs by shNIPBL_1. h, CpG compositions of significantly upregulated, downregulated and all CTSSs by shNIPBL_2.

Source data

Extended Data Fig. 5 Global DNA methylation analysis by enzymatic methyl sequencing (EM-seq).

a,b, Representative distribution plots of DNA methylation sequencing coverage (a) and global DNA methylation levels (b) with NIPBL perturbation (shNIPBL_1 and shNIPBL_2) and control (shLuc). n = 2 biological replicates. c,d, Representative DNA methylation density plots under shLuc condition for baseline CTSSs and differentially regulated CTSSs by shNIPBL_1 (c) and shNIPBL_2 (d) perturbations. e,f, Histogram of CpG methylated regions under shLuc condition for baseline CTSSs and differentially regulated CTSSs by shNIPBL_1 (e) and shNIPBL_2 (f). g,h, Scatter plots showing differential methylation β vs. log₂(fold change) of CTSS in shNIPBL_1 (g) and shNIPBL_2 (h) conditions. i–m, Representative DNA methylation profiles by EM-seq of significantly upregulated CTSSs in ALK intron 19 altP region (i), ULK4 (intron 31; j), LINC01387 (intron 2; k), between SYN3 and LINC01640 (intergenic regions; l) and between TAS2R39 and TAS2R40 (intergenic regions; m). Coverage (Cov) and methylation conversion (methyl) tracks were shown.

Source data

Extended Data Fig. 6 MITF binding at activated altPs in intron and distal intergenic regions by NIPBL perturbation in melanoma cells.

a, MITF-binding motif (highlighted in orange) at representative activated altP often embedded within LTR sequence (highlighted in green). b, Fraction of baseline or shNIPBL-differentially increased/decreased MITF ChIP–seq peaks that were localized within LTR, LINE and SINE repetitive elements in the genome. E, exponents of 10. P value by two-sided proportion test. c, Normalized CAGE-seq, MITF ChIP–seq and H3K4me3 CUT&RUN profiles with NIPBL perturbation at representative activated altPs (shaded in pink). d, H3K4me3 enrichment by ChIP–qPCR with NIPBL perturbations at selective activated altPs. Data indicated the mean ± s.d. (n = 3–4 biological replicates). *P < 0.05, **P < 0.01, ordinary one-way ANOVA, Dunnett’s multiple comparisons test.

Source data

Extended Data Fig. 7 NIPBL downregulation leads to alterations in long-range interaction frequency and chromatin architecture.

a, Hi-C contact probability plot showing the average contact frequency of valid paired-end reads as a function of genomic distance with NIPBL perturbation. Genomic distance between valid cis read pairs was calculated and the distribution of log₁₀ read pair genomic distance was plotted. b, Hi-C contact map (at 100 kb resolution) of a representative region (chr8) with NIPBL downregulation (shNIPBL_2, right) vs. control (shLuc, left), demonstrating finer segregation in compartments after NIPBL downregulation. c, Number of baseline CTSSs versus differentially regulated CTSSs by shNIPBL_1 and shNIPBL_2 in hierarchical and non-hierarchical TADs defined under control condition (≥ level 3 vs. ≤ level 2 and singleton). d–e, Fraction (d) and number (e) of baseline CTSS versus differentially regulated CTSSs by shNIPBL_4 and shNIPBL_5 in hierarchical and non-hierarchical TADs defined under control condition (≥ level 3 vs. ≤ level 2 and singleton). *P < 0.05, **P < 0.01, two-tailed chi-square test.

Source data

Extended Data Fig. 8 Perturbation of CTCF insulators in proximity to ALK^ATI and ULK4 altPs.

a, Hi-C contact map (10 kb resolution) of TADs surrounding ALK^ATI in control (top, shLuc) and NIPBL knockdown (bottom, shNIPBL_2). Genomic region of interest around the ALK^ATI was shaded in blue. Selected CTCF insulator sites (shaded in yellow) that overlapped with SMC1A binding around the ALK^ATI altP (shaded in pink) were targeted. b–f, CTCF ChIP–qPCR with specific sgRNA targeting selective CTCF binding by dCas9–KRAB compared to sgGFP1/sgGFP2 controls. g, Relative ALK^ATI mRNA levels with dCas9–KRAB and respective sgRNAs. h,i, Quantification of CTCF binding by ChIP–qPCR at WDR43_3dn (h) by Cas9 deletion of CTCF and relative ALK^ATI mRNA levels (i). j, Hi-C contact map (10 kb resolution) of TADs surrounding ULK4 intron 31 in control (top, shLuc) and NIPBL knockdown (bottom, shNIPBL_2). Genomic region of interest around the ULK4 altP was shaded in blue. Selected CTCF insulator sites (shaded in yellow) that overlapped with SMC1A binding around the ULK4 intron 31 altP (shaded in pink) were targeted. k–m, CTCF ChIP–qPCR with specific sgRNA targeting selective CTCF binding by dCas9–KRAB compared to sgGFP1/sgGFP2 controls. n, Relative ULK4 intron 31 TSS mRNA levels with dCas9–KRAB and respective sgRNAs. o,p, Quantification of CTCF binding by ChIP–qPCR at CTNNB1_5up (o) by Cas9 deletion of CTCF and relative ULK4 intron 31 TSS mRNA levels (p). Error bars are mean ± s.d. (n = 3 biological replicates). *P < 0.05, **P < 0.01, ***P < 0.001, ****P < 0.0001, ordinary one-way ANOVA, Dunnett’s multiple comparisons test (b–g, k–n). *P < 0.05, **P < 0.01, two-tailed unpaired t test (h,i,o,p).

Source data

Extended Data Fig. 9 Specificity of CRISPRi-sgRNA at the ALK^ATI and ULK4 intron 31 altPs.

a, Single dCas9 peak was observed in ChIP–seq for each CRISPRi-sgRNA targeted region around the ALK^ATI altP within a 5.6 Mbp window, while no dCas9 peak was observed in no sgRNA and non-targeting sgGFP controls. b, Normalized Cas9 ChIP–seq and H3K27ac and H3K9me3 CUT&RUN signals around on-target sites. Pink highlighted areas indicated the locations of designed sgRNA. Specific increase in H3K9me3 decoration was observed at corresponding dCas9 peak. Specific repression of H3K27ac was observed at intron 19 and intron 4 by sgALK^ATI TSS and sgALK int4. c, Single dCas9 peak was observed in ChIP–seq for each CRISPRi-sgRNA targeted region around the ULK4 intron 31 altP within a 6.2 Mbp window, while no dCas9 peak was observed in no sgRNA and non-targeting sgGFP controls. d, Normalized Cas9 ChIP–seq and H3K27ac and H3K9me3 CUT&RUN signals around on-target sites. Pink highlighted areas indicated the locations of designed sgRNA. Specific increase in H3K9me3 decoration was observed at corresponding dCas9 peak. Specific repression of H3K27ac was observed at intron 31 and intron 30 by sgULK4int31 TSS and sgULK4 int30. e, log₂(fold change) of H3K9me3 vs. H3K27ac CUT&RUN signals at dCas9 peaks called in sgALK^ATI TSS (left) or sgALK int4 (right) compared to no sgRNA control conditions. Orange point highlighted dCas9 peak specific to sgALK^ATI TSS or sgALK int4. f, log₂(fold change) of H3K9me3 vs. H3K27ac CUT&RUN signals at dCas9 peaks called in sgULK4int31 TSS (left) or sgULK4 int30 (right) compared to no sgRNA control conditions. Orange point highlighted dCas9 peak specific to sgULK4int31 TSS or sgULK4 int30.

Extended Data Fig. 10 NIPBL functional perturbation by FunSeq2 of TCGA melanoma cases and MITF binding for ALK^ATI transcriptional activation in patient-derived melanoma cell lines.

a, Each cohort of high and low ALK^ATI expression (RSEM) has 50 cases. NIPBL mutations are present in 12 of 50 cases in high ALK^ATI expression cohort and in 4 of 50 cases of low ALK^ATI expression cohort. FunSeq2 score is considered 0 for cases without any NIPBL mutations. b, Quantification of MITF binding by ChIP–qPCR at selective H3K27ac-enriched regions that interacted with ALK^ATI altP identified by H3K27ac HiChIP in ALK^ATI-expressing melanoma cell lines. Negative control: PSA promoter; positive controls: TYR and DCT promoters. c, Immunoblots of MITF and loading control actin (ACTB) with siRNA-mediated knockdown of MITF in melanoma cells. d, Relative ALK^ATI mRNA levels with siRNA-mediated MITF knockdown in melanoma cells as in b. Error bars are mean ± s.d. (n = 3–4 biological replicates). **P < 0.01, ***P < 0.001, ****P < 0.0001, ordinary one-way ANOVA, Dunnett’s multiple comparisons test.

Source data

Supplementary information

Supplementary Information

Supplementary Figs. 1–4 and Notes 1–10.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–6.

Supplementary Data 1

Supporting data for Supplementary Figs. 2 and 3.

Source data

Source Data Fig. 1

Statistical source data.

Source Data Fig. 2

Statistical source data.

Source Data Fig. 3

Statistical source data.

Source Data Fig. 4

Statistical source data.

Source Data Fig. 5

Statistical source data.

Source Data Extended Data Figs. 1–8 and 10

Statistical source data.

Source Data Extended Data Fig. 1

Unprocessed western blots.

Source Data Extended Data Fig. 10

Unprocessed western blots.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Wong, E.W.P., Sahin, M., Yang, R. et al. Disruption of TAD hierarchy promotes LTR co-option in cancer. Nat Genet 57, 1754–1765 (2025). https://doi.org/10.1038/s41588-025-02239-6

Download citation

Received: 02 May 2023
Accepted: 20 May 2025
Published: 30 June 2025
Issue date: July 2025
DOI: https://doi.org/10.1038/s41588-025-02239-6

Subjects

Abstract

Similar content being viewed by others

Main

Results

Haploinsufficient loss of NIPBL induces global altP activation

NIPBL loss activates altP, arising from LTR with open chromatin features

NIPBL partial loss leads to decrease in hierarchical TAD structures

Promiscuous gene activation through retargeting of enhancers to altP

NIPBL loss-mediated altP usage from LTR can give rise to oncogene expression

Discussion

Methods

CRISPRi

RNA-seq

CAGE-seq

EM-seq

CUT&RUN

Hi-C

4C-seq

RNA-seq analysis

CAGE-seq analysis

Correlation between RNA-seq and CAGE-seq

Repetitive region analysis

CUT&RUN analysis

ChromHMM

CG content analysis

EM-seq and DNA methylation analysis

HOMER motif analysis

4C-seq analysis

FunSeq2 analysis

Hi-C and HiChIP analysis

Generation of valid paired-end reads

Preprocessing of Hi-C and HiChIP data

Calling significant and differential loops

TAD annotations

Statistics and reproducibility

Reporting summary

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding authors

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Source data

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links