Introduction

Renal cell carcinoma (RCC), one of the most common urological malignancies globally, was estimated to have caused approximately 430,000 new cases worldwide in 20201. Clear cell renal cell carcinoma (ccRCC), the predominant histological subtype, accounts for 70–80% of all cases and is characterized by high heterogeneity and aggressiveness2. Radical surgery (nephrectomy or partial nephrectomy) is the standard treatment; however, 20–40% of patients experience recurrence and disease progression, and 25–30% present with metastasis at diagnosis3. Unfortunately, surgical outcomes are poor for these patients, and ccRCC is inherently resistant to radiotherapy and chemotherapy, leading to a significantly reduced five-year survival rate. Therefore, improving the mortality rate of RCC depends on enhancing early detection sensitivity and developing new therapeutic targets, which necessitates a deeper understanding of the molecular biology and pathogenesis of this cancer.

With the rapid development of next-generation sequencing (NGS) technologies (short-read sequencing, SRS), our understanding of ccRCC has significantly deepened. Studies based on single nucleotide variants (SNVs) and small insertions/deletions (InDels) have identified mutations in key renal cancer genes such as VHL, PBRM1, SETD2, and BAP1, which have been proven to be critical initiating and driving factors in ccRCC3. However, the landscape and pathogenesis of larger-scale genomic alterations—structural variants (SVs)—remain underexplored in ccRCC. SVs, defined as insertions, deletions, duplications, inversions, and translocations of at least 50 base pairs, represent large-scale genomic alterations that account for a substantial proportion of genetic variation in the human genome4. SVs are ubiquitous in cancer5 and emerge during tumor development, progression, and therapeutic resistance6. They can dysregulate gene expression by amplifying, disrupting, or fusing cancer-related genes or by repurposing non-coding DNA regulatory elements7. While SVs were traditionally detected using next-generation sequencing, recent advances in long-read sequencing (LRS) technologies have made SV detection more precise and convenient8. Long-read sequencing can directly span SVs to identify breakpoint junctions, resolve high homology repetitive regions, and sequence complex genomic regions with high GC content9. Additionally, LRS employs single-molecule sequencing technology, which minimizes base mismatches10 and sequencing biases11. Given these advancements, there is a compelling need to perform long-read sequencing on ccRCC genomes to precisely characterize SV features and uncover novel mechanisms and driver events.

Recent technological advances have made the detection of SVs more accessible; however, interpreting their pathogenicity and phenotypic consequences remains a formidable challenge. Notably, SVs not only exhibit dosage effects by altering gene coding but also exert position effects by influencing the location and/or function of cis-regulatory elements such as promoters and enhancers12. Recent advances in three-dimensional genomic interaction technologies, such as high-throughput chromosome conformation capture (Hi-C) sequencing, have enabled the detection and characterization of the 3D genomic structure in tumors, thereby facilitating the evaluation of SV position effects13. Recent studies have revealed that dynamic changes in 3D chromatin organization are associated with the development of various solid tumors14,15,16. Importantly, SVs formation is influenced by 3D genome architecture17,18,19, while SVs can disrupt 3D genomic structures, thereby interfering with the precise regulatory networks of gene expression15,16. This underscores the bidirectional interplay between SVs and 3D genomic organization, highlighting their dynamic and reciprocal relationship. Moreover, enhancer hijacking, as an important manifestation of the positional effects of structural variations, has been demonstrated to play a significant role in various cancers such as leukemia20, high-grade glioma21, and glioblastoma22.

However, the interplay between SVs, 3D epigenomic organization, and dysregulated transcriptional programs in ccRCC remains underexplored, with critical gaps persisting in our understanding of its molecular pathogenesis. In this study, we obtained nanopore third-generation whole-genome sequencing (WGS), next-generation WGS, Hi-C, H3K27ac ChIP-seq, and RNA-seq data from two renal cancer cell lines (786-O and OS-RC-2) and an immortalized normal renal epithelial cell line (HEK293T) to comprehensively assess the structural variant landscape and 3D chromatin organization in renal cancer. We also performed RNA-seq on 13 clinical ccRCC samples to validate our findings. Overall, our study identifies novel variant targets in renal cancer from the perspective of structural variation, reveals multidimensional 3D genome reorganization during ccRCC oncogenesis, marked by compartment shifts, TAD boundary alterations, and chromatin loop reconfiguration. Our study further elucidates the dynamic interplay between SVs and the 3D genome, revealing how SVs drive aberrant oncogenic regulation through altering 3D chromatin organization.

Notably, we identify and validate a previously unrecognized pathological enhancer hijacking event that drives the constitutive activation of the proto-oncogene SEMA5B. Leveraging these mechanistic insights, we also developed a machine learning-based prognostic prediction model for ccRCC, founded on cancer-specific enhancer hijacking events. This model demonstrates robust predictive performance and stability. These findings not only expand our understanding of the genetic and molecular basis of ccRCC but also offer novel potential therapeutic targets and biomarkers for precision oncology, while providing a translational pathway for clinical application of the uncovered mechanisms.

Results

The landscape of structural variation signatures in ccRCC

In this study, we systematically characterized the spectrum of SVs associated with the tumorigenesis of normal renal epithelial cells using nanopore long-read whole-genome sequencing. Our nanopore sequencing data exhibited exceptional quality, with an N50 of approximately 50 kb and a mean sequencing quality of around 20 (Supplementary Fig. 1a–c, Supplementary Fig. 1g,h, Supplementary Tables 1 and 2). The data also achieved a mean map identity exceeding 95% and a mapping ratio above 99% (Supplementary Fig. 1k, Supplementary Table 2), alongside sufficient sequencing depth (≥25X) (Supplementary Fig. 1a–f, 1i, Supplementary Table 2), and a throughput surpassing 43 Gb for reads longer than 50 kb (Supplementary Fig. 1j).

Through alignment of the sequencing data to the GRCh38 (hg38) reference genome, we identified a substantial number of high-confidence SV events and constructed a detailed SV map (Fig. 1a, Supplementary Fig. 2d-2e). The total SV counts in 786-O, OS-RC-2, and HEK293T cells were 18,912, 18,792, and 21,571, respectively (Fig. 1b, Supplementary Tables 3 and 4). Consistent with patterns observed in other solid tumors23,24, deletions and insertions were the predominant types, comprising approximately 45% and 51% of SVs in each tumor cell, respectively. However, unlike other tumors with a duplication rate of around 6%, ccRCC exhibited a markedly lower duplication rate, indicating that duplications may not be a defining feature of renal cancer24. Our analysis of the length distribution of different SV types revealed that insertions and deletions shared similar patterns, with a prominent peak near 300 bp and the majority of events occurring within 1 kb (Fig. 1c, Supplementary Fig. 2h). Furthermore, examination of the normalized counts of different SV types across each chromosome indicated that the distribution patterns of each specific SV type were largely consistent (Figs. 1e,f, Supplementary Fig. 2i). In line with prior research25,26, most SVs were located in intergenic and intronic regions (Fig. 1d, Supplementary Fig. 2f,g).

Fig. 1: SV landscapes in 786-O, OS-RC-2, and HEK293T cell lines.
figure 1

a Circos plot illustrating the genome-wide distribution of 786-O high-confidence SVs of identified by Sniffles, cuteSV, and NanoSV, and integrated by SURVIVOR. The tracks from the outermost to innermost circles represent deletions, insertions, duplications, with the innermost green curve indicating inversions and the purple curve translocations. b Venn plot depicting the intersection of structural variants in two ccRCC cell lines (786-O and OS-RC-2) and one normal renal cell line (HEK293T), with counts shown. c Histograms displaying the length distribution of different types of ccRCC SVs. d Distribution of SVs in various genomic regions in 786-O. Distribution of standardized total SV burden (gray: complex SV, light blue: deletion, green: duplication, orange: insertion, yellow: inversion, purple: translocation) across chromosomes in 786-O (e) and OS-RC-2 (f). g Pipeline for identifying cancer-related genes with exons directly impacted by SVs in 786-O and OS-RC-2. ccRCC cancer genes were defined as those annotated by COSMIC or ranked among the top 30 ccRCC single nucleotide variation genes in TCGA. h IGV image showing a heterozygous deletion on chromosome 3 covering HIF1A in 786-O and a heterozygous duplication on chromosome 3 covering HIF1A in OS-RC-2. i Ribbon image showing a heterozygous deletion on chromosome 3 covering HIF1A in 786-O. j Ribbon image showing a heterozygous duplication on chromosome 3 covering HIF1A in OS-RC-2.

We next investigated the prevalence of complex SVs, defined by overlapping breakpoint junctions. Comparative analysis revealed significantly higher frequencies of complex SVs in renal carcinoma cell lines (786-O and OS-RC-2) compared to normal renal epithelial cells (HEK293T) (Supplementary Table 5), suggesting enhanced genomic instability during carcinogenesis. To identify tumor-specific SVs potentially driving oncogenesis, we performed a systematic comparison between malignant and normal cell lines. Notably, approximately one-third demonstrated cell line specificity (Fig. 1b, Supplementary Table 6), implying functional relevance of lineage-restricted SVs in ccRCC pathogenesis. Subsequent analysis of SV-affected exonic regions identified 232, 234, and 255 genes disrupted in 786-O, OS-RC-2, and HEK293T, respectively (Fig. 1g, Supplementary Table 7). Importantly, 121 and 124 tumor-specific SV-associated genes in 786-O and OS-RC-2 were prioritized for functional characterization (Supplementary Table 8). Cross-referencing with the COSMIC database and TCGA mutation rank revealed 10 and 27 tumor-specific SV-associated genes corresponding to either canonical oncogenes or renal carcinoma-associated mutations (SNV-based top 30 candidates) (Supplementary Table 9). Of particular clinical relevance, both ccRCC models exhibited HIF1A alterations - a hypoxia-inducible oncogene regulated by VHL (a renal cancer gatekeeper gene) whose dysregulation correlates with poor prognosis27,28,29. In 786-O, we identified a 38-kb intragenic deletion corroborating previous cytogenetic and NGS reports of HIF1A loss (Fig. 1g, h). Conversely, OS-RC-2 exhibited a novel duplication event, expanding the spectrum of HIF1A alterations in ccRCC. Furthermore, long-read sequencing uncovered a 2.6-kb insertion in SETD2 (Supplementary Fig. 3f–h), a chromatin modifier gene previously associated with renal oncogenesis through SNV/indel mutations3,30. Collectively, these findings—including canonical ccRCC genomic hallmarks 3p loss, 5q gain30 (Supplementary Fig. 2a, b)—validated the molecular identity of our cell models. Both tumor lines shared mutations in FLG and CAMKK2 — two genes ranked among the top 30 most frequently SNV-mutated loci in ccRCC (Supplementary Fig. 2k). Additionally, we detected a large ~258-kb deletion affecting the renal cancer-associated oncogene NFIB (Supplementary Fig. 2i). To independently verify these findings, we performed multi-platform visualization of the relevant genomic regions across all three cell lines using Integrative Genomics Viewer (IGV) and Ribbon (Fig. 1h–j, Supplementary Fig. 2j, k).

We systematically compared the performance of third-generation and next-generation sequencing in detecting SVs in ccRCC. Long-read sequencing demonstrated superior sensitivity, identifying significantly more high-confidence SVs than next-generation sequencing (Supplementary Fig. 3a–c). Genome-wide ideogram revealed a marked increase in breakpoint density detected by long-read sequencing, particularly in complex genomic regions such as centromeres and telomeres (Supplementary Fig. 3d, e). Moreover, long-read sequencing resolved a large-scale insertion in SETD2 (Supplementary Fig. 3f, h), a key renal cancer driver gene undetectable by next-generation sequencing. Consistent with prior studies24, deletions and insertions dominated TGS-specific SVs, likely mediated by repetitive element activity31. Annotation of repeat-associated deletions revealed that TGS detected a higher burden of repetitive element-driven SVs compared to next-generation sequencing, with distinct compositional profiles: SINEs represented 58.8% of NGS-associated repeats but showed reduced proportions in long-read data, while other repeat classes (e.g., LINEs, LTRs) exhibited increased representation (Supplementary Fig. 3g). TGS-identified repeats displayed broader chromosomal distribution without positional bias (Supplementary Fig. 3j) and larger average sizes (Supplementary Fig. 3i). Subfamily-level classification and multi-repeat-overlapping SV analyses further refined these observations (Supplementary Fig. 4).

In summary, the SVs identified here directly impact oncogenes and tumor suppressor genes, potentially exerting significant influence on tumor malignancy and the maintenance of neoplastic phenotypes. Our long-read sequencing analysis delineates the landscape of SVs in human renal cancer, a previously underexplored area, and provides a valuable resource for future studies on the pathogenesis of ccRCC.

Extensive remodeling of 3D genome organization associates with gene expression alterations in ccRCC

To investigate the global impact of SVs on higher-order chromatin structure, we analyzed the 3D genome of ccRCC cells and normal cells using Hi-C technology (Supplementary Fig. 5a–c). Multi-scale comparative analysis revealed extensive reorganization of the 3D genome architecture in malignant cells compared to normal cell.

We first characterized A/B compartment distributions, observing compartment switching rates of 27.1% (14.0% A-to-B, 13.1% B-to-A) in 786-O and 27.1% (14.4% A-to-B, 12.8% B-to-A) in OS-RC-2 (Supplementary Table 11). Chromosomal-level profiling revealed a tendency toward compartment identity conservation, with stable A compartments representing the largest fraction in ccRCC cells (786-O: 39.5%; OS-RC-2: 38.8%; Fig. 2a). Consistent with the established role of compartments in gene regulation, compartment switches significantly correlate with gene expression changes, with stable A and B to A compartments showing higher expression, and A to B compartments exhibiting low transcription, suggesting silencing (Fig. 2b and Supplementary Fig. 5d, Supplementary Table 12). Further analysis indicated that the B-to-A compartment switch, despite involving fewer differentially expressed genes overall, was markedly enriched for transcriptionally activated oncogenes. This was statistically significant in both 786-O (30/322 vs. 6/388; p = 2.745e-06; Fig. 2c) and OS-RC-2 (22/299 vs. 4/342; p = 6.99e-05; Supplementary Fig. 5g) cell lines, with the COSMIC-annotated oncogene BTG1 serving as a representative example (Fig. 2c, e, Supplementary Fig. 5g). Moreover, Pearson correlation heatmaps showed that cancer cell lines not only had extensive compartment remodeling but also significant enhancement of A/B compartment patterns (Fig. 2d).

Fig. 2: 3D genome architectural reorganization correlates with gene expression alterations in ccRCC.
figure 2

a Stacked bar chart showing frequencies of whole-chromosome A/B compartment shifts in 786-O and OS-RC-2 compared to HEK293T. b Box plot showing gene expression distribution: the box spans the interquartile range (IQR), the center line marks the median, and the whiskers extend to 1.5×IQR (or to the maximum/minimum values if within range). Gene expression comparisons are presented as Log2FC (786-O vs HEK293T), with p-values calculated using the Wilcoxon rank-sum test. c Volcano plot highlighting differentially expressed genes (orange: significantly upregulated; green: significantly downregulated) and cancer-related genes in regions with B-to-A compartment shifts. Differentially expressed genes are defined by |Log2FoldChange | > 1 and adjusted p value < 0.05, with examples of cancer-related genes circled in black. d Pearson correlation heatmap of chr15 in OS-RC-2 and HEK293T. e IGV images display an example of A/B compartment shifts on chromosome 12 in 786-O and OS-RC-2 versus HEK293T. Assignment of A (red) and B (blue) compartments is based on eigenvector values > 0 and < 0. Gene density in the genome is shown as histograms, and enhancer activity is marked by H3K27ac ChIP-seq peaks, presented as histograms in 786-O (yellow track) and OS-RC-2 (blue track). The red box indicates a common B-to-A event covering the BTG1 gene in both cancer cell lines. f Venn plot depicting the overlap of TADs derived from 10-kb resolution interaction matrices across the three cell types, with the number of TADs in each category shown. g, h Examples of TAD alterations in regions of interest (G: chr7:31,740,000-33,130,000, H: chr5:156,410,000-157,740,000) in OS-RC-2 compared with HEK293T. Boxes in the interaction heatmaps delineate TADs, with involved genes displayed (RefSeqGene). i Box plot showing the distribution of TAD length: the box spans the interquartile range (IQR), the center line marks the median, and the whiskers extend to 1.5×IQR (or to the maximum/minimum values if within range). P values were calculated using the Wilcoxon rank-sum test. j Box plot showing gene expression distribution: the box spans the interquartile range (IQR), the center line marks the median, and the whiskers extend to 1.5×IQR (or to the maximum/minimum values if within range). Gene expression comparisons are presented as Log2FoldChange (ccRCC-common TAD region vs conserved TAD region), with P values calculated using the Wilcoxon rank-sum test. k Jensen- diseases pathway enrichment of differentially expressed genes located in ccRCC-common TADs. P values were obtained by Fisher’s exact test using EnrichR.

At the finer scale of topologically associating domains (TADs), we identified 7014, 7823, and 7580 TADs at 10 kb resolution in the 786-O, OS-RC-2, and HEK293T cell lines, respectively (Fig. 2f, Supplementary Table 13). Notably, 3,907 TADs were conserved across all three cell lines, representing 55.7%, 49.9%, and 51.5% of the total TADs in 786-O, OS-RC-2, and HEK293T, respectively. This high degree of conservation highlights the robustness of TADs in the human genome, which is consistent with previous studies15,32. To characterize TAD alterations, we defined ccRCC-specific TADs (gained in tumor cells) and ccRCC-common TADs (shared between tumor cells but absent in normal cells). Our results showed that, compared to conserved TADs, ccRCC-specific and ccRCC-common TADs were significantly smaller in size (Fig. 2i), a finding similar to previous studies on solid tumors such as multiple myeloma, liver cancer, and prostate cancer14,15,16. We visualized interaction matrices using hicPlotTADs to illustrate these characteristics, which clearly depicted the smaller TAD sizes on chromosomes 7 and 5 in the ccRCC cell lines (Fig. 2g, h, Supplementary Fig. 5h,i). Functionally, genes within ccRCC-specific TADs exhibited significantly higher expression levels than those in conserved TADs. (Fig. 2j, Supplementary Tables 15-16). Moreover, these genes were significantly enriched in pathways related to kidney cancer and other associated diseases (Fig. 2k, Supplementary Table 17), suggesting that these regions are hotspots for transcriptional dysregulation and oncogene activation.

In summary, our results demonstrate that ccRCC undergoes extensive cancer-specific remodeling of the 3D genome at both the compartment and TAD levels. These structural changes are closely associated with transcriptional dysregulation and the aberrant activation of oncogenes, implicating them as potential drivers of ccRCC development and progression.

Distribution of structural variants in 3D chromatin organization

Previous studies have established that somatic SV formation is influenced by and reciprocally modulates three-dimensional genome architecture17,18,19. To elucidate this relationship in ccRCC, we systematically investigated SV distribution patterns across hierarchical chromatin structures. Initial compartment-level analysis through permutation testing demonstrated significant enrichment of both ccRCC-specific and cell-type SVs in transcriptionally active A compartments (ccRCC-specific SVs: Z-score = 11.959, P = 0.001; 786-O SVs: Z-score = 19.221, P = 0.001; Fig. 3a). At local genomic scales, cancer cell lines (786-O and OS-RC-2) exhibited increased deletion and insertion densities in A compartments alongside decreased densities in B compartments, a pattern absent in normal cell (Fig. 3b, Supplementary Table 18). Further examination of compartment transition chromosomal regions revealed elevated deletion/insertion densities in stable A compartments and B to A transition regions (except OS-RC-2 insertion), while no significant changes were observed in A to B regions compared to genome-wide background levels (Supplementary Fig. 6a, Supplementary Table 20). These findings collectively suggest preferential SV accumulation in A compartments, potentially linked to the high transcriptional activity and consequent high DSB occurrence rate in A compartments, as well as to B to A compartment dynamics.

Fig. 3: Distributions of structural variants across 3D genome architectures.
figure 3

a Permutation testing was employed to assess the spatial distribution patterns of both pan-cellular and ccRCC specific structural variants across 3D chromatin organizations. The Expected distribution was derived from 1000 iterations of random shuffling of chromosomal regions. b Density of SVs (insertions, deletions, and duplications) across A/B compartments and TADs. SV density, calculated as the number of SVs normalized by the length of corresponding chromosomal regions, is represented as follows: gray bars indicate the background SV density across entire chromosomes; yellow bars show SV density in A compartments; light blue bars represent SV density in B compartments; orange bars depict SV density at TAD boundaries; and dark blue bars illustrate SV density within TAD domains. Enrichment analyses were conducted with proportionality test using R’s proportion test, comparing the proportion of SVs in each region of interest to the proportion of that region’s length in the whole genome. Significance levels are denoted as follows: ****p ≤ 0.0001, ***p ≤ 0.001, **p ≤ 0.01, *p ≤ 0.05. c Density of ccRCC-specific SVs (insertions, deletions, and duplications) across A/B compartments and TADs. ccRCC-specific SVs are defined as those occurring in 786-O or OS-RC-2 but absent in HEK293T. SV density was calculated using the same method as in the legend of (B). Enrichment analyses for these ccRCC-specific SVs were also performed using R’s proportion test, following the same approach as described in (B). d Proportions of ccRCC specific SV subtypes across distinct A/B compartment switching regions in 786-O cell. e Schematic of SV classification based on the positional relationship between ccRCC specific SV breakpoints and TAD boundaries. f Number and proportion of ccRCC specific SV types in 786-O according to (E). g Number and proportion of ccRCC specific SV types in 786-O according to (E).

Analysis of tumor-specific SVs revealed distribution patterns mirroring those in parental cell lines, with deletion/insertion showing A compartment enrichment. Notably, 786-O-specific deletions displayed a significant reduction within B compartments (Fig. 3c, Supplementary Table 19). Compartment transition analysis showed ccRCC-specific deletions consistently enriched in B to A regions, while insertion and duplication distributions exhibited cell line-specific variations (Supplementary Fig. 6b, Supplementary Table 22). Despite density differences across compartment types, SV type composition remained comparable (Fig. 3d, Supplementary Fig. 6i).

TAD-level investigations demonstrated significant boundary enrichment for both ccRCC-specific and cell-type SVs (ccRCC-specific: Z = 1.684, P = 0.043; 786-O: Z = 3.355, P = 0.001; Fig. 3a). Moreover, boundary regions exhibited elevated densities for most SV types except 786-O duplication and OS-RC-2 insertion (Fig. 2b, Supplementary Table 18), consistent with prior researches15,19,33. Comparative analysis revealed ccRCC-associated TAD alterations, with 786-O duplication and OS-RC-2 deletion/insertion preferentially localizing at lost TAD boundaries (Supplementary Fig. 6c). Tumor-specific SVs showed cell type-dependent associations with TAD remodeling, particularly OS-RC-2 insertion enrichment at disrupted boundaries (Supplementary Fig. 6d, Supplementary Table 23).

Classification of ccRCC-specific SVs by TAD positioning identified four distinct categories (Fig. 3e). The majority localized within TAD interiors (Inner TAD), suggesting minimal chromatin folding impact (Fig. 3f, Supplementary Fig. 6j). SVs with the potential to disrupt chromatin folding (i.e., those crossing TAD boundaries or partially overlapping with them) exhibited distinct SV type compositions, with deletions predominating in boundary-spanning events (Fig. 3g, Supplementary Fig. 6k).

These results collectively demonstrate that structural variant distributions in renal cell carcinoma exhibit non-random associations with three-dimensional chromatin architecture. Our analyses revealed preferential SV accumulation in transcriptionally active A compartments and their transition zones, accompanied by cell type-specific preferences in TAD boundary associations. Furthermore, the disruptive potential of SVs on topological organization appears contingent on variant type, with deletions predominantly affecting boundary-spanning regions. The observed spatial coordination between somatic structural variation and chromatin architecture suggests a sophisticated interplay between genomic instability and 3D genome organization during renal carcinogenesis.

Interplay of ccRCC specific SVs and TADs

Spatial genome organization is a critical determinant of chromosomal rearrangements and oncogenesis34. Previous studies have demonstrated that SVs can induce alterations in chromatin architecture, particularly through disruption of TAD boundaries. Such perturbations may lead to fusion of adjacent TADs or aberrant spreading of active chromatin, thereby establishing oncogenic enhancer-promoter interactions12,18,25,35. To investigate the impact of SVs on TAD disruption in renal cell carcinoma, we analyzed the correlation between distinct classes of TAD deletions and TAD fusion events. Our results revealed that deletions spanning TAD boundary regions exhibited significantly higher TAD fusion scores compared to other deletion types (Fig. 4a, Supplementary Table 24). These findings align with prior studies indicating that boundary-spanning deletions are strongly associated with TAD fusion36,37,38. Notably, SVs localized within inner TAD regions—even those of substantial size—showed minimal disruption to adjacent chromatin interactions, with neighboring TADs remaining largely unaffected (Fig. 4c, Supplementary Fig. 7d). Conversely, SVs traversing TAD boundaries induced a pronounced increase in chromatin interactions between adjacent TADs proximal to the deletion sites (Fig. 4d).

Fig. 4: Oncogenic structural variations drive transcriptional reprogramming through TAD remodeling in ccRCC.
figure 4

a Comparative analysis of TAD fusion scores across distinct deletion classes. Boxplots depict distributions stratified by TAD deletion classes (boundary-spanning vs. internal deletions), with box ranges representing interquartile distances (IQR), central lines indicating medians, and whiskers extending to 1.5×IQR or extreme values within range. Statistical significance was determined by Wilcoxon rank-sum test. b Differential gene expression burden in SV hotspots. Stacked bars quantify proportions of differentially expressed genes (DEGs) in genomic regions stratified by TAD fusion score percentiles (top/bottom 50th percentile) versus genome-wide baselines across 786-O and OS-RC-2 cell lines. c Examples of the impact of inner TAD deletion frequency on the chromatin folding domain in OS-RC-2. Triangle heatmaps represent chromatin contact frequency, with the top showing OS-RC-2, middle showing HEK293T, and bottom showing the subtractive results. Histogram representing roadmap epigenome enhancer activity, marked by H3K27ac, in OS-RC-2 (red). d Examples of the impact of Cross TAD boundary deletion frequency on the chromatin folding domain in ccRCC. Triangle heatmaps represent chromatin contact frequency for 786-O (top), OS-RC-2 (second), HEK293T controls (third), with differential contact maps (fourth: 786-O vs control; fifth: OS-RC-2 vs control). Histograms show roadmap epigenome enhancer activity marked by H3K27ac in 786-O and OS-RC-2 (red). e Box plot of gene expression levels in regions with TAD fusion scores in the top 50%, bottom 50%, and genome-wide levels. The box spans the interquartile range (IQR), with the center line indicating the median and whiskers extending to 1.5×IQR (or the maximum/minimum values if within range). P values were calculated using the Wilcoxon rank-sum test.

We further explored the relationship between TAD fusion and transcriptional dysregulation in two cancer cell lines. Differential gene expression analysis demonstrated that genomic regions harboring SVs with top 50% TAD fusion scores contained a significantly higher proportion of dysregulated genes compared to regions with lower fusion scores or genome-wide baselines (Fig. 4b, Supplementary Table 25). Strikingly, genes within high TAD fusion score regions exhibited elevated expression levels relative to the global genomic average (Fig. 4e).

These observations suggest that TADs maintain genome structural integrity, while their boundaries serve as vulnerable hubs susceptible to structural variation-driven remodeling of oncogenic circuits.

Remodeling of enhancer and focal chromatin interaction in ccRCC oncogenesis

Having examined large-scale 3D genome architecture, we next focused on the enhancer landscape and focal chromatin interactions to elucidate the fine-scale regulatory logic underlying ccRCC pathogenesis. Using H3K27ac ChIP-seq data, we identified 27,340 ccRCC-specific and 9752 normal-specific enhancer peaks across 786-O, OS-RC-2, and HEK293T cell lines (Supplementary Table 26). Comparative chromatin state analysis confirmed distinct enhancer activities: ccRCC-specific enhancers were active in tumor cell lines (786-O, OS-RC-2) but not in HEK293T cells, while normal-specific enhancers showed the opposite pattern (Fig. 5a). IGV plot of representative genomic loci further confirmed the robustness of our differential enhancer identification (Fig. 5b, Supplementary Fig. 9a). Functional annotation of tumor-specific enhancers through GO dataset based GREAT analysis revealed significant enrichment for pathways central to ccRCC biology, including angiogenesis39 and hypoxia response40 (Fig. 5d, Supplementary Table 28). In contrast, normal-specific enhancers were associated with urinary system developmental processes such as mesonephric tubule formation and ureteric bud morphogenesis (Fig. 5e). Mouse phenotype enrichment analysis demonstrated tumor-specific enhancer associations with renal morphological abnormalities (Supplementary Fig. 9c), reinforcing the biological relevance of our findings. Motif analysis identified Jun-AP1, ZNF416, NEUROD1, ZNF669, ZEB1, and MGA as key transcription factors potentially orchestrating enhancer reprogramming in ccRCC (Supplementary Fig. 9b).

Fig. 5: Enhancer and chromatin interaction remodeling drives transcriptomic alterations during oncogenesis.
figure 5

a Heatmap of H3K27ac ChIP-seq signals showing the intensity of ccRCC-specific and normal-specific enhancers across the genome. b IGV screenshot showing an example region around chr1:16,950,000-17,330,000 corresponding to the PADI1 gene, highlighting ccRCC-specific enhancers. c Aggregated peak analysis for ccRCC-common chromatin interactions in ccRCC (786-O and OS-RC-2) and normal HEK293T cell lines (n = 233). d GREAT pathway enrichment analysis based on the GO dataset for ccRCC-specific enhancers. e GREAT pathway enrichment analysis based on the GO dataset for normal-specific enhancers. f ccRCC-common chromatin interactions in the MAP4K4 genomic region. Red circles on the chromatin interaction heatmap and black arcs below highlight the positions of ccRCC-specific interactions. The top two panels of the chromatin interaction heatmap are from 786-O and OS-RC-2 tumor cells, and the bottom panel shows HEK293T normal cells. The tracks below show H3K27Ac ChIP-seq profiles from renal cancer cell lines (red) and normal cell lines (blue). g Histograms showing the proportion of H3K27ac peaks in different types of loops in the genomes of 786-O and OS-RC-2. P values were calculated using R’s proportionality test. Box plots illustrating the expression levels of genes overlapping with double anchors of ccRCC-common loops, conserved loops, and all identified loops in 786-O (h) and OS-RC-2 (i). The box denotes the interquartile range (IQR), with the center line representing the median and whiskers extending to 1.5×IQR (or the maximum/minimum values if within range). P values were determined using the Wilcoxon rank-sum test.

We used Mustache at 5 kb resolution to identify chromatin interactions genome-wide, detecting 3272 loops in 786-O, 3,396 in OS-RC-2, and 2,225 in HEK293T cells (Supplementary Table 29). Differential analysis revealed 223 ccRCC common, 55 conserved, and 377 normal-specific loops (Supplementary Table 30). Aggregate peak analysis demonstrated pronounced clustering of ccRCC common interaction signals in tumor samples compared to normal counterparts (Fig. 5c). Notably, ccRCC-common loops exhibited significantly higher active enhancer signals compared to conserved loops and random control regions (Fig. 5g, Supplementary Table 32). Chromatin interaction heatmaps displayed long-range ccRCC common interactions, such as those involving MAP4K4, RFX8, and CREG2 (Fig. 5f), with tumor common enhancer signals observed in these regions. To gain deeper insights into the impact of these remodeled interactions, we integrated chromatin interaction data with transcriptomic profiles. Our analysis revealed that the anchors of ccRCC common loops were linked to markedly higher gene expression levels in comparison to all identified loops or conserved loops (Fig. 5h, i, Supplementary Table 31).

In summary, our results demonstrate that ccRCC pathogenesis involves extensive enhancer reprogramming and a remodeling of focal chromatin interactions. The enrichment of active enhancers within ccRCC-specific loops and their strong association with elevated gene expression underscore the role of fine-scale 3D genome restructuring in driving transcriptional dysregulation during renal carcinogenesis.

Identification of enhancer hijacking events driving oncogenic dysregulation in ccRCC

Recent studies have revealed that SVs can aberrantly activate oncogenes through enhancer hijacking, a mechanism whereby SVs reposition enhancer elements into proximity with oncogenic promoters41,42. To systematically identify such events in ccRCC, we applied Neoloopfinder. Based on comprehensive consensus SV data from TGS-WGS and Hi-C - corrected genomic CNVs, we identified ectopic chromatin interactions near SV breakpoints that activate oncogenes, termed neoloops (Fig. 6a, b, Supplementary Table 33). Aggregate peak analysis revealed significant ccRCC-specific enrichment of neoloop signals compared to normal tissue controls (Fig. 6c). Transcriptomic profiling demonstrated elevated expression of genes localized to neoloop anchor regions in tumor cell lines (Fig. 6d, Supplementary Table 34), supporting the functional relevance of these aberrant chromatin interactions in oncogene activation. Notably, systematic annotation identified multiple cancer drivers within these neoloop domains, including established oncogenes THRAP3 and DIRC1 (Supplementary Fig. 10a, b). Among the identified neoloops, we characterized a complex SV involving a coordinated chromosomal inversion and translocation that repositioned a distal enhancer cluster (chr3:128,620,000-128,930,000) to a proximal position relative to the SEMA5B promoter (chr3:122,450,000-123,000,000) (Fig. 6e, g). Hi-C contact maps confirmed complete absence of promoter-enhancer interactions at this locus in normal cells, while tumor samples exhibited robust chromatin connectivity coinciding with SEMA5B transcriptional upregulation (Fig. 6 e, f, g).

Fig. 6: Identification and validation of enhancer hijacking events driving oncogenic dysregulation in ccRCC.
figure 6

Circos plots illustrating enhancer hijacking and ectopic chromatin interactions mediated by SVs detected by TGS-WGS in OS-RC-2 (a) and 786-O (b) cancer cells. The purple curve represents neoloop spanning different chromosomes, while the orange curve represents neoloop within the same chromosome. c Aggregated peak analysis of ectopic chromatin interactions (neoloops) in ccRCC (786-O and OS-RC-2) and normal HEK293T cells. d Box plot of gene expression levels for genes at neoloop anchors in tumor and normal cells. The box spans the interquartile range (IQR), with the center line indicating the median and whiskers extending to 1.5×IQR. P values were calculated using the Wilcoxon rank-sum test. e Top panel: The reconstructed Hi-C map and genomic ChIP-seq, RNA-seq tracks for the enhancer hijacking event involving SEMA5B dysregulation in OS-RC-2 cells, showing intrachromosomal inversion on chr3 and interchromosomal translocation between chr2 and chr3. Blue circles indicate neoloops’ genomic position and interaction intensity. Bottom panel: The same regions in normal kidney cells. f Gene ranking dotplot showing DEGs between OS-RC-2 and HEK293T cells. Orange circles highlight SEMA5B dysregulation from enhancer hijacking. g KR balanced Hi-C map showing translocation-related chromatin interactions in renal cancer at chr2: 0-188,120,000 and chr3: 120,000,000–198,000,000. h DNA-FISH confirms co-localization of chr3:188,120,000 and chr2:122,450,000 DNA fragments in OS-RC-2 cells, absent in HEK293T cells. i Validation of breakpoint junction 1 of inversion event involving enhancer hijacking. Top panel: Ribbon plot of nanopore long-read WGS reads capturing breakpoint junction 1 on chr3. Middle panel: PCR validation of breakpoint junction 1, amplifying a 761 bp inversion sequence only in renal cancer cells. Bottom panel: Sanger sequencing validation of the 761 bp PCR product. j Schematic of PCR validation strategy for complex SV breakpoint junctions. k Validation of breakpoint junction 2 of translocation event involving enhancer hijacking. Top panel: Ribbon plot of nanopore long-read WGS reads capturing breakpoint junction 2 between chr2 and chr3. Middle panel: PCR validation of breakpoint junction 2, amplifying a 964 bp translocation sequence only in renal cancer cells. Bottom panel: Sanger sequencing validation of the 964 bp PCR product.

To validate the physical existence of these SVs, we performed experimental verification. Dual-color DNA-FISH using breakpoint-flanking probes demonstrated tumor-specific co-localization of chr2 and chr3 fragments (Fig. 6h, Supplementary Fig.10c, d), confirming the somatic origin of the translocation. Breakpoint junction PCR followed by Sanger sequencing (primer locations in Fig. 6j) successfully amplified and validated heterozygous inversion/translocation events in tumor cell lines (Fig. 6i, k). These findings were further corroborated by single-molecule long-read sequencing data showing continuous DNA reads spanning both rearrangement junctions exclusively in tumor samples (Fig. 6i, k). No evidence of these structural rearrangements was detected in matched normal cell lines across all experimental modalities.

In summary, our results demonstrate that SVs in ccRCC can disrupt local chromatin architecture to mediate long-range enhancer hijacking, thereby creating oncogenic enhancer- promoter loops. This 3D genome reorganization facilitates tumor-specific transcriptional activation by aberrantly engaging enhancers with new target genes. Furthermore, our multi-platform validation framework provides conclusive evidence for the existence of the complex SVs underlying these pathogenic rewiring events.

Enhancer hijacking-mediated SEMA5B upregulation promotes tumorigenesis in ccRCC

Although semaphorins are emerging as clinical biomarkers and therapeutic targets in cancer43, and SEMA5B has been implicated in various malignancies (47, 48) including as a potential target in renal cancer44, its specific oncogenic role in ccRCC remains poorly characterized. To address this gap, we first evaluated the expression of SEMA5B across pan-cancer datasets using The Cancer Genome Atlas (TCGA). Results revealed that SEMA5B exhibits high expression predominantly in brain and renal tissues, with significantly elevated levels in renal cancer compared to normal renal tissue (Fig. 7a, b), suggesting ccRCC-specific overexpression of SEMA5B. Further analysis demonstrated low transcriptomic and protein expression of SEMA5B in normal cell lines such as HK-2 and HEK293T, whereas OS-RC-2 renal cancer cells exhibited markedly higher expression (Fig. 7e, f). These findings were validated in clinical samples using RNA-seq data and RT-qPCR, which confirmed tumor-specific overexpression of SEMA5B at the transcriptomic level (Fig. 7c, d). Immunohistochemistry (IHC) further corroborated tumor-specific upregulation of SEMA5B at the protein level (Fig. 7g, i).

Fig. 7: Enhancer hijacking mediates SEMA5B overexpression in ccRCC and promotes renal carcinogenesis.
figure 7

a Bodymap analysis from GEPIA revealed significantly elevated SEMA5B expression in ccRCC compared to normal tissues. b Pan-cancer analysis of SEMA5B expression across 33 tumor types in the GEPIA database demonstrated markedly higher expression in ccRCC. c Box plot and scatter plot of RNA-seq data showing SEMA5B expression levels in paired tumor and normal tissues (N = 13). The box represents the interquartile range (IQR), with the median indicated by the center line and whiskers extending to 1.5×IQR. P values were calculated using the Wilcoxon rank-sum test. Each scatter point represents an individual sample, and lines connecting tumor and normal samples indicate paired renal tissues from the same patient. d RT-qPCR quantification of SEMA5B expression in paired tumor and normal tissues (N = 13). P values were derived from unpaired two-sided Student’s t test. ****p ≤ 0.0001. e RT-qPCR analysis of SEMA5B expression in HK2, HEK293T, 786-O, and OS-RC-2 cell lines. N = 3 biological replicates. P values were derived from unpaired two-sided Student’s t test. ****p ≤ 0.0001. f Western blot validation of SEMA5B protein levels in HK2, HEK293T, 786-O, and OS-RC-2 cell lines. g Representative immunohistochemical (IHC) staining images of SEMA5B in paired tumor and normal tissues. Scale bars: 50 μm and 20 μm. h Schematic illustration of SEMA5B neoloops and CRISPRi design targeting enhancers e4–e6 or the promoter region in OS-RC-2 cells. i Quantitative IHC score of SEMA5B expression in paired tumor and normal tissues (N = 13). P values were derived from unpaired two-sided Student’s t-test. ****p ≤ 0.0001. j, m Short-term cell proliferation assays demonstrating that SEMA5B promotes ccRCC cell proliferation. ****p ≤ 0.0001. siNC indicates a negative control transfected with a non-specific sequence, and sgEV represents non-targeting sgRNAs with no genome recognition sites. k Western blot analysis of SEMA5B expression following CRISPRi targeting of enhancers e4 and e6 or the promoter region of SEMA5B, or siRNA-mediated SEMA5B knockdown in ccRCC cells. l RT-qPCR quantification of SEMA5B expression after CRISPRi targeting of enhancers e4 and e6 or the promoter region in OS-RC-2 cells. N = 3 biological replicates. P values were derived from one-way ANOVA. ****p ≤ 0.0001, ***p ≤ 0.001, **p ≤ 0.01. n, r Bioluminescence measurement on the indicated days of mice with different treatment OS-RC-2 cell engraftment. N = 5 biological replicates. P values were derived from one-way ANOVA. o, s Transwell migration assays demonstrated that knockdown of SEMA5B significantly reduced cell invasion compared to normal control cells. P-values were derived from one-way ANOVA. ****p ≤ 0.0001, **p ≤ 0.01. q, t Transwell migration assays demonstrated that CRISPRi of SEMA5B significantly reduced cell invasion compared to empty vector control. N = 3 biological replicates. P values were derived from one-way ANOVA. ****p ≤ 0.0001, **p ≤ 0.01. p, u Long-term colony formation assays showing that SEMA5B enhances ccRCC cell proliferation. N = 3 biological replicates. P values were derived from one-way ANOVA. ****p ≤ 0.0001, **p ≤ 0.01.

To investigate the regulatory mechanisms underlying SEMA5B overexpression, we identified enhancers (e4–e6) involved in ectopic interactions based on the coordinates of the neo-loop structures. We designed sgRNAs targeting these enhancers and the promoter region of SEMA5B for CRISPR interference (CRISPRi) experiments (Fig. 7h, Supplementary Table 36) and synthesized siRNAs targeting SEMA5B exons. The efficiency of knockdown (KD) and CRISPRi was validated via western blotting and RT-qPCR (Figs. 7k, l, Supplementary Fig. 10e). CRISPRi targeting the ectopic enhancers significantly reduced SEMA5B expression (Fig. 7k, l), validating the reliability of neo-loop identification and its transcriptional regulatory effects.

We further assessed the phenotypic consequences of disrupting the ectopic enhancers and promoter of SEMA5B. In vitro experiments revealed that ablation of these regulatory elements markedly attenuated cellular proliferation (Fig. 7j, p, u) and invasive capacity (Fig. 7q, t), findings that were consistent with KD results (Fig. 7m, o, s). In vivo experiments further confirmed that disruption of the ectopic enhancers and promoter of SEMA5B significantly slowed tumor proliferation (Fig. 7n, r, Supplementary Fig. 10g, h).

In summary, our integrated analysis—spanning clinical cohorts, cellular models, and functional genomics—establishes SEMA5B as a ccRCC-specific oncogene activated by enhancer hijacking. The consistent phenotypic evidence from in vitro and in vivo models underscores the critical role of this neo-loop-mediated transcriptional rewiring in ccRCC pathogenesis, nominating SEMA5B as a promising therapeutic target for renal malignancies.

A neoloop-based machine learning model for prognostic prediction in ccRCC

To evaluate the clinical potential of our findings, we developed a machine learning model based on ccRCC-specific neoloop-associated genes using the TCGA-KIRC cohort (Supplementary Fig. 11a, b). This model demonstrated robust predictive performance, achieving area under the curve (AUC) values of 0.747 in the training cohort, 0.740 in the test cohort, and 0.743 in the entire patient cohort. These results surpassed the predictive capability of the traditional WHO grading system (AUCs: 0.745, 0.703, and 0.720, respectively; Fig. 8a–c). Furthermore, the model exhibited stable and sustained predictive accuracy over time, maintaining consistent performance across 1- to 5-year follow-up periods (Fig. 8d–f). We also constructed a nomogram integrating multiple clinical parameters to predict overall survival (OS) in ccRCC patients (Fig. 8h). Calibration curve analysis confirmed the model’s robust predictive performance (C-index = 0.767; Fig. 8g).

Fig. 8: Construction of a ccRCC prognostic risk model based on ccRCC-specific enhancer-hijacking events using machine learning.
figure 8

Receiver Operating Characteristic (ROC) curves demonstrating the robust prognostic predictive performance of the risk model in the TCGA-KIRC cohort (a), training cohort (b), and testing cohort (c). Time-dependent Receiver Operating Characteristic (ROC) curves evaluating the risk model’s performance at 1-, 3-, and 5-year intervals in the TCGA-KIRC cohort (d), training cohort (e), and testing cohort (f). g Calibration curve analysis validating the stability and reliability of model predictions. h Nomogram for predicting 1-, 3-, and 5-year overall survival (OS) in ccRCC patients within the TCGA-KIRC cohort. Kaplan-Meier survival curves depicting significant divergence in overall survival (OS) (i) and progression-free survival (PFS) (j) between high- and low-risk groups in the TCGA-KIRC cohort. Kaplan-Meier survival analysis showing distinct overall survival (OS) (k) and progression-free survival (PFS) (l) outcomes for high- versus low-risk groups in the training cohort. Kaplan-Meier survival curves confirming differential overall survival (OS) (m) and progression-free survival (PFS) (n) between risk strata in the testing cohort.

Based on the model-derived risk scores, patients were stratified into distinct high-risk and low-risk groups, which exhibited significantly different survival outcomes (Supplementary Fig. 11c–S11e). Kaplan-Meier survival analysis for Overall Survival (OS) and Progression-Free Survival (PFS) revealed significant divergence between the survival curves of the high-risk and low-risk groups in both the training cohort and the full TCGA-KIRC cohort (p < 0.001; Fig. 8i–l), indicating the model’s efficacy in discriminating patients with differential risk levels. Importantly, this significant separation of survival curves was consistently replicated in the testing cohort (p < 0.001; Fig. 8m, n), confirming the model’s robustness and predictive accuracy across independent patient subsets.

Notably, the model-derived risk scores showed significant positive correlations with established indicators of disease severity. Patients with more advanced disease—including those of older age, higher pathological TNM stage, elevated clinical stage, and advanced WHO grade—consistently exhibited increased risk scores (all p < 0.05; Supplementary Fig. 12ag). These associations further validate the clinical relevance and biological coherence of our neoloop-based prognostic model.

Discussion

RCC, particularly ccRCC, poses a significant clinical challenge due to its high metastatic potential and resistance to conventional therapies. Over the past decade, genomic studies have revealed recurrent somatic mutations in genes such as VHL, PBRM1, SETD245, and other epigenetic regulation related genes, underscoring the importance of genetic and epigenetic dysregulation in ccRCC pathogenesis. However, these efforts have primarily relied on next-generation sequencing, which focuses on SNVs and InDels, leaving the contribution of SVs—large-scale genomic alterations including deletions, duplications, inversions, and translocations—poorly understood.

Recent advances in long-read sequencing and Hi-C have significantly deepened our understanding of the critical roles of SVs and 3D genome architecture in tumorigenesis and cancer progression. However, in the context of ccRCC, the dynamic interplay between SVs, 3D chromatin organization, and their mechanistic contributions to malignant transformation remains poorly understood. In this study, we systematically integrated SV landscapes, 3D genome organization, epigenomic profiles, and transcriptomic data in ccRCC, to provide a comprehensive view of multidimensional alterations in chromosomal architecture and to elucidate how SVs interact with higher-order chromatin structure to drive oncogenic dysregulation. These findings expand our understanding of the molecular networks underlying ccRCC pathogenesis and highlight novel regulatory mechanisms that may inform therapeutic strategies.

To establish a detailed SV profile in ccRCC, we leveraged Nanopore long-read sequencing to generate a tumor-specific SV atlas, identifying over 18,000 high-confidence SVs. Our analysis offered substantially improved resolution over traditional next-generation sequencing, particularly in complex genomic regions such as repetitive sequences and structural rearrangements. Moreover, long-read sequencing identified a large-scale insertion in SETD2 and duplication and deletion events affecting HIF1A. These mutations in key renal cancer driver genes were undetectable in previous research using next-generation sequencing. Previous studies have established that the histone methyltransferase SETD2 maintains H3K36me3 to suppress chromatin accessibility. However, loss of SETD2 function leads to a more open chromatin state and subsequent enhancer activation46. A pertinent question for future research is whether this mutational background increases the proportion of active enhancers, thereby providing more opportunities for enhancer hijacking events. Furthermore, while the conventional view holds that VHL mutation upregulates HIF1A expression, thereby driving tumor-associated transcriptional changes27, our study reveals that HIF1A itself can be subject to duplication variants. It remains to be explored whether such structural variations alone can lead to HIF1A transcriptional accumulation, and whether, in the context of concurrent VHL mutation—which impairs HIF degradation—this would result in a more pronounced oncogenic function of HIF-1α. The frequency and impact of these previously overlooked structural variants warrant further validation in large-scale cohorts utilizing long-read sequencing data.

Notably, immortalized HEK293T cells exhibited an even higher SV burden (21,571 SVs), a phenomenon potentially linked to genomic instability induced by SV40 T antigen during the immortalization process. Furthermore, SVs showed a non-random genomic distribution, with the majority localized to intronic and intergenic regions rather than exonic sequences. This distribution pattern suggests that SVs may primarily contribute to tumor evolution by altering regulatory landscapes while maintaining coding sequence integrity—a balance that preserves genomic stability while enabling phenotypic diversity. At the 3D genome level, SVs were significantly enriched in transcriptionally active A compartments and at TADs boundaries, implying that chromatin architecture directly influences SV formation. Mechanistically, SV generation in these regions may arise from two major pathways: replication-based mechanisms, such as microhomology-mediated break-induced replication (MMBIR), and fusion-based mechanisms involving erroneous repair of double-strand breaks (DSBs)35. The increased vulnerability of A compartments to DSBs is primarily attributed to their open chromatin configuration, active transcriptional processes, and the abundance of regulatory elements47,48. These features collectively create a permissive environment for SV formation. For instance, the TOP2B enzyme, which relaxes DNA supercoils, can introduce DSBs in these active regions, thereby increasing SV risk35.

The pathogenic mechanisms of SVs can be categorized into position effects and dosage effects. SVs that disrupt or encompass coding sequences are typically interpreted based on their impact on gene dosage49. However, it is noteworthy that 98% of the human genome is non-coding50, and a significant portion of this non-coding genome is involved in gene regulation51. In reality, SVs can cause diseases without altering the genomic sequence52. They achieve this by affecting the position and/or function of cis-regulatory elements such as promoters and enhancers12, thereby influencing the expression of genes located far from the SV breakpoints and contributing to carcinogenesis and disease progression.

In this study, we first investigated SVs that directly impact the exons of oncogenes to explore the oncogenic dosage effects of SVs. Our findings revealed deletions and duplications in crucial oncogenes such as HIF1A, NFIB, and CNTN3 that had previously gone unnoticed. Subsequently, we shifted our focus to the impact of local SVs on chromatin folding. We discovered that local SVs can induce the fusion of adjacent chromatin domains, leading to alterations in gene expression within neighboring TADs. Mechanistically, SVs spanning TAD boundaries remove the insulation effect of the original CCCTC-binding factor (CTCF)-associated boundary elements, triggering enhancer translocation and potentially disrupting enhancer-promoter communication, resulting in abnormal gene expression. Moreover, the TAD structural perturbations caused by TAD boundary-spanning deletions appear to be confined within adjacent TADs. This suggests that TADs tend to limit the effects of SVs to the maximum extent, thereby preserving the overall stability of the three-dimensional chromatin architecture. This observation is consistent with the evolutionary conservation of TADs, which may be a result of adaptive selection under natural evolutionary pressures.

To further investigate the oncogenic mechanisms of SV position effects in ccRCC, we focused on copy-number balanced distal SVs (including inversions and translocations) that are distant in the genome and disrupt chromatin loops. We first identified tumor-specific enhancer libraries closely associated with the biological functions and phenotypes of ccRCC. Then, by integrating genomic alterations, 3D genomic interactions, tumor-specific enhancers, and transcriptomic data, we identified enhancer hijacking as a novel oncogenic mechanism in ccRCC, exemplified by the activation of oncogenes such as SEMA5B, DIRC1, and THRAP3. We used long-read sequencing, DNA-FISH, PCR, and Sanger sequencing to validate the complex genomic breakpoints involved in enhancer hijacking. Hi-C analysis revealed ccRCC-specific translocation interaction signals, confirming the authenticity of the SEMA5B enhancer hijacking event. Further analysis of clinical samples and the TCGA database showed that SEMA5B is significantly overexpressed in ccRCC. We used CRISPRi to knock out the translocated enhancer clusters and promoter, and validated the oncogenic effects of SEMA5B both in vitro and in vivo. SEMA5B influences tumor progression through various mechanisms, including regulating tumor angiogenesis53 and inducing tyrosine kinase phosphorylation to promote tumor metastasis54. Recent studies indicate that the interaction between SEMA and its plexin receptor is targetable55. Given the overexpression of SEMA5B in ccRCC and our phenotypic experimental results, there is an opportunity for intervention. Notably, SEMA5B has also been identified as an optimal therapeutic target for antibody drug conjugates (ADCs) or small molecule drug conjugates (SMDCs) in renal cancer56. Additionally, small-molecule inhibitors targeting tyrosine kinases have been approved for clinical treatment of ccRCC. Therefore, the downstream events of SEMA5B oncogene activation via the enhancer hijacking mechanism are worthy of further study and may provide new targets for clinical treatment.

Our study provides novel insights into the pathophysiological mechanisms of renal cell carcinoma, offering a foundation for advancing precision risk prevention strategies. Leveraging the comprehensive landscape of structural variation (SV)-mediated enhancer hijacking events, we constructed a risk assessment model using machine learning. This model demonstrated sensitive prognostic risk prediction and robust performance, effectively stratifying patients into distinct prognostic groups and establishing a new pathway for translating novel mechanistic discoveries into clinical applications.

In conclusion, our multi-omics approach reveals how SVs reshape 3D chromatin organization to activate oncogenic pathways in ccRCC, bridging the gap between structural genomic alterations and transcriptional dysregulation. While this study provides critical insights, limitations such as sample size and the absence of longitudinal data warrant further investigation. Future work should explore the temporal dynamics of SV accumulation during tumor evolution and assess the therapeutic efficacy of targeting enhancer hijacking events. By elucidating the interplay between SVs and 3D genome organization, this research not only advances our understanding of ccRCC biology but also opens new avenues for precision oncology. The resulting risk prediction model further translates these mechanistic insights into a clinically applicable tool for improved patient stratification.

Materials and methods

Cell culture

Human renal cell carcinoma cell lines 786-O and OS-RC-2 and human normal kidney cell lines HK-2 and HEK293T were obtained from the American Type Culture Collection (ATCC) (https://www.atcc.org/). All cell lines were cultured under recommended conditions, and 786-O, OS-RC-2, HK-2, and HEK293T was authenticated by high-resolution small tandem repeat profiling and next-generation whole genome sequencing. The cell lines used in this study were identified by the following Research Resource Identifiers (RRIDs): 786-O (CVCL_1051), OS-RC-2 (CVCL_1626), and HEK293T (CVCL_0063). WGS data analysis revealed that the average alignment rate of the cell line reads to the human reference genome (hg38) exceeded 90% (Supplementary Table 2), confirming the absence of contamination in all cell lines.

Ultra-long genomic DNA extraction

Ultra-long genomic DNA was extracted from 5 million fresh ccRCC and normal kidney cells. The cell pellet was resuspended in 500 μL of pre-cooled NSB and incubated on ice for 3 min. After centrifugation at 500 g for 5 min at 4 °C, the pellet was washed with 1000 μL of wash buffer and centrifuged again under the same conditions. The pellet was then lysed in 5 mL of SDS extraction buffer (#3250GR500, BioFroxx) with 20 μL of RNase A (100 mg/mL) (#9001-99-4, Solarbio) at 50 °C for 1 h, with gentle inversion every 20 min. The lysate was cooled, centrifuged at 5000 × g for 10 min at 4 °C, and the supernatant was extracted with phenol/chloroform/isoamyl alcohol (25:24:1), followed by chloroform/isoamyl alcohol (24:1). DNA was precipitated with 0.8 volumes of chilled isopropanol, pelleted by centrifugation, washed twice with 75% ethanol, and air-dried. The DNA was resuspended in 200 μL of Elution Buffer (#19086, Qiagen) and incubated overnight at 4 °C. DNA quality and concentration were measured using a NanoDrop spectrophotometer (NanoDrop Technologies, Wilmington, DE) and Qubit 3.0 Fluorometer (Life Technologies, Carlsbad, CA, USA). Fragment integrity and size distribution were preliminarily assessed by agarose gel electrophoresis prior to library construction.

Nanopore long-read whole genome sequencing

High-molecular-weight (HMW) genomic DNA (gDNA) was isolated from renal cancer cell lines 786-O and OS-RC-2, with detailed extraction procedures provided in the Supplementary information. The purified HMW gDNA was subsequently prepared for nanopore sequencing, comprising end-repair, adapter ligation with the SQK - LSK114 ligation sequencing kit (Oxford Nanopore Technologies, Oxford, UK), and purification via AMPure XP beads (#A63881, Beckman Coulter, Brea, USA). Sequencing was conducted on the PromethION platform (Oxford Nanopore Technologies, Oxford, UK) using R10.4.1 flow cells in accordance with the manufacturer’s standardized protocols. Base calling and FASTQ conversion were performed with MinKNOW (v23.07.12) (Oxford Nanopore Technologies, Oxford, UK) and Dorado (v0.9.0) (Oxford Nanopore Technologies, Oxford, UK). NanoPlot (v1.40.0) was employed for the quality control analysis of long-read sequencing data57.

Next-generation whole genome sequencing

Total genomic DNA from 786-O and OSRC-2 was extracted using TIANamp Genomic DNA kit (#DP304, TIANGEN, Beijing, China). RNase A (#19101, Qiagen, Hilden, Germany) was used to obtain RNA-free genomic DNA. Then gDNA was sheared using Covaris instruments to retrieve DNA fragment sizes from 400 to 500 bp. The DNA fragment sizes were determined using a Qsep100™ fully automated nucleic acid and protein analysis system (BiOptic Inc., New Taipei City, China). Then the DNA fragment was used for library construction with both NovaSeq X Plus, Novaseq 6000, and BGI DNBSEQ-T7. The library preparation was conducted according to the manufacturer’s instructions (#ND607-01 and #NDM627-01, Vazyme, Nanjing, China).

Identification of structural variants

For long-read whole genome sequencing (WGS) data, Minimap2 (v2.24)58 was applied to perform genomic mapping with default parameters. Sniffles (v2.0.7)26, NanoSV (v1.2.4)59, and cuteSV (2.0.0)60 were used for long-read structural variants (SV) identification with default parameters. For next-generation WGS data, BWA (v0.7.17)61 was applied to perform genomic mapping with default parameters. Delly (v1.1.8)62, Lumpy (v0.3.1)63, and Manta (v1.6.0)64 were used for short-read SV identification with default parameters. Homo sapiens genome assembly GRCh38 (hg38) was used as reference genome. Consensus SVs were identified using SURVIVOR (v1.0.7)65 when at least two SV callers detected them as the same type, in the same genomic orientation, with breakpoints within 1000 bp of each other for variants larger than 50 bp. ccRCC-specific SV was also identified by SURVIVOR with similar parameters. The SV classification was comprehensively defined with each SV caller. The SVs retrieved were annotated with AnnotSV (v3.4.4)66. Complex SVs were characterized as local assemblies composed of multiple SV junctions originating from distinct genomic locations. In contrast, simple SVs were defined as those containing only a single junction event. Cancer gene intersects were identified using BEDTools (v2.30.0)67 to analyze the intersection between consensus-variants and the GENCODE hg38 annotation, followed by filtering against the COSMIC Cancer Gene Census (v98)68 and the Bushman Lab’s cancer-related gene list (http://www.bushmanlab.org/assets/doc/allOnco_May2018.tsv). The Circos plots depicted in Fig. 1 and Fig. 6 were generated using Circa (http://omgenomics.com/circa) with BED files encompassing diverse SVs. SV was visualized using IGV (v2.14.1)69 and Ribbon (v1)70.

Hi-C library construction and sequencing

Ten million crosslinked cells were lysed in 100 μL of 0.55% SDS at room temperature, 62 °C and 37 °C sequentially for 10 min each, followed by the addition of 50 μL 10% Triton X-100 to quench the SDS and incubation at 37 °C for 30 min. To fragmentize the chromatin, nuclei were digested with DpnII (#R0543, NEB) at 37 °C overnight. The digested chromatin was filled-in with dNTP (#N0447, NEB) and DNA Polymerase I, Large (Klenow) Fragment (#M0209, NEB) at 37 °C for 1.5 h, and then subjected to dA-tailing in a solution containing 1× CutSmart buffer, BSA (#B9000S, NEB), dATP (#N0440, NEB), and Klenow Fragment (3´ → 5´ exo-) (#M0212, NEB) at room temperature for 1 h. Proximity ligation was performed by adding biotinylated bridge linker, T4 DNA ligase (#M0202, NEB) and incubating at 16 °C overnight.

The ligated DNA was then decrosslinked with Proteinase K and and purified using a ChIP DNA Clean & Concentrator kit (#D5205, Zymo Research). The purified DNA was sheared by sonication to 400 bp using covaris machine. Biotinylated fragments were captured with M-280 streptavidin Dynabeads (#11206D, Thermo Fisher Scientific). The end repair, A-tailing, and adaptor ligation were performed on-beads using a VAHTS Universal DNA Library Prep Kit for Illumina V3 (#ND607, Vazyme). The final library was amplified by PCR, purified with Ampure beads (#A63881, Beckman Coulter), and quality was controlled using Qubit high-sensitive DNA (Thermo Fisher Scientific) and Qsep100™ (BiOptic Inc.) analysis. Hi-C library sequencing was performed using the Illumina HiSeq X Ten platform with 150 bp paired-end reads.

Hi-C contact matrix building and data normalization

For Hi-C data, read quality was evaluated using FastQC (v0.12.1) (https://github.com/s-andrews/FastQC) and processed with Trim Galore (v0.6.7) (https://github.com/FelixKrueger/TrimGalore) for quality control. Trimmed reads were aligned against the human reference genome (GRCh38/hg38) utilizing Bowtie2 (version 2.5.1)71 with default mapping parameters. Subsequent data processing implemented the Hi-C Pro software (v2.10.0)72 to systematically eliminate non-informative read pairs through multistage filtering. This rigorous quality control process excluded unmapped reads, multimapped reads, and invalid paired-end reads including dangling-end pairs, re-ligation pairs, self-cycle pairs, single-end pairs, and dumped pairs, retaining only uniquely mapped valid interactions for downstream analysis. The interaction matrices at various resolutions (bin sizes of 1 Mb, 500 kb, 100 kb, 40 kb, 20 kb, and 5 kb at the genome-wide level) were constructed using HiC-Pro software (version 3.1.0) with default parameters72. To remove potential Hi-C interaction bias, an improved computational efficiency ICE (Iterative Correction) method was utilized72.

For bridge linker Hi-C data, raw sequencing reads underwent quality assessment using FastQC (v0.12.1) (https://github.com/s-andrews/FastQC) and processed with Trim Galore (v0.6.7) (https://github.com/FelixKrueger/TrimGalore) for quality control. Trimmed reads were aligned to the human reference genome (GRCh38/hg38) using the Burrows-Wheeler Aligner (BWA) (v0.7.17)73, employing a hybrid of BWA-ALN and BWA-MEM algorithms. Only uniquely mapped reads with a mapping quality score (MAPQ) ≥ 30 were retained. The pipeline categorizes read pairs based on bridge-linker sequences: (i) no linker (none-tags), (ii) a linker and one genomic tag (mono-tags), and (iii) a linker and paired-end tags (di-tags). For di-tags, each tag was extended by 500 bp in the 5′ direction to identify chromatin interaction anchors. Interaction matrices at multiple resolutions (1 Mb, 500 kb, 100 kb, 40 kb, 20 kb, and 5 kb) were constructed using processed PETs. Matrices were normalized using the ICE (Iterative Correction) method to correct for biases in Hi-C data72.

Identification of A/B compartments

A/B compartments were determined using Juicer tools74 with Hi-C interaction matrices normalized by the Knight-Ruiz (KR) algorithm. Eigenvectors were computed for each chromosome at 100-kb resolution through principal component analysis of Pearson’s correlation matrices derived from intrachromosomal interaction frequencies. Compartment assignments were established by correlating eigenvectors with genome-wide gene density profiles (wigCorrelate). Eigenvector signs were inverted for chromosomes exhibiting negative correlations to align transcriptional activity (A compartments) with gene-rich regions. Final compartment annotations were merged across all chromosomes and formatted into a genome-wide bedGraph file. A/B compartment switches were called when genomic coordinates showed discordant eigenvector signs (positive/negative) between tumor and normal samples after eigenvector sign normalization.

Identification of TAD boundaries and domains

TADs were identified using the hicFindTADs algorithm in HiCExplorer (v2.2.3)75 with normalized Hi-C interaction matrices at 10-kb resolution. Key parameters included a threshold comparison value (--thresholdComparisons) of 0.05 and a boundary detection sensitivity (--delta) of 0.01. TAD domains were defined as genomic regions flanked by two consecutive statistically validated boundaries. Differential TAD analysis was performed using HiCExplorer’s hicDifferentialTAD tool (v2.2.3)75 with normalized 10-kb-resolution matrices. Control sample TAD architectures served as reference frameworks to quantify structural divergence in experimental samples (e.g., ccRCC vs. normal). Interaction frequency differences within corresponding TAD regions were statistically evaluated using Wilcoxon rank-sum tests, followed by false discovery rate (FDR) correction. Genomic regions exhibiting significant divergence (FDR-adjusted P < 0.01) were classified as differential TADs, while regions below this threshold were categorized as conserved TADs.

Identification of chromatin loops

Chromatin loops were detected using Mustache (v1.2.0)76 with its default parameters, analyzing 5-kb-resolution Hi-C interaction matrices. Statistically significant loops were retained following filtration based on interaction frequency thresholds that surpassed genome-wide background levels (P < 0.05). Differential chromatin loops were further analyzed using the differential analysis module (diff_mustache.py) implemented in Mustache (v1.2.0)76, with statistical significance thresholds applied (P < 0.05).

Quantification of TAD structural disruption

To assess potential structural damage caused by deletion-induced TAD disruption, we employed the TAD fusion score77 with default parameters to quantitatively evaluate the degree of TAD boundary disintegration.

Bulk RNA-seq and data analysis

Total RNA was extracted using the RNAeasy Mini Kit (#74104; Qiagen; Hilden; Germany) following the manufacturer’s protocol. Poly(A)-enriched strand-specific libraries were constructed according to the manufacturer’s instructions (#NRM605 and #NR60, Vazyme, Nanjing, China) and sequenced on the Illumina NovaSeq 6000 or DNBSEQ-T7 platform. Three biological replicates per sample were generated, yielding 20 million reads per replicate. Read quality was evaluated using FastQC (v0.12.1) (https://github.com/s-andrews/FastQC) and processed with Trim Galore (v0.6.7) (https://github.com/FelixKrueger/TrimGalore) for quality control. Reads were aligned to the human reference genome (hg38) using STAR (v2.7.11)78, with gene expression quantification performed using HTSeq-count (v2.0.4)79. Downstream statistical analyses were conducted in RStudio, with differential gene expression analysis performed using DESeq2 (v1.38.3)80. Differentially expressed genes were identified based on a Benjamini-Hochberg adjusted q-value threshold of < 0.05 coupled with a minimum absolute log2-fold change of ≥ 1.

ChIP-seq data analysis

Similar quality control procedures were implemented using FastQC (v0.12.1) and Trim Galore (v0.6.7). Reads were aligned to the hg38 genome using Bowtie 2 (v2.5.1)71, with duplicate removal performed using SAMtools (v1.18)81. For ChIP-seq data, peak calling was conducted using MACS2 (v2.2.9.1)82 with specific parameters: -f BAMPE, -ghs, --nomodel, -B, and -q 1e-2. BigWig tracks were generated using deepTools bamCoverage (v3.5.1)83 with RPGC normalization. Differential analysis between cancer and normal samples was performed using BEDTools (v2.30.0)67. Data quality was assessed according to ENCODE standards for histone ChIP-seq. Track visualization of H3K27ac ChIP-seq profiles at specific regions was performed using IGV (v2.14.1)69, and heatmap of H3K27ac ChIP-seq signals across ccRCC-specific and normal-specific regions were generated by the computeMatrix and plotHeatmap algorithms from deeptools (v3.5.1)83.

Analysis of SV distribution across the 3D genome architecture

Dynamic chromosomal regions were defined by contrasting cancer cell lines with normal kidney cell lines. Specifically, comparing 786-O (or OS-RC-2) with HEK293T cell lines enables classification of compartment regions into stable A, stable B, A-B, and B-A compartments; and chromosomal regions into ccRCC-gained TAD boundaries, ccRCC-lost TAD boundaries, and stable TAD boundaries. ccRCC-specific SVs, determined by SURVIVOR, were those present in 786-O (or OS-RC-2) but absent in HEK293T. Dynamic chromosomal regions were identified using BEDTools. To investigate SV distribution, we compared SV densities in different chromosomal regions (A/B compartments, TAD domains, and TAD boundaries). SV density was calculated as the number of SVs divided by the length of each chromosomal region, while background density was the number of SVs divided by the whole-genome length. SV enrichment was assessed by comparing the regional SV proportion to the background using R’s prop.test function. Genomic region associations were assessed through permutation testing via regioneR (v1.38.0)84 with default parameters.

Identification of repetitive elements

RepeatMasker (version 3.7) was employed for the annotation of repetitive elements. Specifically, mutation sequence information corresponding to TGS and NGS - identified DELs and INSs was obtained from the vcf files and subsequently utilized as input for RepeatMasker (https://github.com/Dfam-consortium/RepeatMasker/tree/master).

Enrichment analysis

The list of genes located within ccRCC-common TADs was analyzed in Enrichr (v23.6.8)85 using the Jensen_DISEASES term (https://diseases.jensenlab.org/Search).The bed files for tumor-specific and normal-specific enhancers were uploaded to the GREAT website (v4.0.4) (http://great.stanford.edu/public/html/). GREAT analysis was performed with hg38 as the reference genome and the whole genome as the background region, and the results were visualized using the R package ggplot2 (v3.4.2) (https://github.com/tidyverse/ggplot2).

Neoloop analysis

In each sample, CNV profiles were estimated from Hi-C matrices using a generalized additive model. Large SV breakpoints were identified through nanopore long-read WGS analysis and Hi-C breakfinder25. These breakpoints were then used as inputs for NeoLoopFinder86, a computational framework employed to construct and identify local genome structures and neo-loops.

Cell transfection

The short interference RNAs that targeted SEMA5B (si SEMA5B) and corresponding siRNA negative controls (siNC) were purchased from Gene Pharma (Shanghai, China). Transfections were performed using the Lipofectamine 3000 kit (Invitrogen; Thermo Fisher Scientific, Inc., Waltham, MA, USA) according to the manufacturer’s protocol. The knockdown efficiency was assessed by reverse transcription-quantitative polymerase chain reaction (RT- qPCR) 48 h after transfection.

CRISPR-based epigenetic perturbation

The promoter- or enhancer-targeting sgRNAs and non-targeting sgRNAs with no genome recognition sites were cloned into pLV-hU6-sgRNA-hUbC-dCas9-KRAB-T2a-Puro (Addgene: 71236). The cells were infected with these vectors and then selected with puromycin (5 µg/ml) for at least 3 days before extracting RNA. All sgRNA sequences used are listed in Supplementary Table 36.

DNA FISH combined with immunofluorescence

OS-RC-2 and HEK293T cells were cultivated in a chamber slide (ibidi) and subjected to doxycycline (25 ng/ml) for 24 h. After fixation with 4% paraformaldehyde for 10 min, the cells were rinsed three times with ice-cold PBS. For DNA FISH, cells were permeabilized by sequential incubation in 70%, 80%, 90%, and 100% ethanol (5 min each) at room temperature. The probe hybridization mixture was formulated by combining FISH Hybridization Buffer (39 μl) (#Bes1011, BersinBio) with FISH probes (1 μl). Ten microliters of this mixture were deposited on each slide, and a coverslip was affixed on top, sealed with rubber cement. Upon solidification of the rubber cement, genomic DNA and probes were denatured at 75 °C for 8 min. The slides were then incubated at 37 °C overnight in darkness. Subsequently, the coverslip was removed, and the slides were sequentially washed with prewarmed 2×SSC at 53 °C for 5 min, prewarmed 0.1% NP-40/2×SSC at 42 °C for 5 min, and 2×SSC at room temperature for 5 min. Post-washing, the slides were stained with 4’,6-diamidino-2-phenylindole and examined on a confocal laser scanning microscope (Zeiss). The DNA FISH probes used in this study were custom-synthesized by BersinBio, with their sequences listed in Supplementary Table 35.

PCR across the breakpoint junctions

We designed specific primers on each side of the breakpoint and verified the breakpoint junction by PCR followed by sequencing to obtain nucleotide resolution sequence information. All primers used were listed in Supplementary Table 37.

RNA isolation and quantitative real-time PCR (qPCR)

Total RNA was extracted from the indicated cells and frozen tissue specimens using TRIzol reagent (Invitrogen, USA) and reverse transcribed into cDNA using a Prime Script RT reagent kit (Takara, Japan) according to the manufacturers’instructions. Real-time PCR was performed using SYBR reagent (Applied Biosystems). The sequences of the primers used for quantitative real-time PCR are provided in Supplementary Table 38. For quantitative data analysis, gene expression was calculated using the 2 − ∆∆Ct method.

Western blot assay

Proteins for western blotting analysis were extracted using RIPA buffer supplemented with protease inhibitors, and their concentrations were quantified with a BCA assay kit. Equal amounts of proteins were separated on 4-20% SDS-PAGE gels and subsequently transferred onto nitrocellulose membranes. The membranes were then blocked with 5% skimmed milk for 1 h at room temperature. Following blocking, the membranes were incubated overnight at 4°C with the following primary antibodies: rabbit anti-human SEMA5B (#HPA066548, Sigma-Aldrich; Merck KGaA), anti-β-tubulin (#abs131994, Absin, Shanghai, China), and the following antibodies from Cell Signaling Technology (CST, Danvers, USA): anti-Jak1 (6G4) (#3344S), anti-phospho-Jak1 (Tyr1034/1035) (#3331S), anti-Stat3 (D3Z2G) (#12640S), anti-phospho-Stat3 (Tyr705) (D3A7) XP® (#9145S), anti-mTOR (#2972S), anti-phospho-mTOR (Ser2481) (#2974S), anti-Akt (pan) (C67E7) (#4691S), and anti-phospho-Akt (Thr308) (#9275S). β-Tubulin was used as an internal loading control.

After extensive washing, the membranes were incubated for 1 h at room temperature with horseradish peroxidase (HRP)-conjugated goat anti‑rabbit IgG (Beijing Zhongshan Golden Bridge Biotechnology, Co. Ltd., Beijing, China) or HRP-conjugated goat anti-mouse IgG (Beijing Zhong-shan Golden Bridge Biotechnology, Co. Ltd., Beijing, China). Protein bands were visualized using the SuperSignal™ West Pico Plus Chemiluminescent Substrate kit (Thermo Scientific Inc., IL, USA) and detected with a Molecular Imager ChemiDoc XRS system (Bio-Rad Laboratories, Inc., Hercules, CA, USA). Quantitative analysis of the band intensities for SEMA5B and signal-related proteins was performed using ImageJ software, with the values normalized to β-tubulin.

HE and immunohistochemical staining

All tissues were fixed in 10% neutral-buffered formalin. Lungs were inflation-fixed and all other tissues were drop-fixed for 24 h before paraffin embedding. Histological assessment was performed on hematoxylin and eosin (H&E)-stained sections. For immunohistochemical analysis, sections were dewaxed, rehydrated and subjected to microwave-based antigen retrieval with 20 min of boiling in citrate buffer (10 mM citrate and 0.05% Tween-20, pH 6.0) under pressure. Sections were probed with primary antibodies to SEMA5B (Sigma‑Aldrich; Merck KGaA). Secondary antibodies (ImmPRESS HRP reagent kit, anti-rabbit, Beijing Zhongshan Golden Bridge Biotechnology, Co. Ltd., Beijing, China) were incubated at room temperature for 45 min and finally visualized with the peroxidase substrate DAB (Sigma‑Aldrich; Merck KGaA) for the same amount of time (maximum 5 min) at room temperature. The stained tissue sections were counterstained with Meyer’s hematoxylin.

Cell proliferation and viability assays

Cell proliferation was quantified using the Cell Counting Kit-8 (CCK-8; Beyo time Institute of Biotechnology, Shanghai, China) following the manual. Briefly, 3×103/well cells were seeded and transfected in a 96-well plate (Corning Incorporation). After indicated time, 10 μl of CCK-8 solution was added into each well, and the cells were incubated for 4 h at 37 °C. The absorbance was measured using a microplate reader (Bio-Tek Instruments,VT, USA) at 450 nm.

Colony formation assay

A colony formation assay was also performed. Cells were added to 6-well plates (1×103 cells/well) after transfection for 2 weeks. Colonies were fixed with 100% methanol at room temperature for 20 min and stained with 0.1% crystal violet (Sigma‑Aldrich; Merck KGaA) at 25 °C for 30 min. The total number of visible colonies were imaged and counted using a light microscope. All experiments were repeated three times.

Trans-well invasion assay

Trans-well membranes with 8μm pore size (Corning Costar, MA, USA) coated with Matrigel (BD Biosciences, NJ, USA) were used for cancer cell invasion assay as previously described. After indicated transfection, cells were resuspended into fresh serum-free medium, and re-plated into the upper chamber. Fresh medium was directly added to the lower chamber. After additional 24 h incubation, the invasive cells penetrated to the lower surface, and were fixed with 4% paraformaldehyde (PFA) in phosphate-buffered saline (PBS) and stained with crystal violet (Sigma-Aldrich, MO, USA). The number of invasive cells was counted with a microscope (BX51, Olympus).

Human samples

Tissue samples of ccRCC patients were from the Fourth Affiliated Hospital of School of Medicine, Zhejiang University. Informed consent was obtained from all study subjects, and sample collection was approved by the Ethics Committee of the Fourth Affiliated Hospital of School of Medicine, Zhejiang University (K2024224).

Animal studies

All experiments using animals were performed in accordance with protocols approved by the local animal experimentation committee of the Zhejiang University (license no. ZJU20241020). Balb/C-nude mice were used at 8 weeks of age. Luciferase-expressing OS-RC-2 cells were injected subcutaneously on both flanks and monitored twice per week for 4 weeks by electronic caliper or bioluminescence imaging using Xenogen IVIS Imaging System 100 (STTARR facility, UHN). Investigators who measured and analyzed the samples were blind to the group information.

Risk prediction model construction

ccRCC patients were randomly partitioned 1:1 into training and testing cohorts using the createDataPartition R function. Potential prognostic ccRCC-specific enhancer hijacking genes were initially identified via univariate Cox regression analysis. Subsequently, the R package glmnet (v = 4.1.8) (https://cran.r-project.org/web/packages/glmnet/index.html) facilitated LASSO Cox regression to select the most significant genes. A multivariate stepwise Cox regression model was then constructed to further refine genes selection and optimize the prognostic model. Finally, a risk score equation was derived by combining the multivariate Cox regression coefficients with the expression levels of the selected enhancer hijacking genes, as follows:

Risk = sum (Gene expression*coef)

Survival analysis and patient stratification

Cohorts (TCGA, internal training, and testing datasets) were stratified into high- and low-risk groups based on the median risk score. Kaplan-Meier survival curves, generated using the R package survminer (v = 0.4.9) (https://github.com/kassambara/survminer), and log-rank tests (p < 0.05) assessed significant differences in OS and PFS between risk groups. The R package timeROC (v = 0.4) (https://cran.r-project.org/web/packages/timeROC/index.html) facilitated ROC analysis to evaluate the model’s sensitivity and specificity for predicting OS, comparing its AUC against other clinical features. Associations between the risk score and clinical factors (age, gender, stage, T, M, N, grade) were then examined. To enhance prognostic precision and predictive capacity, a nomogram integrating the risk score with clinical parameters was developed, providing quantitative survival probability estimates.

Quantification and statistical analysis

The associations between Hi-C interactions, transcriptome dynamics, and SV were examined by Fisher’s test and rank sum test. Gene expression data were presented as median with interquartile range (IQR), with statistical significance evaluated by Wilcoxon rank-sum test. Constituent ratios were analyzed using χ² test or Fisher’s exact test. Survival curves and pan-cancer expression patterns were acquired from GEPIA2 using standard analytical parameters87. All statistical analyses of exprements were performed using either Prism 9 (GraphPad Software), with appropriate statistical tests applied based on data distribution and experimental design.