Complex de novo structural variants are an underestimated cause of rare disorders

Jung, Hyunchul; Yang, Tsun-Po; Walker, Susan; Danecek, Petr; Garcia-Salinas, O. Isaac; Neville, Matthew D. C.; Christopher, Joseph; Cortés-Ciriano, Isidro; Firth, Helen; Scally, Aylwyn; Hurles, Matthew; Campbell, Peter; Rahbari, Raheleh

doi:10.1038/s41467-025-64722-2

Download PDF

Article
Open access
Published: 03 November 2025

Complex de novo structural variants are an underestimated cause of rare disorders

Nature Communications volume 16, Article number: 9528 (2025) Cite this article

13k Accesses
1 Citations
95 Altmetric
Metrics details

Subjects

An Author Correction to this article was published on 17 February 2026

This article has been updated

Abstract

Complex de novo structural variants (dnSVs) are crucial genetic factors in rare disorders, yet their prevalence and characteristics in rare disorders remain poorly understood. Here, we conduct a comprehensive analysis of whole-genome sequencing data of 12,568 families, including 13,698 offspring with rare diseases, obtained as part of the UK 100,000 Genomes Project. We identify 1,870 dnSVs, constituting the largest dnSV dataset reported to date. Complex dnSVs (n = 158; 8.4%) emerge as the third most common type of SV, following simple deletions and duplications. We classify 65% of these complex dnSVs into 11 subtypes. Among probands with dnSVs (n = 1,696), 9% exhibit exon-disrupting pathogenic dnSVs associated with the probands’ phenotype. Notably, 12% of exon-disrupting pathogenic dnSVs and 22% of de novo deletions or duplications previously identified by array-based or whole-exome sequencing methods are found to be complex dnSVs. We also find distinct genomic properties of de novo deletions depending on the parent of origin. This study highlights the importance of complex dnSVs in the cause of rare disorders and demonstrates the necessity of specific genomic analysis to avoid overlooking these variants.

Structural variant calling and clinical interpretation in 6224 unsolved rare disease exomes

Article Open access 31 May 2024

Uncovering recessive alleles in rare Mendelian disorders by genome sequencing of 174 individuals with monoallelic pathogenic variants

Article 27 September 2024

Combining exome/genome sequencing with data repository analysis reveals novel gene–disease associations for a wide range of genetic disorders

Article Open access 19 April 2021

Introduction

Structural variants (SVs), defined as genetic changes \(\ge\)50 bp that encompass copy number variants (CNVs)¹, rearrangements, and mobile element insertions, play an important role in cancer when occurring in somatic cells². They also arise in the germline, with de novo structural variants (dnSVs) contributing to rare disorders^{3,4,5,6,7,8,9,10}. For instance, chromosomal microarray (CMA), which is capable of detecting submicroscopic CNVs, demonstrated an average diagnostic yield of 12.2% in patients with developmental and intellectual disorders¹¹. Beyond CNVs, other types of SVs, such as complex SVs involving clustered breakpoints originating from a single event^3,12, have provided insights into the genetic aetiology of rare disorders^13,14,15, surpassing the explanatory power of CNVs alone. Nevertheless, in contrast to de novo single-nucleotide variants (SNVs), there is limited information on the prevalence and characteristics of dnSVs, particularly complex rearrangements, in rare disorders, primarily due to the substantial technical challenges associated with their detection¹⁶.

One prominent difficulty arises from the inherent limitations of short-read technologies in accurately capturing and characterising large-scale genomic rearrangements. The restricted read lengths often result in fragmented or incomplete representations of complex structural variations, leading to difficulties in assembling the complete picture of genomic architecture. This issue is particularly pronounced in regions with high sequence similarity, where distinguishing between homologous sequences presents significant computational and analytical challenges.

Long-read sequencing mitigates the challenges associated with short-read platforms by offering a more direct span across SVs, thereby enabling better resolution and a more complete representation of complex genomic variations. While long-read sequencing offers unique advantages in studying SVs, the lack of substantial long-read sequence datasets from rare disorder cohorts highlights the ongoing importance of precise short-read based SV analytical pipelines^17,18,19. These pipelines, essential for detecting a broad spectrum of SVs and reducing false positives. This is particularly pertinent in the absence of large cohort population datasets, which hampers accurate filtering and necessitates robust short-read analytical approaches²⁰. Consequently, leveraging large-scale short-read sequence datasets with rigorous analytical approaches remains key for a nuanced understanding of SVs in diseases, particularly rare disorders.

To shed light on their significance in rare diseases, we analysed dnSVs identified in 13,702 whole-genome-sequenced parent–child trios from 12,568 families from the rare disease programme of the 100,000 Genomes Project (Supplementary Data 1)²¹ using rigorous analytical approaches. The rare disease cohort encompasses individuals with a broad spectrum of conditions, with neurology and neurodevelopmental (NN) disorders making up half of the cohort. Other represented disorders include ultra-rare conditions, ophthalmological, renal and urinary tract, cardiovascular, endocrine, and additional disease groups (Supplementary Fig. 1a).

Results

Rate of de novo SVs and parental age and sex bias

We developed a rigorous pipeline to analyse an average of 13,980 candidate variants (standard deviation = 2550) per proband, already called by Genomic England using the Manta caller²². We identified a total of 1870 high-confidence dnSVs (Fig. 1 and Supplementary Data 2), all of which were visually inspected (“Methods”). Some of these dnSVs were validated using previously identified dnSVs detected in independent sequencing data from overlapping family cohorts. The validation rate was 100% (n = 44): 37 candidate dnSVs were confirmed by array/whole-exome sequencing from the Deciphering Developmental Disorders (DDD)²³ study and 7 candidate dnSVs were confirmed using long-read sequencing data from Genomic England (GEL)²¹, respectively (Supplementary Fig. 2). In addition, we validated 11 pathogenic dnSVs (Fig. 2 and Supplementary Fig. 3) using RNA-seq data by confirming abnormal RNA reads supporting dnSVs (Supplementary Fig. 3a), underexpression (Fig. 2g), and Supplementary Fig. 3b), or aberrant splicing patterns (Supplementary Fig. 3c). Furthermore, 19 of the pathogenic dnSVs involving inversions were validated by an independent group²⁴.

**Fig. 1: Summary of the identified dnSVs in the rare disease programme of the 100,000 Genomes Project.**

**Fig. 2: Representative complex SVs disrupting potential causal genes.**

Using 1870 high-confidence dnSVs from 1696 probands (91% of probands had a single SV; Supplementary Fig. 1b), we estimated an overall mutation rate of 0.13 events per genome per generation, in line with previous reports^25,26,27 (Supplementary Fig. 4a). The rate of dnSVs varies across the rare disorder categories; such that probands with NN disorders and those with cardiovascular disorders exhibit the highest dnSV rate (0.15 event per genome), whereas probands with ophthalmological and hearing and ear disorders show the lowest (0.1 event per genome; Supplementary Fig. 4b). It is worth noting that the rate of dnSVs is marginally higher in the probands (0.13 event per genome) than in unaffected siblings (n = 207; 0.09 event per genome; P = 0.05). Approximately 12% (n = 1696) of the probands harboured at least one dnSVs. We identified 4 individuals with a considerably higher number of dnSVs (n \(\ge\) 4). These individuals, recruited under different rare disease categories, are not among the previously reported germline SNV hypermutators²⁸ and have no known history of parental exposure to chemotherapy. Unlike the known multiple dnCNVs phenomenon that shows a predominance of copy number gain^29,30, 88% of the identified dnSVs in these individuals were a deletion (median = 1.5 kb), suggesting that further investigation is needed to characterise the multiple dnSVs in these individuals. We found a statistically significant positive correlation between the number of dnSVs and de novo SNVs/indels (P = 3.87E-07; Supplementary Fig. 5a), which is partly explained by the parental age effect^27,31. However, the mechanistic basis of this correlation remains unclear.

Interestingly, we observed a greater enrichment of dnSVs in probands without diagnostic SNVs/indels compared to those with diagnostic SNVs/indels (P < 5.00E-02; Fig. 1a), suggesting that a significant proportion of unsolved cases is likely to be explained by dnSVs. We also found a parental-age effect on the occurrence of dnSVs (Fig. 1b, P_{_paternal} < 5.00E-02 and P_{_maternal} < 1.00E-02). Overall, we identified a significant increase in parental age at birth in probands with dnSVs compared to those without (P < 5.00E-02). Among the rare disorder classes, a significant difference in parental age distribution is observed in dysmorphic and congenital abnormality syndromes and skeletal disorders (P < 5.00E-02), while only the association with skeletal disorders remained significant after multiple testing correction (Benjamini-Hochberg corrected P < 0.05; Supplementary Fig. 5b). Additionally, we observed 67.8% of the phased dnSVs originated from paternal germ cells (Supplementary Fig. 5c), as a proportion consistent with previous studies on structural variation (66–74.4%)^26,27. This finding aligns with the well-documented paternal bias in de novo SNVs and indels, reinforcing the broader trend of increased germline mutagenesis in the male lineage³².

Distribution of different classes of dnSVs

Among the different classes of dnSVs (Fig. 1c), simple deletion (n = 1377; 73.6%) was the most common, followed by tandem duplication (n = 245; 13.1%). The median detected deletion and tandem duplication sizes are 3.7 kb (range 52 bp − 61 Mb) and 49 kb (range 135 bp − 154 Mb), respectively (Supplementary Fig. 5d). Furthermore, we identified other classes such as complex SVs (n = 158; 8.4%), reciprocal inversion (n = 49; 2.6%; Fig. 2a), reciprocal translocation (n = 30; 1.6%; Fig. 2b), and templated insertion (n = 6; 0.7%; Fig. 2c). The representative probands with simple SVs disrupting phenotype-relevant genes are shown in Fig. 2a–c. For example, we identified a templated insertion that disrupted MECP2, which has a well-established function in neurodevelopment³³ in probands with NN disorders. This gene is known to be recurrently affected by dnSVs¹⁰ (Fig. 2c). This was independently validated by long-read sequencing (Supplementary Fig. 2b).

We inferred the timing of maternally derived duplication formation into meiosis I and II based on the fact that heterologous allele duplications are known to occur only before the separation of homologous chromosomes during meiosis I. In contrast, homologous allele duplications are known to occur before the separation of sister chromatids during meiosis II³⁴. We identified 41 cases of dnSVs with duplications, which comprise 30 tandem duplications and 11 complex SVs (median = 380 kb; range 38 kb – 40 Mb), for timing analysis. We classified the timing of duplication of maternal origin into meiosis I and II (Methods and Supplementary Fig. 6). This classification revealed that 85% of duplications in this cohort originated from maternal meiosis II (P = 4.87E-06; Fig. 1d). Further investigations using larger cohorts are required to confirm which step of meiosis contributes more significantly to dnSVs^34,35.

The role of complex SVs in rare disorders

Notably, the third most common type of dnSVs is complex SVs. We further classified complex SVs into nine major classes (Fig. 1e). The most common class, termed ‘Loss-Loss’ (n = 18;11.4%), comprised two adjacent deletions (Fig. 2d). For instance, the two adjacent deletions (2 kb and 3 kb in length) in a proband with an NN disorder affected two exons in CNOT2 (Fig. 2d) for which haploinsufficiency is known to cause a neurodevelopmental disorder with characteristic facial features⁹. In addition, adjacent deletions (5 kb and 1.7 kb in length) in a proband disrupted exon 2 of AMER1 (Supplementary Fig. 2c), for which deficiency is associated with osteopathia striata with cranial sclerosis³⁶. Other classes comprising inversion and deletion are ‘Inv-Loss’ (i.e., inversion with flanking deletion; n = 14;8.9%; Fig. 2e) and ‘Loss-Inv-Loss’ (i.e., paired deletion inversion; n = 12; 7.6%; Fig. 2f). The representative cases with these types of complex SVs disrupting renal and urinary tract disorder-³⁷ and NN disorder-related genes³⁸, such as KMT2A, AFF2, FMR1, and SRRM2, are shown in Fig. 2e–g. We observed significantly reduced mRNA expression of SRRM2 (P = 4.00E-04), disrupted by ‘Loss-Inv-Loss’ in a proband with an NN disorder (Fig. 2g).

Another commonly observed class, termed ‘Loss-invDup’ (n = 14; 8.9%), is characterised by copy-number loss plus a nearby duplication linked by inverted rearrangements. For instance, a ‘Loss-invDup’ in a proband with NN affected an exon in AUTS2 (Fig. 3a), which has been implicated in neurodevelopment and as a candidate pathogenic gene for numerous neurological disorders³⁹. Another class, ‘Deletion bridge’ (i.e., bridge of templated insertion; n = 7; Fig. 3b and Supplementary Fig. 7a, b), led to large deletions (15 Mb in chromosome X in Fig. 3b and 720 kb in chromosome 1 in Supplementary Fig. 7b) containing genes involved in neurodevelopment^40,41, such as GALNT2, MECP2, and CTNNB1, in probands with NN disorders (Supplementary Fig. 7a, b). ‘Translocation-Loss’ (n = 3) led to deletions on either one chromosome (Fig. 3c) or both chromosomes (Supplementary Fig. 7c), resulting in the disruption of phenotype-relevant genes such as ARID1B⁴² in a proband with an NN disorder.

**Fig. 3: Representative complex SVs disrupting potential causal genes.**

The other remaining classes, comprising duplication and inversion, are ‘DUP-NML-DUP’^3,43 (i.e., paired duplication inversion or Dup-invDup; n = 7; 4.4%; Supplementary Fig. 8) and ‘DUP-TRP/INV-DUP’¹⁵ (i.e., Dup-Trp-Dup; n = 14; 8.9%; Supplementary Figs. 7d and 8), exhibit structures involving two duplications linked by inverted rearrangements and duplication–inverted-triplication–duplication, respectively. Interestingly, beyond local-2 jumps (i.e., clusters of two rearrangements) as described above, we also found three instances of ‘DUP-NML-DUP-NML-DUP’^44,45,46 (i.e., known as local-3-jumps in the cancer field²) involving three local rearrangements (Supplementary Fig. 9a). Furthermore, other complex duplications, such as dispersed (n = 8; 5.1%) and inverted duplications (n = 2; 1.3%), were also observed. Although most pathogenic effects of these complex SV types involving duplication arise from overexpression of triplosensitive genes^15,45 (i.e., gain-of-function), these variant types have been reported to cause disease by loss-of-function mechanisms such as gene disruption⁴⁷, gene fusion at breakpoints⁴⁸, and segmental uniparental disomy⁴⁹.

The complex SVs that did not fit into the described classes were categorised as ‘Unclassified’ (n = 56; 35.4%). Two of the complex SVs under the “Unclassified” category had long-read data that enabled us to resolve their genomic configuration (Fig. 3d, e). A proband with a skeletal disorder (Fig. 3d) had a deletion-inversion-deletion-inversion-deletion structure, which affected several exons in EFTUD2, for which deficiency is likely to lead to craniofacial anomalies⁵⁰. Another case has a structure of duplication followed by ‘Loss-Inv-Loss’ which disrupted the phenotype-relevant gene, EDA⁵¹ (Fig. 3e). In addition, a case with a small deletion (3 kb in chr22q.13.33) within a large deletion (20 kb in chr22q.13.33), where one of the breakpoints was the same, in a proband with an NN disorder disrupted SHANK3 associated with a broad spectrum of neurodevelopmental disorders⁵² (Supplementary Fig. 9b). Collectively, these results highlight the complex nature of dnSVs in rare disorders.

Clinical impact of dnSVs

Overall, our analysis reveals that among probands with dnSVs, 9% (145/1696) exhibit exon-disrupting pathogenic dnSVs associated with the probands’ phenotype. Notably, 66 of these 145 (46%) pathogenic SVs were balanced rearrangements (e.g., reciprocal inversion) or CNVs affecting <3 exons that cannot be reliably detected by array-based or whole-exome sequencing methods (Fig. 4a and Supplementary Data 2), highlighting the importance of WGS-based genetic testing in routine clinical care.

**Fig. 4: Clinical relevance of dnSVs.**

In our study, we observed that 1.4% of probands with NN disorders harboured complex dnSVs, a prevalence about two times higher than in a previous autism spectrum disorder study²⁷ (0.76%; P = 2.00E-02; two-sided Fisher exact test). Notably, approximately 12% of pathogenic dnSVs in our dataset were identified as complex events (Fig. 4b), highlighting their significant role in rare disorders despite their lower frequency. Moreover, among probands with array-based or whole-exome sequencing data available, 22% of de novo CNVs identified by these data (8/37) were complex SVs, which were previously misclassified as simple dnSVs (Fig. 4c).

Furthermore, our investigation reveals distinctive patterns within intronic and intergenic dnSVs among probands with NN disorders. Intronic dnSVs showed a significant enrichment in known pathogenic genes associated with NN disorders in the G2P database⁵³ (P = 1.00E-03). In contrast, intergenic dnSVs, when assessed for genes within a 50 kb range up- and downstream of the dnSVs (“Methods”), did not show such an association, suggesting the potential pathogenic role of intronic dnSVs in rare disorders (Fig. 4d). Additional studies using RNA-seq and/or CRISPR/Cas-9 genome editing are needed to elucidate the functional impact of these intronic dnSVs on mRNA splicing and expression.

Genomic properties of dnSVs

In exploring genomic properties of dnSVs, we observed a prevalent distribution of de novo deletions (dnDELs) and tandem duplications (dnTDs) in gene-dense areas (Supplementary Fig. 10), in line with previous findings in somatic cells². However, smaller dnDELs (< 10 kb) are enriched in early-replicating regions (P = 1.00E-03; Supplementary Fig. 10a), which is inconsistent with previous reports².

We observed that the majority of dnSVs, primarily simple dnDELs, exhibit enrichment at the subtelomeric regions across autosomes (Supplementary Fig. 11). We also identified a positive association between the number of subtelomeric dnDELs and early replication regions, especially when they were within 15 Mb of telomere ends (Spearman’s rho = 0.56, P = 6.46E-03; Supplementary Fig. 12). In total, the density of dnDELs within 15 Mb of telomere ends (i.e., 1.3/Mb) is 2.8 times greater than the autosome-wide average (i.e., 0.457/Mb).

We observed a distinctive sex difference in patterns of dnSVs, specifically, maternal dnSVs were enriched for larger deletions, while paternal dnSVs were enriched for smaller deletions (P = 4.99E-05; Fig. 5a). We further confirmed a similar enrichment pattern using an independent dataset²⁷ (P = 1.63E-03; Fig. 5a). This gender-specific difference is potentially in line with a higher incidence of aneuploidy in oocytes than in sperm⁵⁴. The higher rate of aneuploidy is known to be associated with the distinct features of oocytes⁵⁵, such as the architecture of the meiotic spindle, the level of cortical tension at the oocyte surface, weaknesses in surveillance mechanisms that monitor chromosome segregation, and environmental factors. Additionally, we found that maternal dnDELs are enriched at the subtelomeric regions (< 15 Mb) of chromosome 16 (Fig. 5b). We observed similar maternal enrichment of dnDELs in subtelomeric regions of chromosome 16 in an independent cohort²⁷ (Fig. 5b), suggesting possible sex-specific mechanisms in the generation of dnSVs.

**Fig. 5: Size and genomic distribution of dnDELs according to parent of origin.**

Discussion

Our investigation provides a substantial advancement in the understanding of de novo structural variants in rare disorders, encompassing an extensive cohort of 13,702 parent–child trios. In particular, we provide insights into the role of complex SVs in the aetiology of rare disorders.

The prevalence of dnSVs, affecting 12% of probands, highlights the importance of integrating these variants into the broader spectrum of genetic factors contributing to rare disorders. Unlike conventional cytogenetic methods, such as array Comparative Genomic Hybridisation (CGH)-based technologies, WGS offers unparalleled precision in characterising the genomic configuration of complex dnSVs. This is particularly crucial as some simple deletions and insertions may be integral components of complex SVs often overlooked by array-based / whole-exome sequencing methods. For instance, in 37 cases where array and or whole-exome sequencing data were available, we found that 8 complex dnSVs (22%) were misclassified as simple dnSVs. In addition, 66 of 145 (46%) pathogenic SVs identified in our study were balanced rearrangements (e.g., balanced inversion) or CNVs affecting \(\le 2\) exons that can’t be detected by array-based / whole-exome sequencing methods, highlighting the importance of WGS-based genetic testing in routine clinical care.

Notably, dnSVs exhibited non-random distribution patterns, showing enrichment in specific genomic locations associated with distinct features depending on the parent of origin. Strikingly, we observed an enrichment of maternal dnDELs within 15 Mb of the telomeres of chromosome 16. This enrichment positively correlates with skewed early replication regions across chromosomes. While the genomic basis for this maternal bias in subtelomeric regions of chromosome 16 is unknown, previous reports have suggested potential explanations⁵⁶. These include early subtelomeric replication in meiosis⁵⁷, increased rates of meiotic double-strand breaks in the distal parts of chromosomes⁵⁸, or biased maternal non-crossover gene conversion⁵⁹. Overall, these findings indicate the need for further investigation into parental influence and region-specific impacts on disease manifestation.

We note several limitations in our approach, which open potential avenues for future investigations. Complex structural variants are still underestimated in our cohort because of the inherent limitations of short-read-based SV discovery. For example, SVs in repeat-rich regions (e.g., segmental duplications or retrotransposons) remain challenging to identify. Although we identified some of these SVs using read-depth-based algorithms (e.g., CANVAS), resolving the genomic configuration of complex SVs using read-depth-specific calls that do not provide read orientation, along with read-pattern-based calls, is challenging. Furthermore, short-read sequencing is known to fail to capture large SVs, especially large insertions. Due to the lower detection sensitivity of de novo retrotransposition, with the current pipeline, these variants have been excluded from our analysis. Specifically, we observed a rate of 0.01 events per genome, which is far lower than expected (0.03-0.038 per genome^27,60). Future research could explore techniques such as long-read sequencing⁶¹ to enhance our ability to detect dnSVs in repetitive regions⁶². Finally, the systematic identification of gene duplication leading to triplosensitivity⁶³ and inherited pathogenic SVs will be needed to facilitate more comprehensive diagnostics.

Overall, our findings expand the understanding of dnSVs in rare disorders and highlight the need for ongoing research to unravel the complexities of their contribution to the aetiology of rare disorders and the potential for clinical application.

Methods

SV calling

We used available data from the rare disease cohort of the 100,000 Genomes Project generated by Genomics England²¹. The 100,000 Genomes Project was approved by the East of England—Cambridge Central Research Ethics Committee (REF 20/EE/0035). Genomic blood DNA libraries were prepared using the Illumina TruSeq PCR-free protocol, and whole-genome sequenced on the Illumina HiSeq X platform (2 × 150-bp paired-end reads). Read alignment and SV calling using Isaac⁶⁴ and Manta²², respectively, were performed by the Genomics England Bioinformatics team²¹. The details of sequencing and variant calling have been previously described²¹. Manta VCF files were converted to BEDPE format using SVtools (v0.5.1)⁶⁵ and then BEDTools (v2.31.0)⁶⁶ was utilised to extract proband-specific SVs with ≥ 50 bp in length for each family (Supplementary Fig. 13). We first removed SVs on Y chromosome and further removed SVs with evidence of clipped reads (i.e., split reads) at breakpoints in either parent. Specifically, SVs supported by ≥ 4 clipped reads at either breakpoint or ≥ 2 clipped reads at both breakpoints in either parent were excluded. SVs found in 3 or more samples were removed because such SVs were likely alignment artefacts. We selected SVs flagged as “PASS” or “MGE10kb” (i.e., Manta calls with length < 10 kb) and further narrowed down SVs with the Manta score > 30 and supporting discordant reads > 10. We rescued SVs with imprecise breakpoints if they were supported by CANVAS⁶⁷. In detail, SVs tagged as “IMPRECISE” (i.e., imprecise breakpoints) were rescued if they were also flagged as “ColocalizedCanvas”. We excluded SVs with VAF < 0.1 (n = 10) to remove mosaic SVs. Translocation through retrotransposon-mediated 3′ transduction was excluded to focus on dnSVs. All SVs were manually validated to identify high-confidence de novo SVs using the IGV browser⁶⁸. Specifically, we visually validated the presence of abnormal reads (i.e., discordant/split reads) in probands and the absence of such reads in their parents. We additionally called CNVs (i.e., deletion and duplication) with > 10 kb using CANVAS to identify more diagnostic CNVs. Read-depth-specific calls were only used to assess their pathogenicity because clustering and classification (i.e., resolving genomic configuration of complex SVs) of read-depth calls that did not have read-orientation were challenging. Long-read sequencing data (23 resequenced samples with PacBio technology), RNA-seq (whole blood RNA sequencing from 5,546 samples) and diagnostic SNV/Indels were obtained from Genomic England²¹. Previous dnCNV calls from array-based or whole-exome sequencing were obtained from the DDD study cohort²³. Insertion events called by Manta²² were further classified into retrotranspositions using RepeatMasker. This process resulted in a lower sensitivity because Manta with Issac-based alignment is not optimised to call retrotranspositions. Retrotransposition-specific identification tools, such as MELT⁶⁹ or xTea⁷⁰, are needed to increase sensitivity for retrotransposition detection. To estimate the proportion of probands with complex SVs in the previous study²⁷, we obtained 19 probands with complex SVs where “sv_type” was “CPX” and “role” was “proband”.

SV classification

We used ClusterSV² (https://github.com/cancerit/ClusterSV) to group rearrangements (i.e., breakpoints), into rearrangement clusters. The key advantage of ClusterSV is to identify clusters of dispersed breakpoints⁴⁶ without requiring a predefined distance threshold, allowing for a data-driven detection of complex genomic rearrangements. We defined complex SVs as those with ≥ 2 clustered breakpoints except for simple SVs involving reciprocal inversion, balanced translocation, templated insertion, and dispersed duplication. In general, we classified the types of complex SVs according to the previous study that comprehensively characterised somatic complex SVs using thousands of cancer genomes². In short, complex SVs involving two inversions were categorised into Loss-invDup, DUP-TRP/INV-DUP, Inv-LossDU (i.e., inversion with flanking deletion), Loss-invLoss (i.e., paired deletion inversion), and DUP-NML-DUP (i.e., paired duplication inversion) according to read patterns and copy numbers (Supplementary Fig. 8). Complex SVs involving two deletions were classified as Loss-Loss. Bridge deletion (i.e., bridge of templated insertion) and Translocation-Loss (i.e., translocation with deletion) were classified using the previously described criteria^2,71. DUP-NML-DUP-NML-DUP (i.e., Local-3 jumps) involving three local rearrangements were discovered according to the read patterns described in the previous cancer study². Breakpoints filtered out near unresolved SV classes were rescued if they could resolve the configuration of unresolved SV classes according to the types of SV defined. For the remaining unresolved SV, CANVAS calls were used to resolve their genomic configuration manually. Complex SVs that did not fit into the described classes were categorised as ‘Unclassified’. All complex SVs were manually validated using IGV browser (2.18.2)⁶⁸, Samplot (v1.3.0)⁷², or BamSnap (v0.2.19)⁷³.

SV phasing to identify parent of origin and estimation of the timing of duplication from maternal origin

We used unfazed (v1.0.2)⁷⁴, which employs both extended read-backed- and SNV allele-balance- phasing, to identify the parent of origin for dnSVs. Haplotypecaller (v3.3)⁷⁵ was utilised to make an input for unfazed. Phasable dnSVs (51%; 962/1870) were used for downstream analysis concerning the parent of origin. To classify the timing of maternally derived duplication into meiosis I and II, we first identified duplication (including those in complex SVs) from maternal origin (step 1) and further classified them into meiosis I and II (step 2) using a set of informative genotypes (Supplementary Fig. 6) For binary classification at each step, the ratio of the number of SNPs supporting one class to another class was calculated, and a class for which the ratio was greater than 0.9 was chosen, At least three SNPs were required for either class at each step. These filtering criteria could time large duplications with a handful of erroneously called SNPs and remove ambiguous duplications such as those originating from both parents during early development (e.g., potentially due to mitotic crossing-over). The timing of paternally derived duplications was not inferred because duplications can also occur in a premeiotic state during male gametogenesis throughout life.

Evaluation of clinical relevance of dnSVs

The identified SVs disrupting exons were reviewed for potential clinical relevance by NHS clinical scientists and/or Genomics England. We considered SVs as being potential (likely) pathogenic SVs if at least one of the following criteria were fulfilled: (i) the variant had been clinically assessed as likely pathogenic or pathogenic by an NHS genomic laboratory hub. In detail, the variant had been assessed by an NHS clinical laboratory according to the best practice guidelines recommended by the Association for Clinical Genomic Science (ACGS) as being likely pathogenic or pathogenic and related to the primary phenotype for which the participant was recruited to the 100,000 Genomes Project. (ii) the variant had been reviewed on a research basis and considered to be a strong candidate diagnostic variant. In the research review, variants resulting in loss of function for genes in which haploinsufficiency is a known mechanism of disease or variants resulting in loss of a critical domain impacting genes associated with a phenotype relevant to the primary clinical indication were considered as candidate diagnostic variants.

We classified pathogenic SVs annotated with “reciprocal inversion” or “reciprocal translocation” as not likely to be detected by array-based or whole-exome sequencing methods. In addition, the SVs involving deletions spanning < 3 exons of GENCODE canonical transcript⁷⁶ were classified into this category.

Enrichment testing of non-coding dnSVs in known pathogenic genes in NN disorders

We first extracted the non-coding dnSVs (i.e., intronic and intergenic dnSVs) for which genomic coordinates did not include any exons in NN disorders based on the Gencode basic V45 GTF file and then obtained the known pathogenic genes associated with NN disorders from the Gene2Phenotype developmental disorders panel⁵³. Specifically, we kept all genes with organ specificity equal to “Brain/Cognition”, allelic requirement equal to “monoallelic_autosomal”, and a confidence category equal to “strong” or “definitive” (n = 190 genes). We then computed the observed-over-expected ratio for the overlap between the non-coding dnSVs and known pathogenic genes in NN disorders using the Genome Association Tester software⁷⁷. Intronic and intergenic regions were obtained based on the canonical transcript of protein-coding genes in the Gencode basic V45 GTF file using BioMart and GencoDymo R packages, and bedtools⁶⁶. These two regions were used as a workspace in GAT to test the over-representation of dnSVs in intronic and/or intergenic regions of known pathogenic genes. We added a window of 5–500 kb (5, 10, 25, 50, and 500 kb) up- and downstream to each intergenic dnSVs to perform the enrichment test. The number of random samples (“--num-samples”) for each GAT run was set to 1000.

Enrichments near telomeres and centromeres

We equally partitioned the genome into 5 Mb bins based on their distance to the telomere ends. For comparison, we also partitioned the genome based on their distance to the centromeres. For the validation cohort, we downloaded the all_dnsv.csv file from Belyeu et al. ²⁷. In total, there are n = 309 CEPH and SFAI dnSVs across autosomes after removing chrX and chrY. Finally, only n = 192 dnDELs were used in the validation analysis.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The data used in this study can be accessed via the Genomics England Research Environment, a secure cloud workspace. The raw data, including patient profiles and corresponding genomic sequencing data, are only available under restricted access for patient privacy reasons. Access can be obtained by first applying to become a member of either the Genomics England Research Network (https://www.genomicsengland.co.uk/research/academic) or the Discovery Forum (industry partners https://www.genomicsengland.co.uk/research/research environment). The process for joining the network is described at https://www.genomicsengland.co.uk/research/academic/join-gecip.

Code availability

Custom Python and R scripts for data analysis can be found at https://github.com/hj6-sanger/GEL_SV (version 1.0.0, https://doi.org/10.5281/zenodo.17093113).

Change history

17 February 2026
A Correction to this paper has been published: https://doi.org/10.1038/s41467-026-69646-z

References

Redon, R. et al. Global variation in copy number in the human genome. Nature 444, 444–454 (2006).
ADS CAS PubMed PubMed Central Google Scholar
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
ADS CAS PubMed PubMed Central Google Scholar
Schuy, J., Grochowski, C. M., Carvalho, C. M. B. & Lindstrand, A. Complex genomic rearrangements: an underestimated cause of rare diseases. Trends Genet. 38, 1134–1146 (2022).
CAS PubMed PubMed Central Google Scholar
Lupski, J. R. Structural variation mutagenesis of the human genome: Impact on disease and evolution. Environ. Mol. Mutagen. 56, 419–436 (2015).
CAS PubMed PubMed Central Google Scholar
Lupski, J. R. et al. Gene dosage is a mechanism for Charcot-Marie-Tooth disease type 1A. Nat. Genet. 1, 29–33 (1992).
CAS PubMed Google Scholar
Potocki, L. et al. Molecular mechanism for duplication 17p11.2- the homologous recombination reciprocal of the Smith-Magenis microdeletion. Nat. Genet. 24, 84–87 (2000).
CAS PubMed Google Scholar
Potocki, L. et al. Characterization of Potocki-Lupski syndrome (dup(17)(p11.2p11.2)) and delineation of a dosage-sensitive critical interval that can convey an autism phenotype. Am. J. Hum. Genet. 80, 633–649 (2007).
CAS PubMed PubMed Central Google Scholar
Stefansson, H. et al. Large recurrent microdeletions associated with schizophrenia. Nature 455, 232–236 (2008).
ADS CAS PubMed PubMed Central Google Scholar
Lupski, J. R. Schizophrenia: Incriminating genomic evidence. Nature 455, 178–179 (2008).
ADS CAS PubMed Google Scholar
Gardner, E. J. et al. Detecting cryptic clinically relevant structural variation in exome-sequencing data increases diagnostic yield for developmental disorders. Am. J. Hum. Genet. 108, 2186–2194 (2021).
CAS PubMed PubMed Central Google Scholar
Miller, D. T. et al. Consensus statement: chromosomal microarray is a first-tier clinical diagnostic test for individuals with developmental disabilities or congenital anomalies. Am. J. Hum. Genet. 86, 749–764 (2010).
CAS PubMed PubMed Central Google Scholar
Liu, P., Carvalho, C. M. B., Hastings, P. J. & Lupski, J. R. Mechanisms for recurrent and complex human genomic rearrangements. Curr. Opin. Genet. Dev. 22, 211–220 (2012).
CAS PubMed PubMed Central Google Scholar
Sanchis-Juan, A. et al. Complex structural variants in Mendelian disorders: identification and breakpoint resolution using short- and long-read genome sequencing. Genome Med. 10, 95 (2018).
CAS PubMed PubMed Central Google Scholar
Choo, Z.-N. et al. Most large structural variants in cancer genomes can be detected without long reads. Nat. Genet. 55, 2139–2148 (2023).
CAS PubMed PubMed Central Google Scholar
Grochowski, C. M. et al. Inverted triplications formed by iterative template switches generate structural variant diversity at genomic disorder loci. Cell Genom. 4, 100590 (2024).
CAS PubMed PubMed Central Google Scholar
Scherer, S. W. et al. Challenges and standards in integrating surveys of structural variation. Nat. Genet. 39, S7–S15 (2007).
CAS PubMed PubMed Central Google Scholar
Conrad, D. F. et al. Mutation spectrum revealed by breakpoint sequencing of human germline CNVs. Nat. Genet. 42, 385–391 (2010).
CAS PubMed PubMed Central Google Scholar
Mills, R. E. et al. Mapping copy number variation by population-scale genome sequencing. Nature 470, 59–65 (2011).
CAS PubMed PubMed Central Google Scholar
Collins, R. L. et al. A structural variation reference for medical and population genetics. Nature 581, 444–451 (2020).
ADS CAS PubMed PubMed Central Google Scholar
Kernohan, K. D. & Boycott, K. M. The expanding diagnostic toolbox for rare genetic diseases. Nat. Rev. Genet. 25, 401–415 (2024).
CAS PubMed Google Scholar
Turro, E. et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583, 96–102 (2020).
ADS CAS PubMed PubMed Central Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
CAS PubMed Google Scholar
Deciphering Developmental Disorders Study. Large-scale discovery of novel genetic causes of developmental disorders. Nature 519, 223–228 (2015).
Pagnamenta, A. T. et al. The impact of inversions across 33,924 families with rare disease from a national genome sequencing project. Am. J. Hum. Genet. 111, 1140–1164 (2024).
Werling, D. M. et al. An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat. Genet. 50, 727–736 (2018).
CAS PubMed PubMed Central Google Scholar
Brandler, W. M. et al. Paternally inherited cis-regulatory structural variants are associated with autism. Science 360, 327–331 (2018).
ADS CAS PubMed PubMed Central Google Scholar
Belyeu, J. R. et al. De novo structural mutation rates and gamete-of-origin biases revealed through genome sequencing of 2396 families. Am. J. Hum. Genet. 108, 597–607 (2021).
CAS PubMed PubMed Central Google Scholar
Kaplanis, J. et al. Genetic and chemotherapeutic influences on germline hypermutation. Nature 605, 503–508 (2022).
ADS CAS PubMed PubMed Central Google Scholar
Liu, P. et al. An organismal CNV mutator phenotype restricted to early human development. Cell 168, e7 (2017).
ADS Google Scholar
Du, H. et al. The multiple de novo copy number variant (MdnCNV) phenomenon presents with peri-zygotic DNA mutational signatures and multilocus pathogenic variation. Genome Med. 14, 122 (2022).
CAS PubMed PubMed Central Google Scholar
Girard, S. L. et al. Paternal age explains a major portion of de novo germline mutation rate variability in healthy individuals. PLoS ONE 11, e0164212 (2016).
PubMed PubMed Central Google Scholar
Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012).
ADS CAS PubMed PubMed Central Google Scholar
Gonzales, M. L. & LaSalle, J. M. The role of MeCP2 in brain development and neurodevelopmental disorders. Curr. Psychiatry Rep. 12, 127–134 (2010).
PubMed PubMed Central Google Scholar
Ma, R. et al. A clear bias in parental origin of de novo pathogenic CNVs related to intellectual disability, developmental delay and multiple congenital anomalies. Sci. Rep. 7, 44446 (2017).
ADS CAS PubMed PubMed Central Google Scholar
Ariad, D. et al. Aberrant landscapes of maternal meiotic crossovers contribute to aneuploidies in human embryos. Genome Res. 34, 70–84 (2024).
CAS PubMed PubMed Central Google Scholar
Gear, R. & Savarirayan, R. Osteopathia Striata with Cranial Sclerosis. in GeneReviews® (eds. Adam, M. P. et al.) (University of Washington, Seattle, 1993).
Koenigbauer, J. T. et al. Spectrum of congenital anomalies of the kidney and urinary tract (CAKUT) including renal parenchymal malformations during fetal life and the implementation of prenatal exome sequencing (WES). Arch. Gynecol. Obstet.309, 2613-2622 (2023).
Eigenhuis, K. N., Somsen, H. B. & van den Berg, D. L. C. Transcription pause and escape in neurodevelopmental disorders. Front. Neurosci. 16, 846272 (2022).
PubMed PubMed Central Google Scholar
Hori, K., Shimaoka, K. & Hoshino, M. AUTS2 gene: keys to understanding the pathogenesis of neurodevelopmental disorders. Cells11, https://doi.org/10.3390/cells11010011 (2021).
Zilmer, M. et al. Novel congenital disorder of O-linked glycosylation caused by GALNT2 loss of function. Brain 143, 1114–1126 (2020).
PubMed PubMed Central Google Scholar
Zhuang, W., Ye, T., Wang, W., Song, W. & Tan, T. CTNNB1 in neurodevelopmental disorders. Front. Psychiatry 14, 1143328 (2023).
PubMed PubMed Central Google Scholar
Moffat, J. J., Smith, A. L., Jung, E.-M., Ka, M. & Kim, W.-Y. Neurobiology of ARID1B haploinsufficiency related to neurodevelopmental and psychiatric disorders. Mol. Psychiatry 27, 476–489 (2022).
Gu, S. et al. Alu-mediated diverse and complex pathogenic copy-number variants within human chromosome 17 at p13.3. Hum. Mol. Genet. 24, 4061–4077 (2015).
CAS PubMed PubMed Central Google Scholar
Beck, C. R. et al. Megabase length hypermutation accompanies human structural variation at 17p11.2. Cell 176, e10 (2019).
Google Scholar
Bahrambeigi, V. et al. Distinct patterns of complex rearrangements and a mutational signature of microhomeology are frequently observed in PLP1 copy number gain structural variants. Genome Med. 11, 80 (2019).
CAS PubMed PubMed Central Google Scholar
Carvalho, C. M. B. et al. Replicative mechanisms for CNV formation are error prone. Nat. Genet. 45, 1319–1326 (2013).
CAS PubMed PubMed Central Google Scholar
Ishmukhametova, A. et al. Dissecting the structure and mechanism of a complex duplication-triplication rearrangement in the DMD gene. Hum. Mutat. 34, 1080–1084 (2013).
CAS PubMed Google Scholar
Zuccherato, L. W., Alleva, B., Whiters, M. A., Carvalho, C. M. B. & Lupski, J. R. Chimeric transcripts resulting from complex duplications in chromosome Xq28. Hum. Genet. 135, 253–256 (2016).
PubMed Google Scholar
Carvalho, C. M. B. et al. Interchromosomal template-switching as a novel molecular mechanism for imprinting perturbations associated with Temple syndrome. Genome Med. 11, 25 (2019).
PubMed PubMed Central Google Scholar
Wu, J. et al. EFTUD2 gene deficiency disrupts osteoblast maturation and inhibits chondrocyte differentiation via activation of the p53 signaling pathway. Hum. Genom. 13, 63 (2019).
CAS Google Scholar
Deshmukh, S. & Prashanth, S. Ectodermal dysplasia: a genetic review. Int. J. Clin. Pediatr. Dent. 5, 197–202 (2012).
PubMed PubMed Central Google Scholar
Accogli, A. et al. SHANK3 mutation and mosaic turner syndrome in a female patient with intellectual disability and psychiatric features. J. Neuropsychiatry Clin. Neurosci. 31, 272–275 (2019).
PubMed Google Scholar
Thormann, A. et al. Flexible and scalable diagnostic filtering of genomic variants using G2P with Ensembl VEP. Nat. Commun. 10, 2373 (2019).
ADS PubMed PubMed Central Google Scholar
Bell, A. D. et al. Insights into variation in meiosis from 31,228 human sperm genomes. Nature 583, 259–264 (2020).
ADS CAS PubMed PubMed Central Google Scholar
Charalambous, C., Webster, A. & Schuh, M. Aneuploidy in mammalian oocytes and the impact of maternal ageing. Nat. Rev. Mol. Cell Biol. 24, 27–44 (2023).
CAS PubMed Google Scholar
Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, e19 (2019).
Google Scholar
Pratto, F. et al. Meiotic recombination mirrors patterns of germline replication in mice and humans. Cell 184, e20 (2021).
Google Scholar
Pratto, F., et al. DNA recombination. Recombination initiation maps of individual human genomes. Science 346, 1256442 (2014).
PubMed PubMed Central Google Scholar
Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).
ADS PubMed Google Scholar
Borges-Monroy, R. et al. Whole-genome analysis reveals the contribution of non-coding de novo transposon insertions to autism spectrum disorder. Mob. DNA 12, 28 (2021).
CAS PubMed PubMed Central Google Scholar
Mitsuhashi, S. & Matsumoto, N. Long-read sequencing for rare human genetic diseases. J. Hum. Genet. 65, 11–19 (2020).
PubMed Google Scholar
Kim, J. et al. Patient-Customized Oligonucleotide Therapy for a Rare Genetic Disease. N. Engl. J. Med. 381, 1644–1652 (2019).
CAS PubMed PubMed Central Google Scholar
Collins, R. L. et al. A cross-disorder dosage sensitivity map of the human genome. Cell 185, e25 (2022).
Google Scholar
Raczy, C. et al. Isaac: ultra-fast whole-genome secondary analysis on Illumina sequencing platforms. Bioinformatics 29, 2041–2043 (2013).
CAS PubMed Google Scholar
Larson, D. E. et al. svtools: population-scale analysis of structural variation. Bioinformatics 35, 4782–4787 (2019).
CAS PubMed PubMed Central Google Scholar
Quinlan, A. R. & Hall, I. M. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics 26, 841–842 (2010).
CAS PubMed PubMed Central Google Scholar
Roller, E., Ivakhno, S., Lee, S., Royce, T. & Tanner, S. Canvas: versatile and scalable detection of copy number variants. Bioinformatics 32, 2375–2377 (2016).
CAS PubMed Google Scholar
Robinson, J. T. et al. Integrative genomics viewer. Nat. Biotechnol. 29, 24–26 (2011).
CAS PubMed PubMed Central Google Scholar
Gardner, E. J. et al. The Mobile Element Locator Tool (MELT): population-scale mobile element discovery and biology. Genome Res. 27, 1916–1929 (2017).
CAS PubMed PubMed Central Google Scholar
Chu, C. et al. Comprehensive identification of transposable element insertions using multiple sequencing technologies. Nat. Commun. 12, 3836 (2021).
ADS CAS PubMed PubMed Central Google Scholar
Baca, S. C. et al. Punctuated evolution of prostate cancer genomes. Cell 153, 666–677 (2013).
CAS PubMed PubMed Central Google Scholar
Belyeu, J. R. et al. Samplot: a platform for structural variant visual validation and automated filtering. Genome Biol. 22, 161 (2021).
PubMed PubMed Central Google Scholar
Kwon, M., Lee, S., Berselli, M., Chu, C. & Park, P. J. BamSnap: a lightweight viewer for sequencing reads in BAM files. Bioinformatics 37, 263–264 (2021).
CAS PubMed PubMed Central Google Scholar
Belyeu, J. R., Sasani, T. A., Pedersen, B. S. & Quinlan, A. R. Unfazed: parent-of-origin detection for large and small de novo variants. Bioinformatics 37, 4860–4861 (2021).
CAS PubMed PubMed Central Google Scholar
Van der Auwera, G. A. et al. From FastQ data to high confidence variant calls: the Genome Analysis Toolkit best practices pipeline. Curr. Protoc. Bioinform. 11, 10.1–11.10.33 (2013).
Google Scholar
Mudge, J. M. et al. GENCODE 2025: reference gene annotation for human and mouse. Nucleic Acids Res. 53, D966–D975 (2025).
CAS PubMed PubMed Central Google Scholar
Heger, A., Webber, C., Goodson, M., Ponting, C. P. & Lunter, G. GAT: a simulation framework for testing the association of genomic intervals. Bioinformatics 29, 2046–2048 (2013).
CAS PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank the families and their clinicians for their participation and engagement, and our colleagues who assisted in the generation and processing of data. We would like to thank Ana Lisa Taylor Tavares for helpful discussions and advice. This research was made possible through access to the data and findings generated by the 100,000 Genomes Project. The 100,000 Genomes Project is managed by Genomics England Limited (a wholly owned company of the Department of Health and Social Care). The 100,000 Genomes Project is funded by the National Institute for Health Research and NHS England. The Wellcome Trust, Cancer Research UK, and the Medical Research Council have also funded research infrastructure. The 100,000 Genomes Project uses data provided by patients and collected by the National Health Service as part of their care and support. This research was supported in part by a Wellcome Trust grant. R.R. was supported by RCUK | Medical Research Council (MRC) (MR/W025353/1) and Cancer Research UK (CRUK).

Author information

Authors and Affiliations

Wellcome Sanger Institute, Wellcome Genome Campus, Hinxton, UK
Hyunchul Jung, Tsun-Po Yang, Petr Danecek, O. Isaac Garcia-Salinas, Matthew D. C. Neville, Joseph Christopher, Isidro Cortés-Ciriano, Helen Firth, Matthew Hurles, Peter Campbell & Raheleh Rahbari
Genomics England, London, UK
Susan Walker
Department of Clinical Genetics, Cambridge University Hospitals, Cambridge, UK
Joseph Christopher & Helen Firth
Department of Genomic Medicine, University of Cambridge, Cambridge, UK
Joseph Christopher
European Molecular Biology Laboratory, EBI, Hinxton, Cambridge, UK
Isidro Cortés-Ciriano
Department of Genetics, University of Cambridge, Downing Street, Cambridge, UK
Aylwyn Scally

Authors

Hyunchul Jung
View author publications
Search author on:PubMed Google Scholar
Tsun-Po Yang
View author publications
Search author on:PubMed Google Scholar
Susan Walker
View author publications
Search author on:PubMed Google Scholar
Petr Danecek
View author publications
Search author on:PubMed Google Scholar
O. Isaac Garcia-Salinas
View author publications
Search author on:PubMed Google Scholar
Matthew D. C. Neville
View author publications
Search author on:PubMed Google Scholar
Joseph Christopher
View author publications
Search author on:PubMed Google Scholar
Isidro Cortés-Ciriano
View author publications
Search author on:PubMed Google Scholar
Helen Firth
View author publications
Search author on:PubMed Google Scholar
Aylwyn Scally
View author publications
Search author on:PubMed Google Scholar
Matthew Hurles
View author publications
Search author on:PubMed Google Scholar
Peter Campbell
View author publications
Search author on:PubMed Google Scholar
Raheleh Rahbari
View author publications
Search author on:PubMed Google Scholar

Contributions

R.R. conceived the project. R.R., P.C., and M.H. supervised the project. H.J., T.Y., and R.R. wrote the manuscript; all authors reviewed and edited the manuscript. H.J. and T.Y. led the analysis of the data with help from J.C., P.D., I.G.S., M.D.C.N., and S.W. reviewed the pathogenic complex dnSV candidates. H.F., helped with clinical interpretation of the dnSVs. A.S., M.H., P.C., H.J., and R.R. helped with data interpretation and statistical analysis.

Corresponding authors

Correspondence to Hyunchul Jung or Raheleh Rahbari.

Ethics declarations

Competing interests

P.J.C. is a co-founder, shareholder, and consultant for Quotient Therapeutics, M.E.H. is a co-founder of, consultant to and holds shares in Congenica, a genetics diagnostic company. All remaining authors declare no competing interests.

Peer review

Peer review information

Nature Communications thanks the anonymous reviewers for their contribution to the peer review of this work. A peer review file is available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Description of Additional Supplementary Files (download PDF )

Supplementary Data 1 (download XLSX )

Supplementary Data 2 (download XLSX )

Supplementary Data 3 (download XLSX )

Supplementary Data 4 (download XLSX )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Jung, H., Yang, TP., Walker, S. et al. Complex de novo structural variants are an underestimated cause of rare disorders. Nat Commun 16, 9528 (2025). https://doi.org/10.1038/s41467-025-64722-2

Download citation

Received: 05 August 2024
Accepted: 25 September 2025
Published: 03 November 2025
Version of record: 03 November 2025
DOI: https://doi.org/10.1038/s41467-025-64722-2