Introduction

Structural variations (SVs)—defined as genomic alterations larger than 50 base pairs (bp) and including duplications, deletions, inversions, translocations, and insertions—constitute the majority of variable nucleotides in the human genome and contribute significantly to phenotypic diversity1. Although exome sequencing (ES) enables the detection of copy number (CN) variants (CNVs)2, it frequently fails to capture SV breakpoints, primarily because of the limited coverage of capture regions and technical limitations such as polymerase chain reaction (PCR) and hybridization biases in GC-rich regions, as well as poor alignment in low-complexity regions.

Even with genome sequencing, SV detection remains challenging because of the wide range of SV types and sizes and their frequent localization within repetitive genomic regions. These complexities hinder both sensitivity and specificity in SV calling, especially for structurally complex and repetitive regions. As a result, many pathogenic SVs likely remain undetected, and the reliable identification of such variants from short-read sequencing data continues to be a major technical and computational challenge3,4.

Several methodologies have been used for SV detection, including conventional karyotyping, fluorescence in situ hybridization, chromosomal microarray analysis, and long-read sequencing. More recently, optical genome mapping (OGM) has emerged as a promising technique for the comprehensive analysis of SVs. Unlike sequencing-based methods, OGM uses high-resolution fluorescence imaging of ultra-high molecular weight DNA to detect structural changes without nucleotide-level sequencing. Long-read sequencing produces reads typically ranging from 10 to 100 kilobases (kb), whereas OGM analyzes DNA molecules exceeding 150 kb, providing broader genome coverage (>80×), albeit at lower breakpoint resolution3,5,6.

OGM identifies SVs through the de novo assembly of fluorescently labeled DNA molecules using a specific six-base sequence motif (CTTAAG), which occurs on average every 5 kb across the genome7,8,9. This method enables the detection of a wide range of structural alterations, including insertions and deletions (>500 bp), duplications (>50 kb), inversions and translocations (>50–70 kb), chromosomal aneuploidy, absence of heterozygosity, triploidy, and copy-neutral events10,11. In this regard, OGM may outperform traditional sequencing-based platforms for identifying large and complex SVs.

Recent genomic studies have demonstrated that ~0.5–4% of individuals with rare diseases harbor dual or triple molecular diagnoses, in which a single diagnosis explains only part of the observed phenotype12. These findings emphasize the importance of comprehensive SV discovery beyond CNVs, because additional SVs may be responsible for the unexplained portion of a phenotype. It is therefore critical to identify full SV configurations, rather than mere CN changes, for accurate genotype–phenotype correlations and clinical diagnoses. In the present study, we aimed to use OGM to comprehensively characterize constitutional SVs that were initially identified as CNVs using ES. Furthermore, we used targeted Oxford Nanopore long-read sequencing with adaptive sampling (AS) to resolve the breakpoint structures of these newly discovered SVs at nucleotide resolution.

Results

We applied OGM to 30 cases (Table 1 and Fig. S1; case numbers are listed together with the Saphyr serial numbers assigned by our laboratory) and identified only the expected ES-identified CNVs in 16 cases (Cases 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 16, 19, 25, and 28). By contrast, previously undetected SVs were revealed in the remaining 14 cases (Fig. S1 and Tables 2 and 3). The newly discovered SVs in the latter 14 cases included: three translocations (Cases 15, 17, and 26), three direct duplications (Cases 7, 18, and 24), one duplication with inversion (Case 6), two inverted duplications (Cases 23 and 27), three intrachromosomal insertion events with inverted duplications (Cases 20, 21, and 22), one chromoplexy-like event (Case 29), and one chromothripsis (Case 30). The variant allele frequency (VAF), confidence, and supporting molecules of all SV calls identified by OGM are shown in Table S2. Confidence in OGM was defined as a normalized confidence score between 0 and 1; higher values indicate higher confidence. It has been reported that intrachromosomal fusions often exhibit lower confidence scores because of a higher probability of similar labeling patterns occurring by chance within the same chromosome13. Although intrachromosomal fusions also had low OGM confidence scores (of ~0.1–0.3) in the present study, SV calls were judged to be highly reliable from supporting molecules.

Table 1 Summary of the phenotypes and CNVs of patients in this study
Table 2 Overview of structural variants identified by OGM in this study
Table 3 Overview of chromothripsis in Case 30

To further characterize the SVs in all 30 cases, we used Oxford Nanopore targeted long-read sequencing with AS (Table S3). This approach allowed us to precisely determine the SV breakpoints at the nucleotide level and generate dot plots to visualize the identified SVs.

OGM and long-read sequencing provide complementary and comprehensive detection of diverse SVs, including those invisible to ES. The representative cases demonstrated that integrating optical and sequence-level data enhances both the diagnostic yields and the mechanistic understanding of SV formation. Here, we present findings obtained through OGM analysis and long-read sequencing with AS. A detailed analysis of each case is provided in the Supplementary Information.

Deletions (Cases 1, 2, 3, 4, 5, 8, 9, 10, 11, 12, 13, 14, 16, 19, and 25)

We performed OGM in 16 cases with ES-based CN losses, and detected only simple (expected) deletions in 15 cases; however, a chromoplexy-like event was identified in one case (Case 29) (Table 2). Breakpoints at nucleotide resolution were successfully determined in 11 of the 15 cases with simple deletions: interstitial deletions in Cases 3, 8, 9, 10, 11, 12, 14, 16, 19, and 25, and a deletion involving 4q telomeric sequences (TTAGGG) in Case 2 (Figs. S2 and S3).

Among these 11 cases, although ES-based analysis did not identify haploinsufficient genes, we identified SUMO1 (pHaplo score = 0.90014, ClinGen haploinsufficiency [HI] score = not recorded15) in Case 12 and COL11A1 (pHaplo score = 0.89414, HI score = 115) in Case 25 (Supplementary Data 115,16). SUMO1 deletion causes orofacial cleft (OMIM# 613705) and SUMO1 haploinsufficiency has been associated with cleft lip and palate, consistent with the phenotype observed in Case 1217,18. In Case 25, ES had already identified KCNA2 (OMIM# 616366) as a cause of seizures and global developmental delay19,20. Our analysis additionally revealed COL11A1, whose haploinsufficiency causes Marshall (OMIM# 154780) and Stickler (OMIM# 604841) syndromes, both of which are associated with abnormal facial features, failure to thrive, and short stature, as observed in Case 2521,22. Upon reviewing the ES data, we identified the same regions as simple CN losses in ES data; however, the corresponding genes had not been called by the eXome Hidden Markov Model (XHMM). These cases underscore the importance of OGM and long-read sequencing and of reviewing the ES data, thus demonstrating its limitations.

CTBP1 was disrupted by the deletion breakpoint in Case 2 (Table 2). CTBP1 aberration is associated with hypotonia, ataxia, developmental delay, and tooth enamel defect syndrome (OMIM# 617915) in an autosomal dominant fashion, and may influence the neurological features observed in this case23. Long-read analysis enabled us to determine the precise location at which the gene was disrupted.

In Cases 1, 4, 5, and 13, although deletions were identified using OGM (Table 2 and Fig. S4), the breakpoint determination of deletions using long-read sequencing was unsuccessful. Segmental duplications were exactly mapped to deletion breakpoints, and long-read sequencing cannot cover these segmental duplications (Fig. S4). Cases 1, 4, 5 and 13 are described in the Supplementary information.

Duplication (Case 28)

We identified an ES-based CN gain in Case 28. OGM detected the expected duplication at the terminal region of chromosome 2q (Fig. S5). However, we were unable to fully characterize the duplication using both OGM and long-read sequencing because of the presence of many repetitive sequences at the terminal region of the chromosome. Case 28 is described in detail in the Supplementary information.

Unbalanced translocations (Cases 15, 17, and 26)

In Cases 15, 17, and 26, unbalanced translocations—characterized by combined terminal CN gains and CN losses—were detected using ES. OGM also identified unbalanced translocations in these three cases.

Case 15 presented clinically with West syndrome, hypoplasia of the corpus callosum, and aortic regurgitation. In this case, an unbalanced translocation was detected using OGM and AS, with a 3.5-Mb deletion at 1p36.33–p36.32 and a 3.8-Mb duplication at 17p13.3–p13.2 (Fig. 1a, b, Table 2, and Table S4). The deletion encompassed GABRD (pHaplo score = 0.80714, HI score = 315), which is associated with susceptibility to generalized epilepsy with febrile seizures plus, type 5 (GEFSP5) (OMIM# 613060) and has been reported to cause epileptic symptoms in 1p36 deletion syndrome24 (Table 1 and Fig. 1b).

Fig. 1: Results of OGM and long-read sequencing in Cases 15 and 18.
Fig. 1: Results of OGM and long-read sequencing in Cases 15 and 18.The alternative text for this image may have been generated using AI.
Full size image

af and gi are generated from Cases 15 and 18, respectively. a Circos plot of OGM shows unbalanced translocation involving Chr 17 and 1. b OGM results in the genome browser. In Chr 17, a large duplication is visualized as a blue bar in the CNV call track; PAFAH1B1 was located in the duplication. Deletion of Chr 1 should have been shown as red bars in the CNV call track, but is not shown in this case despite CN losses of 0.9–1.1; GABRD was located within the deletion. Translocation breakpoints are shown in purple on the sample map between two matching chromosomes. The numbers are the breakpoint positions. Arrows A and B indicate that these maps are the forward orientations to the reference genome. c Dot-plot of the consensus sequence from long-read sequencing data. Red lines show the forward orientation to the reference sequences. Vertical pale stripes show annotations in the reference genome with different colors: transposable elements (pale blue, reverse-strand; pale pink, forward-strand), tandem repeats (purple), protein-coding sequences (dark green), and exons (green). The numbers below the dot-plot are breakpoint positions. d Electrophoresis image of polymerase chain reaction products. e Scheme of translocation and primers. f Upper and lower diagrams show Sanger sequencing electropherograms of der(1) and der(17), respectively, at breakpoints. Rearrangement junction sequence (middle line) and matching reference sequence (top and bottom lines) are shown in pink and sky blue. der(1) includes a 3-bp insertion and der(17) includes a 2-bp insertion, as shown in black. g Large duplication is visualized as blue bars in the CNV call track and a blue line in the SV call track in the genome browser. h Dot-plot of the consensus sequence containing the direct duplication region created from long-read sequencing data using dnarrange. i Scheme of direct duplication. ISCN nomenclature in the two cases is described in Table S4. Circos plots and the genome browsers were visualized by Bionano Solve 3.8.1 and Bionano Access 1.8.1 for OGM. Dot-plots were created by lamassemble 1.4.2 and dnarrange for targeted long-read sequencing. Chr, chromosome, CN copy number, CNV CN variant, ISCN International System for Human Cytogenomic Nomenclature, SV structural variant.

Although no known triplosensitive genes were initially identified within the 17p13.3–p13.2 duplication, precise mapping using OGM identified that this duplication also encompassed PAFAH1B1, with pTriplo score = 0.99814 and ClinGen triplosensitivity [TS] score = 015 (Supplementary Data 1). Duplications of PAFAH1B1 have been linked to West syndrome25, seizures, and a range of structural brain anomalies, including abnormalities of the corpus callosum as a part of 17p13.3 duplication syndrome26. Thus, the duplicated PAFAH1B1 may account for the hypoplasia of the corpus callosum that was observed in this patient. To confirm the origin of this unbalanced translocation, breakpoint-spanning PCR was performed using long-read sequencing data (primers listed in Table S5); this confirmed its paternal origin (Fig. 1d, e). Sanger sequencing of the translocation breakpoints further identified a 3-bp insertion on der(1) and a 2-bp insertion on der(17) (Fig. 1f). Although TP73 was disrupted by a translocation (Table 2), none of the disrupted genes were associated with the clinical features observed in Case 15.

The same analyses using OGM were also performed for Cases 17 (Fig. S6) and 26 (Fig. S7); however, no additional gene aberrations relevant to the phenotype were detected using OGM. Cases 17 and 26 are described in the Supplementary information.

Direct duplications (Cases 7, 18, and 24)

OGM identified direct duplications in three cases (Fig. 1g and Fig. S8a, d), which had previously been detected by ES. Long-read sequencing data confirmed the breakpoints in these cases and enabled the reconstruction of direct duplications through dot plots (Fig. 1h and Fig. S8b, e). This approach revealed no additional aberrant genes or gene disruption relevant to the phenotype. Cases 7, 18, and 24 are described in detail in the Supplementary information.

Duplication with inversion (Case 6)

In Case 6, we newly identified a 6.2-Mb inversion at 21q22.2–q22.3 and a 218-kb duplication at 21q22.2 using OGM with AS. The CN gain had previously been detected using ES (Tables 2 and S4). OGM analysis revealed that the inversion was located immediately downstream of the duplication (Fig. 2a), thus highlighting a limitation of ES, which typically fails to detect breakpoints outside coding regions. Furthermore, OGM analysis revealed that a CN loss previously identified by ES was the deletion from 21q22.3 to 21qter, located downstream of this inversion, with the 3′ end of the inversion corresponding to the end of the long arm of chromosome 21 (Fig. 2a).

Fig. 2: Results of OGM and long-read sequencing in Cases 6 and 23.
Fig. 2: Results of OGM and long-read sequencing in Cases 6 and 23.The alternative text for this image may have been generated using AI.
Full size image

ac and df are generated from Cases 6 and 23, respectively. a OGM results in the genome browser. Large duplication is visualized as a blue bar in the CNV call track and inversion is visualized at the end of arrow B. Downstream of the inversion (21q22.3–qter) shows CN losses of 0.9–1.74 in the genome browser, and the 3′ end of the inversion is the end of the long arm of Chr 21. b Dot-plot of duplication and inversion regions created from long-read sequencing data using dnarrange. c Scheme of SVs. d OGM results in the genome browser. Large deletion and duplication are visualized as red and blue bars, respectively, in the CNV call track. Inverted duplication is indicated by a blue line in the SV call track and is shown in the sample map below. NFIB is located at the boundary between deletion and duplication. e Dot-plot of the consensus sequence from long-read sequencing data. f Scheme of the inverted duplication. ISCN nomenclature of the two cases is described in Table S4. The genome browsers were visualized by Bionano Solve 3.8.1 and Bionano Access 1.8.1 for OGM. Dot-plots were created by lamassemble 1.4.2 and dnarrange for targeted long-read sequencing. Chr chromosome, CN copy number, CNV CN variant, ISCN International System for Human Cytogenomic Nomenclature, SV structural variant.

Long-read sequencing data further illustrated these SVs through dot-plot reconstruction (Fig. 2b). The CN loss, previously identified by ES, encompassed COL6A1 (pHaplo score = 0.94214, HI score = not registered15), whose haploinsufficiency causes Bethlem myopathy 1A (OMIM# 158810) or Ullrich congenital muscular dystrophy 1A (OMIM# 254090), and COL6A2 (pHaplo score = 0.59314, HI score = not registered15), whose deletion results in Bethlem myopathy 1B (OMIM# 620725) or Ullrich congenital muscular dystrophy 1B (OMIM# 620727). In addition, the deletion of DIP2A (pHaplo score = 0.91614, HI score = 115) is associated with neurodevelopmental disorder27 and autism28 (Table 1 and S3). However, no additional aberrant haploinsufficiency or triplosensitive genes associated with hereditary spastic paraplegia or autism were identified within these regions using OGM (Supplementary Data 1). Although TMPRSS2 and COL18A1 were disrupted by the SVs (Table 2), neither has been associated with hereditary spastic paraplegia or autism.

Inverted duplications (Cases 23 and 27)

OGM identified inverted duplications in Cases 23 and 27, involving complex rearrangements that combined deletions and duplications (Fig. 2d and Fig. S9a). Long-read sequencing successfully confirmed the breakpoints in both cases (Fig. 2e and Fig. S9b).

In Case 23, who was clinically suspected to have Cornelia de Lange syndrome, OGM revealed a 14.3-Mb deletion and an 851-kb duplication at 9p22.3, resulting in inverted duplication (Table 2 and Table S4). Notably, NFIB (pHaplo score = 0.99914, HI score = 315) was disrupted at intron 1 by the SV (Fig. 2d, Table 2, and Supplementary Data 1). Haploinsufficiency of NFIB causes macrocephaly with impaired intellectual development (OMIM# 618286) as well as intellectual disability and dysmorphic features29,30. These clinical features are consistent with those observed in Case 23, including intellectual disability and dysmorphic features.

Case 27 involved an inverted duplication rearrangement detected by OGM with AS (Table 2 and Supplementary Data 1), similar to a previously reported case31. Within the 18.5-Mb deletion at 9p22.1–24.3, both NFIB and SMARCA2 were identified (Fig. S9b and Supplementary Data 1). As previously mentioned, NFIB haploinsufficiency can contribute to intellectual impairment, dysmorphic facial features, and a high palate (Table 1). SMARCA2 (pHaplo score = 0.99914, HI score = 015) is associated with blepharophimosis-impaired intellectual development syndrome (OMIM# 619293) and Nicolaides–Baraitser syndrome (OMIM# 601358) in an autosomal dominant manner, and its intragenic deletions reportedly cause intellectual disability, seizures, and dysmorphic features32,33. Although SMARCA2 deletion has not been clearly established as a distinct deletion syndrome, it has been previously reported with limited clinical information34. By contrast, the adjacent 19.7-Mb duplication did not contain any known genes associated with human diseases (Supplementary Data 1). Notably, although NFIB-AS1 and ADAMTSL1 were disrupted by inverted duplication in each of Cases 23 and 27 (Table 2), neither gene was associated with the clinical features observed in these cases.

Except in rare cases where one breakpoint coincides with an inverted duplication and sequencing reads spanning two different exons are present, ES is generally incapable of identifying inverted duplications and their breakpoints, because orientation information is not available. By contrast, the present approach yielded meaningful findings that were linked to clinical symptoms in these two cases.

Intrachromosomal insertion events with inverted duplications (Cases 20–22)

Using OGM with AS, we identified novel intrachromosomal insertion events and inverted duplications in three cases (Cases 20–22) in whom CN gains or combined CN losses and CN gains had initially been detected by ES. OGM revealed a range of complex structural rearrangements in these cases.

Case 20 involved an inverted duplication, a deletion, and an intrachromosomal fusion (Fig. S10, Table 2, and Table S4). In Case 21, an inverted duplication and two intrachromosomal fusions were identified (Fig. S11, Table 2, and Table S4). Case 22 exhibited an inverted duplication and an intrachromosomal fusion (Fig. 3, Table 2, and Table S4).

Fig. 3: Results of OGM and long-read sequencing in Case 22.
Fig. 3: Results of OGM and long-read sequencing in Case 22.The alternative text for this image may have been generated using AI.
Full size image

a The entire SV of Chr 18 (upper diagram) and its enlargement within the dotted square (lower diagram) in the genome browser. Large duplications and deletions are visualized as blue and red squares, respectively, in the CNV call track, and an inverted duplication is visualized as a blue line in the SV call track of the genome browser. Arrows A and B indicate intrachromosomal fusion, and arrows B and B’ indicate inverted duplication. b Dot-plot of the consensus sequence containing the inverted duplication and intrachromosomal fusion regions created from long-read sequencing data using dnarrange. c Scheme of SVs. ISCN nomenclature is described in Table S4. The genome browser was visualized by Bionano Solve 3.8.1 and Bionano Access 1.8.1 for OGM. Dot-plot was created by lamassemble 1.4.2 and dnarrange for targeted long-read sequencing. Chr chromosome, CNV CN variant, ISCN international system for human cytogenomic nomenclature, OGM optical genome mapping, SV structural variant.

Regarding these chromosomal rearrangements, Case 20 included an 8.4-Mb duplication at Xq21.33–q22.3, whereas Case 21 harbored a 439-kb duplication at Xq22.2, as detected by OGM. In both cases, the newly resolved SVs identified by OGM and long-read sequencing did not reveal any additional pathogenic variants that might explain the patients’ clinical features (Supplementary Data 1). Cases 20 and 21 are described in detail in the Supplementary information.

By contrast, Case 22 exhibited a more complex pattern involving a 10.8-Mb duplication at 18p11.32–p11.22, a 2.8-Mb duplication at 18q21.31–21.33, and an 18.9-Mb deletion at 18q21.33–q23 (Fig. 3a). Within the deleted 18q21.33–q23 region, ES identified TSHZ1 and NFATC1, which have been associated with auricular deformities, hearing impairment35,36, and congenital heart disease37,38 (Table 1 and Supplementary Data 1). Nevertheless, in the newly characterized SVs using OGM and long-read sequencing, no additional candidate pathogenic genes were identified that might account for the patient’s clinical presentation, which included features potentially resembling Cornelia de Lange syndrome (Supplementary Data 1). Additionally, PIEZO2, which is associated with Marden–Walker syndrome (OMIM# 248700) through an autosomal dominant mechanism, was disrupted by SVs in Case 22 (Table 2). PIEZO2 has also been reported as associated with Cornelia de Lange syndrome39, and may be involved in the phenotype observed in this case.

Notably, the intrachromosomal fusions in these cases were positioned adjacent to their respective CN gains—a feature that was missed by ES because of the copy-neutral nature of these SVs. Furthermore, three cases exhibited complex multi-breakpoint events, meaning that it was difficult to understand the structural reconstruction using long-read sequencing alone; OGM, which enables the visualization and overview of complex SVs, was therefore extremely helpful.

Chromoplexy-like event in Case 29

In Case 29, who presented with Ehlers–Danlos syndrome, joint laxity, arthritis, and mild intellectual disability, ES previously identified a 5.2-Mb CN loss at 12q21.1–q21.2. OGM further revealed an intrachromosomal insertion event and inversion within chromosome 12, an inversion within chromosome 14, and a translocation between chromosomes 12 and 14 (Fig. 4a, b). Long-read sequencing precisely determined all of the breakpoints (Fig. 4c, d), confirming a chromoplexy-like event between chromosomes 12 and 14 (Fig. 4l and Table S4). Chromoplexy refers to a large-scale and complex formation of chain rearrangements that affects multiple chromosomes. This phenomenon is characterized by the interdependent occurrence of multiple inter- and intrachromosomal translocations and deletions40. Chromoplexy leads to the generation of derivative chromosomes that present little or no CN alterations, and Case 29, with CN loss, presented a chromoplexy-like event. Conversely, chromothripsis is defined as a relatively clustered chromosomal shattering followed by a random restitching of chromosomal fragments, which results in the formation of complex genomic rearrangements40. Chromothripsis was observed in Case 30.

Fig. 4: Results of OGM and long-read sequencing in Case 29.
Fig. 4: Results of OGM and long-read sequencing in Case 29.The alternative text for this image may have been generated using AI.
Full size image

a The entire SV of Chr 12 (upper diagram) and its enlargement within the dotted square (lower diagram) in the genome browser. Large deletion is visualized as a red bar in the CNV call track, and inversion is visualized as blue lines in the SV call track at both ends of the deletion. Arrow C’ in the sample map is a fragment of arrow C in the reference. Arrow B indicates inversion and arrows B and C indicate an intrachromosomal insertion event. Yellow sample maps lead to maps of Chr14 (shown in Fig. 4b). b The entire SV of Chr14 (upper diagram) and its enlargement within the dotted square (lower diagram) in the genome browser. Arrow D indicates inversion, shown as blue lines in the SV call track. Yellow sample maps lead to maps of Chr 12 (shown in Fig. 4a). c, d Dot plots of the consensus sequence in derivative Chr 12 (c) or 14 (d) created from long-read sequencing data using dnarrange. Arrows C–D and F–G indicate reciprocal translocation between Chr 12 and 14. e Scheme of reciprocal translocation. ISCN nomenclature is described in Table S4. The genome browsers were visualized by Bionano Solve 3.8.1 and Bionano Access 1.8.1 for OGM. Dot plots were created by lamassemble 1.4.2 and dnarrange for targeted long-read sequencing. Chr chromosome, CNV CN variant, ISCN international system for human cytogenomic nomenclature, OGM optical genome mapping, SV structural variant.

The chromoplexy-like event involved chromosomes 12 and 14, with reciprocal inverted insertions, partial deletions, and inversion of multiple segments. Despite the complex rearrangement, no clearly haploinsufficient gene associated with the patient’s phenotype was identified within the deleted region (Supplementary Data 1). PTPRR and TRHDE were disrupted by a reciprocal translocation (Table 2), but neither was associated with the clinical features observed in Case 29. The derivative chromosome 12 harbored the 5.2-Mb deletion at 12q21.1–q21.2, which was joined to chromosome 14 without a CN change. Therefore, although the CN loss was detectable by ES-based CNV analysis, the reciprocal translocation itself was not identified. Because the deleted region did not encompass any previously known disease-associated genes that would explain the clinical features, this case clearly demonstrates the limitations of ES for detecting complex SVs and highlights the added value of OGM and long-read sequencing in cases with no apparent clinical features.

Chromothripsis (Case 30)

In Case 30, who presented with vitreoretinal degeneration, short stature, growth delay, abnormal facial shape, calcified skin lesions, and ventricular septal defect, ES detected eight CN gains in chromosome 22, implying chromothripsis (Fig. 5a). The chromothripsis is thought to arise when one homolog of chromosome 22 undergoes aberrant segregation due to anaphase-lagging and becomes encapsulated within a micronucleus. Extensive DNA damage incurred within the micronucleus is followed by erroneous reassembly upon reincorporation into the primary nucleus, resulting in abnormal chromosomal rearrangements41. Based on ES data, the CNV sizes of the eight CN gains at 22q12.1–q13.33 ranged from 107–437 kb. OGM analysis revealed six intrachromosomal fusions, one deletion, one inversion, and one inverted duplication with ogm (22q)cth at 22q11.21–q13.33 (Fig. 5b, Fig. S11, and Table 3). Although we attempted to reconstruct the full spectrum of SVs based on the OGM results, the SVs identified by OGM did not completely match those detected by dnarrange using long-read sequencing data. We therefore manually evaluated the breakpoint sequences using long-read sequencing data, considering the orientations of the SVs and CNVs called using OGM (Fig. 5c). The CNV sizes of the eight duplications in a dot-plot of long-read sequencing data were 390 kb at 22q11.21, 419 kb at 22q12.1–q12.2, 460 kb at 22q12.2, 238 kb at 22q12.3, 884 kb at 22q13.1, 335 kb at 22q13.1, 466 kb at 22q13.1, and 625 kb at 22q13.31; all eight of these were revealed to be three CN gains. These regions included 6, 9, 7, 4, 31, 15, 14, and 7 genes, respectively (Supplementary Data 1); however, none of the genes were associated with known phenotypes. In addition, six genes were disrupted by the chromothripsis (Table 3), but none of these were associated with the clinical features observed in Case 30. We also investigated the pathogenicity of triplosensitive regions within all eight duplications in case 30 by using ClinGen. Duplications of the 22q11.2 region (TS score = 115) have been reported as causing developmental delay, dysmorphisms, and multiple congenital abnormalities42, suggesting they may cause the growth delay and abnormal facial shape observed in case 30. However, no cause for other phenotypes in this case could be identified. Ultimately, the combined analysis of OGM and targeted long-read sequencing provided a comprehensive and accurate identification of the chromothripsis (Table S4).

Fig. 5: Results of the CNV analysis using ES data, OGM, and long-read sequencing in Case 30.
Fig. 5: Results of the CNV analysis using ES data, OGM, and long-read sequencing in Case 30.The alternative text for this image may have been generated using AI.
Full size image

a CNV analysis of XHMM using ES data detected eight CN gains in Chr 22, indicated by blue bars (I–VIII). The x-axis shows physical positions, and the y-axis shows z-scores. b Circos plot of Chr 22 in OGM. Intrachromosomal fusions are indicated by magenta lines with numbers 1, 2, 4, and 6–8, corresponding to Maps 1, 2, 4, and 6–8, respectively, in Fig. S12. An inversion is indicated as a light blue dot with the number 3 (corresponding to Map 3 in Fig. S12), and an inverted duplication is indicated as a purple dot with the number 5 (corresponding to Map 5 in Fig. S12) in the SV track. Blue lines in the CNV track indicate five duplications (II, III, V, VII, and VIII), one of which contains a different duplication (VI), making a total of six duplications that correspond to those in (a). Two duplications (I and IV) should have been originally shown as blue bars in the CNV call track, but these two duplications are not shown in this case despite CN gains of 2.58–2.94 and 2.36–2.62, respectively. c Dot-plot of chromothripsis manually connected from long-read sequencing data using dnarrange. This dot-plot was created by linking each dot-plot from Maps 1–8 of Fig. S12; the numbers (1–8) and letters (A–J) in the dot-plot correspond to the map numbers (Maps 1–8) and letters (A–J) in Fig. S12. The blue areas (I–VIII) show eight CN gains that correspond to those in (a). The sizes of the eight CN gains from the long-read sequencing data are shown below the dot-plot. The top of the dot-plot shows genes located in this region, and the six genes in red squares were disrupted by the chromothripsis. ISCN nomenclature is described in Table S4. Circos plot was visualized by Bionano Solve 3.8.1 and Bionano Access 1.8.1 for OGM. Dot-plot was created by lamassemble 1.4.2 and dnarrange for targeted long-read sequencing. Chr chromosome, CNV CN variant, ES exome sequencing, ISCN international system for human cytogenomic nomenclature, OGM optical genome mapping, SV structural variant.

Discussion

We identified novel SVs using OGM with AS in 14 of 30 cases (Cases 6, 7, 15, 17, 18, 20–24, 26, 27, 29, and 30) in whom CNVs had previously been detected using ES. These SVs were undetectable by ES alone because of unstable genomic coverage, limited fragment size, and analytical limitations. ES typically misses SVs with breakpoints outside coding regions and cannot resolve the configuration of CN gains or unbalanced translocations inferred from telomeric CNVs. By contrast, OGM uses ultra-high molecular weight DNA and genome-wide mapping to detect large and complex SVs—including inversions, inverted duplications, and intrachromosomal fusions—outside ES target regions (Cases 6 and 20–22) and within repetitive regions (Case 26). Importantly, OGM with AS revealed disruptions (Cases 2 and 22) and CN changes (Cases 12, 15, 23, 25, and 27) in genes that are likely associated with clinical features (Table 2 and Supplementary Data 1) (7/30 cases, 23.3%), clearly demonstrating the utility of OGM with AS following CNV detection.

Most novel SVs were observed in cases involving CN gains (Cases 6, 7, 15, 17, 18, 20–24, 26, 27, and 30), with only one additional SV found in a case of CN loss (Case 29). Notably, in cases in which ES detected both CN gain and CN loss, OGM clarified the underlying events as unbalanced translocations (Cases 15, 17, and 26) or inverted duplications (Cases 23 and 27), which were undetected by ES. These findings suggest that such CNV patterns should raise a suspicion of hidden SVs, warranting further investigation with OGM.

Compared with ES, OGM achieved a superior detection of balanced (copy neutral) and complex (inversions, intrachromosomal fusions, chromoplexy, or chromothripsis) SVs that lacked CN changes, whereas ES remained sensitive to CNVs involving coding regions. Long-read sequencing complemented OGM by defining nucleotide-level breakpoints, although its limited read length sometimes prevented full structural reconstruction. Thus, OGM provides a genome-wide structural overview, whereas long-read sequencing ensures breakpoint precision; together, they offer comprehensive SV characterization.

OGM also enables CNV detection through molecule coverage-based analysis, to identify CNVs larger than 500 bp43. In the present study, OGM detected CNVs ranging from 238 kb to 18.9 Mb. Although pathogenic genes were identified within these CNVs in six cases (Cases 12, 15, 23, 25, and 27), they had already been included within regions detected by ES, suggesting that ES can capture certain pathogenic CNVs if they are covered by baits.

Several factors may explain why some haploinsufficient genes were not captured by ES. Large CNVs spanning multiple genes can dilute read-depth changes, and capture efficiency varies across exons because of GC-rich or probe hybridization biases. Although the affected genes are generally covered by baits, exonic read-depth variation or pipeline filtering thresholds can lead to missed calls. In our laboratory, CNV detection was performed using an XHMM-based pipeline that was optimized for batch- and sex-specific analysis to minimize sequencing bias; however, this process was not perfect2,44.

ES was performed between 2015 and 2022 using standardized pipelines from our laboratory. Although the retrospective visual inspection of BAM alignments was conducted for selected loci, no clear breakpoint-supporting reads were observed. The undetected SVs are therefore likely attributable to the limitations of read depth–based algorithms rather than sequencing artifacts. Previous studies have revealed that incorporating one-end–anchored split-read mappings into ES analysis can reveal SVs that are otherwise missed by standard read-depth approaches45. Furthermore, CNV and SV calling software increasingly combines multiple data signatures—including read-pair, split-read, read-depth, and assembly-based methods—for improved detection accuracy and breakpoint precision46. Together, these findings suggest that further improvements in ES pipelines, such as the integration of read-pair or split-read analysis, may enhance SV detection sensitivity.

OGM and long-read sequencing detect SVs through fundamentally different mechanisms. Although long-read sequencing provides single-nucleotide resolution and accurate breakpoint determination, it is constrained by physical read lengths (10–100 kb), often requiring multiple reads to reconstruct complex SVs6. By contrast, OGM analyzes molecules spanning hundreds of kilobases, enabling the better characterization of biallelic CNVs and phasing of CN changes47.

For complex rearrangements with multiple breakpoints—such as intrachromosomal insertion events and inverted duplications (Cases 20–22), chromoplexy-like events (Case 29), or chromothripsis (Case 30)—OGM was valuable for structural delineation, with long-read sequencing used for precise breakpoint resolution. In Case 30, discrepancies in breakpoint orientation and CN between OGM and long-read sequencing illustrated the difficulty of resolving such events with sequencing alone. The genome-wide view of OGM and its ability to detect both CNVs and SVs are particularly advantageous for analyzing chromoanagenesis, including chromothripsis48.

In the present study, OGM was particularly advantageous for delineating complex multi-breakpoint events (Cases 20–22, 29, and 30) in which complete reconstruction was difficult to perform using long-read sequencing alone. However, for translocation, direct duplications, or inverted duplications, long-read sequencing alone may suffice. OGM and long-read sequencing can also be complementary: the former provides superior structural mapping, whereas the latter is better at refining breakpoints.

Compared with ES, which primarily targets coding regions and relies on read depth–based CNV detection, combined OGM and long-read sequencing offers genome-wide coverage and direct breakpoint visualization, thereby enabling the accurate identification of balanced or complex SVs that are invisible to ES. In contrast to short-read whole genome sequencing (srWGS), which can capture small CNVs and SNVs genome-wide but often struggles to resolve long-distance rearranged regions, the integration of OGM and long-read sequencing overcomes these limitations by providing both optical- and nucleotide-level mapping continuity.

All 14 SVs detected using OGM were validated with high accuracy. However, the resolution limit (~5 kb) of OGM means that smaller variants may be missed6,49. Breakpoint positions between OGM and long-read sequencing differed by 0.5–385.5 kb (Tables 2 and 3), often because of unmapped regions in OGM data. Larger unmapped regions correlated with greater discrepancies. Conversely, smaller gaps allowed near-precise breakpoint localization.

In Case 26, in which all reads included repetitive sequences, OGM accurately detected SVs, whereas long-read sequencing yielded mismapped reads, failing to detect the true breakpoints. This highlights the strength of OGM in repetitive and complex regions, leveraging long-range DNA molecules without sequence-based alignment11,50.

Despite its advantages, OGM cannot fully resolve SVs in centromeres and telomeres11. Whole-arm rearrangements such as Robertsonian translocations or isodicentric chromosomes remain challenging because of repetitive elements and gaps in the reference genome3,43,48. Additionally, telomere fusions may obscure insertion sites43. However, complete chromosome assemblies (e.g., of the X chromosome) are becoming feasible via the integration of T2T reference genomes and OGM51,52.

Recent advances in genomic technologies have exhibited a clear trajectory, from ES to srWGS and onward to long-read sequencing and OGM. For example, in a large cohort of 722 undiagnosed patients with inherited retinal disease, a comparison of ES and srWGS demonstrated that srWGS using SV scanning identified molecular diagnoses in 4.29% of cases53, demonstrating an added—but modest—gain over ES. More strikingly, a recent long-read whole genome sequencing (lrWGS) study of 34 exome-negative autosomal recessive families achieved a 38% diagnostic yield54, thus underscoring the major improvement in SV detection afforded by long reads. Our study built on this idea by integrating OGM and long-read sequencing, and demonstrated that this combined approach was able to resolve SVs—including balanced and complex rearrangements—that both ES and srWGS often miss. Specifically, in our cohort, the combined method delineated multi-breakpoint events that long-read sequencing alone may not fully reconstruct. Our findings indicate that integrating OGM and AS after ES-based CNV discovery increases diagnostic resolution and clarifies complex rearrangements. This ES–OGM–long-read sequential approach refines SV interpretation and establishes a framework for integrating genomic and sequencing data in rare disease diagnostics.

The present study has some limitations. First, because all patients were referred from tertiary or specialized clinical centers, a potential bias toward severe or atypical cases cannot be excluded. Moreover, the small sample size (30 cases were selected according to the selection criteria from 281 cases with CNVs identified by XHMM analysis in ES, out of a total of 8622 cases with rare genetic disorders who underwent ES analysis in our laboratory between 2015 and 2022; please refer to the Methods for the selection criteria) and technical differences/improvements in ES collected over multiple years may affect the generalizability of our results. Further large-scale prospective studies are needed to evaluate the true diagnostic yield and reproducibility of the OGM and long-read sequencing pipeline. Second, although the combined OGM and long-read sequencing approach provides maximal structural resolution, its high cost currently limits its routine clinical application. Although 13 of 14 cases (excluding Case 30) may have been resolved by long-read sequencing alone, complete reconstruction using only long-read sequencing is likely to be considerably difficult, especially considering complex multi-breakpoint events such as those of Cases 20–22 and 29. A tiered strategy—for example, starting with ES, followed by OGM or long-read sequencing for unresolved or structurally complex cases—may allow rapid solutions to be achieved using a relatively cost-effective and diagnostically efficient workflow. Such a layered approach balances cost, resolution, and interpretability, thereby offering a practical framework for clinical implementation.

Collectively, our results highlight the potential of OGM and long-read sequencing for bridging the gap between CNV detection and full SV interpretation. In clinical practice, this approach may enhance diagnostic accuracy, inform personalized management, and guide genetic counseling for patients with unexplained rare genetic/genomic disorders by sequencing alone, and for those with dual or triple molecular diagnoses. Future research should systematically assess the pathogenic importance of newly identified SVs and explore the integration of OGM and long-read sequencing data into clinical pipelines.

In conclusion, CN gains or CN gain–CN loss patterns detected using ES should prompt the suspicion of hidden SVs, particularly when CNVs do not fully explain the phenotype. Highly complex SVs, which are often unresolvable by sequencing alone, can be effectively characterized by combining OGM and long-read sequencing. OGM is thus a powerful tool for identifying complex chromosomal rearrangements within its resolution range. Combining OGM with targeted long-read sequencing provides a robust and scalable strategy for comprehensive SV characterization, with direct implications for improving genetic diagnosis and personalized medicine in rare disorders.

Methods

Patients

The study was performed on 30 patients with different disease profiles (Table 1). All patients were clinically evaluated and referred to our laboratory from July 2015 to August 2022. ES had been performed on the patients using the method described in the following section, and all 30 patients had CNVs identified using ES. The ethnicities of patients were Japanese (25 patients) and Brazilian (five patients). The study protocol was approved by the institutional review boards, and written informed consent was obtained from patients or their guardians (IRB of Yokohama City University Faculty of Medicine, Approval number: A170525011/RevB2602000031). The cohort consisted of patients with congenital multiple anomalies who were referred to our laboratory from the pediatrics, neurology, clinical genetics, and obstetrics departments of collaborating institutions/hospitals. To elucidate the causes of rare genetic disorders, we selected 30 cases from 281 cases with CNVs identified by XHMM analysis in ES, out of a total of 8622 cases with rare genetic disorders that underwent ES analysis in our laboratory between 2015 and 2022. Inclusion criteria were as follows: (1) cases with non-recurrent CNVs identified by ES, in which the causative genes within CNVs did not fully explain the phenotypes, and (2) cases with lymphoblastoid cell lines available in our laboratory. Within this cohort, cases were selected equally from three groups: CN losses, CN gains, and CN gain–CN loss patterns. Exclusion criteria included: (1) cases with CNVs that were already reported to cause recurrent deletion or duplication syndromes, and (2) cases with unavailable lymphoblastoid cell lines. An overview of the study workflow, from ES-based CNV detection through OGM analysis to targeted long-read sequencing using AS, is shown in Fig. S13.

CNV analysis using whole ES

Genomic DNA was extracted from peripheral blood leukocytes using a QuickGene-610L kit (Fujifilm, Tokyo, Japan), according to the manufacturer’s instructions. Genomic DNA samples were sequenced on either the NovaSeq 6000 (Illumina, San Diego, CA, USA) using 150-bp paired-end reads after the enrichment of exonic regions using the Twist Human Comprehensive Exome (Twist BioScience, San Francisco, CA, USA) or the HiSeq2000 or HiSeq2500 (Illumina) with 101-bp paired-end reads after capture using the SureSelect Human All Exon kit (Agilent Technologies, Santa Clara, CA, USA). Exome data processing was performed as previously described55. Reads were aligned to GRCh38 using Novoalign (http://www.novocraft.com/), and PCR duplicates were eliminated using Picard (https://broadinstitute.github.io/picard/). The Genome Analysis Toolkit (GATK) (https://gatk.broadinstitute.org/hc/en-us) was used to realign indels and recalibrate base quality scores. With 20 reads covering >90% of the complete coding sequence, the mean depth of coverage of ES against the RefSeq coding sequence was 70–80×.

CNVs were detected from the ES data using XHMM, a statistical tool for CN analysis2,44,56. In brief, XHMM detects CNVs by analyzing normalized raw exome read depth data with principal component analysis of the complete coding regions. In Case 30, principal component analysis-normalized and filtered z-scores for the whole read depth were obtained, and SignalMap Version 1.9.0.05 (Roche Nimblegen, Madison, WI, USA) was used for visualization purposes. All CNVs were confirmed using quantitative PCR analysis of patients and their unaffected parents, and inheritance was assumed to be de novo or uniparental (Table 1).

OGM

Ultra-high molecular weight DNA was extracted from 1.5 million lymphoblastoid cell lines using the SP-G2 Blood and Cell Culture DNA Isolation Kit (Bionano Genomics, San Diego, CA, USA) following the manufacturer’s instructions57. DNA was labeled using the DLS-G2 Labeling Kit (Bionano Genomics) for the attachment of fluorophores to a specific six-nucleotide sequence (CTTAAG). The labeled DNA was loaded on a Saphyr chip G3.2 and run on a Saphyr instrument (Bionano Genomics) for an output of 400 Gb, targeting 80× effective coverage, with >70% of molecules >150 kb aligning (map rate) at N50 (N50 is defined as the length of the shortest molecule for which equal and longer molecules make up 50% of the total data) of >230 kb8 (Table S1). De novo assembly, alignment to the GRCh38 reference, and variant annotation of OGM data were performed using Bionano Solve 3.8.1. Visualization by Circos plot and genome browser, filtering, and interpretation of the identified SVs were conducted using Bionano Access 1.8.1. Standard filter settings were adjusted to select only the variants present in the regions of interest identified by previous standard diagnostic tests and in <1% of control samples. In Case 28, OGM data were analyzed using the T2T reference genome in addition to GRCh38 to observe any differences.

Targeted long-read sequencing using AS with GridION

Here, 3 μg of the ultra-high molecular weight DNA used in OGM was fragmented to a target size of 50 kb using a Megaruptor 3 DNAFluid Kit (Diagenode, Seraing, Belgium). In cases in which the remaining amount of ultra-high molecular weight DNA was low, 5 μg of genomic DNA extracted from peripheral blood leukocytes was fragmented to a target size of 40 kb using a Megaruptor 3 Shearing Kit (Diagenode). The sheared DNA was used to construct sequencing libraries using an Oxford Nanopore Ligation Sequencing Kit (SQK-LSK114) (Oxford Nanopore Technologies, Oxford, UK), largely following the manufacturer’s instructions55,58 except that enzyme incubation times were doubled, with the final AMPure purification incubation for 10 min at 37 °C. Approximately 25 fmol of the library was loaded onto a flow cell (FLO-MIN114, R10.4.1) on a GridION (Oxford Nanopore Technologies). Target regions comprising 0.24%–0.98% of the whole genome were enriched using the AS option59 of GridION Mk1 with a BED file that assigned regions of CNVs detected by ES and each of the surrounding total 7.3–29.5 Mb regions (Table S3). Sequencing was performed for ~3 days, with one additional library loading after nuclease flushing of a flow cell. In cases in which the amount of data was insufficient, new flow cells were used for additional analysis as appropriate.

The data were base-called with super accuracy mode and processed into a BAM file using Dorado v.0.7.2 (Oxford Nanopore Technologies) with GRCh38 as a reference. The depth of coverage was calculated using mosdepth v0.3.1 (https://github.com/brentp/mosdepth). The mean depth of the target region was 11.72×–47.38× (Table S3). SVs and CNVs were called using pbsv v.2.9.0 (PacBio SV calling and analysis tools) (https://github.com/PacificBiosciences/pbsv) together with breakpoint identification. dnarrange was used to detect SVs and extract patient-specific SVs by referencing 29 control datasets57,60. Control datasets were not used in the analysis if dnarrange excluded breakpoints of interest. Subsequently, using lamassemble (https://gitlab.com/mcfrith/lamassemble), each group of overlapping SV reads was merged into a consensus sequence and realigned to the reference genome. Next, dnarrange created a dot-plot representation of SV breakpoints. In dot plots, lines show the alignment to the reference sequence and are drawn in red and blue to denote forward and reverse orientations to the reference genome, respectively. To understand entire SVs, we used the algorithm dnarrange-link to infer the order and orientation of multiple rearrangements61.