Inferring chromosome segregation error stage and crossover in trisomic disorders with application to Down syndrome

Li, Zhenhua; Yang, Wenjian; Wu, Gang; Chang, Ti-Cheng; Cheng, Zhongshan; Devidas, Meenakshi; Shago, Mary; Carroll, Andrew J.; Heerema, Nyla A.; Gastier-Foster, Julie M.; Wood, Brent L.; Sanclemente, Lauren; Raetz, Elizabeth A.; Hunger, Stephen P.; Loh, Mignon L.; Feingold, Eleanor; Rosser, Tracie C.; Allen, Emily G.; Sherman, Stephanie L.; Rabin, Karen R.; Lupo, Philip J.; Yang, Jun J.

doi:10.1038/s41467-025-61413-w

Download PDF

Article
Open access
Published: 09 July 2025

Inferring chromosome segregation error stage and crossover in trisomic disorders with application to Down syndrome

Nature Communications volume 16, Article number: 6316 (2025) Cite this article

5499 Accesses
7 Altmetric
Metrics details

Subjects

Abstract

Errors in chromosome segregation during gametogenesis, such as nondisjunction (NDJ) errors, have severe consequences in human reproduction, and a better understanding of their etiology is of fundamental interest in genetics. Mapping NDJ errors to meiotic/mitotic stages typically requires proband-parent comparison, limiting its applicability. Herein, we develop Mis-segregation Error Identification through Hidden Markov Models (MeiHMM), a method for inferring NDJ error stage and crossover events based on only genomic data of trisomic probands. Guided by triallelic genotype/haplotype configurations, MeiHMM discerns the allelic origin at each locus, which informs NDJ error during gamete formation, without identifying the parental origin of the trisomy. In 152 Down syndrome (DS) cases, MeiHMM achieved an accuracy of 96.1% in classifying NDJ errors, with a sensitivity of 91.6% in crossover identification, compared to proband-parents trio analysis. 17% of Meiosis II errors were misclassified as Meiosis I, mainly due to small proximal crossover events. Applying MeiHMM to 509 children with DS-associated childhood leukemia, we demonstrate that NDJ error is associated with the age of disease onset, somatic genomic abnormalities, and prognosis. Thus, MeiHMM is an effective method for trisomic NDJ error classification and crossover identification that can be applied broadly to study the etiology of congenital aneuploidy conditions.

Holliday junction–ZMM protein feedback enables meiotic crossover assurance

Article Open access 24 September 2025

Co-inheritance of recombined chromatids maintains heterozygosity in a parthenogenetic ant

Article Open access 16 July 2024

Gene conversion: a non-Mendelian process integral to meiotic recombination

Article Open access 07 April 2022

Introduction

Aneuploidy can occur during human meiosis due to nondisjunction (NDJ) errors, but it is challenging to study because most cannot give rise to viable embryos, with the exception of trisomies 13, 18 and especially 21—the most common aneuploid condition at birth. Therefore, Down syndrome (DS) offers a unique opportunity and is an ideal model for understanding the origin of these events. Individuals with DS are at increased risk of intellectual disability, congenital heart disease, and are predisposed to hematologic malignancies¹. Trisomy 21 arises mostly from NDJ or mis-segregation of chromosome 21 (chr 21) during meiosis^2,3, or relatively uncommonly, from postzygotic mitosis⁴. In more than 90% of cases, the extra copy of chr 21 originates from the maternal gamete⁵. This is because female gametogenesis is more prone than male gametogenesis to erroneous chromosomal segregation due to decades of arrest at prophase I^6,7. Inaccurate chromosomal segregation during gametogenesis can be grossly classified into meiosis I (MI) or meiosis II (MII) errors using genotyping data^8,9. MI errors are inferred when two homologous parental chromosomes are identified in the gamete and MII errors are inferred when two sister chromatids are identified. These observations can be the result of several mechanisms of mis-segregation, such as NDJ of the homologous chromosomes, precocious separation of sister chromatids, or reverse segregation. The stage at which the NDJ takes place is associated with the number and the position of the chromosomal crossovers. In maternal NDJ of chromosome 21, absence of crossover or crossover distally located from the centromere is associated with MI errors. In contrast, MII errors are associated with pericentromeric crossovers¹⁰.

The type/stage of NDJ error of trisomy 21 is usually inferred from proband-parent trios by comparing the heterozygosity patterns of genotypes at the pericentromeric region between the child with DS and the parents^10,11,12,13. However, when the parents’ genetic data are unavailable, there is currently no generalizable method to infer the type of error, aside from recent work by refs. ^14,15 analyzing NDJs based on low coverage whole genome sequencing (WGS) data in preimplantation genetic testing.

In this work, we develop a method, Mis-segregation Error Identification through Hidden Markov Models (MeiHMM), to infer the type of NDJ and locate the chromosomal crossovers using proband genotype alone, based on the frequencies of single nucleotide polymorphisms (SNPs) or haplotypes in the general population. Applying this to WGS data of 152 DS cases demonstrates high accuracy compared to trio analysis. We further investigate the association between the type of NDJ error and disease characteristics in a cohort of DS-associated acute lymphoblastic leukemia (ALL). MeiHMM does not infer the parental origin of trisomy 21, but it is applicable to NDJ errors of both maternal and paternal origins. For simplicity in the description, the following sections assume a maternal origin for trisomy 21, which is true in ~90% of DS cases⁵, though the framework is the same for NDJ of paternal origin as well.

Results

MeiHMM implements variant- and haplotype-based analyses to identify NDJ timing and cross over locations

The types of chr 21 NDJ errors can be distinguished by the number of unique haplotypes across the q arm of this chromosome as follows (Fig. 1A): (1) Errors of MI origin are characterized by having three different haplotypes in the centromeric region, i.e., one allele with the paternal haplotype, two remaining alleles with two different maternal haplotypes due to the unsuccessful separation of the homologous chromosomes in the oocyte; (2) By contrast, errors of MII result when the duplicated sister chromatids fail to separate; such failure is inferred when only two haplotypes are represented in the centromeric region, i.e., one allele from the paternal haplotype and two identical alleles from the maternal haplotype; (3) Mitotic errors are inferred when there is complete duplication of maternal chr 21. (Alternatively, this could be inferred as a MII error with no recombination, a noted limitation of this gross classification¹⁰) Applying a Hidden Markov model, MeiHMM (1) segments chr 21 into blocks of two- or three haplotypes which informs the stage of NDJ, and (2) identifies the position of chromosome crossover (recombination during meiosis) on the basis of the boundaries of these haplotype blocks.

To implement MeiHMM, we categorize heterozygous SNPs on chr 21 into three groups (Fig. 1B). (1) Type 1 informative SNPs: for a given case, we first identify variants that are rare in the general population (alternative allele frequency <0.003 in gnomAD and ALlele Frequency Aggregator [ALFA]); if the genotype configuration at this SNP is alternative: alternative: reference alleles (Alt: Alt: Ref), this suggests a two-haplotype state because the probability of a rare allele occurring twice independently is exceedingly low; thus the configuration likely arises from the duplication of the rare allele, i.e., two-haplotype state (Fig. 1C; see “Methods” for details). (2) Type 2 informative SNPs: for variants that do not meet the criteria in #1 (common SNPs or rare variants without the defined genotype configuration), we utilize genotypes at adjacent variants to discern two- vs three-haplotype state: for a given variant and its two flanking SNPs, we generate hypothetical haplotypes for this three-variant block, assuming a two-haplotype state (Fig. 1C), i.e., one haplotype consists of the combination of the one-copy alleles across the three variants, whereas the other is defined by the combination of alleles with two copies. The likelihood of two- vs three-haplotype state for this region is then scored on the basis of the observed vs expected frequency (O/E) of the hypothetical haplotypes in the 1000 Genomes Project populations, with the expected frequency estimated as the product of the allele frequencies of incorporated alleles (see “Methods”). A lower O/E ratio suggests that the two-haplotype state is less likely, which informs the presence of the three-haplotype states. (3) The remainder of chr 21 SNPs are considered non-informative. Variants on chr 21 were classified into one of these three groups and these assignments were fed into a hidden Markov model to segment chr 21 into smoothened two-haplotype and three-haplotype blocks (Fig. 1D). The stage of NDJ error is then inferred from the segmentation results (Supplementary Fig. 1). Detailed definitions of the two types of informative SNPs and the hidden Markov model are available in “Methods”.

We performed simulations to evaluate the degree to which the aforementioned type 1 (rare variant-based) and type 2 (rare haplotype-based) informative SNPs inform the two- vs three-haplotype state on chr 21 (see “Methods” for details). First, for the rare variant-based inference, we generated triallelic genotypes for chr 21 SNPs according to the allele frequencies reported in gnomAD and ALFA, under the assumption of either a two- or three-haplotype state. Among 44,681 variants in gnomAD and ALFA for which (1) the alternative allele population frequency <0.003 and (2) the simulated genotype configuration is Alt: Alt: Ref, 99.7% were only seen under the two-haplotype hypothesis (Fig. 1E). For rare haplotype-based inference, we performed similar simulations but now by randomly sampling haplotypes from the 1000 Genomes project. In 1,653,980 hypothetical haplotypes with O/E ratio <0.1, 98.2% were only seen under the three-haplotype hypothesis (Fig. 1F).

Figure 1G shows an example of the segmentation results of MeiHMM, based on WGS data from Gabriella Miller Kids First Pediatric Research Program (see “Methods”). For this particular case, rare haplotype analysis of the pericentromeric region indicates a three-haplotype state, and therefore it is classified as MI error, with one crossover event identified in the q arm (Fig. 1G). Additional examples are included in Supplementary Fig. 2.

Performance metrics of MeiHMM

To test the accuracy of MeiHMM, we compared the NDJ error stage and crossover event calling results with those from 152 DS proband-parent trios with WGS data (see “Methods”). Overall, MeiHMM classified 76%, 19%, and 5% of the cases as MI, MII, and mitotic errors (Fig. 2A), respectively, of which 146 were consistent with trio analysis (accuracy = 96.1%; Fig. 2A). Misclassifications of the six cases were due to crossovers (Fig. 2B) near the centromere, resulting in exceedingly narrow two-haplotype blocks (Fig. 2C) in the pericentromeric region with a limited number of informative SNPs; consequently, MII errors were misclassified as MI errors (Fig. 2A).

**Fig. 2: Comparison of MeiHMM with trio analysis.**

MeiHMM also accurately identified crossover events (Fig. 2B). In the 95 crossover events identified by trio analysis, 87 were successfully discovered by MeiHMM (sensitivity = 91.6%) with no false positive crossover events (precision = 100%). Again, the discordance is largely due to the pericentromeric crossovers as described above, except for one case with two crossovers too close to each other (Fig. 2B). Two cases analyzed by both MeiHMM and trio are illustrated in Supplementary Fig. S3.

Next, we applied MeiHMM to a large dataset of 764 DS cases with proband-only data genotyped by using WGS, MeiHMM identified 75% (n = 571) of the errors as MI origin (no crossover, 47%; with crossover, 28%), 21% of the errors as MII, and 4% as mitotic errors (Fig. 2D). The frequencies of the error types were consistent with those in the cases by trio analysis (Fig. 2A), as well as those reported in a previous study using microsatellite markers¹⁰.

We identified an average of 0.47 crossover events per case in MI errors (range 0–3 per case, total 266 events; Fig. 2E), though this is likely an underestimation of the actual frequency of crossover events during gamete formation due to random selection of sister chromatids in MII followed by MI errors. In cases with MII errors (n = 165), an average of 1.19 crossover events were identified (range 1–3 per case, total 197 events; Fig. 2F). MI and MII errors were characterized by a distinct pattern of crossover position along chr 21. Crossovers in MI errors were more likely to be telomeric, especially in cases with only one crossover (Fig. 2E). In contrast, crossovers in MII errors were preferably pericentromeric, regardless of the number of crossovers, with a relatively heavier tail towards the telomeric region (Fig. 2F). These results were consistent with previous observations that the positioning of crossovers is associated with the type of error¹⁰.

Applying MeiHMM to SNP arrays

Because MeiHMM is based on genotype data, it can potentially be used for datasets generated by other genotyping platforms. To demonstrate this, we applied MeiHMM to a previously published SNP arrays dataset of 429 DS case-control series for congenital atrioventricular septal defects (AVSD)¹⁶. We elected to only use common variants and perform the haplotype-based analysis (Type 2 in Fig. 1B) because rare SNPs are poorly represented on SNP arrays. We observed MI, MII and mitotic errors in this DS-AVSD cohort with frequencies comparable to our DS-ALL dataset (Fig. 2D and Supplementary Fig. 4).

NDJ error is associated with disease phenotypes of DS-related ALL

The ability to infer chr 21 NDJ error using proband-only data enables the investigation of DS-associated health conditions when genotype data of the parents are not available. To demonstrate this, we applied MeiHMM to a cohort of children with DS-associated acute lymphoblastic leukemia (DS-ALL) (n = 509) genotyped using WGS¹⁷. Overall, MI, MII, and mitotic errors accounted for 73% (with crossover, 31%; no crossover, 43%), 22% and 5% of the dataset, respectively, similar to those observed in the general DS population without ALL (n = 764; P = 0.63).

However, there was significant heterogeneity in NDJ error within DS-ALL cases, especially when stratified by the age at ALL onset (P = 0.0026) and by ALL molecular subtype defined by leukemia genomic abnormalities (P = 0.017; Fig. 3A). MII or mitotic errors were over-represented in patients with ALL diagnosed at the age above 10 years (38%) compared to those diagnosed below 10 years (24%; P = 0.0024; Fig. 3A). Across different ALL subtypes, there seemed to be an over-representation of errors of MII origin in patients with ETV6::RUNX1-like, CRLF2 BCR::ABL1-like and C/EBPalt DS-ALL (Fig. 3A). Because MII errors are characterized by more crossover events, we further hypothesize that the specific location of chr 21 recombination (and consequently haplotype diversity of specific loci) may influence DS disease phenotypes. To test this, we compared the frequencies of two- and three-haplotype states in windows of 1 million bp along chr 21 between DS-ALL cases and non-ALL DS cases. Compared to non-ALL DS cases, DS-ALL patients older than 10 years were more likely to have 2 haplotypes (rather than 3 haplotypes) in the proximal q arm, especially at q21.1 (Fig. 3B). In contrast, DS-ALL patients younger than 10 years exhibited similar patterns of haplotype diversity relative to that of the general DS population without leukemia (Fig. 3B). In terms of molecular subtype, CRLF2 BCR::ABL1-like DS-ALL cases exhibited significantly more frequent two-haplotype blocks in the region spanning from q21.3 to q22.2, while C/EBPalt DS-ALL cases are more likely to have two-haplotype blocks at q22.12 when compared to non-ALL DS cases (Fig. 3C).

We also observed that the type of trisomy 21 error was associated with treatment outcomes of DS-ALL in terms of both event-free survival (EFS; P = 0.016; Fig. 3D) and overall survival (OS; P = 0.0024; Fig. 3E). After adjusting for known prognostic factors such as age, initial white blood cell count, minimal residual disease at end-of-induction, and ALL molecular subtype, MII is independently associated with poorer event-free survival (P = 0.035; Supplementary Table 1). Because the ratio of two- vs three-haplotype blocks differ by NDJ error type, we examined the impact of the size of two-haplotype blocks on treatment outcomes. As shown in Supplementary Table 2, the proportion of two-haplotype blocks was linked to decreasing EFS (P = 0.012) and OS (P = 2.6 × 10⁻⁴), and it remained marginally significant after adjusting for NDJ error type (P = 0.038 for OS). These results imply that the proportion of chr 21 with two-haplotype state (thus the degree of homozygosity) may mediate the impact of NDJ error on leukemia treatment outcomes. Taken together, these results suggest that the NDJ error of trisomy 21 could modify specific phenotypes of DS-associated health conditions.

Discussion

The type of NDJ error and crossover events are central to understanding the etiology of DS (and trisomic chromosomal disorders in general). It has been found that the type of NDJ (MI, MII, or mitotic) is closely related to maternal age^12,18, genetic polymorphisms¹¹, lifestyle^19,20, socioeconomic status^21,22, and crossover events¹⁰. An altered crossover pattern is a strong risk factor for chromosomal NDJ^2,8,23; and when NDJ happens, the crossover is associated with the type of error¹⁰. Because MeiHMM only requires data from the DS child, it can greatly expand the number of cases for which NDJ can be inferred and thus enable large association studies to explore the impact of NDJ on DS disease phenotypes. However, the exact biology underlying the association of NDJ error stage with DS disease phenotypes is unclear and warrants further investigation. We posit that duplication of the deleterious allele (in the two-haplotype state, MII) can result in increased severity of disease phenotypes, compared to cases with only one-copy of the risk allele (in the three-haplotype state, MI). In DS-ALL, we observe association of two-haplotype state with higher risk of unfavorable disease subtype CRLF2 BCR::ABL1-like, which is in line with this hypothesis. That said, this hypothesis should be experimentally tested in future studies.

We have demonstrated that MeiHMM can be applied to genotype data generated by WGS or SNP arrays. It should also be straightforward for MeiHMM to accommodate data generated from other sequencing or genotyping platforms, such as RNA sequencing, whole exome sequencing, or even a small, targeted assay designed specifically for NDJ error identification. In addition to its versatility in data types, MeiHMM can also be applied to study the NDJ of other trisomic chromosomal conditions (e.g., trisomy 13 or trisomy 18)²⁴, either prenatally or postnatally. One limitation of MeiHMM is that it cannot detect crossover events involving small genomic regions with few informative SNPs. This is especially problematic for calling short two-haplotype blocks where rare variants are sparse. For example, as MII errors are associated with proximal crossovers, which results in short two-haplotype blocks at the pericentromeric region, leading to the misclassification of 17% (n = 6 of 35) of MII errors as MI errors (Fig. 2A). It should be noted that these misclassifications may introduce uncertainty in assessing the association of NDJ error type with DS phenotypes, such as treatment outcomes of ALL. Furthermore, because MeiHMM uses data of the proband-only, it does not infer the parental origin of trisomy 21.

The traditional method for NDJ staging relies on diads or trios, with restrictive utility because current DS genomic datasets are not trio-based. Prior efforts such as LD-PGTA¹⁵ and LD-CHASE¹⁴ are capable of inferring NDJ errors and crossover events in embryos generated through in vitro fertilization, using data from preimplantation genetic testing for aneuploidy. This elegant approach employs a haplotype-based framework, conceptually related to MeiHMM, even though it relies on sampling read pairs from low-coverage WGS data. In contrast, MeiHMM harnesses genotype information, making it more broadly generalizable across platforms; MeiHMM also makes use of rare variants (Type 1 informative SNP), which may be beneficial for regions with sparse linkage disequilibrium. Applying MeiHMM would greatly expand the ability to determine the origin of NDJ, essential in any DS cases genotyped for Chr 21 SNPs at a reasonable density, with potential value for studying the etiology of other congenital aneuploidies.

Methods

Data and genotyping

This study complies with all relevant ethical regulations and was approved by the St. Jude Children’s Research Hospital Institutional Review Board and Baylor College of Medicine Institutional Review Board. All the WGS data used in this study were generated by the Gabriella Miller Kids First Pediatric Research Program. WGS and SNP array data were obtained for a total of 764 and 429 children with non-mosaic DS, respectively. WGS data of both parents were also obtained for a subset of DS children with WGS data, forming 152 trios. In addition, WGS data of DS-associated acute lymphoblastic leukemia cases (DS-ALL, n = 509) was obtained, and subtype classification was performed in conjunction with RNA-Seq data as described in our previous publication¹⁷. Briefly, subtypes including CRLF2, ETV6::RUNX1, IGH::IGF2BP1, TCF3::PBX1, BCR::ABL1, KMT2A, and DUX4 were classified by rearrangement identification (using fusionCatcher version 0.99.7d)²⁵; high hyperdiploid subtype was identified by digital karyotyping (using RNASeqCNV version 1.2.2)²⁶; C/EBPalt subtype was defined by overexpression and/or alterations of CEBPD, CEBPA or CEBPE genes¹⁷; BCR::ABL1-like and ETV6::RUNX1-like subtypes were called by gene expression analysis²⁷; PAX5alt subtype was defined by unique gene expression pattern and presence of PAX5 alterations²⁷; and PAX5 P80R and IKZF1 N159Y were called using variant identification²⁷. DS-ALL Patients were enrolled in the following Children’s Oncology Group trials: 9904 (ClinicalTrials.gov identifier NCT00005585), 9905 (NCT00005596), 9906 (NCT00005603), AALL0232 (NCT00075725), AALL0331 (NCT00103285), AALL08B1 (NCT01142427), AALL0932 (NCT01190930), AALL1131 (NCT02883049), AALL1731 (NCT03914625), and APEC14B1 (NCT02402244). All samples were collected with informed consent obtained from patients, parents, and/or guardians, in written and signed form. WGS was performed using Kapa Hyperprep library preparation kit (Roche), sequenced on the NovaSeq platform (Illumina), and coverage of reference and alternative alleles were extracted using the GATK (version 3.5-0) germline short variant discovery pipeline²⁸, based on which triallelic genotype calls were made. SNPs with variant allele frequencies between 0.13 and 0.40 or between 0.60 and 0.87 were deemed to have configuration of Alt: Ref: Ref or Alt: Alt: Ref, respectively. Only SNPs with coverage of at least 30 reads, and the frequencies of alternative allele carriers in the cases were within 10% of the expected proportion (estimated using overall population allele frequency from gnomAD²⁹ and ALFA³⁰) were kept for analysis. SNP array data was retrieved from a previously published dataset¹⁶ of SNP 6.0 arrays (Affymetrix). For each SNP on chromosome 21, genotyping was performed by unsupervised hierarchical clustering using average linkage and Euclidean distance based on three features: intensity A, intensity B and the ratio between B and A. Only SNPs with genotype counts broadly following Hardy-Weinberg equilibrium (Chi squared statistics <20) were kept for analysis.

Type 1 informative SNPs: variants with rare two-copy alleles

On chromosome 21, let $M$ be the number of heterozygous SNPs in a DS proband that is properly genotyped, ${{\mathbf{\Omega }}}={({{{\bf{G}}}}_{i})}_{i=1}^{M}$ denote the genotypes of these SNPs where ${{{\bf{G}}}}_{i}$ is the genotype of the ${i}^{{th}}$ SNP. As there are three copies of chromosome 21, the genotype of a heterozygous SNP always consists of two copies of one allele and a single copy of another allele (hereafter referred to as two-copy allele and one-copy allele, respectively). ${{{\bf{G}}}}_{i}$ can be further denoted as

$${{{\bf{G}}}}_{i}=\left({a}_{i},{b}_{i}\right),\,{a}_{i},{b}_{i}\in \left\{A,T,C,G\right\}{{\rm{and}}}\,{a}_{i}\ne {b}_{i}$$

(1)

where ${a}_{i}$ is the two-copy allele and ${b}_{i}$ is the one-copy allele. As an example, for a heterozygous SNP with genotype ${TTG}$, we have ${a}_{i}=T$ and ${b}_{i}=G$. The position of the ${i}^{{th}}$ heterozygous SNP on chromosome 21 is denoted as ${l}_{i}.$ The genotypes ${({{{\bf{G}}}}_{i})}_{i=1}^{M}$ are ordered according to their positions, i.e., ${l}_{i} < {l}_{j},\forall {i} < j$.

Let ${{\rm{AF}}}(l,m,g)$ denotes the allele frequency of a SNP on chromosome 21, at position $l$, with $m\in \{{{\rm{eur}}},{{\rm{afr}}},{{\rm{amr}}},{{\rm{eas}}},{{\rm{sas}}}\}$, representing European, African, American, East Asian, and South Asian genetic ancestries, and $g\in \left\{A,T,C,G\right\}$, representing the allele. We define a variant as having rare two-copy allele status if

$${\max }_{k}{{\rm{AF}}}\left({l}_{i},{m}_{k},{a}_{i}\right) < 0.003,$$

(2)

where ${\{{m}_{k}\}}_{k}$ are the ancestries that account for more than 25% in the genetic ancestry admixture of the subject.

In inequality (2), the allele frequencies were obtained from gnomAD²⁹, or from ALFA³⁰ if the SNP is not reported in gnomAD. The genetic ancestry admixture fractions were estimated by using iAdmix³¹.

In genomic ranges with two haplotypes, as the third allele is a duplicate, rare alleles are as likely to be duplicated as the common alleles. In contrast, when there are three independent haplotypes, a rare allele is unlikely to appear twice, making rare two-copy alleles more likely to appear in two-haplotype blocks (Fig. 1B).

Type 2 informative SNPs: variants in rare hypothetical haplotypes

When the three copies of chromosomes 21 originate from only two haplotypes (i.e., two of the three chromosomes are identical), then the two distinct haplotypes on heterozygous SNPs can be represented as ${({a}_{i})}_{i=1}^{M}$ and ${({b}_{i})}_{i=1}^{M}$, where former is formed by all the two-copy alleles, and the latter is formed by all the one-copy alleles. For the locus of the ${i}^{{th}}$ SNP, we can generate the hypothetical haplotypes formed by itself and its two immediately adjacent SNPs (a total of 3 loci), which can be represented as $({a}_{i-1},{a}_{i},{a}_{i+1})$ and $({b}_{i-1},{b}_{i},{b}_{i+1})$. For genomic ranges with three haplotypes, this procedure shuffles and recombines the alleles, creating arbitrarily rare haplotypes that do not appear in the general population (Fig. 1B). The frequency of the hypothetical haplotype $({a}_{i-1},{a}_{i},{a}_{i+1})$ in the population can be calculated as $F({a}_{i-1},{a}_{i},{a}_{i+1})/N$, where $F({a}_{i-1},{a}_{i},{a}_{i+1})$ and $N$ represent the number of occurrences of $({a}_{i-1},{a}_{i},{a}_{i+1})$ and the total number of haplotypes reported in the 1000 Genomes project³², respectively. The expected frequency of the hypothetical haplotype $({a}_{i-1},{a}_{i},{a}_{i+1})$, under the assumption of no linkage disequilibrium, can be calculated as ${{\rm{AF}}}({l}_{i-1},{a}_{i-1})\times {{\rm{AF}}}({l}_{i},{a}_{i})\times {{\rm{AF}}}({l}_{i+1},{a}_{i+1})$. We score $({a}_{i-1},{a}_{i},{a}_{i+1})$ as the ratio between observed and expected frequencies (O/E ratio):

$${S}_{{ai}}=\frac{\frac{F\left({a}_{i-1},{a}_{i},{a}_{i+1}\right)}{N}}{{{\rm{AF}}}\left({l}_{i-1},{a}_{i-1}\right)\times {{\rm{AF}}}\left({l}_{i},{a}_{i}\right)\times {{\rm{AF}}}\left({l}_{i+1},{a}_{i+1}\right)}$$

(3)

Note that differing with inequality (2), here the allele frequency ${{\rm{AF}}}({l}_{i},{a}_{i})$ was obtained from 1000 Genomes project (n = 3202 individuals, retrieved on April 4th, 2022). Because of the small number of available haplotypes in this database, the genetic ancestries were ignored when calculating the allele frequency for haplotype analysis. The hypothetical haplotype formed by single-copy alleles $({b}_{i-1},{b}_{i},{b}_{i+1})$ can be scored similarly, and a SNP is defined to be in a rare hypothetical haplotype when

$$\min \left({S}_{{ai}},{S}_{{bi}}\right) < 0.1$$

(4)

The steps involved in identifying rare two-copy alleles and rare hypothetical haplotypes are summarized in Fig. 1C.

Hidden Markov models

For each DS case, we constructed a hidden Markov model (Fig. 1F) on the set of heterozygous SNPs $\Omega$, with the components defined as follows:

1.
vector of hidden states,
$${{\bf{Q}}}={\left({q}_{i}\right)}_{i=2}^{M-1},\,{q}_{i}\in \left\{2{{\rm{H}}},\, 3{{\rm{H}}}\right\},$$
(5)
where $2{{\rm{H}}}$ and $3{{\rm{H}}}$ indicate two-haplotype and three-haplotype states, respectively.
2.
vector of observations
$${{\bf{O}}}={\left({o}_{i}\right)}_{i=2}^{M-1},{o}_{i}=\left\{\begin{array}{cc}{{\rm{T}}}1,\hfill & {{\rm{if}}}\left(2\right)\,{{\rm{is}}}\; {{\rm{true}}},\hfill \\ {{\rm{T}}}2,\hfill & {{\rm{if}}}\left(4\right)\,{{\rm{is}}}\; {{\rm{true}}},\hfill \\ {{\rm{Others}}},& {{\rm{otherwise}}},\hfill \end{array}\right.$$
(6)
where ${{\rm{T}}}1$ and ${{\rm{T}}}2$ stand for type 1 and type 2 informative SNPs, respectively. For cases genotyped by SNP arrays, only rare hypothetical haplotypes were used as only common SNPs are covered by the platform.
3.
To define the initial, state transition, and emission probabilities, we first randomly selected 40 cases for manual inspection of two- vs three-haplotype state, most of which were visually apparent (e.g., based on AF of two-copy alleles and O/E ratio of hypothetical haplotype, Fig. 1G). For two-haplotype state, the initial probability was set as the proportion of samples with the first variant along the chromosome being assigned to the two-haplotype state; the transition probabilities were set as the proportion of variants followed by a state transition (a crossover event); For the emission probabilities, we used the proportion of each type of variants being assigned to the two-haplotype state. This was then repeated to set the parameters for the three-haplotype state.

The most likely vector of hidden states was estimated by using the Viterbi algorithm³³.

Simulation

We estimated the distributions of (i) the allele frequency of two-copy allele and (ii) the score of the 3-SNP hypothetical haplotypes, under the assumptions of the triallelic chromosome 21 having two or three haplotypes, by sampling random genotypes/haplotypes from reference databases. The simulation was done by repeating the following procedure for 100 times:

i.
Allele frequencies of two-copy alleles
1. a.
  Three haplotypes: for each SNP reported in reference database (gnomAD or ALFA), three independent alleles were sampled based on allele frequencies in the reference, and the allele frequency of the two-copy allele was recorded if the genotype was heterozygous.
2. b.
  Two haplotypes: for each SNP, two independent alleles were sampled based on allele frequencies in the reference, and one was doubled at equal probability to make three alleles. If the final genotype was heterozygous, the allele frequency of the two-copy allele was recorded.
ii.
O/E ratio of hypothetical haplotypes

a.
Three haplotypes: three complete haplotypes of chromosome 21 were sampled from the 1000 Genomes dataset, and the O/E ratios for all hypothetical haplotypes formed by three adjacent SNPs were recorded.
b.
Two haplotypes: similarly, two complete haplotypes of chromosome 21 were sampled. One of the haplotypes was doubled to make a total of three haplotypes, and the O/E ratios for all the hypothetical haplotypes formed by three adjacent SNPs were recorded.

Trio analysis and performance evaluation metrics

When genotyping data of the DS child and both parents were available, the stage of NDJ error and crossover event can be determined as described previously¹¹. Briefly, SNPs that were homozygous non-reference in one parent but homozygous reference in the other were used as markers to infer the parent of trisomy 21 origin. In the DS child, these SNPs were heterozygous, and the two-copy alleles were from the parent of origin of the extra copy of chromosome 21. After the parent of origin was determined, SNPs that were heterozygous in the parent of origin but homozygous reference in the other parent were used to segment chromosome 21 into two-haplotype or three-haplotype blocks. For such SNPs, the non-reference allele could be doubled or absent when there were two haplotypes in the DS child. When there were three haplotypes, it was always the reference allele that was doubled. In these SNPs, whether the non-reference is doubled were used in HMMs as observations to segment chromosome 21 into blocks of two or three haplotypes.

Performances of MeiHMM in NDJ error and crossover event identification were evaluated by comparing to the results of trio analysis on the same set of DS cases (n = 152). For NDJ error classification, accuracy was used to measure the overall performance. It is defined as

$${{\rm{Accuracy}}}=\frac{{{\rm{number}}}\; {{\rm{of}}}\; {{\rm{cases}}}\; {{\rm{with}}}\; {{\rm{results}}}\; {{\rm{consistent}}}\; {{\rm{with}}}\; {{\rm{trio}}}\; {{\rm{analysis}}}}{{{\rm{total}}}\; {{\rm{number}}}\; {{\rm{of}}}\; {{\rm{cases}}}}$$

(7)

For crossover event identification, let ${{\rm{TP}}}$, ${{\rm{FP}}}$ and ${{\rm{FN}}}$ denote the number of true positives, false positives, and false negatives, respectively, ${{\rm{precision}}}={{\rm{TP}}}/({{\rm{TP}}}+{{\rm{FP}}})$ and ${{\rm{sensitivity}}}={{\rm{TP}}}/({{\rm{TP}}}+{{\rm{FN}}})$, were used as the performance evaluation metrics.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The WGS data of the DS cases (both ALL and non-ALL) and the parents were obtained from the Gabriella Miller Kids First Pediatric Research Program and NIH INvestigation of Co-occurring conditions across the Lifespan to Understand Down syndromE (INCLUDE) Project, which are available through dbGaP with accession number phs002330.v2.p1 [https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs002330.v2.p1]. Because the raw sequencing data contain personal genomic and genetic information, access to these data is restricted for the advancement of research only. Access to the data is facilitated through dbGaP website by making applications to the data access committee and the expected timeframe of response is dependent on the complexity of each application. Access is granted annually with the option of renewal. SNP array data were obtained from a previously published study, which is publicly available through Gene Expression Omnibus with accession number GSE60607. Source data are provided with this paper.

Code availability

The code for MeiHMM is available at https://github.com/jjyanglab/MeiHMM³⁴.

References

Bull, M. J. Down syndrome. N. Engl. J. Med. 382, 2344–2352 (2020).
Article PubMed Google Scholar
Sherman, S. L. et al. Trisomy-21 - association between reduced recombination and nondisjunction. Am. J. Hum. Genet. 49, 608–620 (1991).
CAS PubMed PubMed Central Google Scholar
Antonarakis, S. E. et al. The meiotic stage of nondisjunction in trisomy-21 - determination by using DNA polymorphisms. Am. J. Hum. Genet. 50, 544–550 (1992).
CAS PubMed PubMed Central Google Scholar
Antonarakis, S. E., Avramopoulos, D., Blouin, J. L., Talbot, C. C. & Schinzel, A. A. Mitotic errors in somatic-cells cause trisomy-21 in about 4.5-percent of cases and are not associated with advanced maternal age. Nat. Genet. 3, 146–150 (1993).
Article CAS PubMed Google Scholar
Antonarakis, S. E. Parental origin of the extra chromosome in trisomy 21 as indicated by analysis of DNA polymorphisms. Down Syndrome Collaborative Group. N. Engl. J. Med. 324, 872–876 (1991).
Article CAS PubMed Google Scholar
Garcia-Cruz, R. et al. Dynamics of cohesin proteins REC8, STAG3, SMC1 beta and SMC3 are consistent with a role in sister chromatid cohesion during meiosis in human oocytes. Hum. Reprod. 25, 2316–2327 (2010).
Article CAS PubMed Google Scholar
Wolstenholme, J. & Angell, R. R. Maternal age and trisomy-a unifying mechanism of formation. Chromosoma 109, 435–438 (2000).
Article CAS PubMed Google Scholar
Ottolini, C. S. et al. Genome-wide maps of recombination and chromosome segregation in human oocytes and embryos show selection for maternal recombination rates. Nat. Genet. 47, 727–735 (2015).
Article CAS PubMed PubMed Central Google Scholar
Bell, A. D. et al. Insights into variation in meiosis from 31,228 human sperm genomes. Nature 583, 259 (2020).
Article ADS CAS PubMed Google Scholar
Lamb, N. E. et al. Susceptible chiasmate configurations of chromosome 21 predispose to non-disjunction in both maternal meiosis I and meiosis II. Nat. Genet. 14, 400–405 (1996).
Article CAS PubMed Google Scholar
Chernus, J. M. et al. A candidate gene analysis and GWAS for genes associated with maternal nondisjunction of chromosome 21. PLoS Genet. 15, https://doi.org/10.1371/journal.pgen.1008414 (2019).
Lamb, N. E., Yu, K., Shaffer, J., Feingold, E. & Sherman, S. L. Association between maternal age and meiotic recombination for trisomy 21. Am. J. Hum. Genet. 76, 91–99 (2005).
Article CAS PubMed Google Scholar
Oliver, T. R. et al. New insights into human nondisjunction of chromosome 21 in oocytes. PLoS Genet. 4, https://doi.org/10.1371/journal.pgen.1000033 (2008).
Ariad, D. et al. Aberrant landscapes of maternal meiotic crossovers contribute to aneuploidies in human embryos. Genome Res. 34, 70–84 (2024).
Article CAS PubMed PubMed Central Google Scholar
Ariad, D. et al. Haplotype-aware inference of human chromosome abnormalities. Proc. Natl. Acad. Sci. USA 118, https://doi.org/10.1073/pnas.2109307118 (2021).
Ramachandran, D. et al. Contribution of copy-number variation to Down syndrome-associated atrioventricular septal defects. Genet. Med. 17, 554–560 (2015).
Article CAS PubMed Google Scholar
Li, Z. et al. Genomic landscape of Down syndrome-associated acute lymphoblastic leukemia. Blood https://doi.org/10.1182/blood.2023019765 (2023).
Article PubMed PubMed Central Google Scholar
Allen, E. G. et al. Maternal age and risk for trisomy 21 assessed by the origin of chromosome nondisjunction: a report from the Atlanta and National Down Syndrome Projects. Hum. Genet. 125, 41–52 (2009).
Article PubMed Google Scholar
Yang, Q. H. et al. Risk factors for trisomy 21: maternal cigarette smoking and oral contraceptive use in a population based case control study. Genet. Med. 1, 80–88 (1999).
Article CAS PubMed Google Scholar
Keen, C. et al. The association between maternal occupation and Down syndrome: a report from the National Down Syndrome Project. Int. J. Hyg. Envir. Heal 223, 207–213 (2020).
Article CAS Google Scholar
Christianson, R. E., Sherman, S. L. & Torfs, C. P. Maternal meiosis II nondisjunction in trisomy 21 is associated with maternal low socioeconomic status. Genet. Med. 6, 487–494 (2004).
Article PubMed Google Scholar
Hunter, J. E. et al. The association of low socioeconomic status and the risk of having a child with Down syndrome: a report from the National Down Syndrome Project. Genet. Med. 15, 698–705 (2013).
Article PubMed PubMed Central Google Scholar
Warren, A. C. et al. Evidence for reduced recombination on the nondisjoined chromosomes-21 in Down syndrome. Science 237, 652–654 (1987).
Article ADS CAS PubMed Google Scholar
Carey, J. C. Trisomy 18 and trisomy 13 syndromes. In Cassidy and Allanson's Management of Genetic Syndromes (eds Carey, J. C. et al.) 737–956 (Wiley-Blackwell, 2020).
Nicorici, D. et al. FusionCatcher – a tool for finding somatic fusion genes in paired-end RNA-sequencing data. Preprint at bioRxiv https://doi.org/10.1101/011650 (2014).
Barinka, J. et al. RNAseqCNV: analysis of large-scale copy number variations from RNA-seq data. Leukemia 36, 1492–1498 (2022).
Article CAS PubMed PubMed Central Google Scholar
Gu, Z. et al. PAX5-driven subtypes of B-progenitor acute lymphoblastic leukemia. Nat. Genet. 51, 296–307 (2019).
Article CAS PubMed PubMed Central Google Scholar
Poplin, R. et al. Scaling accurate genetic variant discovery to tens of thousands of samples. Preprint at bioRxiv https://doi.org/10.1101/201178 (2018).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Phan, L. et al. ALFA: allele frequency aggregator. www.ncbi.nlm.nih.gov/snp/docs/gsr/alfa/ (2020).
Bansal, V. & Libiger, O. Fast individual ancestry inference from DNA sequence data leveraging allele frequencies for multiple populations. BMC Bioinforma. 16, 4 (2015).
Article CAS Google Scholar
Genomes Project, C. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article ADS Google Scholar
Rabiner, L. R. A tutorial on hidden Markov-models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).
Article Google Scholar
Li, Z. et al. Inferring chromosome segregation error stage and crossover in trisomic disorders with application to Down syndrome. Github https://doi.org/10.5281/zenodo.15632851 (2025).
Article Google Scholar

Download references

Acknowledgements

We would like to thank the patients and families who participated in this study for donating specimens for research, and the clinicians and research staff for assistance in sample collection, processing, and curation. We also thank Jeremy Hunt, Ting-Nien Lin, and Pam McGill from the Department of Pharmacy and Pharmaceutical Sciences at St. Jude for technical support and Dr. Cherise Guess from the Department of Scientific Editing at St. Jude for proofreading. This work was in part supported by the National Institutes of Health (X01HL145686-01 [P.J.L., K.R.R.], R01CA249867 [K.R.R., P.J.L.], P30CA125123-14S4 [K.R.R.], 1R03HD103908-01 [J.J.Y.], U10CA98543 [M.L.L.], U10CA180886 [M.L.L.], U10CA98413 [M.L.L.], U10CA180899 [M.L.L.], U24CA114766 [M.L.L.] and U24CA196173 [M.L.L.]), Department of Defense (W81XWH-20-1-0567 [P.J.L.]), the Lynch family (K.R.R.), and the American Lebanese Syrian Associated Charities (J.J.Y.).

Author information

These authors contributed equally: Zhenhua Li, Wenjian Yang, Gang Wu.

Authors and Affiliations

Department of Pharmacy and Pharmaceutical Sciences, St. Jude Children’s Research Hospital, Memphis, TN, USA
Zhenhua Li, Wenjian Yang & Jun J. Yang
Center for Applied Bioinformatics, St. Jude Children’s Research Hospital, Memphis, TN, USA
Gang Wu, Ti-Cheng Chang & Zhongshan Cheng
Global Pediatric Medicine, St. Jude Children’s Research Hospital, Memphis, TN, USA
Meenakshi Devidas
Department of Pathobiology and Laboratory Medicine, University of Toronto, Toronto, ON, Canada
Mary Shago
Department of Genetics, University of Alabama at Birmingham, Birmingham, AL, USA
Andrew J. Carroll
The Ohio State University, Columbus, OH, USA
Nyla A. Heerema & Julie M. Gastier-Foster
Department of Pediatrics, Baylor College of Medicine, Houston, TX, USA
Julie M. Gastier-Foster, Lauren Sanclemente, Karen R. Rabin & Philip J. Lupo
Department of Pathology & Immunology, Baylor College of Medicine, Houston, TX, USA
Julie M. Gastier-Foster
Department of Pathology and Laboratory Medicine, Children’s Hospital Los Angeles, University of Southern California, Los Angeles, CA, USA
Brent L. Wood
Department of Pediatrics and Perlmutter Cancer Center, New York University Langone Medical Center, New York, NY, USA
Elizabeth A. Raetz
Department of Pediatrics and the Center for Childhood Cancer Research, Children’s Hospital of Philadelphia, Philadelphia, PA, USA
Stephen P. Hunger
The Perelman School of Medicine at the University of Pennsylvania, Philadelphia, PA, USA
Stephen P. Hunger
Ben Towne Center for Childhood Cancer and Blood Disorders Research, Seattle Children’s Research Institute and the Department of Pediatrics, Seattle Children’s Hospital, Fred Hutch Cancer Center, University of Washington, Seattle, WA, USA
Mignon L. Loh
College of Science, Oregon State University, Corvallis, OR, USA
Eleanor Feingold
Department of Human Genetics, Emory University, Atlanta, GA, USA
Tracie C. Rosser, Emily G. Allen & Stephanie L. Sherman

Authors

Zhenhua Li
View author publications
Search author on:PubMed Google Scholar
Wenjian Yang
View author publications
Search author on:PubMed Google Scholar
Gang Wu
View author publications
Search author on:PubMed Google Scholar
Ti-Cheng Chang
View author publications
Search author on:PubMed Google Scholar
Zhongshan Cheng
View author publications
Search author on:PubMed Google Scholar
Meenakshi Devidas
View author publications
Search author on:PubMed Google Scholar
Mary Shago
View author publications
Search author on:PubMed Google Scholar
Andrew J. Carroll
View author publications
Search author on:PubMed Google Scholar
Nyla A. Heerema
View author publications
Search author on:PubMed Google Scholar
Julie M. Gastier-Foster
View author publications
Search author on:PubMed Google Scholar
Brent L. Wood
View author publications
Search author on:PubMed Google Scholar
Lauren Sanclemente
View author publications
Search author on:PubMed Google Scholar
Elizabeth A. Raetz
View author publications
Search author on:PubMed Google Scholar
Stephen P. Hunger
View author publications
Search author on:PubMed Google Scholar
Mignon L. Loh
View author publications
Search author on:PubMed Google Scholar
Eleanor Feingold
View author publications
Search author on:PubMed Google Scholar
Tracie C. Rosser
View author publications
Search author on:PubMed Google Scholar
Emily G. Allen
View author publications
Search author on:PubMed Google Scholar
Stephanie L. Sherman
View author publications
Search author on:PubMed Google Scholar
Karen R. Rabin
View author publications
Search author on:PubMed Google Scholar
Philip J. Lupo
View author publications
Search author on:PubMed Google Scholar
Jun J. Yang
View author publications
Search author on:PubMed Google Scholar

Contributions

G.W., S.L.S., K.R.R., P.J.L. and J.J.Y. conceived the study. Z.L., W.Y., G.W., T.-C.C., Z.C., M.D., L.S., E.F., T.C.R., E.G.A., S.L.S. and J.J.Y. performed data analysis. M.S., A.J.C., N.A.H., J.M.G.-F., B.L.W., E.A.R., S.P.H., M.L.L., E.F., T.C.R., E.G.A., S.L.S., K.R.R. and P.J.L. provisioned study materials. All authors wrote, reviewed, and approved the manuscript.

Corresponding authors

Correspondence to Zhenhua Li or Jun J. Yang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Reporting Summary (download PDF )

Transparent Peer Review file (download PDF )

Source data

Source Data (download XLSX )

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Li, Z., Yang, W., Wu, G. et al. Inferring chromosome segregation error stage and crossover in trisomic disorders with application to Down syndrome. Nat Commun 16, 6316 (2025). https://doi.org/10.1038/s41467-025-61413-w

Download citation

Received: 11 December 2024
Accepted: 22 June 2025
Published: 09 July 2025
Version of record: 09 July 2025
DOI: https://doi.org/10.1038/s41467-025-61413-w