Sequence diversity lost in early pregnancy

Arnadottir, Gudny A.; Jonsson, Hakon; Hartwig, Tanja Schlaikjær; Gruhn, Jennifer R.; Møller, Peter Loof; Gylfason, Arnaldur; Westergaard, David; Chan, Andrew Chi-Ho; Oddsson, Asmundur; Stefansdottir, Lilja; Roux, Louise le; Steinthorsdottir, Valgerdur; Swerford Moore, Kristjan H.; Olafsson, Sigurgeir; Olason, Pall I.; Eggertsson, Hannes P.; Halldórsson, Gísli H.; Walters, G. Bragi; Stefansson, Hreinn; Gudjonsson, Sigurjon A.; Palsson, Gunnar; Jensson, Brynjar O.; Fridriksdottir, Run; Petersen, Jesper Friis; Helgason, Agnar; Norddahl, Gudmundur L.; Rohde, Palle Duun; Saemundsdottir, Jona; Magnusson, Olafur Th.; Halldorsson, Bjarni V.; Bliddal, Sofie; Banasik, Karina; Gudbjartsson, Daniel F.; Nyegaard, Mette; Sulem, Patrick; Thorsteinsdottir, Unnur; Hoffmann, Eva R.; Nielsen, Henriette Svarre; Stefansson, Kari

doi:10.1038/s41586-025-09031-w

Download PDF

Article
Open access
Published: 21 May 2025

Sequence diversity lost in early pregnancy

Nature volume 642, pages 672–681 (2025)Cite this article

38k Accesses
11 Citations
204 Altmetric
Metrics details

Subjects

Abstract

Every generation, the human genome is shuffled during meiosis and a single fertilized egg gives rise to all of the cells of the body¹. Meiotic errors leading to chromosomal abnormalities are known causes of pregnancy loss^2,3, but genetic aetiologies of euploid pregnancy loss remain largely unexplained⁴. Here we characterize sequence diversity in early pregnancy loss through whole-genome sequencing of 1,007 fetal samples and 934 parental samples from 467 trios affected by pregnancy loss (fetus, mother and father). Sequenced parental genomes enabled us to determine both the parental and meiotic origins of chromosomal abnormalities, detected in half of our set. It further enabled us to assess de novo mutations on both homologous chromosomes from parents transmitting extra chromosomes, and date them, revealing that 6.6% of maternal mutations occurred before sister chromatid formation in fetal oocytes. We find a similar number of de novo mutations in the trios affected by pregnancy loss as in 9,651 adult trios, but three times the number of pathogenic small (<50 bp) sequence variant genotypes in the loss cases compared with adults. Overall, our findings indicate that around 1 in 136 pregnancies is lost due to a pathogenic small sequence variant genotype in the fetus. Our results highlight the vast sequence diversity that is lost in early pregnancy.

Diagnostic yield of trio exome sequencing as a first-tier test for identifying genetic causes of pregnancy loss

Article 24 July 2025

Haplotyping-based preimplantation genetic testing reveals parent-of-origin specific mechanisms of aneuploidy formation

Article Open access 07 October 2021

Genetic deconvolution of fetal and maternal cell-free DNA in maternal plasma enables next-generation non-invasive prenatal screening

Article Open access 13 October 2022

Main

Germline mutations are transmitted from parents to their offspring⁵. They occur before the formation of a fertilized egg, are found in all cells of the offspring and can be assessed by sampling somatic tissues. Considerable knowledge exists on germline mutations that appear de novo in offspring (DNMs). The majority of DNMs are of paternal origin, although there are genomic regions within which parental contributions are almost equal⁶. DNMs contribute to disease risk⁷, their frequency correlates with parental age at conception^6,7 and they cluster at areas near recombination events^8,9. However, little is known about DNMs that prevent a fertilized egg from developing to term and it is unclear why some sites, or genes, lack sequence variants^10,11 and why some variants are never observed in homozygous state¹².

Recombination shuffles the genome between generations, enabling selection of certain haplotypes in favour of others¹³. Although this provides an evolutionary advantage, the recombination process comes at a cost, as it is mutagenic^8,9. Recombination failure during meiosis can also result in aneuploidies, that is, extra (trisomies) or missing (monosomies) chromosomes^1,14. A quarter of all pregnancies ends in pregnancy loss¹⁵ and approximately half of all losses are explained by single chromosome aneuploidies or an extra set of chromosomes (triploidies)¹⁴. We previously identified that a sequence variant in SYCE2 associates with recombination patterns and pregnancy loss¹⁶, with a more pronounced effect on longer chromosomes. Aneuploidies of larger chromosomes are rarer in pregnancy loss than in blastocysts from in vitro fertilization³. Overall, this indicates that there is considerable structural variation present in early pregnancy loss that is only accessible by directly sampling products of conception that are not carried to term.

Finding the genetic causes of chromosomally normal (euploid) pregnancy loss has also proved challenging. Previous genetic studies on euploid pregnancy loss have mainly focused on late losses or on pregnancies complicated by structural anomalies^17,18, while studies involving earlier losses have been limited by inaccessibility of fetal material and maternal cell contamination¹⁹. The role of small sequence variants (SSVs; small indels and single-nucleotide variants (SNVs)) and copy-number variants (CNVs) has therefore not been directly assessed at scale in early pregnancy loss.

Despite the hefty societal cost, early pregnancy loss is under-researched and there is a paucity of therapeutic interventions¹⁵. The Copenhagen Pregnancy Loss (COPL) study was initiated to improve the understanding of pregnancy loss by applying large-scale recruitment of consecutive patients with clinically diagnosed pregnancy loss, detailed phenotyping and comprehensive sampling of fetal and parental material. Here we present the results from our genomic study of 467 trios affected by pregnancy loss (fetus, mother and father), including 1,007 fetal samples and 934 parental samples (Fig. 1a). This enabled us to document the sequence diversity that is present only in early pregnancy loss and describe the complex interplay between meiotic recombination and point mutations.

**Fig. 1: Sequencing overview and fetal presence in pregnancy loss samples.**

The COPL cohort

Most of the pregnancy loss cases occurred in the first 12 weeks of gestation, as measured by time since last menstruation (Fig. 1b). If possible, multiple fetal samples were collected from each pregnancy loss. We sequenced DNA from 1,439 fetal samples from 664 loss cases, along with 1,111 parental samples of predominantly European background (83%), to an average depth of 36.5× (Fig. 1a and Supplementary Figs. 1 and 2). For 467 pregnancy loss cases, we sequenced at least one fetal sample and samples from both parents, with 2.2 samples sequenced on average per fetus (1,007 samples). Collected tissue was dissected and categorized as chorionic villi, fetal and unknown. Chorionic villus samples were the most common (52.4%; Fig. 1b) followed by tissues of unknown (28.1%) and fetal (19.5%) origin.

Tissue samples from pregnancy loss are often maternally contaminated, especially in samples from early loss and in those of unknown origin^19,20,21. We found that the paternal fraction (Supplementary Note 1) was less than 25% for 275 out of 1,007 fetal samples, indicating substantial maternal contamination in the cohort (Fig. 1c), comparable to previous studies¹⁹.

Triploidies and aneuploidies

Oversharing of alleles between the fetus and mother can indicate either maternal cell contamination or a maternally transmitted triploidy. Triploidies are life-limiting, and known causes of pregnancy loss, particularly in the first trimester^2,19,21,22. We found 59 pregnancy loss cases in which at least one fetal sample had higher than expected kinship with the mother, and 11 with the father (Fig. 1d). Maternally transmitted triploidy would present as alternating regions of both homologous chromosomes and of sister chromatids from the mother in the fetal genome (Fig. 2). To discern between maternal oversharing due to contamination and oversharing due to triploidy, we searched these losses for chromosomal crossovers using sequence variants specific to one of the parents (homologous-sister informative alleles; HS alleles; Fig. 2 and Methods). In 19 out of 59 pregnancy loss cases with maternal oversharing, we found numerous crossovers (>20) between sister chromatid state and homologous chromosome state due to meiotic recombination (Figs. 1d and 2). We also found numerous crossovers for all 11 pregnancy loss cases with paternal oversharing. The combined incidence (30 triploidies; 6.4% of the set) is consistent with previous reports^2,19,23.

We used whole-genome sequencing (WGS) data to search for aneuploidies among the 467 trios affected by pregnancy loss (Methods and Supplementary Note 2), and found them in 206 loss cases (44.1%, 380 fetal samples). Monosomy X (9.9%) and trisomy 16 (9.2%) were the most frequent aneuploidies (Fig. 3a and Supplementary Table 1), comparable with studies using karyotyping or SNP arrays^2,19,21,22. WGS also enables the detection of chromosomal structural variants that would not be captured with SNP arrays. We detected 19 large de novo CNVs (>200 kb) in 14 distinct pregnancy loss cases (23 samples) (Methods, Supplementary Fig. 3 and Supplementary Table 2). In five of the loss cases (11 samples), we detected deletion–duplication pairs that were probably part of the same event, three of which represent translocations between two different chromosomes (Fig. 3b and Supplementary Table 2). In two loss cases, we detected a deletion–duplication pair on chromosome 8 (Fig. 3c) that probably represents unbalanced translocations facilitated by non-allelic homologous recombination between the inverted and non-inverted sequence in heterozygous carriers on chromosome 8 (ref. ²⁴).

**Fig. 3: Characterization of triploidies, aneuploidies and CNVs.**

We ran structural variant detection^25,26 on the flanking regions (±5 kb) of the CNVs to annotate them (Methods and Supplementary Table 2). We found the exact breakpoints for 12 CNVs and read pairs supporting the classification of eight CNVs as part of four translocations (Supplementary Figs. 4–7). Moreover, we found read-alignment patterns of balanced translocations in fathers from two of these losses (Supplementary Figs. 6 and 7), which substantially increases the risk of loss in future pregnancies²⁷. None of the CNVs were near (<100 kb) a common fragile site (Supplementary Table 2). Out of the 14 distinct CNV events, 11 were in euploid losses and 6 of those fulfilled pathogenicity criteria (Methods).

Meiotic origin of the extra chromosomes

At meiosis I, homologous chromosomes segregate to opposite spindle poles, while sister chromatids remain connected at the centromeres²⁸ (Fig. 2a). As crossovers are suppressed at centromeres, the presence of both homologous chromosomes from the same parent at all loci near the centromere (pericentromeric) is indicative of a meiosis I error¹ (Extended Data Fig. 1). We found that HS alleles from both homologous chromosomes were generally present at trisomies of chromosomes 15, 16, 21 and 22 (Fig. 3d), pointing to meiosis I errors. Of the 19 maternally transmitted triploidies, 5 and 12 were consistent with meiosis I and II failure, respectively. By contrast, among the paternally transmitted triploidies, we detected homologous chromosomes at the pericentromeres for around half of the chromosomes per fetal sample, pointing to dispermy fertilization²⁹ (Fig. 3e). The two remaining maternally transmitted triploidies showed a similar pattern that might be the product of two independent ova and one spermatid²⁹.

We compared the crossovers identified in the extra chromosomes from triploidies with the sequence-level recombination map from Icelanders³⁰. We found that 27.2% (95% confidence interval (CI) = 23.7–30.6%) and 32.3% (95% CI = 26.6–37.9%; Fig. 4a) of crossovers identified in paternal and maternal triploidies occur within recombination hotspots, respectively (1.6% and 1.8% of the genome). This supports our interpretation that the alternating patterns of homologous-sister segments along the genome represent true crossovers. Although this is substantially less than the fraction of crossovers in recombination hotspots (74.9% and 71.1% for paternal and maternal crossovers, respectively) used to define the recombination map, this could be partly attributed to the window approach used in this study rather than identifying crossovers on the resolution of heterozygous markers.

**Fig. 4: Crossovers detected in aneuploidies and triploidies.**

To assess whether lost fetuses had a different recombination pattern compared with individuals who made it to term, we tallied the crossovers across the genome for triploidies and trisomies (Fig. 4b,c and Extended Data Fig. 2). We found that only 113 out of the 161 maternal trisomies had a crossover and three of the five paternal ones (Fig. 4c). The rate of crossover absence was higher for trisomies of chromosomes 21 and 22 compared with the rest (Fig. 4d; odds ratio (OR) = 4.5, P = 4.71 × 10⁻⁵, Fisher’s exact test). This has also been observed for oocytes derived from fetuses³¹, supporting that meiotic disjunction of chromosomes 21 and 22 suffers from fewer crossovers established during oocyte arrest. Among autosomes, these chromosomes contribute the fewest number of crossovers to genetic maps⁸.

Mutation detection

The WGS analysis enabled us to investigate the rate and mutational spectrum of DNMs (SNVs and indels) in the pregnancy loss cases by comparing fetal genomes with parental genomes. In 724 fetal samples with low maternal contamination (paternal fraction > 25%; paternal kinship coefficient > 0.2; 366 distinct pregnancy loss cases; Fig. 1a), we identified 75,345 DNMs, with an average of 103.3 DNMs per loss (95% CI = 100.7–105.9). We estimate a discrepancy rate of 8.0% (95% CI = 7.7–8.2%, binomial test) by comparing multiple fetal samples from the same pregnancy loss (Supplementary Note 3).

Discrepancies between samples can result from artifacts or post-zygotic mutations (Supplementary Note 3 and Extended Data Fig. 3). Using 42 pregnancy loss cases with proper fetal and two villus samples, we classified DNMs as post-zygotic mutations if they were non-constitutional (allelic balance (AB) < 50%) in at least one villus sample, and found 9.7 (P = 1.92 × 10⁻⁷, t-test) post-zygotic mutations per villus that were absent from the proper fetus. By contrast, we identified 1.6 (P = 1.07 × 10⁻⁶, t-test) post-zygotic mutations in villus samples that were also in the proper fetus, occurring before trophoblast branching in early embryogenesis. This is supported by their high frequency in the proper fetus (Supplementary Fig. 8). Villus samples in general had a lower allelic balance (−2.9%; P = 2.07 × 10⁻⁴⁰, t-test) and a higher number of DNMs (11.1 DNMs; P = 9.22 × 10⁻⁹, t-test; Fig. 5a and Extended Data Fig. 4) compared with proper fetal samples. We compared the mutational spectrum of the DNMs to mutational signatures identified in cancer samples³² and found that all samples with a detectable mutational signature 18 from COSMIC³³ (characterized by C>A mutations) were villus samples (P = 0.0005, Fisher’s exact test), consistent with previous studies of placentas³⁴. Overall, this indicates a considerable accumulation of somatic mutations in early development, confined to the trophoblast lineage.

**Fig. 5: DNMs detected in pregnancy loss.**

The AB of the DNM allele is a key metric in the quality control of DNMs⁶. The presence of extra chromosomes, as in the trisomies and triploidies, distorts the expected AB from 1/2 to 1/3 or 2/3, which impacts DNM detection. Correcting for this difference in sensitivity (Supplementary Note 3 and Supplementary Fig. 9), we found that the paternal and maternal triploid fetuses carried 40 (95% CI = 23.0–57.1) and 25.9 (95% CI = 15.8–35.9) more DNMs on average than the euploid fetuses, respectively. This is expected as the triploid fetuses have an extra set of chromosomes.

We phased 15,086 DNMs (71.7%) to the paternal and 5,967 (28.3%) to the maternal chromosomes of the fetus (Supplementary Note 3). The high paternal contribution of DNMs to offspring is well known^6,35 and, for a fetus receiving two sets of paternal chromosomes (paternal triploidy), we would expect a corresponding excess of paternal DNMs. Consistent with this expectation, we found that the paternal triploidies showed a higher fraction of paternally phased mutations (85.9%; 95% CI = 82.9–88.9%, block jackknife) compared with the euploid fetuses (71.4%; 95% CI = 69.8–73.0%, block jackknife; Fig. 5b). Furthermore, DNMs identified in maternal triploidies (59.5%; 95% CI = 55.3–63.7%, block jackknife) and trisomic chromosomes (54.5%; 95% CI = 46.4–62.5%, block jackknife) showed a lower paternal fraction than those in euploid fetuses.

The number of DNMs per proband is highly dependent on the parental ages at conception^6,8,36,37. We observed a similar age effect in the pregnancy loss cases (Fig. 5c and Extended Data Figs. 5 and 6) as in the Icelandic set^6,30. There is a higher number of DNMs (16.3; 95% CI = 14.8–17.8) in the pregnancy loss cases than in the Icelandic set after correcting for parental ages at conception, restricting to non-triploid samples and correcting for villus contribution (Extended Data Fig. 7). The villus contribution is consistent with independence from parental ages at conception, as expected for post-zygotic mutations (Extended Data Figs. 5 and 7). To examine this further, we estimated the mutational spectrum of villus contributions. We found that the highest villus contributions were for C>A (3.4 DNMs; P = 2.11 × 10⁻¹³, block jackknife) and indel (4.9 DNMs; P = 1.84 × 10⁻¹³, block jackknife) mutations (Extended Data Fig. 7 and Supplementary Table 3). To summarize, there is a greater contribution of post-zygotic mutations and potential false positives compared with the Icelandic set, but the strong parental age effects and paternal fraction indicate that most of the mutations identified in the pregnancy loss cases accumulated before completion of meiosis.

In the female germline, sister chromatids are formed in early development, in contrast to in the male germline, in which they are formed after puberty. If a mutation occurred before sister formation (pre-sister), it would be expected to be faithfully replicated on the sister chromatid (Fig. 5d). We would expect the AB of pre-sister DNMs to be 2/3 in the triploidies/trisomies (Fig. 5d), whereas DNMs not replicated on the sister chromatid (post-sister) would have an AB of 1/3. A cut-off of 50% is symmetric between the pre-sister and post-sister ABs and can be used to differentiate them. High-AB (>50%) DNMs within sister segments are a combination of pre-sister mutations and misclassified ones (false-positive DNMs or AB measurement errors), whereas high-AB mutations within homologous segments probably consist of only misclassified mutations as the expected AB is always 1/3. Thus, the difference in the prevalence of high-AB mutations between homologous and sister segments serves as an estimate of the fraction of pre-sister mutations (probably an underestimate; Supplementary Note 4). For maternally transmitted triploidies and trisomies, we found that the prevalence of high-AB DNMs was significantly higher within sister segments (9.3%; 95% CI = 8.0–10.7%, block jackknife) than homologous segments (2.7%; 95% CI = 2.2–3.3%, block jackknife; Fig. 5e and Extended Data Fig. 8). This indicates that 6.6% (95% CI = 5.1–8.0%, block jackknife) of the maternal mutations are pre-sister mutations.

Notably, we did not observe the sister/homologous state difference for high-AB DNMs in the paternally transmitted triploidies (0.1%; P = 0.77). On the paternal side, most DNMs are expected to accumulate in mitosis of spermatogonia before the formation of sister chromatids. We would therefore expect to see a high AB for half of the DNMs in the paternally transmitted triploidies if the extra set of chromosomes was derived from the same spermatogonium (Extended Data Fig. 9). However, we saw little or no high-AB DNMs within sister segments on the father’s side, indicating that the spermatozoa are derived from distinct spermatogonia (Fig. 5d), consistent with the equal fraction of homologous/sister states observed at centromeres of paternally transmitted triploidies (Fig. 3e).

We contrasted the mutational spectrum of DNMs from chromosomes with extra copies (trisomy–mother, triploid–father and triploid–mother) to DNMs from euploid fetuses and pairs of chromosomes not affected by trisomies/monosomies (euploid–chromosome pair; Fig. 5f; 3 × 3 × 8 = 72 tests). C>G mutations were found at a greater frequency among maternal trisomies compared with in the euploid–chromosome pair set (P = 0.002; euploid–chromosome pair). No other comparisons passed multiple-testing correction (Holm). Duplication of chromosome 16 is frequent among the aneuploidies (Fig. 3a), and we have shown previously⁶ that C>G mutations in the maternal germline accumulate at high rates on chromosomes 8 and 16. Furthermore, these C>G mutations at these C>G-enriched regions are more likely to occur in strand-coordinated mutational clusters near non-crossover gene conversions^6,8,30, which are the products of double-stranded breaks (DSBs). As DSBs can cause translocations, we searched the translocation cases and found strand-coordinated clusters at chromosomes 15 and 16 for the chromosome 15–chromosome 16 case (Fig. 3b), mostly consisting of C>A, C>G and C>T mutations (87%) of maternal origin (22 out of 22 phased DNMs). This indicates that chromosome 16 is susceptible to both point mutations and aneuploidies in the ageing oocyte due to DSBs.

Pathogenic genotypes

We searched for pathogenic SSVs in the DNM set (366 pregnancy loss cases) and found 26 SSV genotypes in as many loss cases (45 fetal samples) that classified as pathogenic or likely pathogenic based on criteria from the American College of Medical Genetics and Genomics³⁸. Of these, 23 were DNMs and 3 were biallelic predicted loss of function (pLoF) variants (Methods and Supplementary Table 4). For 11 of the lost fetuses carrying pathogenic DNMs, at least 1 proper fetal sample had been sequenced; for the remaining 12, only villus samples were available. One of the DNMs, a splice donor indel in FBLN1, was a post-zygotic mutation confined to a villus sample. Nonetheless, the mean AB of all pathogenic DNMs (0.495; non-triploid loss cases) was comparable to that of DNMs in proper fetal samples (0.489; non-triploid loss cases; P = 0.95), and the majority of phased variants was of paternal origin (6 out of 8, 75%), consistent with attributes of germline mutations⁵.

We compared the frequency of pathogenic SSVs in the cases of pregnancy loss to that in adult controls (7,760 trios; Methods) and found an enrichment of pathogenic SSVs in pregnancy loss compared with in the controls (OR = 2.98, P = 5.7 × 10⁻⁶, Fisher’s exact test). This enrichment was also significant after correcting for parental age (P = 4 × 10⁻³, χ² test, logistic regression) and was not observed for synonymous DNMs (P = 0.36, χ² test, logistic regression). The enrichments were 3.63 (P = 3.0 × 10⁻⁴) and 2.59 (P = 2.2 × 10⁻³) when restricting to euploid and aneuploid/triploid pregnancy loss cases, respectively. Of the 26 pathogenic SSVs identified, 12 were in euploid fetuses (12 out of 141; Supplementary Table 5 and Supplementary Note 5).

Considering the possibility that not all genes that are essential to early human development have been linked to postnatal phenotypes, we also searched the cases of euploid pregnancy loss for pLoF DNMs in genes highly constrained for LoF variants (Methods) and found an increase over the adult controls (OR, 4.0; P = 0.006, Fisher’s exact test). Three of the pLoF variants were in genes without an established link to disease: DDX5, ZMYM4 and HECTD1 (Supplementary Table 6). To better understand the role of these variants, as well as that of the pathogenic SSVs, in pregnancy loss, we annotated the genes with regards to previous evidence of lethality, relevance of the associated phenotypes to developmental defects, fetal expression of the genes³⁹ and evidence of lethality from mouse knockouts (Supplementary Tables 7–9). We found that, on average, the genes mutated in euploid fetuses had higher expression in fetal tissue than the ones in aneuploid/triploid fetuses (P = 6.9 × 10⁻¹⁹⁶, t-test; Extended Data Fig. 10).

One of the euploid fetuses was homozygous for a pLoF variant in DHCR7, a gene linked to autosomal recessive Smith–Lemli–Opitz syndrome⁴⁰. The variant is a splice acceptor variant (NM_001360.2, c.964-1G>C) that is described to result in miscarriage or neonatal death in homozygous state⁴⁰. DHCR7 c.964-1G>C exhibits a deficit of homozygosity among adults, that is, a lower number of observed homozygous individuals than expected based on frequency¹². In our study of 1.52 million genotyped European adults¹², DHCR7 c.964-1G>C had the most prominent homozygous deficit, with 92 homozygotes expected but none observed. Here we observed the c.964-1G>C variant in a homozygous state in 1 out of 366 trios, the majority of whom have European ancestry, with an expectation of 0.022 homozygotes (95% CI = 0–0.053; supplementary data 4 from ref. ¹²). This finding is consistent with our previously estimated fivefold increased risk of miscarriage among couples who are both heterozygous carriers of DHCR7 c.964-1G>C¹².

We detected one other biallelic pLoF genotype among the euploid lost fetuses, a compound heterozygous genotype in CPLANE1 (NM_023073.3, c.6354dup (p.Ile2119TyrfsTer2) and c.493del (p.Ile165TyrfsTer17)). CPLANE1 pathogenic genotypes have been detected in fetuses from early pregnancy loss⁴¹, and another pLoF variant in CPLANE1 had a suggestive deficit of homozygosity in our previous study (supplementary table 18 from ref. ¹²). We did not find any other homozygous deficit variants (supplementary table 21 from ref. ¹²) presenting as biallelic genotypes in our cohort. The biallelic genotypes in DHCR7 and CPLANE1 infer a high recurrence risk (at least 25%) and, in both cases, the couple had a history of recurrent pregnancy loss (three consecutive losses; Supplementary Note 6). DNMs are generally expected to have a lower risk of recurrence, unless there is evidence of parental mosaicism^42,43, which we did not observe in this set.

Discussion

Here we performed WGS analysis of 1,439 samples from 664 cases of early pregnancy loss and characterized the sequence diversity in 467 of the loss cases for which samples were available for the full trio. We found probable genetic causes in 254 out of 467 loss cases (55%), the majority of which were aneuploidies (44.1%), followed by triploidies (6.4%), pathogenic SSVs (3.3%) and de novo CNVs (1.3%) (Supplementary Table 5).

Most of the genetic causes originated on the maternal chromosome, mainly in the form of triploidies and trisomies, consistent with the well-described meiotic errors accompanying oogenesis. We observed an indication of interplay between point mutations and large structural variation. The oocytes are in dictyate arrest for decades¹ and accumulate mutation clusters in specific regions of the genome^6,44, for example, the p-arm of chromosome 8 and termini of chromosome 16. We found a translocation between chromosomes 15 and 16 (Fig. 3b) containing clusters of strand-coordinated DNMs, a likely byproduct of faulty DSB repair.

The extra chromosomes in the trisomy and triploidy cases gave us a unique view into recombination and mutational processes in the germline. We found that triploid lost fetuses carried 31.8 more DNMs on average than euploid lost fetuses, as expected given the extra set of chromosomes. However, these mutations stem from both homologous chromosomes of the parents in contrast to the DNMs observed in adult offspring, providing a more comprehensive assessment of the diploid germ cells. The presence of sister chromatids enabled us to estimate that 6.6% of maternal DNMs occurred before the formation of sister chromatids. Furthermore, we mapped crossover locations for the trisomies and triploidies, providing a complementary view of the recombination map of the human genome.

We found a threefold enrichment of pathogenic SSVs in the trios affected by pregnancy loss in comparison to adult trios, indicating that two-thirds of the pathogenic SSVs identified in the euploid lost fetuses are truly causative, explaining around 6% of those losses (8 out of 141). Assuming that approximately 1 in 17 (1/0.06) euploid loss cases is explained by SSVs and 1 in 8 pregnancies is lost with a euploid fetus, our findings indicate that at least 1 in every 136 (17 × 8) pregnancies will be lost due to a pathogenic SSV. There is an excess of DNMs in the trios affected by pregnancy loss compared with adult trios, partly due to false-positive calls and partly attributed to post-zygotic mutations specific to villi. Nevertheless, most of our pathogenic SSVs were consistent with germline origin and were found in greater frequency in trios affected by pregnancy loss than in adult trios. This shows that the contribution of DNMs to pregnancy loss is determined by whether they occur in essential genomic sequences rather than their overall count. Larger cohorts of sequenced euploid fetuses are needed to identify variants with incomplete penetrance and/or polygenic effects. We realize that high-depth sequencing is cost prohibitive at scale, but low-pass sequencing (around 1×) coupled with imputation is comparable to chip genotyping in terms of cost and quality⁴⁵. Furthermore, long-read sequencing could aid detection of structural variants in euploid lost fetuses without a pathogenic SSV.

Studies of the contribution of pathogenic SSVs to pregnancy outcomes^17,18 have mainly focused on pregnancies with structural abnormalities of the fetus, typically discovered after the first trimester. It is hard to compare such phenotypes with the ones found in cases of early pregnancy loss, including gestational age, fetal anatomical structure, and placental morphology. Notably, we found that one of the pathogenic DNMs was a post-zygotic mutation confined to villi, which are a precursor of the fetal part of the placenta. The mutation occurred in FBLN1 (also known as fibulin-1), which encodes a structural component of blood vessel walls⁴⁶ and has particularly high expression in placental trophoblast giant cells (Supplementary Table 7), suggesting that perhaps this pregnancy loss was triggered by abnormal placental function.

We detected homozygous and biallelic genotypes in genes with a strong deficit of biallelic loss of function in adults, and a high risk of recurrence. We also considered the possibility that many genes that are important for early development might not yet be identified as such. We found pLoF DNMs in three euploid lost fetuses in constrained genes that are highly expressed during the fetal stage³⁹. Two of these genes, DDX5 and HECTD1, encode proteins that are involved in cellular processes that are essential for early development, such as DNA replication⁴⁷ and ubiquitylation⁴⁸, but they were not associated with postnatal phenotypes at the time of submission. During revision of this Article, variants in HECTD1 were described to cause neurodevelopmental defects and even infant death⁴⁹, further illustrating the potential of sequencing fetal material from pregnancy loss cases to identify gene–disease connections.

There are considerable differences between aneuploidies detected in blastocysts and in fetuses from pregnancy loss cases³. We are therefore probably still missing substantial sequence diversity that is lost in the period from implantation to a clinically recognized pregnancy. Here we have nonetheless shown that sequencing early products of conception that succumb to negative selection can identify essential genes and characterize their function in early development.

Pregnancy loss has been considered to be the natural fate of fetuses with severe genetic defects, making therapeutic interventions unwarranted. However, we are unable to explain 45% of sporadic unselected clinical pregnancy loss cases using WGS, pointing to other factors in the aetiology of pregnancy loss. Given the high incidence of euploid pregnancy loss cases without a clear monogenic cause in the fetus, there is an urgent need to understand their aetiologies and develop therapeutic interventions that can ensure healthy live births. This provides value not only for the couples wanting a child, but also for the many societies that are challenged by decreasing fertility rates and rapidly ageing populations⁵⁰. Continued sampling of trios and biomarkers throughout pregnancy is needed to disentangle the complex interplay between the fetal and maternal genomes and the environment, and the cascade of events sparked by a single fertilized egg.

Methods

Cohort

The COPL study is a prospective cohort study, with the first patients recruited in November 2020 (described in detail previously²⁰). Women with a confirmed or suspected pregnancy loss were referred by a physician to a gynaecological department at one of three hospitals in Denmark: Copenhagen University Hospital Hvidovre, Herlev or North Zealand.

Women were included in the study if older than 18 years, if the pregnancy loss was before gestational age 22 weeks + 0 days (154 days) and if an intrauterine pregnancy had been confirmed by ultrasound, including anembryonic pregnancy as per definition. The participants provided both oral and written informed consent. Excluded from the study were pregnancies of unknown location, extrauterine pregnancies and molar pregnancies. The study was approved by the Health Research Ethics Committees for the Capital Region of Denmark (H-18024745) and followed the General Data Protection Regulation governance.

Sampling

The woman and her male partner (if possible) had venous blood drawn in 4 ml EDTA tubes that were kept at 5 °C until transport in tubes with cushioning material. All of the blood samples from the woman were drawn before initiating treatment for the pregnancy loss while fetal material was in utero, or within 24 h of passage of the fetal material. In cases of spontaneous or medically induced (mifepristone plus misoprostol) emptying of the uterus (53%), the woman was instructed (both orally and in writing) to collect all pregnancy loss tissue at home and place it in the fridge at 5 °C before bringing it to the hospital within 1 week. If she had a surgical removal (47%), COPL staff collected the material at the hospital and placed it at 5 °C. In cases of second-trimester loss, induction and delivery took place at the hospital, and clinical staff collected biopsies from the fetal foot. COPL staff packed and shipped the tissue in padded envelopes to the University of Copenhagen (maximum transportation time 2 h), where it was stored at 4 °C.

We used a previously described method to extract DNA from fetal material²⁰. In brief, fetal tissue samples were washed several times in sterile PBS to remove maternal blood before being examined under a stereomicroscope to identify fetal tissue and/or chorionic villi. In some cases, the tissue could not be identified as either and was designated unknown. Some of these unknown samples were probably blood clots and maternal decidua. The fetal tissue or villi were dissected into pieces of 0.5 cm² and then subjected to three consecutive PBS washes before being placed into a 2 ml cryotube (Nunc, Thermo Fisher Scientific, 10577391), submerged in liquid nitrogen for snap-freezing and then stored at −80 °C. For DNA extraction, we used the DNeasy Blood and Tissue 96 with TissueLyser LT (Qiagen, 69504) according to the instructions provided by the supplier. In brief, the samples were put on dry ice and the frozen tissue was moved to a 2 ml Safe-Lock Eppendorf tube (Eppendorf, 0030 123.344), as recommended by the supplier. A single bead (Stainless Steel Beads, Qiagen, 69989) and appropriate buffers from the DNeasy kit were added to the tube containing the frozen tissue together with appropriate buffers, which was bead milled for 1 min at 30 Hz using a 5 mm steel bead on a TissueLyser LT (Qiagen, 85600). The remaining steps were performed according to the recommendations of the supplier. Parental blood samples and extracted fetal DNA were shipped to deCODE, Iceland, on dry ice.

All information on the participants and samples was stored in a secure cloud (REDCap) in the Capital Region of Denmark. Blood tubes were labelled with individual bar codes, a pseudonymized code, and whether it was a maternal or paternal sample. Information about the last menstruation (gestational age–last menstruation; GA–LM), pregnancy loss type (missed miscarriage or spontaneous ongoing miscarriage) and gestational age at the time of loss was gathered by interview and patient records on inclusion.

WGS analysis

Sequencing libraries were prepared using the NEBNext Ultra II PCR-free kit (New England Biolabs). In brief, 0.5–1 µg of genomic DNA was fragmented in 96-well TPX-AFA plates (Covaris) to a mean target size of 450–550 bp using the Covaris LE220plus instrument. End repair and A-tailing was performed in a single step followed by ligation of unique dual-indexed sequencing adaptors (IDT for Illumina) and two rounds of SPRI-bead purification (0.6×) using the Hamilton STAR NGS liquid handler. The quality (concentration and insert size) of the sequencing libraries was determined using the LabChip GX (96-samples) instrument (Perkin Elmer). Sequencing libraries were pooled appropriately using Hamilton STARlet liquid handlers and sequenced using the Illumina’s NovaSeq 6000 instrument. We performed paired-end sequencing on the S4 flowcell (v1.0 chemistry), resulting in read lengths of 2 × 150 cycles of incorporation and imaging, in addition to 2*8 index cycles. Real-time analysis involved conversion of image data to base calling in real time. All of the steps in the workflow were monitored using an in-house laboratory information management system with barcode tracking of all of the samples/plates and reagents.

Alignment and sequence variant calling

The WGS data were aligned against the human reference genome (hg38) as previously described⁵¹ and managed with an in-house laboratory information management system. In brief, reads were aligned with BWA mem⁵² (v.0.7.10) and marked for duplicates with Picard tools (v.1.117). The aligned reads in BAM format for fetal and parental samples were used as input to GraphTyper⁵³ (v.2.7.5), the average coverage of the BAM files is shown in Supplementary Fig. 2.

Paternal fraction estimation

VCF files from GraphTyper were parsed to identify genotypes where the mother was homozygous, and where the father carried an allele absent from the mother. Sites where the genotype of the parents passed the following quality thresholds were considered for downstream analysis; a genotype quality greater than 20 and depth greater than 20. Furthermore, we required the AB to be within [30%, 95%] for heterozygous calls; and within a range of [0, 5%] or [95%, 100%] for homozygous reference or alternative calls, respectively.

For sites that passed these quality criteria, we counted the number of reads in the fetus supporting the paternal allele (Y), conditioned on the genotype of the father and the DP genotype field from the fetus (D). We then regressed the number of reads supporting the paternal allele in the fetus on allele counts of the paternal allele in the father (A). For an uncontaminated euploid fetus, the slope and intercept of this regression (Y = D × A) would be expected to be 1 and 0, respectively. The slope is expected to decrease with lower paternal contribution, the same as for maternal contamination or maternal triploidies.

Coverage of the sex chromosomes (allosomes) can be informative about the maternal contamination in cases of pregnancy loss with male fetuses (Supplementary Fig. 10). Consistent with expectation, we found that fetal samples with 20% of the allosome coverage on the Y chromosome were predominantly of low contamination (90.0% paternal fraction) compared with those without Y chromosome coverage (55.9% paternal fraction; P = 2.7 × 10⁻⁶¹).

Estimation of genetic ancestry

Genetic ancestry of parental samples was estimated by running ADMIXTURE⁵⁴ v.1.3.0 in supervised mode using 1000 Genomes⁵⁵ CEU (Utah Caucasian, European), CHB (Beijing Han, East Asian), ITU (Indian Telugu, South Asian), PEL (Peruvian, Native American) and YRI (Nigerian Yoruba, African) as training populations. Markers were pruned for pairs in strong linkage disequilibrium by excluding long-range high-linkage-disequilibrium regions⁵⁶ and running PLINK⁵⁷ v.1.90b6.15 --indep-pairwise 200 25 0.4. ADMIXTURE was run on batches of ten test individuals at a time.

Coverage-based CNV calling

The CNV caller used for the pregnancy loss set is based on sequence coverage. Its aim is to find CNVs that are long enough for coverage to give enough evidence for calling them. To guard against too high variance in determining the copy-number state at a locus, the sequencing read coverage is binned into 1 kb regions, which is further normalized to overall genomic coverage (using the autosomes).

Sequence coverage shows dependency on G/C content and AT repeat content per region and depends on the fragment length distribution in sequence libraries. Most of the dependency can be captured by the number of G/C base pairs and the number of repeated AT motifs in each read pair fragment⁵⁸. To correct for G/C content, we considered 400 bp non-overlapping windows across the genome, resulting in a table of number of reads starting at each 400 bp window, classified according to the number of G/C base pairs and the number of AT repeats. After coverage normalization, this gives a correction factor for each 400 bp window. Each 1 kb bin consists of a sequence of overlapping 400 bp fragments, each with a corresponding correction factor from the before mentioned table. The average of those correction factors is used as the correction factor for the 1 kb bin. This correction factor is applied to correct each bin, giving a G/C–AT-corrected normalized binned coverage in 1 kb bins, which are then used in the downstream CNV analysis.

The input data to the CNV caller consist of a series of G/C–AT-corrected normalized 1 kb coverage values for each sample on each chromosome. Supplementary Fig. 11 shows four such series at a locus. Each datapoint has an underlying count of sequencing reads that are assumed to be approximately Poisson distributed. Normalization and G/C–AT correction apply a scaling factor resulting in a scaled Poisson variable. The mean of the Poisson variable depends on the copy-number state of the sample in the bin, giving a mixture distribution. A few parameters specify the mean of each component in the mixture:

$$\alpha \times y \sim \mathop{\sum }\limits_{k-0}^{n}{\pi }_{k}\times {\rm{Poisson}}(\alpha \times \beta +\alpha \times \lambda \times {\mu }_{k})$$

where y is the normalized coverage per 1 kb bin, μ_k is a multiplier per copy-number state, α is the inverse scaling factor, β is a bias factor (usually giving the mean of the copy-number 0 state since μ₀ is usually 0), λ controls the distance between copy number states and π_k is the prior probability of copy-number state k. For Illumina sequencing data, we use 7 states, with μ_k = 0.5 × k.

Each bin gets their own α, β, λ and π_k values, which are set to maximum-likelihood estimates for the data for all samples within the bin. A further shift parameter is sometimes used that specifies a shifting of the data on the Poisson scale (when going to the Poisson scale from the y scale, the shift is applied). The main purpose of that parameter is to fit a higher variance in the copy-number 0 cluster than a Poisson variable with mean close to 0 can explain. Supplementary Fig. 12 shows histograms for the CNV in Supplementary Fig. 11. The copy-number clustering for copy-number states 0 and 1 (as well as 2 of course) are very clear.

The actual CNV calling is then run when the model above has been fitted. Each sample is run with a hidden Markov model (HMM) with states corresponding to the copy-number states⁵⁹. Within each state, the model above is used for the emission probability given the copy-number state. The transition probabilities have a parameter specifying the probability of transitioning to a different state. The Viterbi HMM algorithm is used to find the most probable path through the HMM for each sample and, where the Viterbi path diverges from the normal copy number (2 for autosomal chromosomes), a CNV is called. The algorithm is run twice, first with a stringent transition probability (1 × 10⁻⁶) to find longer variants without splitting them into a series of shorter ones, and then with a more relaxed transition probability (0.001) to allow finding shorter variants. CNVs with similar boundaries are grouped together and the HMM is used to estimate which CNVs correspond to the same event and what is the most likely boundary for each such group. The probability model can be used to calculate the posterior likelihood for each copy-number state for each CNV, which can be used in downstream phasing and imputation.

Note that informative structural variants not captured by our copy-number analysis may have been missed because additional rearrangement analysis was not performed.

Sensitivity of the coverage-based CNV detection

To assess our sensitivity for detecting CNVs in the presence of contamination, we simulated duplications by switching between a pair of uncontaminated fetal sample with trisomy 16 and maternal sample along the chromosome (Supplementary Fig. 13). To simulate deletions, we switched between a male fetal sample and maternal sample along chromosome X. Contamination was simulated by taking the weighted average of the maternal and the fetal corrected coverage value for simulated CNV segments. We considered the following combinations of CNV sizes (5 kb, 10 kb, 20 kb, 50 kb 100 kb, 200 kb and 500 kb) and contamination values (0, 10%, 20%, 30%, 40%, 50%, 60% and 70%). We determined that we found a simulated CNV if 50% or more of the windows within the CNV were called in deletion state for deletions and duplication state for duplications in the output from the HMM program. We found 89.4% (95% CI = 79.4–95.6%) of the simulated duplications and 96.3% (95% CI = 94.5–97.6%) of the simulated deletions of size 10 kb at 30% contamination; other combinations are shown in Supplementary Fig. 13.

Combining segments and phasing

The calls from the HMM algorithm in the fetal samples were in the form of a series of small segments. Similar to the Battenberg algorithm⁶⁰, we joined these disjoint segments through phase-informative sequence variants. For the joining, we assessed the allele-specific depth in the fetal sample limited to variants where parents were homozygous for different alleles.

Letting C_MM be the measured maternal coverage, C_M the actual maternal coverage and C_CM the coverage that is due to maternal contamination, then

$${C}_{{\rm{MM}}}={C}_{{\rm{M}}}+{C}_{{\rm{CM}}}$$

Letting C_P be the paternal coverage, then:

$${C}_{{\rm{T}}}={C}_{{\rm{MM}}}+{C}_{{\rm{P}}}$$

$${C}_{{\rm{T}}}={C}_{{\rm{M}}}+{C}_{{\rm{CM}}}+{C}_{{\rm{P}}}$$

C_T represents the total coverage of the phase-informative variants (limiting to homozygous variants in either parent) in the fetal sample.

Assuming a diploid fetal sample, that is,

$${C}_{{\rm{M}}}={C}_{{\rm{P}}}$$

Then, we can get an estimate of the global maternal contamination coefficient C_GLOB

$${C}_{{\rm{MM}}}={C}_{{\rm{P}}}+{C}_{{\rm{CM}}}$$

$${C}_{{\rm{GLOB}}}={\widehat{C}}_{{\rm{CM}}}={C}_{zrmMM}-{C}_{{\rm{P}}}$$

For each segment, the maternal contribution was then corrected for C_GLOB. If the maternal and paternal contributions deviated from the expected 50:50 ratio, within the predefined segments, those segments were assigned either a duplication or deletion status and assigned to the corresponding parent (phased). All calculations for the refinement of aneuploidies and CNVs were performed in R v.4.4.3, refined segments were combined into CNVs using Python v.3.9.21.

The pathogenicity of identified CNVs was estimated based on criteria from the ACMG⁶¹.

CNV breakend annotation with GraphTyper2

We discovered structural variants with Manta (v.1.6.0), merged the variant sites over all individuals with svimmer (git hash 2af9ccf1ac056b9516c25ec4a0121657a9f9ab8e), and finally genotyped them jointly across the entire set with GraphTyper2 (v.2.7.7). We considered all types of structural variants within 5 kb of CNV breakends. We searched for structural variant DNM candidates using the same criteria as described in the ‘DNM detection’ section with two exemptions: a more permissive allelic balance cut-off (10%) and a lower minimum number of reads supporting the DNM allele (6). If we found a DNM candidate among the structural variants of the CNV carrier within this region, then we set that as the position of the CNV breakpoint Supplementary Table 2. To search for balanced translocations present in the parents, we manually curated the translocation calls by GraphTyper with an allele count less than 10. We found exact breakpoints of the two balanced translocations that are present in parents (Supplementary Figs. 6 and 7).

Kinship and oversharing

We estimated kinship between fetal samples and parental samples using KING⁶² (v.2.1.5) on SNVs in subsampled 50 kb regions of the genome (1%). We considered only biallelic SNVs, with allele counts greater than four and an AAScore from GraphTyper of greater than 0.5.

We searched for indication of triploidy based on oversharing between the fetal sample and either of the parental samples, that is, a higher kinship coefficient than expected for parent–offspring relationship (k = 0.25). If k_mat represents the kinship coefficient between the fetal sample and the maternal sample and k_pat represents the kinship coefficient between the fetal sample and the paternal sample, we considered the following scenarios:

$${k}_{{\rm{pat}}} > 0.3\,{\rm{and}}\,{k}_{{\rm{mat}}} > \,0.2$$

$${k}_{{\rm{pat}}} > 0.2\,{\rm{and}}\,{k}_{{\rm{mat}}} > \,0.3$$

as potential triploidies. We then used the change in variance of allelic balance, as described in the next section, across the genome to further refine these potential triploidies.

Detecting crossovers between chromosomal states

We defined phase-informative variants as variants present in heterozygous state in one parent and homozygous in the other. In the following method, we restricted to phase-informative heterozygous variants in the parent transmitting the extra chromosome, where the other parent had a homozygous genotype.

If two sister chromatids from the transmitting parent were present at a given locus, we expected the AB of HS alleles at that locus to be either 0 or 2/3, depending on whether or not the sisters carried the HS allele (Fig. 2b). If both homologous chromosomes were present at a locus, we expected the AB to always be 1/3, considering the presence of an HS-allele in the transmitting parent. The state of homologous chromosomes has much lower variance than the state of sister chromatids, as there is no variability in sampling the homologous chromosomes. The sampling variance of the homologous chromosome state should be due to only sequencing variability, that is, reads supporting the alternative allele in the fetal sample.

We let Y be the observed coverage of the phase-informative variant in the fetal sample. T is a Bernoulli variable denoting the transmission of the phase-informative variant from the parent to the fetus. Then, by the law of total variance, we can partition the variance based on transmission T:

$$V[Y]=E[V[Y| T]]+{V}[E[Y| T]]$$

If we assume the read coverage in the fetus given transmission is determined with a Poisson variable:

$$Y| T=1 \sim {\rm{Poisson}}(\mu )$$

Then:

$$\begin{array}{c}E[V[Y| T]]={P}_{T}\times V[Y| T=1]+(1-{P}_{T})\times V[Y| T=0]\\ \,\,\,\,\,\,=\,{P}_{T}\times V[Y| T=1]+(1-{P}_{T})\times 0\\ \,\,\,\,\,\,=\,{P}_{T}\times \mu \end{array}$$

The other part of the sum will be:

$$\begin{array}{c}V[E[Y| T]]={P}_{T}\times {(\mu -{P}_{T}\cdot \mu )}^{2}+(1-{P}_{T})\times {(0-{P}_{T}\cdot \mu )}^{2}\\ \,\,\,\,\,=\,{P}_{T}{\mu }^{2}{(1-{P}_{T})}^{2}+{P}_{T}^{2}{\mu }^{2}(1-{P}_{T})\\ \,\,\,\,\,=\,{P}_{T}{\mu }^{2}-2\times {P}_{T}^{2}{\mu }^{2}+{P}_{T}^{3}{\mu }^{2}+{P}_{T}^{2}{\mu }^{2}(1-{P}_{T})\\ \,\,\,\,\,=\,{P}_{T}{\mu }^{2}-{P}_{T}^{2}{\mu }^{2}\\ \,\,\,\,\,=\,{\mu }^{2}{P}_{T}(1-{P}_{T})\end{array}$$

Taken together:

$$V[Y]={P}_{T}\cdot \mu +{\mu }^{2}{P}_{T}(1-{P}_{T})$$

If both homologous chromosomes are present (P_i = 1) at 30× coverage in the fetus (μ = 10) then we expect:

$$V[Y]=1\times 10+{10}^{2}\times 0=10$$

By contrast, in the sister state (P_r = 0.5) the variance is much higher:

$$V[Y]=0.5\times 20+{20}^{2}\times 0.25=110$$

We used this change in variance of the allelic balance along the genome to find crossovers between the homologous chromosome states and sister chromatid states. We binned the number of phase-informative variants in 1 kb windows, and calculated the average allelic balance. We excluded 1 kb windows that overlapped larger 100 kb windows with sequence variants with low average quality metrics (AAScore < 0.8 or Carrier_regression_beta < 0.7; see the ‘DNM detection’ section for details of AAScore and Carrier_regression_beta). We then used a changepoint technique on the 1 kb windows that passed the exclusion criteria to detect changes in variance in the 1 kb windows (cpt.var from the changepoint package in R). We used the following criteria:

A minimum segment length of 1000 1 kb windows (1 Mb)
A maximum of eight changepoints per chromosome
The PELT method for detecting multiple changepoints
Our own penalty function (penalty = “manual”) with 10 × log(n) (pen.value = “10⋅log(n)”) where n is the number of windows per chromosome.

DNM detection

We searched sequence variants and extracted DNM candidates in a similar manner as before^6,8 for 9,651 Icelandic trios by comparing the genotypes of the parents and offspring. The DNM candidates included both SNVs and indels (Supplementary Fig. 14). In brief, we defined a DNM candidate with permissive cut-offs for the genotype of the proband requiring an allele balance greater than 0.25 and depth of 12 reads at the position (supporting either the reference or alternative allele). For the genotypes of the parents, we required at least 12 reads, a maximum of one read supporting the alternative allele and the allelic balance to be less than 5%. Likely (N_LIK) and possible (N_POSS) carriers of the DNM allele outside the descendants of the parent pair were defined as previously⁶. We restricted to DNM candidates with less than 50 likely carriers and either less than 10 possible carriers or a N_LIK/N_POSS ratio greater than 80%. We tuned the DNM candidate filtering by using segregation of DNM candidates in three generation families (2,042 probands) and the following quality covariates in a generalized additive model with a logistic link:

AAScore: prediction probability from GraphTyper that the variant is a true positive.
Carrier_regression_beta: slope from the alternative allele depth regression for the sequence variant. To estimate the quality of the sequence variants, we regressed the alternative allele counts (AD) on the depth (DP) conditioned on the genotypes (GT). For a well-behaving sequence variant, the mean alternative allele count for a homozygous reference genotype should be 0; for a heterozygous genotype, it should be DP/2; and, for a homozygous alternative genotype, it should be DP. Under the assumption of no sequencing or genotyping error, the expected value of AD should be DP conditioned on the genotype; in other words, an identity line (slope 1 and intercept 0). Deviations from the identity line indicate that the sequence variant is spurious or somatic.
Carrier_regression_alpha: intercept from the alternative allele depth regression for the sequence variant.
Proband_het_AB: the allelic balance of the proband.
MaxAAS: the maximum read support for the sequence variant across all individuals.
Alignment_Alt_Reads: the number of reads supporting the alternative allele. This covariate and the following covariates were derived by identifying the reads in the BAM files supporting DNM allele.
Alignment_Alt_Unique_Positions: the unique number of starting positions for the reads supporting the alternative allele.
Alignment_Alt_Soft_clipped: the number of soft clipped bases (S in CIGAR string).
Alignment_Alt_Matched_bases: the number of matched bases (M in CIGAR string).
Alignment_Alt_Score_diff: the difference of the alignment score of the best and the second best hit as reported by BWA mem.
Alignment_Alt_Pair_sw_nm: the pairwise mismatches between reads supporting the alternative allele using Smith Waterman implementation in SeqAn⁶³.
Alignment_Alt_Pair_align: the number of bases in the pairwise alignments.

With the following formula for the gam function from the mgcv R package⁶⁴:

threegen_Consistent_hs~ I(cut(alignment_Alt_Unique_Positions,c(-1,2,4,8,10,Inf)))+ s(I(AAScore))+ s(Carrier_regression_beta)+ s(Carrier_regression_alpha)+ I(ifelse(alignment_Alt_Reads>0, (alignment_Alt_Score_diff/alignment_Alt_Reads)>10, FALSE))+ I(ifelse(alignment_Alt_Pair_align>0, (alignment_Alt_Pair_sw_nm/alignment_Alt_Pair_align)>0.05, TRUE))+ I(ifelse(alignment_Alt_Matched_bases>0, alignment_Alt_Soft_clipped/alignment_Alt_Matched_bases>0.5, TRUE))+ s(Proband_het_AB)+ s(ifelse(MaxAAS>15, 16, MaxAAS))+ I(NPOSS == 0)

We restricted to instances of the DNM candidates in cases in which we saw both of the proband’s haplotypes at a locus transmitted to the offspring of the proband (see figure 1c in ref. ⁶). In brief, if the DNM is a true germline variant, then the allele of the DNM candidate should be present in one of the proband’s offspring. On the other hand, if it is absent from the children, then this assay suggests that the DNM candidate is a false-positive DNM call (a more detailed description was reported previously⁶). Like before, we fitted the generalized additive model using the mgcv R package⁶⁴.

We used the cut-off of 0.5 to call high-quality DNM candidates for the predicted probability of correct segregation in three generation families. To validate the false-positive detection rate of the DNMs, we also used the genotype consistency between pairs of monozygotic twins. We found that 3.8% of DNMs were unobserved in the monozygotic twin of the proband. Note that this approach overestimates the false-positive rate as there are instances of high frequency post-zygotic mutations that differ between pairs of monozygotic twins⁶⁵.

Identifying DNMs in COPL trios

The AB of DNM candidates is a crucial quality-control metric for discerning between false positive DNMs (that is, somatic and genotyping artifacts) and DNMs that transmit to the next generation⁶. However, in the case of the triploidies, trisomies and the maternally contaminated samples, the AB is expected to be lower than 50% for genuine DNMs. This is not compatible with the generalized additive model applied on the deCODE trios that uses the AB; we therefore decided on a simpler threshold strategy to define high-quality DNMs in the COPL data. We defined a high-quality DNM in the COPL set as one that satisfies AB of proband > 0.25; NPOSS < 10 and AAScore > 0.5. To compare the subsets in COPL, we calculated the detection power to detect DNMs per fetal sample using the ratio of the phase-informative markers and a binomial distribution (Supplementary Fig. 9). We considered only samples with paternal fraction > 25% and excluded a single sample with more than 800 DNMs after quality filtering.

Phasing of mutations in Icelandic trios

Like before, we used two complementary approaches to phase DNMs^6,8: one by tracking of the transmission of the DNMs to the offspring of the proband (142,069 DNMs; three-generation phasing) and another one using read tracing of the DNM allele to phase-informative alleles (246,877 DNMs; read phasing). We combined the phase of the DNMs from the different approaches, resulting in 333,365 phased DNMs; if the phase of DNMs differed between the approaches, they were set to unphased DNMs in the downstream analysis.

Phasing of mutations in COPL trios

We read-traced the DNM allele as with deCODE data, except for one crucial difference—we considered only reads linking the DNM allele to nearby phase-informative sequence variants, if the reads were linked to an allele private to one of the parents. This was done to avoid phase ambiguity due maternal contamination or extra chromosomes.

Annotation of pathogenic variants

Sequence variants were annotated on the basis of release 100 of the Variant Effect Predictor (VEP)⁶⁶ using RefSeq gene annotations⁶⁷. We included SNVs and small indels (<50 bp) at coding and splicing regions in the autosome. We required that the variants were supported by at least six sequence reads, with at least three reads supporting the alternative allele. For indels, we required an AB of at least 0.4. For heterozygous variants, we considered pLoF (stop-gained, frameshift indels, splice essential variants) and missense DNMs in known autosomal-dominant disease genes (human phenotype ontology code HP:0000006 for autosomal-dominant disease genes). For missense DNMs, we also required either that the gene was highly constrained for missense variants (95th percentile over all genes, corresponding to a gnomAD⁶⁸ v.4.1.0 missense z score of over 3) or that the variant was nearby (within <10 bp) a previously reported pathogenic/likely pathogenic missense variant on ClinVar (with at least one ClinVar submission supporting the variant as pathogenic/likely pathogenic and no benign/likely benign submissions).

We also included homozygous and compound heterozygous pLoF variants in known autosomal recessive disease genes (based on the code HP:0000007). For homozygous variants, we required the position to be covered by at least eight sequence reads, and for the alternative allele to be supported by at least eight reads. For compound heterozygous variants, we required the positions to be covered by at least four reads, and each variant to be supported by at least two sequence reads. To detect compound heterozygous genotypes, we performed phasing of the variants to paternal and maternal alleles. We also included genotypes where we could not phase both variants, allowing for the possibility of a DNM on one allele. We used minor allele frequencies from 56,969 Icelandic genomes⁶⁹ and 730,947 exomes and 76,215 genomes from gnomAD¹¹ to restrict to variants with frequencies <2%, and considered only genotypes present in fewer than three genomes. All pathogenic genotypes detected in the trios affected by pregnancy loss were confirmed by manual inspection of BAM files.

The same approach was used for defining pathogenic genotypes in a set of adult control trios. For the purpose of pathogenic SSV analysis, individuals with neurodevelopmental disorders, including epilepsy, autism and intellectual disability, were removed from the set of 9,651 Icelandic trios. Those patients, who were whole-genome sequenced as part of ongoing projects at deCODE genetics and have been described previously⁷⁰, were ascertained through the State Diagnostic Counselling Center and the Department of Child and Adolescent Psychiatry and diagnosed based on ICD-10 criteria. Individuals with a clinical diagnosis of a rare genetic disease, as described previously⁶⁹, were also removed from downstream analyses. Removal of these patients from the set of Icelandic trios resulted in a set of 7,760 adult control trios.

Constrained genes

To investigate DNMs in genes that have not been linked to any postnatal phenotypes but that might be essential to early developmental processes, we looked at genes that are highly constrained for pLoF variants. We used constraint data from gnomAD⁶⁸ v.4.1.0, ranking genes (n = 17,481 protein-coding genes) based on their pLI scores. We defined highly constrained genes as the genes in the 90th percentile of all ranked genes.

Age regression

To compare the accumulation of mutations with advancing parental age at conception between pregnancy loss and the Icelandic trios, we regressed the number of DNMs against the parental ages at conception with the following formula:

Nr_of_DNMs~ fathers_age+ mothers_age+ tissue_villi

Note that, we are using a shared intercept model. This is due the fact DNMs detected in the fetal samples are only partially phased through the read-tracing, which results in unidentifiable sex-specific intercepts. We restricted to the subset of the pregnancy loss cohort that is more comparable to the Icelandic set, in terms of quality, that is, we restricted to non-triploid samples with 95% power of detecting DNMs.

We ran separate regressions for the different mutational classes (C>A, C>G, C>T, CpG>TpG, Indel, T>A, T>C and T>G) along with the total number of DNMs per fetal sample/proband. The estimates of the age effects, villus contribution and intercepts are provided in Extended Data Fig. 7 and Supplementary Table 3. The CIs and P values are derived from a block jackknife procedure.

Mutational signatures

To estimate the contribution of COSMIC mutational signatures to the 96-class mutation spectra of the pregnancy loss samples, we used the Fit() function provided by the signature.tools.lib (v.2.4.4) package⁷¹ in R (https://github.com/Nik-Zainal-Group/signature.tools.lib). Note that the mutation burden of each pregnancy loss sample is modest, which limits power for signature detection. The purpose of this exercise was not to discover mutational signatures de novo but, rather, to specifically test the hypothesis that SBS18 was elevated in some villus samples as reported previously³⁴. The mutation spectrum of spermatogonia is dominated by SBS1 and SBS5/40 (ref. ⁷²) and this is mirrored in the mutational spectrum of germline variants (see figure 5 of ref. ⁷³). With this in mind, we fitted only SBS1, SBS5 and SBS18 onto the mutational catalogues. Signatures were fitted using 100 bootstraps.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

The DNMs identified in the Icelandic cohort are available at Zenodo⁷⁴ (https://doi.org/10.5281/zenodo.14025565) along with recombination maps, proband information, gene conversions and observed non-crossovers. Crossover recombination data are available online (https://www.science.org/doi/suppl/10.1126/science.adh2531/suppl_file/science.adh2531_data_s1.zip and https://www.science.org/doi/suppl/10.1126/science.adh2531/suppl_file/science.adh2531_data_s2.zip). The raw sequencing data from the Icelandic cohort cannot be shared publicly due to Icelandic legislation and the regulations set by the Icelandic Data Protection Authority, which restrict the dissemination of individual-level or personally identifiable information. Access to raw data can be granted at the facilities of deCODE genetics in Iceland, for scientific purposes only, and subject to Icelandic legislation regarding use of the data. To gain access to the Icelandic data contact H.J. or K.S. Requests for access are generally considered monthly. The DNMs identified in the COPL cohort are available at Zenodo⁷⁵ (https://doi.org/10.5281/zenodo.15183295). All clinical summary data for the COPL cohort can be found in a previous publication²⁰. Raw data are available under Danish legal provisions and appropriate material transfer agreements pending The Health Research Ethics Committees for the Capital Region of Denmark (H-18024745) and General Data Protection Regulation governance. To gain access to the Danish data contact E.H. or H.S.N. Requests for access are generally considered monthly. Data from the following publicly available datasets were also used in the study: VEP (https://www.ensembl.org/info/docs/tools/vep/index.html), 1000 Genomes reference genome assembly (https://ftp.1000genomes.ebi.ac.uk/vol1/ftp/technical/reference/GRCh38_reference_genome/); gnomAD (https://gnomad.broadinstitute.org/), OMIM (https://omim.org/) and HPO (https://hpo.jax.org/). All other data presented in the study are provided in the Article and its Supplementary Information. Source data are provided with this paper.

Code availability

The processing of the sequencing data is based on the methods described above, as well as the following open-source software: BWA (v.0.7.10; https://github.com/lh3/bwa), Picard tools (v.1.117; https://github.com/broadinstitute/picard), GraphTyper (v.2.7.5; https://github.com/DecodeGenetics/graphtyper), Manta (v.1.6.0; https://github.com/Illumina/manta), svimmer (https://github.com/DecodeGenetics/svimmer), ADMIXTURE (v.1.3.0; https://dalexander.github.io/admixture/), PLINK (v.1.90b6.15; https://www.cog-genomics.org/plink/1.9/), KING (v.2.1.5; www.kingrelatedness.com), R (v.4.4.3; https://www.r-project.org/) and Python (v.3.9.21; https://www.python.org/).

References

Gruhn, J. R. & Hoffmann, E. R. Errors of the egg: the establishment and progression of human aneuploidy research in the maternal germline. Annu. Rev. Genet. 56, 369–390 (2022).
Article CAS PubMed Google Scholar
Sahoo, T. et al. Comprehensive genetic analysis of pregnancy loss by chromosomal microarrays: outcomes, benefits, and challenges. Genet. Med. 19, 83–89 (2017).
Article CAS PubMed Google Scholar
McCoy, R. C. et al. Meiotic and mitotic aneuploidies drive arrest of in vitro fertilized human preimplantation embryos. Genome Med. 15, 77 (2023).
Article CAS PubMed PubMed Central Google Scholar
Finley, J. et al. The genomic basis of sporadic and recurrent pregnancy loss: a comprehensive in-depth analysis of 24,900 miscarriages. Reprod. BioMed. Online 45, 125–134 (2022).
Article CAS PubMed Google Scholar
Waldvogel, S. M., Posey, J. E. & Goodell, M. A. Human embryonic genetic mosaicism and its effects on development and disease. Nat. Rev. Genet. https://doi.org/10.1038/s41576-024-00715-z (2024).
Jónsson, H. et al. Parental influence on human germline de novo mutations in 1,548 trios from Iceland. Nature 549, 519–522 (2017).
Article ADS PubMed Google Scholar
Kong, A. et al. Rate of de novo mutations and the importance of father’s age to disease risk. Nature 488, 471–475 (2012).
Article ADS CAS PubMed PubMed Central Google Scholar
Halldorsson, B. V. et al. Characterizing mutagenic effects of recombination through a sequence-level genetic map. Science 363, eaau1043 (2019).
Article CAS PubMed Google Scholar
Hinch, R., Donelly, P. & Hinch, A. G. Meiotic DNA breaks drive multifaceted mutagenesis in the human germ line. Science 382, eadh2531 (2023).
Halldorsson, B. V. et al. The sequences of 150,119 genomes in the UK Biobank. Nature 607, 732–740 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article ADS CAS PubMed PubMed Central Google Scholar
Oddsson, A. et al. Deficit of homozygosity among 1.52 million individuals and genetic causes of recessive lethality. Nat. Commun. 14, 3453 (2023).
Article ADS CAS PubMed PubMed Central Google Scholar
Nielsen, R., Hellmann, I., Hubisz, M., Bustamante, C. & Clark, A. G. Recent and ongoing selection in the human genome. Nat. Rev. Genet. 8, 857–868 (2007).
Article CAS PubMed PubMed Central Google Scholar
Nagaoka, S. I., Hassold, T. J. & Hunt, P. A. Human aneuploidy: mechanisms and new insights into an age-old problem. Nat. Rev. Genet. 13, 493–504 (2012).
Article CAS PubMed PubMed Central Google Scholar
Quenby, S. et al. Miscarriage matters: the epidemiological, physical, psychological, and economic costs of early pregnancy loss. Lancet 397, 1658–1667 (2021).
Article CAS PubMed Google Scholar
Steinthorsdottir, V. et al. Variant in the synaptonemal complex protein SYCE2 associates with pregnancy loss through effect on recombination. Nat. Struct. Mol. Biol. https://doi.org/10.1038/s41594-023-01209-y (2024).
Byrne, A. B. et al. Genomic autopsy to identify underlying causes of pregnancy loss and perinatal death. Nat. Med. 29, 180–189 (2023).
Article CAS PubMed PubMed Central Google Scholar
Lord, J. et al. Prenatal exome sequencing analysis in fetal structural anomalies detected by ultrasonography (PAGE): a cohort study. Lancet 393, 747–757 (2019).
Article CAS PubMed PubMed Central Google Scholar
Levy, B. et al. Genomic imbalance in products of conception: single-nucleotide polymorphism chromosomal microarray analysis. Obstetr. Gynecol. 124, 202–209 (2014).
Article CAS Google Scholar
Schlaikjær Hartwig, T. et al. Cell-free fetal DNA for genetic evaluation in Copenhagen Pregnancy Loss Study (COPL): a prospective cohort study. Lancet 401, 762–771 (2023).
Article PubMed Google Scholar
Wang, Y. et al. Clinical application of SNP array analysis in first-trimester pregnancy loss: a prospective study. Clin. Genet. 91, 849–858 (2017).
Article CAS PubMed Google Scholar
Demko, Z. P., Simon, A. L., McCoy, R. C., Petrov, D. A. & Rabinowitz, M. Effects of maternal age on euploidy rates in a large cohort of embryos analyzed with 24-chromosome single-nucleotide polymorphism-based preimplantation genetic screening. Fertil. Steril. 105, 1307–1313 (2016).
Article CAS PubMed Google Scholar
Wang, B. T. et al. Abnormalities in spontaneous abortions detected by G-banding and chromosomal microarray analysis (CMA) at a national reference laboratory. Mol. Cytogenet. 7, 33 (2014).
Article PubMed PubMed Central Google Scholar
Graw, S. L., Sample, T., Bleskan, J., Sujansky, E. & Patterson, D. Cloning, sequencing, and analysis of inv8 chromosome breakpoints associated with recombinant 8 syndrome. Am. J. Hum. Genet. 66, 1138–1144 (2000).
Article CAS PubMed PubMed Central Google Scholar
Eggertsson, H. P. et al. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat. Commun. 10, 5402 (2019).
Article ADS PubMed PubMed Central Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Article CAS PubMed Google Scholar
Verdoni, A. et al. Reproductive outcomes in individuals with chromosomal reciprocal translocations. Genet. Med. 23, 1753–1760 (2021).
Article CAS PubMed Google Scholar
Watanabe, Y. Geometry and force behind kinetochore orientation: lessons from meiosis. Nat. Rev. Mol. Cell Biol. 13, 370–382 (2012).
Article CAS PubMed Google Scholar
Zaragoza, M. V. et al. Parental origin and phenotype of triploidy in spontaneous abortions: predominance of diandry and association with the partial hydatidiform mole. Am. J. Hum. Genet. 66, 1807–1820 (2000).
Article CAS PubMed PubMed Central Google Scholar
Palsson, G. et al. Complete human recombination maps. Nature https://doi.org/10.1038/s41586-024-08450-5 (2025).
Hassold, T. et al. Failure to recombine is a common feature of human oogenesis. Am. J. Hum. Genet. 108, 16–24 (2021).
Article CAS PubMed Google Scholar
Alexandrov, L. B. et al. Signatures of mutational processes in human cancer. Nature 500, 415–421 (2013).
Article CAS PubMed PubMed Central Google Scholar
Sondka, Z. et al. COSMIC: a curated database of somatic variants and clinical data for cancer. Nucleic Acids Res. 52, D1210–D1217 (2024).
Article CAS PubMed Google Scholar
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Article ADS CAS PubMed PubMed Central Google Scholar
Gao, Z. et al. Overlooked roles of DNA damage and maternal age in generating human germline mutations. Proc. Natl Acad. Sci. USA 116, 9491–9500 (2019).
Article ADS CAS PubMed PubMed Central Google Scholar
Kaplanis, J. et al. Genetic and chemotherapeutic influences on germline hypermutation. Nature 605, 503–508 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Goldmann, J. M. et al. Germline de novo mutation clusters arise during oocyte aging in genomic regions with high double-strand-break incidence. Nat. Genet. 50, 487–492 (2018).
Article CAS PubMed Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Article PubMed PubMed Central Google Scholar
Cao, J. et al. A human cell atlas of fetal gene expression. Science 370, eaba7721 (2020).
Article CAS PubMed PubMed Central Google Scholar
Lazarin, G. A., Haque, I. S., Evans, E. A. & Goldberg, J. D. Smith–Lemli–Opitz syndrome carrier frequency and estimates of in utero mortality rates. Prenat. Diagn. 37, 350–355 (2017).
Article CAS PubMed PubMed Central Google Scholar
Bozhinovski, G., Terzikj, M., Kubelka-Sabit, K. & Plaseska-Karanfilska, D. High incidence of CPLANE1-related joubert syndrome in the products of conceptions from early pregnancy losses. Balkan Med. J. 41, 97–104 (2024).
Article CAS PubMed PubMed Central Google Scholar
Jónsson, H. et al. Multiple transmissions of de novo mutations in families. Nat. Genet. 50, 1674–1680 (2018).
Article PubMed Google Scholar
Campbell, I. M. et al. Parental somatic mosaicism is underrecognized and influences recurrence risk of genomic disorders. Am. J. Hum. Genet. 95, 173–182 (2014).
Article CAS PubMed PubMed Central Google Scholar
Seplyarskiy, V. B. & Sunyaev, S. The origin of human mutation in light of genomic data. Nat. Rev. Genet. 22, 672–686 (2021).
Article CAS PubMed Google Scholar
Rubinacci, S., Hofmeister, R. J., Sousa da Mota, B. & Delaneau, O. Imputation of low-coverage sequencing data from 150,119 UK Biobank genomes. Nat. Genet. 55, 1088–1090 (2023).
Article CAS PubMed PubMed Central Google Scholar
Singh, U. et al. Expression and functional analysis of fibulin-1 (Fbln1) during normal and abnormal placental development of the mouse. Placenta 27, 1014–1021 (2006).
Article CAS PubMed Google Scholar
Mazurek, A. et al. DDX5 regulates DNA replication and is required for cell proliferation in a subset of breast cancer cells. Cancer Discov. 2, 812–825 (2012).
Article CAS PubMed PubMed Central Google Scholar
Vaughan, N., Scholz, N., Lindon, C. & Licchesi, J. D. F. The E3 ubiquitin ligase HECTD1 contributes to cell proliferation through an effect on mitosis. Sci. Rep. 12, 13160 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Zerafati-Jahromi, G. et al. Sequence variants in HECTD1 result in a variable neurodevelopmental disorder. A. J. Hum. Genet. https://doi.org/10.1016/j.ajhg.2025.01.001 (2025).
Organisation for Economic Co-operation and Development. Declining fertility rates put prosperity of future generations at risk. OECD www.oecd.org/en/about/news/press-releases/2024/06/declining-fertility-rates-put-prosperity-of-future-generations-at-risk.html (2024).
Jónsson, H. et al. Whole genome characterization of sequence diversity of 15,220 Icelanders. Sci. Data 4, 170115 (2017).
Article PubMed PubMed Central Google Scholar
Li, H. & Durbin, R. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25, 1754–1760 (2009).
Article CAS PubMed PubMed Central Google Scholar
Eggertsson, H. P. et al. Graphtyper enables population-scale genotyping using pangenome graphs. Nat. Genet. 49, 1654–1660 (2017).
Article CAS PubMed Google Scholar
Alexander, D. H., Novembre, J. & Lange, K. Fast model-based estimation of ancestry in unrelated individuals. Genome Res. 19, 1655–1664 (2009).
Article CAS PubMed PubMed Central Google Scholar
Consortium, R. E. et al. Integrative analysis of 111 reference human epigenomes. Nature 518, 317–330 (2015).
Article Google Scholar
Price, A. L. et al. Long-range LD can confound genome scans in admixed populations. Am. J. Hum. Genet. 83, 132–135 (2008).
Article CAS PubMed PubMed Central Google Scholar
Chang, C. C. et al. Second-generation PLINK: rising to the challenge of larger and richer datasets. GigaScience 4, 7 (2015).
Article PubMed PubMed Central Google Scholar
Benjamini, Y. & Speed, T. P. Summarizing and correcting the GC content bias in high-throughput sequencing. Nucleic Acids Res. 40, e72 (2012).
Article CAS PubMed PubMed Central Google Scholar
Rabiner, L. R. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77, 257–286 (1989).
Article Google Scholar
Nik-Zainal, S. et al. The life history of 21 breast cancers. Cell 149, 994–1007 (2012).
Article CAS PubMed PubMed Central Google Scholar
Riggs, E. R. et al. Technical standards for the interpretation and reporting of constitutional copy-number variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the Clinical Genome Resource (ClinGen). Genet Med 22, 245–257 (2020).
Article PubMed Google Scholar
Manichaikul, A. et al. Robust relationship inference in genome-wide association studies. Bioinformatics 26, 2867–2873 (2010).
Article CAS PubMed PubMed Central Google Scholar
Döring, A., Weese, D., Rausch, T. & Reinert, K. SeqAn an efficient, generic C++ library for sequence analysis. BMC Bioinform. 9, 11 (2008).
Article Google Scholar
Wood, S. N. Fast stable restricted maximum likelihood and marginal likelihood estimation of semiparametric generalized linear models. J. R. Stat. Soc. B 73, 3–36 (2011).
Article MathSciNet Google Scholar
Jónsson, H. et al. Differences between germline genomes of monozygotic twins. Nat. Genet. 53, 27–34 (2021).
Article PubMed Google Scholar
McLaren, W. et al. The Ensembl variant effect predictor. Genome Biol. 17, 122 (2016).
Article PubMed PubMed Central Google Scholar
Pruitt, K. D., Tatusova, T. & Maglott, D. R. NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 33, D501–D504 (2005).
Article CAS PubMed Google Scholar
Chen, S. et al. A genomic mutational constraint map using variation in 76,156 human genomes. Nature 625, 92–100 (2024).
Article ADS CAS PubMed Google Scholar
Arnadottir, G. A. et al. Population-level deficit of homozygosity unveils CPSF3 as an intellectual disability syndrome gene. Nat. Commun. 13, 705 (2022).
Article ADS CAS PubMed PubMed Central Google Scholar
Weiss, L. A. et al. Association between microdeletion and microduplication at 16p11.2 and autism. N. Engl. J. Med. 358, 667–675 (2008).
Article CAS PubMed Google Scholar
Degasperi, A. et al. A practical framework and online tool for mutational signature analyses show inter-tissue variation and driver dependencies. Nat. Cancer 1, 249–263 (2020).
Article CAS PubMed PubMed Central Google Scholar
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021).
Article ADS CAS PubMed Google Scholar
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2015).
Article PubMed PubMed Central Google Scholar
Pálsson, G. DecodeGenetics/PalssonEtAl_Nature_2024: Initial release of data. Zenodo https://doi.org/10.5281/zenodo.14025565 (2024).
hakon-jon-deCODE. hakon-jon-deCODE/Arnadottir_et_al_2025_Nature: Initial release of data. Zenodo https://doi.org/10.5281/zenodo.15183295 (2025).

Download references

Acknowledgements

We thank everyone who agreed to participate in this study; all colleagues at the Gynaecological Acute Clinics who helped us inform patients about the project and arranged inclusions; S. B. Blom, I. M. Søndergaard, S. S. Dandanell, C. Brinkmann, M. M. B. S. Petersen, Y. I. E. Sammaa-Aru, N. Vinterberg and E. S. Juel for their effort in including participants and collecting and packing the whole-blood samples and pregnancy tissue; and M. Visser for preparing the samples in the laboratory. We acknowledge deCODE genetics/Amgen, Department of Cellular and Molecular Medicine at University of Copenhagen, and Department of Obstetrics and Gynaecology at Copenhagen University Hospital, Amager Hvidovre for financial support for this study; and Ole Kirks Foundation, Danish National Research Foundation (DNRF115), AP Moller Foundation, Independent Research Fund Denmark (0134-00299B and 2034-00263B), BioInnovation Institute Foundation (grant number: BII22SG1021020), and the Novo Nordisk Foundation (grant numbers: NNF15OC0016662, NNF22OC0077221, NNF23OC0087269, NNF22OC0074308 and NNF20SA0064340).

Author information

These authors contributed equally: Gudny A. Arnadottir, Hakon Jonsson, Tanja Schlaikjær Hartwig, Jennifer R. Gruhn
These authors jointly supervised this work: Eva R. Hoffmann, Henriette Svarre Nielsen, Kari Stefansson

Authors and Affiliations

deCODE genetics/Amgen, Reykjavik, Iceland
Gudny A. Arnadottir, Hakon Jonsson, Arnaldur Gylfason, Asmundur Oddsson, Lilja Stefansdottir, Louise le Roux, Valgerdur Steinthorsdottir, Kristjan H. Swerford Moore, Sigurgeir Olafsson, Pall I. Olason, Hannes P. Eggertsson, Gísli H. Halldórsson, G. Bragi Walters, Hreinn Stefansson, Sigurjon A. Gudjonsson, Gunnar Palsson, Brynjar O. Jensson, Run Fridriksdottir, Unnur Þorsteinsdóttir, Kári Stefánsson, Hákon Jónsson, Ólafur Þ. Magnússon, Valgerdur Steinthorsdottir, Agnar Helgason, Gudmundur L. Norddahl, Jona Saemundsdottir, Olafur Th. Magnusson, Bjarni V. Halldorsson, Daniel F. Gudbjartsson, Patrick Sulem, Unnur Thorsteinsdottir & Kari Stefansson
Department of Obstetrics and Gynaecology, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
Tanja Schlaikjær Hartwig, David Westergaard, Henriette Svarre Nielsen, Tanja Schlaikjær Hartwig, Sofie Bliddal, David Westergaard, Karina Banasik, Astrid Marie Kolte, Nina la Cour Freiesleben, Helene Westring Hvidman, Sofie Bliddal, Karina Banasik & Henriette Svarre Nielsen
DNRF Center for Chromosome Stability, Department of Cellular and Molecular Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Jennifer R. Gruhn, Andrew Chi-Ho Chan, Jesper Friis Petersen, Eva Hoffmann, Jenny Gruhn, Andy Chi Ho Chan & Eva R. Hoffmann
Department of Health Science and Technology, Aalborg University, Aalborg, Denmark
Peter Loof Møller, Mette Nyegaard, Palle Duun Rohde & Mette Nyegaard
Department of Obstetrics and Gynaecology, Copenhagen University Hospital-North Zealand, Hillerød, Denmark
Jesper Friis Petersen
Department of Obstetrics and Gynaecology, Copenhagen University Hospital Herlev, Herlev, Denmark
Jesper Friis Petersen
School of Technology, Reykjavik University, Reykjavik, Iceland
Bjarni V. Halldorsson
School of Engineering and Natural Sciences, University of Iceland, Reykjavik, Iceland
Daniel F. Gudbjartsson
Department of Congenital Disorders, Statens Serum Institut, Copenhagen, Denmark
Mette Nyegaard & Mette Nyegaard
Faculty of Medicine, University of Iceland, Reykjavik, Iceland
Unnur Þorsteinsdóttir, Kári Stefánsson, Unnur Thorsteinsdottir & Kari Stefansson
Department of Clinical Medicine, Faculty of Health and Medical Sciences, University of Copenhagen, Copenhagen, Denmark
Henriette Svarre Nielsen & Henriette Svarre Nielsen
Department of Clinical Immunology, Copenhagen University Hospital Rigshospitalet, Copenhagen, Denmark
Maria Christine Krog, Sisse Rye Ostrowski, Erik Sørensen & Mille Løhr
BioInnovation Institute, Copenhagen, Denmark
Markus J. Herregård
Hvidovre Hospitals NIPT Center, Department of Clinical Biochemistry, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark
Louise Ambye
Section of Social Medicine, University of Copenhagen, Copenhagen, Denmark
Lone Schmidt
Department of Clinical Biochemistry, Copenhagen University Hospital Herlev, Herlev, Denmark
Pia Rørbæk Kamstrup
Department of Gynecology and Obstetrics, Copenhagen University Hospital Nordsjælland, Hillerød, Denmark
Ellen Christine Leth Løkkegaard
Department of Gynecology and Obstetrics, Copenhagen University Hospital Herlev, Herlev, Denmark
Helle Ejdrup Bredkjær

Authors

Gudny A. Arnadottir
View author publications
Search author on:PubMed Google Scholar
Hakon Jonsson
View author publications
Search author on:PubMed Google Scholar
Tanja Schlaikjær Hartwig
View author publications
Search author on:PubMed Google Scholar
Jennifer R. Gruhn
View author publications
Search author on:PubMed Google Scholar
Peter Loof Møller
View author publications
Search author on:PubMed Google Scholar
Arnaldur Gylfason
View author publications
Search author on:PubMed Google Scholar
David Westergaard
View author publications
Search author on:PubMed Google Scholar
Andrew Chi-Ho Chan
View author publications
Search author on:PubMed Google Scholar
Asmundur Oddsson
View author publications
Search author on:PubMed Google Scholar
Lilja Stefansdottir
View author publications
Search author on:PubMed Google Scholar
Louise le Roux
View author publications
Search author on:PubMed Google Scholar
Valgerdur Steinthorsdottir
View author publications
Search author on:PubMed Google Scholar
Kristjan H. Swerford Moore
View author publications
Search author on:PubMed Google Scholar
Sigurgeir Olafsson
View author publications
Search author on:PubMed Google Scholar
Pall I. Olason
View author publications
Search author on:PubMed Google Scholar
Hannes P. Eggertsson
View author publications
Search author on:PubMed Google Scholar
Gísli H. Halldórsson
View author publications
Search author on:PubMed Google Scholar
G. Bragi Walters
View author publications
Search author on:PubMed Google Scholar
Hreinn Stefansson
View author publications
Search author on:PubMed Google Scholar
Sigurjon A. Gudjonsson
View author publications
Search author on:PubMed Google Scholar
Gunnar Palsson
View author publications
Search author on:PubMed Google Scholar
Brynjar O. Jensson
View author publications
Search author on:PubMed Google Scholar
Run Fridriksdottir
View author publications
Search author on:PubMed Google Scholar
Jesper Friis Petersen
View author publications
Search author on:PubMed Google Scholar
Agnar Helgason
View author publications
Search author on:PubMed Google Scholar
Gudmundur L. Norddahl
View author publications
Search author on:PubMed Google Scholar
Palle Duun Rohde
View author publications
Search author on:PubMed Google Scholar
Jona Saemundsdottir
View author publications
Search author on:PubMed Google Scholar
Olafur Th. Magnusson
View author publications
Search author on:PubMed Google Scholar
Bjarni V. Halldorsson
View author publications
Search author on:PubMed Google Scholar
Sofie Bliddal
View author publications
Search author on:PubMed Google Scholar
Karina Banasik
View author publications
Search author on:PubMed Google Scholar
Daniel F. Gudbjartsson
View author publications
Search author on:PubMed Google Scholar
Mette Nyegaard
View author publications
Search author on:PubMed Google Scholar
Patrick Sulem
View author publications
Search author on:PubMed Google Scholar
Unnur Thorsteinsdottir
View author publications
Search author on:PubMed Google Scholar
Eva R. Hoffmann
View author publications
Search author on:PubMed Google Scholar
Henriette Svarre Nielsen
View author publications
Search author on:PubMed Google Scholar
Kari Stefansson
View author publications
Search author on:PubMed Google Scholar

Consortia

The COPL Consortium

Henriette Svarre Nielsen
, Tanja Schlaikjær Hartwig
, Sofie Bliddal
, David Westergaard
, Karina Banasik
, Maria Christine Krog
, Astrid Marie Kolte
, Nina la Cour Freiesleben
, Sisse Rye Ostrowski
, Erik Sørensen
, Mille Løhr
, Markus J. Herregård
, Eva Hoffmann
, Jenny Gruhn
, Andy Chi Ho Chan
, Unnur Þorsteinsdóttir
, Kári Stefánsson
, Hákon Jónsson
, Ólafur Þ. Magnússon
, Valgerdur Steinthorsdottir
, Louise Ambye
, Lone Schmidt
, Pia Rørbæk Kamstrup
, Mette Nyegaard
, Ellen Christine Leth Løkkegaard
, Helle Ejdrup Bredkjær
& Helene Westring Hvidman

Contributions

G.A.A., H.J., T.S.H., J.R.G., E.R.H., H.S.N. and K.S. designed the study. G.A.A., H.J., T.S.H., J.R.G., D.W., A.C.-H.C., L.l.R., G.B.W., H.S., J.F.P., G.L.N., J.S., O.T.M., S.B., K.B., E.R.H. and H.S.N. collected samples and data. G.A.A., H.J., T.S.H., J.R.G., A.C.-H.C., L.l.R., J.S., O.T.M., S.B. and H.S.N. performed experiments. G.A.A., H.J., T.S.H., J.R.G., A.G., V.S., K.H.S.M., S.O., A.H., O.T.M., B.V.H., S.B., D.F.G., M.N., P.S., U.T. and E.R.H. designed the methods. G.A.A., H.J., T.S.H., J.R.G., P.L.M., A.G., D.W., A.C.-H.C., A.O., L.S., K.H.S.M., S.O., P.I.O., G.H.H., S.A.G., G.P., B.O.J., R.F., P.D.R. and K.B. analysed the data. G.A.A., H.J., T.S.H., J.R.G., E.R.H., H.S.N. and K.S. wrote the manuscript. All of the authors contributed to the final version of the manuscript.

Corresponding authors

Correspondence to Hakon Jonsson, Eva R. Hoffmann, Henriette Svarre Nielsen or Kari Stefansson.

Ethics declarations

Competing interests

The authors affiliated with deCODE genetics/Amgen declare competing financial interests as employees.

Peer review

Peer review information

Nature thanks Christopher Barnett, Ziyue Gao and the other, anonymous reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Schematic view of meiosis I error.

Missegregation of homologous chromosome during meiois I. See Fig. 2 for meiosis II error.

Extended Data Fig. 2 Crossover locations in paternal triploidies.

See Fig. 4b for the crossover locations in maternal triploidies.

Extended Data Fig. 3 Spectra of DNMs present in single/multiple fetal samples.

Here, we restricted to pregnancy losses with multiple fetal samples and aggregated the DNMs based on whether they were present in a single sample or in multiple samples per mutation class. For comparison, the bottom panel is the mutational spectrum for DNMs identified in Icelanders.

Extended Data Fig. 4 Number of DNMs as a function of allelic balance.

Aggregation was done on the fetal sample level and the entire DNM set was used.

Extended Data Fig. 5 Number of DNMs per fetal sample, tissue type and pathogenic carrier status against paternal age at conception.

The black line is the regression line from the Icelandic set, the coloured lines are regression lines for subsets of the COPL cohort and the shaded areas represent 95% CIs. a, all fetal samples regardless of karyotype. b, excluding triploidies. c, only triploidies. d, same as b but coloured according to whether the fetus is a carrier of a pathogenic DNM.

Extended Data Fig. 6 Paternal ages at conception.

a, the parental ages at conception for the pregnancy losses. b, the parental ages at conception for the Icelandic set. c-d, histogram of the parental ages at conception for the pregnancy losses. e-f, histogram of the parental ages at conception for the Icelandic set.

Extended Data Fig. 7 Age regression - Poisson shared intercept/slopes models.

The analysis restricted to non-triploid samples with 95% power of detecting DNMs. a, paternal age effect. b, maternal age effect. c, shared intercept at conception. d, shared intercept at onset of puberty (13 years). e, villi coefficient. f, deCODE vs COPL-villi difference against deCODE vs COPL-non-villi difference. In f, the age effects are shared across the COPL and the deCODE cohorts, the intercept is shared between the sexes and the regression is done separately per mutational class. In a-d and f the solid lines are identity lines. The centres represent means and error bars are 95%-CIs based on a Block-Jackknife procedure. The estimates are based on 293 pregnancy loss trios and 9,430 Icelandic trios.

Extended Data Fig. 8 Allele balance of DNMs conditioned on sister chromatid or homologous chromosome state.

Extended version of Fig. 5e with both sister chromatid and homologous chromosome state. The vertical line is an allelic balance of 50%.

Extended Data Fig. 9 Schematic overview of dispermy derived from a single spermatogonium or distinct spermatogonia.

Paternal triploidies caused by spermatozoa from the same spermatogonium fertilizing the oocyte are expected to have pre-sister mutations in 2/3 allelic balance. For distinct spermatogonia the allelic balance of pre-sister and post-sister mutations will be the same, i.e. 1/3.

Extended Data Fig. 10 Expression of genes mutated in euploid versus aneuploid/ triploid fetuses.

Based on fetal tissue expression data from Cao et al., showing expression for 15 fetal tissues for n = 16 genes mutated in euploid fetuses and n = 16 genes mutated in aneuploid/triploid fetuses. Genes mutated in euploid fetuses are on average more highly expressed in fetal tissues than genes mutated in aneuploid/triploid fetuses. The box plots show the median expression value (transcripts per million), limits correspond to the first (25%) and third (75%) quartiles, and whiskers extend to the smallest and lowest values observed, limiting to 1.5 times the interquartile range.

Supplementary information

Supplementary Information

Supplementary Notes, Supplementary Figures, Supplementary Tables 1 and 5, legends for Supplementary Tables 1–9 and Supplementary References.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 2–4 and 7–9.

Source data

Source Data Fig. 1 and 3–5

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.

Reprints and permissions

About this article

Cite this article

Arnadottir, G.A., Jonsson, H., Hartwig, T.S. et al. Sequence diversity lost in early pregnancy. Nature 642, 672–681 (2025). https://doi.org/10.1038/s41586-025-09031-w

Download citation

Received: 16 July 2024
Accepted: 16 April 2025
Published: 21 May 2025
Version of record: 21 May 2025
Issue date: 19 June 2025
DOI: https://doi.org/10.1038/s41586-025-09031-w

This article is cited by

Anaysis of the association between chromosomal abnormalities in early missed abortion embryos and maternal age and AMH levels based on CNV-Seq
- Shuhui Huang
- Tingting Huang
- Guiqin Bai
BMC Pregnancy and Childbirth (2025)
Fetal and parental genomes offer mechanistic insights into pregnancy loss
- Ziyue Gao
Nature (2025)
Genomics pioneer fired from firm he founded: ‘It was not easy to domesticate me’
- Ewen Callaway
Nature (2025)