Abstract
From fertilization onwards, the cells of the human body acquire variations in their DNA sequence, known as somatic mutations. These postzygotic mutations arise from intrinsic errors in DNA replication and repair, as well as from exposure to mutagens. Somatic mutations have been implicated in some diseases, but a fundamental understanding of the frequency, type and patterns of mutations across healthy human tissues has been limited. This is primarily due to the small proportion of cells harbouring specific somatic variants within an individual, making them more challenging to detect than inherited variants. Here we describe the Somatic Mosaicism across Human Tissues Network, which aims to create a reference catalogue of somatic mutations and their clonal patterns across 19 different tissue sites from 150 non-diseased donors and develop new technologies and computational tools to detect somatic mutations and assess their phenotypic consequences, including clonal expansions. This strategy enables a comprehensive examination of the mutational landscape across the human body, and provides a comparison baseline for somatic mutation in diseases. This will lead to a deep understanding of somatic mutations and clonal expansions across the lifespan, as well as their roles in health, in ageing and, by comparison, in diseases.
Similar content being viewed by others
Main
Genetic diversity within the human population has been well described. The Human Genome Project resulted in the first near-complete mapping of the human DNA sequence1, and was followed by large-scale projects, such as the 1,000 Genomes Project2 and the Pangenome project3, that mapped the genetic diversity between individuals and populations. Now, there is growing recognition that extensive genetic variation exists within individuals among different tissues and cells. Two decades after completion of the first draft human genome, the Somatic Mosaicism across Human Tissues (SMaHT) Network plans to map the genetic diversity across different tissues and cells within individuals.
From fertilization onwards, the cells of the human body continuously experience damage to their genome, either from intrinsic causes or from exposure to mutagens4,5,6,7,8,9. Although the vast majority of DNA damage is repaired, and the genome is replicated with extremely high fidelity, cells steadily acquire somatic mutations throughout life. All cells within an individual harbour somatic mutation, but any given mutation is present in only a subset of the cells, or even in single cells. Hence, somatic mutations are often described as mosaic10,11.
The detection of somatic mutations is challenging. In contrast to inherited variants, somatic mutations only exist in small and variable proportions of cells, ranging from embryonic mutations present in most cells to mutations present in just a single cell (Fig. 1a). This challenge is exacerbated by the introduction of artefacts and errors resembling low-frequency mutations during DNA library preparation and sequencing12. Current short-read sequencing technologies limit detection of mutations in repetitive regions of the genome and are likely to be less suitable for detection of somatic structural variations.
a, Schematic comparison between inherited variants, an early somatic mutation and a late somatic mutation. b, Overview of causes and types of somatic mutations. EN, endonuclease; ME, mobile element; ORF, open reading frame; RT, reverse transcriptase; ssDNA, single-stranded DNA. c, Overview of the reported mutation rates of somatic SNV across developmental stages and tissues. Data of first cell divisions6,7,59 and later cell divisions6,7,59 are SNVs per cell per division. Data from fetal development of the early central nervous system (CNS)9 and placenta62 are SNVs per cell per day. Adult data are SNVs per year and estimated for seminiferous tubules48, haematopoietic stem cells26,52,144, B lymphocytes52, neurons63,145, T lymphocytes52, bronchial epithelium53, gastric epithelium146, endometrial epithelium79, hepatocytes19, small bowel epithelium19,115, colorectal epithelium19,24,29 and cardiomyocytes49. ZGA, zygotic genome activation.
Although most somatic mutations are probably functionally neutral13, some can profoundly alter the phenotype of a cell and are implicated in a wide variety of diseases. Many insights have come from sequencing the genomes of cancers14, the best-known example of disease arising from somatic mutation, but mutagenesis in tumours is often accelerated, and normal mutational patterns are distorted by genome instability. More recently, mapping the patterns of somatic mutations in normal tissues, exemplified by efforts of the Brain Somatic Mosaicism Network and other studies3,4,5,6,15,16,17,18,19,20,21,22,23,24,25,26, has identified a role for somatic mutations in developmental syndromes, neurological diseases and inflammatory disorders27,28,29,30,31,32,33,34,35,36,37,38,39. Despite these efforts, there is currently no comprehensive reference dataset of somatic mosaicism across many tissues of a large pool of donors.
In this Perspective article, we describe the SMaHT Network, initiated by the NIH Common Fund, which aims to generate a reference catalogue of somatic variation from 150 donors in 19 non-diseased tissue sites. To advance the field, the SMaHT Network will perform a comprehensive discovery and analysis of all types of somatic mutations at an unprecedented scale: the joint analysis of mosaicism across many tissues per donor; the robust discovery of structural variants (SVs) through long-read sequencing and donor-specific assemblies; and the widespread and robust application of ultrasensitive sequencing technologies, such as duplex sequencing, across sequencing centres. Furthermore, beyond applying established sequencing assays at scale, the SMaHT Network has a strong emphasis on tool and technological development to enable the next generation of somatic mutation studies. Before describing the network goals in detail, we briefly review the current knowledge about somatic mutations in health and disease, as well as the technical challenges in detection of mutations. A large part of the SMaHT Network will focus on the development of technologies and computational tools to improve the detection of all types of somatic variation.
Somatic mutations in healthy tissues
Throughout the human lifespan, from conception to death, cells acquire mutations in their DNA6,7,8,9,40 (Fig. 1b). These somatic mutations can be the consequence of erroneous repair of damaged DNA bases or DNA strand breaks, errors during replication, chromosome missegregation or the integration of mobile elements. Somatic mutations can be divided into different types41: substitutions, the vast majority of which are single-nucleotide variants (SNVs); small (less than 50 bp) insertions and deletions (indels); SVs, including segmental duplications, large deletions, translocations, inversions, mobile element insertions (MEIs) and complex SVs, including chromothripsis and chromoplexy; and other large chromosomal aberrations, such as whole-chromosome gains and losses. Duplications, deletions and whole-chromosome gains and losses are also referred to as copy number variants (CNVs) or mosaic chromosomal alterations. These classes differ profoundly in their underlying causes and patterns across tissues and their phenotypic effects on cells. In normal tissues, SNVs are by far the most common type of somatic variation, followed by indels. SVs and large chromosomal aberrations are observed less frequently27, but typically affect more base pairs and thus may have larger functional effects. However, most previous studies on somatic mutations have relied on short-read DNA sequencing, which may fail to detect various types of SVs. Studies of germline differences have shown that SVs are far more abundant, but the majority are missed by short-read approaches42.
Different mutagenic processes cause distinct patterns of somatic mutation, depending on the types of DNA damage incurred and the pathways responsible for DNA lesion repair. Research over the past decade has deconvolved these patterns into mutational signatures and linked certain signatures to specific mutagens, such as ultraviolet light, tobacco smoke, chemotherapy or natural age-related accumulation of endogenous mutations43. Mutational signatures are most commonly applied to SNVs44, but they have been defined for other classes of somatic mutations, including indels44, chromosomal alterations45,46 and SVs44,47. In the context of SNVs, mutational signatures reflect the distribution of specific base changes within their trinucleotide contexts.
All normal tissues, including post-mitotic cells, exhibit SNV mutational signatures linked to clock-like endogenous processes (single base substitution signature 1 (SBS1) or SBS5) and, to a lesser extent, oxidative damage (SBS18)48,49,50. Mutational signatures linked to mutagenic exposure can be confined to specific organs, such as UV damage (SBS7) in the skin51 or skin-resident T lymphocytes52, damage from tobacco smoke (SBS4) in the bronchial epithelium of the lung53 and exposure to a genotoxic strain of Escherichia coli (SBS88)24 in the large intestine. These exposure differences drive some of the variation in the types of somatic mutations observed across different tissues of the human body7,48,54. Furthermore, different mutational processes show different correlations with genomic features, such as replication timing, replication strand and transcription strand55,56,57,58, reflecting genomic biases of DNA damage and repair.
The somatic mutation rate varies across human tissues and life stages (Fig. 1c). During the initial embryonic cell divisions, somatic SNVs accumulate at a high rate of approximately three per division, probably due to the high division rate and delayed activation of the zygotic genome6,7,59. Afterwards, the mutation rate decreases (approximately one SNV per division) during development in utero, in both embryonic tissues, such as the fetal brain9,60,61, and extraembryonic tissues, such as the placenta62. After birth, mutation rates further decline 5–10-fold and vary substantially across tissues, from 16–20 SNVs per year in post-mitotic cells such as neurons21,50,61,63 to 44 SNVs per year in colonic stem cells24 (Fig. 1c). Germ cells have the lowest somatic mutation rate reported23, in line with the parental age effect on de novo germline mutations48. Although division rate may influence the endogenous somatic mutation rate, there are probably other factors that modulate both mutagenesis and repair of DNA damage64,65,66.
Large somatic mutations such as SVs, chromosomal alterations and MEIs are detected much less frequently than SNVs and indels. Although somatic aneuploidy appears to be rare, sub-chromosomal structural variations affect 13–41% of neurons18,34,67,68. Frequent CNVs, mostly duplications of likely developmental origin, have been detected in approximately 7% of brains from the Brain Somatic Mosaicism Network consortium34, and mosaic chromosomal alterations have been observed in approximately 5% of blood samples in the UK Biobank69. Single-neuron DNA sequencing of mobile element-enriched libraries or whole genomes has revealed MEI events that appear to occur during development and create mosaicism in the human brain5,70,71. Bulk sequencing approaches have also detected a few examples of somatic MEIs in the brain72 and non-brain tissues including the heart73, fibroblasts73 and liver74. Recent somatic MEI profiling in colorectal epithelial single-cell clones has indicated peak insertion rates during early embryogenesis75. Considering the potential effect of these large mutations on the sequence, splicing or expression of genes76,77, it is valuable to understand their prevalence across human tissues during development and ageing.
Although most somatic mutations do not discernibly affect the phenotype of a cell, some somatic mutations are under selection in different tissues. Such driver mutations may lead to a proliferative advantage or increased survival of the cell and its progeny, resulting in clonal expansions in tissues. Cancer is the canonical example of somatic evolution and often involves the stepwise accumulation of key somatic mutations and genomic instability1,78. Mutations typically associated with cancer can be abundant across normal tissues with age. For comparison, in a typical individual of 60 years of age, approximately 90% of the endometrial epithelium harbours a driver mutation79, whereas this is true of only about 1% of the colonic epithelium24, despite the latter having a much higher somatic mutation rate24,79. This difference is probably caused by the menstrual cycles of shedding and regrowth in the endometrium. Probably due to similar clonal expansion in development or ageing, about 6% of individuals harbour a 3–20-fold higher than average number of detectable SNVs in their brain34. These varying proportions of clonally expanded cell populations probably reflect differences in tissue architecture, cell turnover, regeneration and selection pressures, but much is still unknown.
Although many driver mutations in normal tissues can be identical to those found in corresponding cancer types, their abundance and phenotypic consequence may differ profoundly as normal tissues may experience different selection pressures than cancer. For example, clones with NOTCH1 mutations are exceedingly abundant in normal oesophageal epithelium, at even higher rates than oesophageal cancers80. NOTCH1-mutant clones have a lower propensity of malignant transformation and even outcompete precancerous clones in the oesophagus81,82. These observations suggest that characterizing the somatic mutation landscape in normal individuals will be important to understand the role of these mutations in pathological phenomena such as cancer.
Finally, somatic mutations can be used as intrinsic barcodes to create phylogenies and trace the ancestries of cells, such that it becomes possible to quantitatively study human development from somatic mutations ascertained in adult donors6,7,20,25,40,58,59,60,70,83. This approach has been applied to studying embryogenesis, clonal expansions across the lifespan and the origins of childhood cancers84. As the allele frequency of a mutation reflects the fraction of cells within a population that harbours it, this method can be used to quantitatively assess the contribution of embryonic progenitors to the adult body. Such studies have found that one of the two daughter cells of the zygote often has at least twice as many descendant cells as the other6,7,20,25,40,58,59,60,70,83,85,86, probably due to cellular bottlenecks in embryogenesis, developmental cell death or migratory patterns, and confirming earlier observations in mice40,87.
Together, these initial studies on somatic mutations in normal tissues have shown the variability of rates, patterns and selection of mutations across tissues. It is unknown, however, how variable these patterns are between individuals and how different types of somatic mutations are correlated with inherited genetic background, environmental exposures or other behavioural characteristics. In addition, mutation discovery is severely hampered in poorly mapped regions of the genome, including acrocentric chromosomes, centromeric and repetitive regions, and, hence, the mutational patterns in these regions are largely unknown. Thus, identification of the differences in mutational patterns between tissues and individuals, particularly in the context of specific organs21,26,29,30,34,88,89, may have profound clinical implications.
Somatic mutations and disease
Somatic mutations can profoundly alter the phenotype of a cell and have been implicated in human diseases. Besides cancer, various other diseases and conditions can be a result of somatic mutations, including cardiovascular anomalies, immunological and neurological disorders26,30,31,32,33,34,52,90,91. Of note, early somatic mutations can cause clonal expansions and alterations in the differentiation programs of precursor cells that subsequently can lead to paediatric cancers and organ overgrowth84,92,93. Among the first described instances of somatic mutagenesis, PI3K–AKT–mTOR pathway mutations involving the brain were associated with brain malformations leading to intractable epilepsy33,94,95. Other examples are NRAS mutations leading to congenital melanocytic nevi96 and UBA1 mutations in haematopoietic stem cells97 leading to VEXAS syndrome, a rare and severe inflammatory disorder. Somatic expansions of short tandem repeats in the brain can cause cell death and neurodegeneration98, and underpin Huntington disease99,100. Large SVs, including CNVs and MEIs, have also been implicated in neurodevelopmental and neurodegenerative disorders72,101,102.
The effects of somatic mutations can be highly specific to the timing and tissue of origin. For example, an activating PIK3CA mutation acquired during development can lead to widespread overgrowth across organs and vascular malformations91. However, PIK3CA mutations acquired after development can lead to cavernomas in the brain103 and are also a common driver mutation observed in normal colonic24 and endometrial epithelium79.
Clonal expansions can also indirectly lead to or influence other diseases26. An example is clonal haematopoiesis of indeterminate potential (CHIP), characterized by a clonal expansion within the haematopoietic stem cell compartment driven by somatic mutations. CHIP is highly prevalent in the context of normal ageing26. Besides acting as a potential cancer precursor clone, CHIP has been linked to various non-cancer diseases, such an increased risk of cardiovascular disease104 and infections105.
Conversely, diseases can also select clones with certain adaptive somatic mutations. Recently, inflammatory bowel disease has been shown to lead to the preferential remodelling of the colonic epithelium with clones harbouring IL-17 and Toll-like receptor pathway mutations29,106. Likewise, chronic liver disease selects for clones of hepatocytes that escape the toxicity imposed by the disease, notably, by recurrent, independent mutations in FOXO1, CIDEB and GPAM, which are all involved in lipid metabolism89.
Together, research over the past years has shown that somatic evolution is ubiquitous in normal tissues and is fundamental to our understanding of the causes, mechanisms and consequences of disease, and the normal process of ageing.
The SMaHT Network
The SMaHT Network, funded by the NIH Common Fund, was established with the goal of transforming our understanding of how somatic variation in human cells influences biological processes. The SMaHT Network will accomplish this through the following aims: (1) generate a comprehensive dataset of somatic variants across human tissues (Fig. 2); (2) develop tools and technologies to optimize detection and characterization of various types of somatic variants; and (3) create a somatic mutation database that is widely used by researchers and the wider public, and interoperable with similar datasets.
Overview of sampling from 19 primary tissue sites, spanning three developmental germ layers (endoderm, mesoderm and ectoderm) and germ cells. Although organs represent a mixture of cells derived from the germ layers (for example, skin epidermis (ectoderm) versus dermis (mesoderm), and adrenal gland medulla (ectoderm) versus cortex (mesoderm)), we have indicated the major germ layer represented by each organ. Gonads represent germ cells and their supportive structures (mesoderm), whereas buccal swabs are variable mixtures of germ layers (mesoderm and ectoderm).
The Network comprises five Genome Characterization Centres (GCCs), 14 Tool and Technology Development projects (TTDs), an Organizational Centre (OC), a Data Analysis Centre (DAC) and a Tissue Procurement Centre (TPC), and includes over 250 researchers from 52 institutions. The GCCs are tasked with producing a core dataset of somatic mutations for the SMaHT Network from multiple tissues collected by TPC, whereas TTDs are tasked with developing novel experimental assays and computational tools. The DAC will integrate the data generated by GCCs and TTDs to build the somatic mutation catalogue, data portal and the analysis work bench for the Network. The OC will coordinate the Network activities and focus on outreach efforts and building liaison with other genomics consortia. The SMaHT Network has implemented a set of policies (https://smaht.org/policies/), including a policy to allow external researchers to apply for associate membership of the Network.
The tissues to be profiled by the Network include those arising from the three germ layers and germlines within the human body, which will give the opportunity to delineate early somatic mutations that are common across all tissues, as well as later mutations that are unique to certain tissues (Fig. 2). The TPC is partnering with multiple organ procurement organizations (OPOs) in the USA for the screening, authorization and recovery of tissues from post-mortem organ and tissue donors. Tissues will be collected following transplant recovery and include the ascending and descending colon, oesophagus, lung and liver (predominantly endoderm); blood, heart, aorta and skeletal muscle (predominantly mesoderm); and the brain, adrenal gland, sun-exposed and non-sun-exposed skin (predominantly ectoderm). We also aim to collect buccal swabs to assess the extent of the somatic mutation landscape that can be gleaned from clinically accessible tissues in living donors. To study mutagenesis in germ cells, we also aim to collect ovaries and testes. Finally, to enable various experimental techniques requiring live cells, we will derive fibroblast cultures from the dermis (skin). All tissues are requested to be recovered from each donor approached for the SMaHT tissue collection. The number and type of samples collected from each donor will vary based on donor authorization and eligibility (Box 1), but the goal is to recover as many tissues from a single donor as possible. To study the mechanisms and consequences of somatic mosaicism across the lifespan, these post-mortem donors will span the human adult age ranges from 18 to over 85 years. The race and ethnicity of donors are assessed using a single-question framework.
To maximize the scientific and clinical impact of the dataset, the TPC will collect a large amount of donor metadata during donation and biospecimen collection, building on practices developed for the Genotype-Tissue Expression (GTEx)107 and developmental GTEx projects108. De-identified donor-level data will include demographic information, medical history, sample-based laboratory test results and death circumstances. Sample-level data will include tissue type and location, ischaemic time and tissue metrics from pathology review. Pathology images will be made publicly available. When possible, tissue sampling will align with the common coordinate framework structure of other large-scale projects. For all of these biospecimens, sufficient fresh-frozen material will be collected and banked to enable all core assays as well as implementation of novel emerging technologies. Fixed samples for pathology review will be collected from adjacent sites to the fresh-frozen specimens utilizing a standardized collection schema developed for each tissue type.
To pursue a demographically robust and evenly sex-distributed pool of donors, the SMaHT Network includes an ethical, legal and social implications project109 consistent with the recommendations of the American Society for Human Genetics to address under-representation in human genomics studies with meaningful engagement of under-represented communities109. This ethical, legal and social implications substudy engages geographically, racially, ethnically and socioculturally diverse stakeholders, which include family decision-makers, tissue requesters, community advisory board members and multi-disciplinary specialty committee members throughout the entire duration of the SMaHT Network. Feedback from community stakeholders will be leveraged to inform communication and enrolment efforts as well as dissemination of study findings.
The SMaHT Network is uniquely positioned to collaborate with many other large consortia and programmes. These include the Human Pangenome Reference Consortium9, to leverage methods for constructing haplotype-phased genome assemblies; the Impact of Genomic Variation on Function Consortium110, to understand the functional consequences of genetic variation; the developmental GTEx project108, to access datasets from tissues at early developmental stages; the Human Tumor Analysis Network and PreCancer Atlas, to further understand the progression from normal cells to tumour cells through somatic mutations; and PsychENCODE111, to inform on the phenotypic consequences of brain somatic mosaicism. These collaborations will enrich the individual studies and, ultimately, through data integration and cross-network analyses, further enhance our understanding of the context and consequences of somatic mutations.
Producing the somatic mutation catalogue
To produce the first phase of the somatic mutation catalogue, the SMaHT Network will strike a balance between standard genomic assays, productionized and applied uniformly by the GCCs to all tissues, and bespoke assays developed by the TTD projects, focusing on novel technological approaches. As part of the initial phase of the SMaHT project, benchmarking efforts are nearing completion, using both primary human tissues and cell lines. We have used this benchmarking to determine optimal sequencing coverage, compare the accuracy of variant calling algorithms, and evaluate the utility of long-read and short-read sequencing data generated on diverse sequencing platforms from multiple GCCs.
The GCCs will deploy three core assays across all tissue specimens that meet quality thresholds: deep short-read whole-genome sequencing (WGS; over 300× coverage), long-read WGS (over 30× coverage) sequencing and RNA sequencing (over 50 million reads). The deep short-read WGS will enable the discovery of high allele frequency somatic mutations across tissues acquired early in embryogenesis, as well as discovery of the large clonal expansions arising later in life. As these core assays will be performed on bulk tissues, composed of heterogeneous cell types, only mutations with a relatively high variant allele frequency (above 1–2%) will be accurately detectable at the proposed depth of sequencing. The long-read WGS will facilitate the detection of complex SVs, MEIs and variants in complex genetic loci that have been challenging to accurately study using short-read data, such as the MHC region, centromeres, telomeres, acrocentric DNA including ribosomal DNA and other tandem-repeat regions of the genome. Ultra-long-read sequencing will enable us to generate near telomere-to-telomere donor-specific reference genome assemblies for at least 50 donors and through reducing misalignment, enhance the discovery of diverse types of variants within an individual50, including complex somatic SVs and other mutations in previously unmappable regions of the genome112. Finally, RNA sequencing may allow us to assess transcriptional consequences of early mutations and late clonal expansions, as well as, by comparison with single-cell RNA sequencing atlases113, cell-type composition of heterogeneous tissues.
In addition to these core assays, GCCs will deploy three approaches specifically designed to profile low-frequency somatic mutation: duplex sequencing, single-cell WGS and transcript-based detection of mutations. These technologies, although published and well-tested, represent recent innovations and have not yet been systematically deployed across sequencing centres or applied at large scale.
As conventional DNA sequencing platforms have a non-trivial sequencing error rate (in the order of 1 in 1,000–10,000), a putative mutation needs to be detected in multiple independent reads to assure it is not artefactual. However, by sequencing both the forward and the reverse strands of each individual DNA duplex molecule, this error rate is drastically reduced. As the reduced error rate is much lower than the expected number of somatic mutations in most tissues, an average mutation burden and mutational profile can be obtained by shallow genome-wide duplex coverage (0.5–2×)63,114. Duplex sequencing of bulk tissue samples is well suited to finding average mutation burdens and spectra of SNVs and indels within cell populations, but the low depth generally precludes discovery of somatic CNVs and SVs, or the precise inference of variant allele frequency of specific mutations.
Even with a reduced sequencing error rate, bulk DNA sequencing will average out the mutational patterns of all cells and does not allow assessment of the variability of mutational patterns between cells or the reconstruction of cell lineages. Instead, sequencing the DNA of single cells or single-cell-derived clones will enable the most detailed discovery of somatic mutations. This can be achieved either by expanding single cells in vitro6,25,26,52,59 or laser capture microdissection to isolate naturally occurring clonal populations of cells7,24,79,88,115.
Alternatively, direct single-cell DNA sequencing is applicable to all cell types, including non-dividing cells. However, whole-genome amplification can cause allelic or locus dropout, uneven coverage across the genome and artefactual variants introduced during biochemical amplification. The direct library preparation (DLP+)116,117 method avoids whole-genome amplification and allows for the accurate detection of CNVs at the single-cell level and other mutations at the population level. The primary template-directed amplification (PTA)30,118 method offers a substantial improvement in data quality over previous single-cell amplification methods, resulting in more uniform genome coverage and fewer artefactual variants. A more recent version of PTA, the ResolveOme approach, profiles both the transcriptome and the genome from the same single cell. If validated, this approach will represent a major advance in allowing new mutation detection and cellular phenotyping at the same time. Profiling somatic mutations in single cells will enable us to characterize mutational patterns and associations between mutation types and to reconstruct phylogenetic trees of normal cells across tissues. In cases of polyploid cells, the variant allele frequencies of somatic mutations may deviate from the expected 0.5 and ploidy will need to be taken in consideration in downstream analyses.
Finally, at least some somatic mutations can be inferred from RNA119,120,121. Methods that allow for the interrogation of the full-length transcriptome in single cells, such as Smart-seq3 (ref. 122) or STORM-seq123, can facilitate the detection of somatic mutations, such as SNVs, indels and fusion genes within transcribed regions of the genome. This allows assessing cell-type specificity for clonal expansion of certain genetic variants. Furthermore, STORM-seq enables quantification of transposable element expression at single-cell resolution, which has been shown to be challenging with other single-cell RNA sequencing methods124. The single-cell data also provide references for a more precise deconvolution of cell types in bulk tissues.
Each of these methods for the detection of somatic mosaic variants presents its own advantages and disadvantages and thus they are complementary (Table 1). For example, although genome-wide duplex sequencing has a lower sequencing error rate and excels at population-level inferences of patterns of short mutations acquired during the entire lifespan, the low depth precludes detection of the precise allele frequency of a specific variant. Bulk sequencing at medium–high coverage (300×) will only detect variants at a sufficiently high frequency (that is, 1–2%) in tissues, which are mostly acquired in early embryogenesis. Single-cell sequencing can in principle detect all variants present in a single cell and allow reconstruction of cell phylogenies, but it requires significant costs and efforts to address genome amplification artefacts. RNA-based mutation discovery allows for direct integration of mutations with transcriptomic information but is naturally confined to expressed regions of the genome. Together, these genomic assays function as complementary techniques to detect somatic mutations and will enable the robust interrogation of mutational patterns across human tissues.
Areas of technological development
As new technologies to interrogate somatic mutations with high resolution or sensitivity are constantly emerging, a large part of the SMaHT Network is devoted to developing new tools and technologies (Box 2). The first area of innovation aims to increase the accuracy of mutation detection in single cells or molecules by further reducing background noise. For single-cell WGS, a limited cloning step to create small pools of cells can reduce allelic dropout and amplification artefacts. In parallel, the SMaHT Network aims to reduce the error rate of amplification and sequencing for single cells and molecules through various adaptations of duplex sequencing technologies63,125,126,127. These approaches will allow for the interrogation of the landscape of somatic variation in single cells and complex multicellular tissues with high precision, which is crucial to study tissues without large-scale expansions.
Second, the SMaHT Network aims to increase the sensitivity of SV detection to single molecules or cells. As SVs extend beyond the length of a typical short read, long-read sequencing unlocks SV detection across the genome, especially for MEIs and other rearrangements in repetitive regions128,129. However, many single-cell DNA amplification approaches result in short fragments. Therefore, we are applying long-read sequencing to clonal populations such as induced pluripotent stem cell lines, which have been used25 in lieu of single cells for lineage reconstructions as they can be expanded and analysed by bulk sequencing, avoiding in vitro DNA amplification. In addition, MEIs can be cost-effectively assessed by target enrichment assays as new insertions share conserved sequences in each transposon subfamily. We are developing targeted detection of MEI insertions by utilizing Cas9-targeted long-read sequencing130 and PTA-amplified micro-bulk or single cells73. These efforts will unlock the study of SVs and MEIs in all tissues and across the lifespan, even in the absence of clonal expansions.
Third, the SMaHT Network will develop scalable platforms that can perform variant detection spatially in human tissues, through single-cell DNA and RNA sequencing with resolved spatial barcodes131,132. This will allow us to study the prevalence and extent of clonal expansions across ages and tissues, especially in organs without a clearly organized tissue architecture.
An outstanding question is the effect of specific somatic mutations on the phenotype of the cells that harbour them. Although certain mutations are under positive selection and lead to clonal expansions, how these mutations alter cellular phenotypes is mostly unknown. The consequence of a mutation can be assessed by combining mutational readouts, either through genotyping of specific mutations133,134 or genome sequencing, in combination with functional readouts of cells, such as the transcriptome, proteome, epigenome, methylome and the chromatin accessibility landscape135,136,137,138,139,140. Interpreting the phenotypic effects of somatic mutations will greatly benefit our understanding of the clinical consequences.
The efforts in tool and technological development within the SMaHT Network are focused on improving precision in somatic mutation detection and interpretation at scale, each addressing vital shortcomings of current assays, with a goal to productionize and deploy many of these within the Network at large. After the development phase, the precise extent and scope of the deployment of these assays across the SMaHT tissues and donors will depend on the cost, scalability and priorities of the Network.
Integration and analysis of data
The low variant allele frequency of mosaic variants brings unique challenges in bioinformatic analysis141, and we expect that novel computational methods and tools are needed to fully analyse the data and to increase the sensitivity and specificity of variant detection. Somatic mutation detection algorithms developed in cancer genomics are often inadequate for detecting variants with allele fractions less than 2–5% and simply increasing the depth of sequencing is not cost-effective. Thus, more sophisticated machine learning algorithms that efficiently incorporate various local features near candidate variants may prove useful136,137,138,142.
Other challenges include optimal integration of long-read and short-read data, inference of lineage relationships based on bulk and single-cell data, and effective strategies for integrative and comparative analysis of samples across the tissues and across individuals. An important aspect of our analysis will be the use of donor-specific diploid genomes assembled using short Illumina, long PacBio and ultra-long Nanopore and Hi-C reads. Alignment to the donor-specific reference genome135 will allow for more accurate variant identification, especially in repetitive regions, as well as for examination of allele-specific transcriptional and epigenetic modulations associated with genetic variants.
The SMaHT DAC will lead an effort to collect, curate and analyse the vast amount of multi-modal data generated on multiple platforms and to create a data resource for the scientific community. The DAC will ensure high data standards with various quality control steps and compile extensive metadata describing experimental and data processing protocols, following the FAIR (Findable, Accessible, Interoperable and Reusable) guidelines143. Scalable and cost-effective analytical workflows will be implemented on a cloud platform with full provenance and docker images to enable reproducibility of the analysis output.
The data generated by the consortium will be made available to the wider scientific community via a user-friendly and secure web portal (https://data.smaht.org). This portal will feature: (1) a reference catalogue of somatic variants that can be searched (for example, by locus, tissue or phenotypic features such as age) and annotated with information from other genomics databases; (2) a workbench that enables users to apply the computational pipelines developed by the SMaHT Network to their own data; and (3) data visualization tools including a multi-scale browser that allows users to navigate the data from a genome-level view to the sequencing read-level view. Visual inspection of variants using such a browser will be particularly helpful in assessing their quality, and the annotations will enable rapid identification of variants that may be functionally relevant.
Conclusion
The SMaHT Network aims to produce a comprehensive reference catalogue of somatic mutations, across tissues and individuals, by harnessing the full potential of many different genomic assays, including short-read and long-read bulk WGS, duplex sequencing, ultra-long-read sequencing, single-cell DNA sequencing and RNA sequencing (Fig. 3). The Network will develop new tools and technologies to increase our ability to detect somatic mutations as well as infer their phenotypic consequences at greater resolution. All of these various data modalities will be integrated, analysed and released to the research community and wider public.
Overview of sampling methods and sequencing assays deployed in the SMaHT Network, as well as the biological questions, outcomes and inferred mutational patterns from downstream analyses of the catalogue of somatic mutations across normal tissues, including mutation rates or burdens, selection, lineage tracing and mutational signatures (reference signatures were obtained from https://cancer.sanger.ac.uk/signatures)44. ZMW, zero-mode waveguide.
An extensive catalogue of somatic mutations will reveal mutational patterns, rates and signatures across tissues, allowing us to infer the biological and molecular processes that govern somatic mutagenesis and their adaptive and maladaptive consequences for development and disease (Fig. 3). Our assays can inform on mutations under selection in tissues, which result in clonal expansions and potentially tissue dysfunction. Single-cell analyses added to the bulk readouts will further allow us to generate cellular phylogenies of human development, infer embryonic differentiation dynamics and improve our future assessment of de novo germline mutations.
Delineating the full extent of somatic mosaicism greatly exceeds the scope of the Human Genome Project. A typical cell may acquire hundreds to thousands of somatic mutations in a lifetime. There are trillions of cells in a human body and so the total number of somatic mutations acquired in a single individual may well exceed quadrillions (1015), millions of times the size of the human genome. Beyond cataloguing somatic variation across tissues, the SMaHT Network provides the opportunity to understand the causes, patterns and consequences of somatic mutations in normal cells, and provide a crucial comparison baseline for disease research. The efforts of the SMaHT Network will substantially contribute to our insights into the role of somatic variation in health, ageing and disease.
References
International Human Genome Sequencing Consortium. Initial sequencing and analysis of the human genome. Nature 409, 860–921 (2001).
Durbin, R. M. et al. A map of human genome variation from population-scale sequencing. Nature 467, 1061–1073 (2010).
Wang, T. et al. The Human Pangenome Project: a global resource to map genomic diversity. Nature 604, 437–446 (2022).
Stratton, M. R., Campbell, P. J. & Futreal, P. A. The cancer genome. Nature 458, 719–724 (2009).
Saini, N. & Gordenin, D. A. Somatic mutation load and spectra: a record of DNA damage and repair in healthy human cells. Environ. Mol. Mutagen. 59, 672–686 (2018).
Park, S. et al. Clonal dynamics in early human embryogenesis inferred from somatic mutation. Nature 597, 393–397 (2021). By amplifying cells into clones and subsequent sequencing, this study reconstructs embryonic dynamics through somatic mutation patterns.
Coorens, T. H. H. et al. Extensive phylogenies of human development inferred from somatic mutations. Nature 597, 387–392 (2021). This study reconstructs phylogenetic trees of human development through the detection of somatic mutations in many different tissues of the same donors.
Evrony, G. D. et al. Single-neuron sequencing analysis of L1 retrotransposition and somatic mutation in the human brain. Cell 151, 483–496 (2012). This study represents a first foray into single-cell DNA sequencing to uncover somatic mutations in single neurons.
Bae, T. et al. Different mutational rates and mechanisms in human cells at pregastrulation and neurogenesis. Science 359, 550–555 (2018). Using somatic SNVs discovered in human brain progenitor cell clones, this study obtains a human embryonic lineage tree and estimates mutation frequencies at pregastrulation and neurogenesis.
Biesecker, L. G. & Spinner, N. B. A genomic view of mosaicism and human disease. Nat. Rev. Genet. 14, 307–320 (2013).
Martínez-Glez, V. et al. A six-attribute classification of genetic mosaicism. Genet. Med. 22, 1743–1757 (2020).
Salk, J. J., Schmitt, M. W. & Loeb, L. A. Enhancing the accuracy of next-generation sequencing for detecting rare and subclonal mutations. Nat. Rev. Genet. 19, 269–285 (2018).
Martincorena, I. et al. Universal patterns of selection in cancer and somatic tissues. Cell 171, 1029–1041.e21 (2017).
The ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Laurie, C. C. et al. Detectable clonal mosaicism from birth to old age and its relationship to cancer. Nat. Genet. 44, 642–650 (2012).
Jacobs, K. B. et al. Detectable clonal mosaicism and its relationship to aging and cancer. Nat. Genet. 44, 651–658 (2012).
Abyzov, A. et al. Somatic copy number mosaicism in human skin revealed by induced pluripotent stem cells. Nature 492, 438–442 (2012). Using induced pluripotent stem cell lines to perform clonal analysis, this study outlines somatic CNVs in skin fibroblasts from multiple families.
McConnell, M. J. et al. Mosaic copy number variation in human neurons. Science 342, 632–637 (2013).
Blokzijl, F. et al. Tissue-specific mutation accumulation in human adult stem cells during life. Nature 538, 260–264 (2016). This study is one of the first to use single-cell-derived organoids across tissues in humans to demonstrate a variability in somatic mutation rate.
Lee-Six, H. et al. Population dynamics of normal human blood inferred from somatic mutations. Nature 561, 473–478 (2018).
Lodato, M. A. et al. Aging and neurodegeneration are associated with increased mutations in single human neurons. Science 359, 555–559 (2018).
Milholland, B. et al. Differences between germline and somatic mutation rates in humans and mice. Nat. Commun. 8, 15183 (2017).
Zhang, L. et al. Single-cell whole-genome sequencing reveals the functional landscape of somatic mutations in B lymphocytes across the human lifespan. Proc. Natl Acad. Sci. USA 116, 9014–9019 (2019).
Lee-Six, H. et al. The landscape of somatic mutation in normal colorectal epithelial cells. Nature 574, 532–537 (2019).
Fasching, L. et al. Early developmental asymmetries in cell lineage trees in living individuals. Science 371, 1245–1248 (2021). Reconstructing lineage trees via somatic mutations in living humans, this study demonstrates that embryonic lineages are often asymmetric.
Mitchell, E. et al. Clonal dynamics of haematopoiesis across the human lifespan. Nature 606, 343–350 (2022).
Jourdon, A., Fasching, L., Scuderi, S., Abyzov, A. & Vaccarino, F. M. The role of somatic mosaicism in brain disease. Curr. Opin. Genet. Dev. 65, 84–90 (2020).
Breuss, M. W. et al. Somatic mosaicism reveals clonal distributions of neocortical development. Nature 604, 689–696 (2022).
Olafsson, S. et al. Somatic evolution in non-neoplastic IBD-affected colon. Cell 182, 672–684.e11 (2020).
Miller, M. B. et al. Somatic genomic changes in single Alzheimer’s disease neurons. Nature 604, 714–722 (2022).
Heimlich, J. B. & Bick, A. G. Somatic mutations in cardiovascular disease. Circ. Res. 130, 149–161 (2022).
Poduri, A. et al. Somatic activation of AKT3 causes hemispheric developmental brain malformations. Neuron 74, 41–48 (2012).
Chung, C. et al. Comprehensive multi-omic profiling of somatic mutations in malformations of cortical development. Nat. Genet. 55, 209–220 (2023).
Bae, T. et al. Analysis of somatic mutations in 131 human brains reveals aging-associated hypermutability. Science 377, 511–517 (2022). Using data collected by the Brain Somatic Mosaicism Network, this paper describes types and frequencies of early somatic mutations in the human brain.
Lim, E. T. et al. Rates, distribution and implications of postzygotic mosaic mutations in autism spectrum disorder. Nat. Neurosci. 20, 1217–1224 (2017).
Baldassari, S. et al. Dissecting the genetic basis of focal cortical dysplasia: a large cohort study. Acta Neuropathol. 138, 885–900 (2019).
D’Gama, A. M. & Walsh, C. A. Somatic mosaicism and neurodevelopmental disease. Nat. Neurosci. 21, 1504–1514 (2018).
Evans, M. A. & Walsh, K. Clonal hematopoiesis, somatic mosaicism, and age-associated disease. Physiol. Rev. 103, 649–716 (2023).
Park, J. S. et al. Brain somatic mutations observed in Alzheimer’s disease associated with aging and dysregulation of tau phosphorylation. Nat. Commun. 10, 3090 (2019).
Behjati, S. et al. Genome sequencing of normal cells reveals developmental lineages and mutational processes. Nature 513, 422–425 (2014). Using single-cell-derived organoids of mouse tissues, this paper demonstrates the use of SNVs for in vivo lineage tracing and is an early report of developmental asymmetry.
Pleasance, E. D. et al. A comprehensive catalogue of somatic mutations from a human cancer genome. Nature 463, 191–196 (2010).
Chaisson, M. J. P. et al. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10, 1784 (2019).
Tate, J. G. et al. COSMIC: the catalogue of somatic mutations in cancer. Nucleic Acids Res. 47, D941–D947 (2019).
Alexandrov, L. B. et al. The repertoire of mutational signatures in human cancer. Nature 578, 94–101 (2020).
Drews, R. M. et al. A pan-cancer compendium of chromosomal instability. Nature 606, 976–983 (2022).
Macintyre, G. et al. Copy number signatures and mutational processes in ovarian carcinoma. Nat. Genet. 50, 1262–1270 (2018).
Li, Y. et al. Patterns of somatic structural variation in human cancer genomes. Nature 578, 112–121 (2020).
Moore, L. et al. The mutational landscape of human somatic and germline cells. Nature 597, 381–386 (2021). By studying many tissues from the same donor, this study demonstrates the variability in mutation burden and signatures across human tissues.
Choudhury, S. et al. Somatic mutations in single human cardiomyocytes reveal age-associated DNA damage and widespread oxidative genotoxicity. Nat. Aging 2, 714–725 (2022).
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Martincorena, I. et al. Tumor evolution. High burden and pervasive positive selection of somatic mutations in normal human skin. Science 348, 880–886 (2015). This study demonstrates the abundance of clones harbouring cancer-driver mutations in a normal tissue using ultra-deep sequencing.
Machado, H. E. et al. Diverse mutational landscapes in human lymphocytes. Nature 608, 724–732 (2022).
Yoshida, K. et al. Tobacco smoking and somatic mutations in human bronchial epithelium. Nature 578, 266–272 (2020).
Li, R. et al. A body map of somatic mutagenesis in morphologically normal human tissues. Nature 597, 398–403 (2021).
Supek, F. & Lehner, B. Differential DNA mismatch repair underlies mutation rate variation across the human genome. Nature 521, 81–84 (2015).
Haradhvala, N. J. et al. Mutational strand asymmetries in cancer genomes reveal mechanisms of DNA damage and repair. Cell 164, 538–549 (2016).
Vöhringer, H., Hoeck, A. V., Cuppen, E. & Gerstung, M. Learning mutational signatures and their multidimensional genomic properties with TensorSignatures. Nat. Commun. 12, 3628 (2021).
Lodato, M. A. et al. Somatic mutation in single human neurons tracks developmental and transcriptional history. Science 350, 94–98 (2015).
Spencer Chapman, M. et al. Lineage tracing of human development through somatic mutations. Nature 595, 85–90 (2021).
Bizzotto, S. et al. Landmarks of human embryonic development inscribed in somatic mutations. Science 371, 1249–1253 (2021).
Bizzotto, S. & Walsh, C. A. Genetic mosaicism in the human brain: from lineage tracing to neuropsychiatric disorders. Nat. Rev. Neurosci. 23, 275–286 (2022).
Coorens, T. H. H. et al. Inherent mosaicism and extensive mutation of human placentas. Nature 592, 80–85 (2021).
Abascal, F. et al. Somatic mutation landscapes at single-molecule resolution. Nature 593, 405–410 (2021). This study develops ultra-low error rates for duplex sequencing that allows the interrogation of somatic mutations at the single-read level.
Rahbari, R. et al. Timing, rates and spectra of human germline mutation. Nat. Genet. 48, 126–133 (2016).
Bloom, J. C., Loehr, A. R., Schimenti, J. C. & Weiss, R. S. Germline genome protection: implications for gamete quality and germ cell tumorigenesis. Andrology 7, 516–526 (2019).
Maklakov, A. A. & Immler, S. The expensive germline and the evolution of ageing. Curr. Biol. 26, R577–R586 (2016).
Cai, X. et al. Single-cell, genome-wide sequencing identifies clonal somatic copy-number variation in the human brain. Cell Rep. 8, 1280–1289 (2014).
Sun, C. et al. Mapping the complex genetic landscape of human neurons. Preprint at bioRxiv https://doi.org/10.1101/2023.03.07.531594 (2023).
Loh, P. R. et al. Insights into clonal haematopoiesis from 8,342 mosaic chromosomal alterations. Nature 559, 350–355 (2018).
Evrony, G. D. et al. Cell lineage analysis in human brain using endogenous retroelements. Neuron 85, 49–59 (2015). This paper represents early work using single-cell somatic mutation signals to reconstruct cell lineages.
Erwin, J. A. et al. L1-associated genomic regions are deleted in somatic cells of the healthy human brain. Nat. Neurosci. 19, 1583–1591 (2016).
Zhu, X. et al. Machine learning reveals bilateral distribution of somatic L1 insertions in human neurons and glia. Nat. Neurosci. 24, 186–196 (2021).
Zhao, B. et al. Somatic LINE-1 retrotransposition in cortical neurons and non-brain tissues of Rett patients and healthy individuals. PLoS Genet. 15, e1008043 (2019).
Shukla, R. et al. Endogenous retrotransposition activates oncogenic pathways in hepatocellular carcinoma. Cell 153, 101–111 (2013).
Nam, C. H. et al. Widespread somatic L1 retrotransposition in normal colorectal epithelium. Nature 617, 540–547 (2023).
Shah, N. M. et al. Pan-cancer analysis identifies tumor-specific antigens derived from transposable elements. Nat. Genet. 55, 631–639 (2023).
Gebrie, A. Transposable elements as essential elements in the control of gene expression. Mob. DNA 14, 9 (2023).
Martincorena, I. & Campbell, P. J. Somatic mutation in cancer and normal cells. Science 349, 1483–1489 (2015).
Moore, L. et al. The mutational landscape of normal human endometrial epithelium. Nature 580, 640–646 (2020).
Martincorena, I. et al. Somatic mutant clones colonize the human esophagus with age. Science 362, 911–917 (2018).
Colom, B. et al. Mutant clones in normal epithelium outcompete and eliminate emerging tumours. Nature 598, 510–514 (2021).
Abby, E. et al. Notch1 mutations drive clonal expansion in normal esophageal epithelium but impair tumor growth. Nat. Genet. 55, 232–245 (2023).
Choi, S. H., Ku, E. J., Choi, Y. A. & Oh, J. W. Grave-to-cradle: human embryonic lineage tracing from the postmortem body. Exp. Mol. Med. 55, 13–21 (2023).
Coorens, T. H. H. et al. Embryonal precursors of Wilms tumor. Science 366, 1247–1251 (2019).
Kwon, S. G. et al. Asymmetric contribution of blastomere lineages of first division of the zygote to entire human body using post-zygotic variants. Tissue Eng. Regen. Med. 19, 809–821 (2022).
Ju, Y. S. et al. Somatic mutations reveal asymmetric cellular dynamics in the early human embryo. Nature 543, 714–718 (2017).
Zernicka-Goetz, M. First cell fate decisions and spatial patterning in the early mouse embryo. Semin. Cell Dev. Biol. 15, 563–572 (2004).
Brunner, S. F. et al. Somatic mutations and clonal dynamics in healthy and cirrhotic human liver. Nature 574, 538–542 (2019).
Ng, S. W. K. et al. Convergent somatic mutations in metabolism genes in chronic liver disease. Nature 598, 473–478 (2021).
Sherman, M. A. et al. Large mosaic copy number variations confer autism risk. Nat. Neurosci. 24, 197–203 (2021).
Keppler-Noreuil, K. M. et al. PIK3CA-related overgrowth spectrum (PROS): diagnostic and testing eligibility criteria, differential diagnosis, and evaluation. Am. J. Med. Genet. A 167A, 287–295 (2015).
Custers, L. et al. Somatic mutations and single-cell transcriptomes reveal the root of malignant rhabdoid tumours. Nat. Commun. 12, 1407 (2021).
Pilet, J. et al. Preneoplastic liver colonization by 11p15.5 altered mosaic cells in young children with hepatoblastoma. Nat. Commun. 14, 7122 (2023).
Checri, R. et al. Detection of brain somatic mutations in focal cortical dysplasia during epilepsy presurgical workup. Brain Commun. 5, fcad174 (2023).
Lee, J. H. et al. De novo somatic mutations in components of the PI3K-AKT3-mTOR pathway cause hemimegalencephaly. Nat. Genet. 44, 941–945 (2012).
Kinsler, V. A. et al. Multiple congenital melanocytic nevi and neurocutaneous melanosis are caused by postzygotic mutations in codon 61 of NRAS. J. Invest. Dermatol. 133, 2229–2236 (2013).
Beck, D. B. et al. Somatic mutations in UBA1 and severe adult-onset autoinflammatory disease. N. Engl. J. Med. 383, 2628–2638 (2020).
Higham, C. F., Morales, F., Cobbold, C. A., Haydon, D. T. & Monckton, D. G. High levels of somatic DNA diversity at the myotonic dystrophy type 1 locus are driven by ultra-frequent expansion and contraction mutations. Hum. Mol. Genet. 21, 2450–2463 (2012).
De Rooij, K. E., De Koning Gans, P. A., Roos, R. A., Van Ommen, G. J. & Den Dunnen, J. T. Somatic expansion of the (CAG)n repeat in Huntington disease brains. Hum. Genet. 95, 270–274 (1995).
Swami, M. et al. Somatic expansion of the Huntington’s disease CAG repeat in the brain is associated with an earlier age of disease onset. Hum. Mol. Genet. 18, 3039–3047 (2009).
Maury, E. A. et al. Schizophrenia-associated somatic copy-number variants from 12,834 cases reveal recurrent NRXN1 and ABCB11 disruptions. Cell Genom. 3, 100356 (2023).
Kim, J. et al. Prevalence and mechanisms of somatic deletions in single human neurons during normal aging and in DNA repair disorders. Nat. Commun. 13, 5918 (2022).
Ren, A. A. et al. PIK3CA and CCM mutations fuel cavernomas through a cancer-like mechanism. Nature 594, 271–276 (2021).
Jaiswal, S. et al. Age-related clonal hematopoiesis associated with adverse outcomes. N. Engl. J. Med. 371, 2488–2498 (2014).
Zekavat, S. M. et al. Hematopoietic mosaic chromosomal alterations increase the risk for diverse types of infection. Nat. Med. 27, 1012–1024 (2021).
Nanki, K. et al. Somatic inflammatory gene mutations in human ulcerative colitis epithelium. Nature 577, 254–259 (2020).
Carithers, L. J. et al. A novel approach to high-quality postmortem tissue procurement: the GTEx project. Biopreserv. Biobank 13, 311–319 (2015).
Coorens, T. H. H. et al. The human and non-human primate developmental GTEx projects. Nature 637, 557–564 (2025).
Lemke, A. A. et al. Addressing underrepresentation in genomics research through community engagement. Am. J. Hum. Genet. 109, 1563–1571 (2022).
IGVF Consortium. Deciphering the impact of genomic variation on function. Nature 633, 47–57 (2024).
Akbarian, S. et al. The PsychENCODE project. Nat. Neurosci. 18, 1707–1712 (2015).
Aganezov, S. et al. A complete reference genome improves analysis of human genetic variation. Science 376, eabl3533 (2022).
Regev, A. et al. The Human Cell Atlas. eLife 6, e27041 (2017).
Cheng, A. P. et al. Error-corrected flow-based sequencing at whole-genome scale and its application to circulating cell-free DNA profiling. Nat. Methods 22, 973–981 (2025).
Wang, Y. et al. APOBEC mutagenesis is a common process in normal human small intestine. Nat. Genet. 55, 246–254 (2023).
Zahn, H. et al. Scalable whole-genome single-cell library preparation without preamplification. Nat. Methods 14, 167–173 (2017).
Laks, E. et al. Clonal decomposition and DNA replication states defined by scaled single-cell genome sequencing. Cell 179, 1207–1221.e22 (2019).
Gonzalez-Pena, V. et al. Accurate genomic variant detection in single cells with primary template-directed amplification. Proc. Natl Acad. Sci. USA 118, e2024176118 (2021).
Yizhak, K. et al. RNA sequence analysis reveals macroscopic somatic clonal expansion across normal tissues. Science 364, eaaw0726 (2019).
Muyas, F. et al. De novo detection of somatic mutations in high-throughput single-cell profiling data sets. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01863-z (2023).
Gao, T. et al. A pan-tissue survey of mosaic chromosomal alterations in 948 individuals. Nat. Genet. https://doi.org/10.1038/s41588-023-01537-1 (2023).
Hagemann-Jensen, M. et al. Single-cell RNA counting at allele and isoform resolution using Smart-seq3. Nat. Biotechnol. 38, 708–714 (2020).
Johnson, B. K. et al. Efficient profiling of total RNA in single cells with STORM-seq. Preprint at bioRxiv https://doi.org/10.1101/2022.03.14.484332 (2025).
Shao, W. & Wang, T. Transcript assembly improves expression quantification of transposable elements in single-cell RNA-seq data. Genome Res. 31, 88–100 (2021).
Xing, D., Tan, L., Chang, C. H., Li, H. & Xie, X. S. Accurate SNV detection in single cells by transposon-based whole-genome amplification of complementary strands. Proc. Natl Acad. Sci. USA 118, e2013106118 (2021).
Bae, J. H. et al. Single duplex DNA sequencing with CODEC detects mutations with high sensitivity. Nat. Genet. 55, 871–879 (2023).
Liu, M. H. et al. DNA mismatch and damage patterns revealed by single-molecule sequencing. Nature 630, 752–761 (2024).
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Zhou, W. et al. Identification and characterization of occult human-specific LINE-1 insertions using long-read sequencing technology. Nucleic Acids Res. 48, 1146–1163 (2020).
McDonald, T. L. et al. Cas9 targeted enrichment of mobile elements using nanopore sequencing. Nat. Commun. 12, 3586 (2021).
Zhao, T. et al. Spatial genomics enables multi-modal study of clonal heterogeneity in tissues. Nature 601, 85–91 (2022).
Russell, A. J. C. et al. Slide-tags enables single-nucleus barcoding for multimodal spatial genomics. Nature https://doi.org/10.1038/s41586-023-06837-4 (2023).
Nam, A. S. et al. Somatic mutations and cell identity linked by Genotyping of Transcriptomes. Nature 571, 355–360 (2019).
Izzo, F. et al. Mapping genotypes to chromatin accessibility profiles in single cells. Nature 629, 1149–1157 (2024).
Feusier, J. et al. Pedigree-based estimation of human mobile element retrotransposition rates. Genome Res. 29, 1567–1577 (2019).
Dou, Y. et al. Accurate detection of mosaic variants in sequencing data without matched controls. Nat. Biotechnol. 38, 314–319 (2020).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Ding, J. et al. Feature-based classifiers for somatic mutation detection in tumour–normal paired sequencing data. Bioinformatics 28, 167–175 (2011).
Gaiti, F. et al. Epigenetic evolution and lineage histories of chronic lymphocytic leukaemia. Nature 569, 576–580 (2019).
Grimes, K. et al. Cell-type-specific consequences of mosaic structural variants in hematopoietic stem and progenitor cells. Nat. Genet. 56, 1134–1146 (2024).
Dou, Y., Gold, H. D., Luquette, L. J. & Park, P. J. Detecting somatic mutations in normal cells. Trends Genet. 34, 545–557 (2018).
Yang, X. et al. Control-independent mosaic single nucleotide variant detection with DeepMosaic. Nat. Biotechnol. 41, 870–877 (2023).
Wilkinson, M. D. et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3, 160018 (2016).
Osorio, F. G. et al. Somatic mutations reveal lineage relationships and age-related mutagenesis in human hematopoiesis. Cell Rep. 25, 2308–2316.e4 (2018).
Luquette, L. J. et al. Single-cell genome sequencing of human neurons identifies somatic point mutation and indel enrichment in regulatory elements. Nat. Genet. 54, 1564–1571 (2022).
Coorens, T. H. H. et al. The somatic mutation landscape of normal gastric epithelium. Nature 640, 418–426 (2025).
Acknowledgements
This research is supported by the NIH Common Fund, through the Office of Strategic Coordination/Office of the NIH Director under awards U24 MH133204, U24 NS132103, UG3 NS132024, UG3 NS132061, UG3 NS132084, UG3 NS132105, UG3 NS132127, UG3 NS132128, UG3 NS132132, UG3 NS132134, UG3 NS132135, UG3 NS132136, UG3 NS132138, UG3 NS132139, UG3 NS132144, UG3 NS132146, UM1 DA058219, UM1 DA058220, UM1 DA058229, UM1 DA058230, UM1 DA058235 and UM1 DA058236. E.E.E. and C.A.W. are investigators of the Howard Hughes Medical Institute.
Author information
Authors and Affiliations
Consortia
Contributions
T.H.H.C., J.W.O., E.A.L. and F.M.V. wrote the manuscript, with input, review and feedback from all other authors. J.W.O., T.H.H.C., Y.A.C., N.S.L. and A.V. made the figures. A.C.L. was substantially involved in UM1 DA058219, UM1 DA058220, UM1 DA058229, UM1 DA058230, UM1 DA058235 and UM1 DA058236, consistent with the role as Program Officer; and is also the NIH Working Group Coordinator, and has involvement with the remaining awards, consistent with this role.
Corresponding authors
Ethics declarations
Competing interests
F.C. is an academic founder of Curio Biosciences and Doppler Biosciences, and scientific advisor for Amber Bio; F.C’s interests were reviewed and managed by the Broad Institute in accordance with their conflict-of-interest policies. G.G. receives research funds from IBM, Pharmacyclics/Abbvie, Bayer, Genentech, Calico, Ultima Genomics, Inocras, Google, Kite and Novartis; is an inventor on patent applications filed by the Broad Institute related to MSMuTect, MSMutSig, POLYSOLVER, SignatureAnalyzer-GPU, MSEye and MinimuMM-seq; is a founder, consultant and holds privately held equity in Scorpion Therapeutics and PreDICTA Biosciences; and was a consultant to Merck, all unrelated to the present work. E.E.E. is a scientific advisory board member of Variant Bio. C.Z. is a co-founder and equity holder of Pioneer Genomics and reports that Baylor College of Medicine filed a patent application related to the CompDuplex-seq or CompDup method. P.N. reports research grants from Allelica, Amgen, Apple, Boston Scientific, Cleerly, Genentech/Roche, Ionis, Novartis and Silence Therapeutics; personal fees from Allelica, Apple, AstraZeneca, Bain Capital, Blackstone Life Sciences, Bristol Myers Squibb, Creative Education Concepts, CRISPR Therapeutics, Eli Lilly & Co, Esperion Therapeutics, Foresite Capital, Foresite Labs, Genentech/Roche, GV, HeartFlow, Magnet Biomedicine, Merck, Novartis, Novo Nordisk, TenSixteen Bio and Tourmaline Bio; equity in Bolt, Candela, Mercury, MyOme, Parameter Health, Preciseli and TenSixteen Bio; and spousal employment at Vertex Pharmaceuticals, all unrelated to the present work. C.T. is the founder of C2T; a consultant for Bayer; a member of the scientific advisory board of PrognomiQ; and receives royalties from Exact Sciences. J.W.O. is the founder and CEO of Absolute DNA, with no direct relation to this study, and the interests are managed by University-Industry Foundation in Yonsei University Health System in accordance with their conflict-of-interest policies. E.A.L. is a member of the scientific advisory board for Inocras. All other authors declare no competing interests.
Peer review
Peer review information
Nature thanks Sarah Aitken, Young Seok Ju and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Disclaimer The views and opinions expressed in this article are those of the authors only and do not necessarily represent the views, official policy or position of the US Department of Health and Human Services or any of its affiliated institutions or agencies.
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Coorens, T.H.H., Oh, J.W., Choi, Y.A. et al. The Somatic Mosaicism across Human Tissues Network. Nature 643, 47–59 (2025). https://doi.org/10.1038/s41586-025-09096-7
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41586-025-09096-7