Abstract
Mitochondrial DNA (mtDNA) has an important yet often overlooked role in health and disease. Constraint models quantify the removal of deleterious variation from the population by selection and represent powerful tools for identifying genetic variation that underlies human phenotypes1,2,3,4. However, nuclear constraint models are not applicable to mtDNA, owing to its distinct features. Here we describe the development of a mitochondrial genome constraint model and its application to the Genome Aggregation Database (gnomAD), a large-scale population dataset that reports mtDNA variation across 56,434 human participants5. Specifically, we analyse constraint by comparing the observed variation in gnomAD to that expected under neutrality, which was calculated using a mtDNA mutational model and observed maximum heteroplasmy-level data. Our results highlight strong depletion of expected variation, which suggests that many deleterious mtDNA variants remain undetected. To aid their discovery, we compute constraint metrics for every mitochondrial protein, tRNA and rRNA gene, which revealed a range of intolerance to variation. We further characterize the most constrained regions within genes through regional constraint and identify the most constrained sites within the entire mitochondrial genome through local constraint, which showed enrichment of pathogenic variation. Constraint also clustered in three-dimensional structures, which provided insight into functionally important domains and their disease relevance. Notably, we identify constraint at often overlooked sites, including in rRNA and noncoding regions. Last, we demonstrate that these metrics can improve the discovery of deleterious variation that underlies rare and common phenotypes.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 51 print issues and online access
$199.00 per year
only $3.90 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout





Similar content being viewed by others
Data availability
Data analysed or generated during this study are included in this article and its supplementary files and available at GitHub (https://github.com/leklab/mitochondrial_constraint). Constraint metrics are provided in the Supplementary Datasets, and will also be available at gnomAD (http://gnomad.broadinstitute.org). Publicly available datasets used in this study are available from the following sources: ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/); DECIPHER (https://www.deciphergenomics.org/ddd/ddgenes; developmental disorder genes); gnomAD (http://gnomad.broadinstitute.org); HelixMTdb (https://www.helix.com/mitochondrial-variant-database); HmtVar (https://www.hmtvar.uniba.it/); IMPC (https://www.ebi.ac.uk/mi/impc/essential-genes-search/; essential genes); MitImpact (https://mitimpact.css-mendel.it/; APOGEE predictions); MITOMAP (https://www.mitomap.org/MITOMAP); NCBI Genome (https://www.ncbi.nlm.nih.gov/datasets/genome/); PhyloTree (https://www.phylotree.org/; haplogroup variants); PDB (https://www.rcsb.org/); UCSC (https://genome.ucsc.edu/; phyloP scores); and UniProt (https://www.uniprot.org/). A detailed description of these datasets and their application is also provided at GitHub (https://github.com/leklab/mitochondrial_constraint/tree/main/required_files). Source data are provided with this paper.
Code availability
The code used for analyses and figure generation are available at GitHub (https://github.com/leklab/mitochondrial_constraint).
References
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Havrilla, J. M., Pedersen, B. S., Layer, R. M. & Quinlan, A. R. A map of constrained coding regions in the human genome. Nat. Genet. 51, 88–95 (2019).
Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at BioRxiv https://doi.org/10.1101/148353 (2017).
Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).
Laricchia, K. M. et al. Mitochondrial DNA variation across 56,434 individuals in gnomAD. Genome Res. 32, 569–582 (2022).
McBride, H. M., Neuspiel, M. & Wasiak, S. Mitochondria: more than just a powerhouse. Curr. Biol. 16, R551–R560 (2006).
Stewart, J. B. & Chinnery, P. F. Extreme heterogeneity of human mitochondrial DNA from organelles to populations. Nat. Rev. Genet. 22, 106–118 (2021).
Chen, Y., Zhou, Z. & Min, W. Mitochondria, oxidative stress and innate immunity. Front. Physiol. 9, 1487 (2018).
Gray, M. W. Mitochondrial evolution. Cold Spring Harb. Perspect. Biol. 4, a011403 (2012).
Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).
Gorman, G. S. et al. Mitochondrial diseases. Nat. Rev. Dis. Primers 2, 16080 (2016).
McCormick, E. M. et al. Specifications of the ACMG/AMP standards and guidelines for mitochondrial DNA variant interpretation. Hum. Mutat. 41, 2028–2057 (2020).
Wang, Y. et al. Association of mitochondrial DNA content, heteroplasmies and inter-generational transmission with autism. Nat. Commun. 13, 3790 (2022).
Gorelick, A. N. et al. Respiratory complex and tissue lineage drive recurrent mutations in tumour mtDNA. Nat. Metab. 3, 558–570 (2021).
Gopal, R. K. et al. Early loss of mitochondrial complex I and rewiring of glutathione metabolism in renal oncocytoma. Proc. Natl Acad. Sci. USA 115, E6283–E6290 (2018).
Kim, M., Mahmood, M., Reznik, E. & Gammage, P. A. Mitochondrial DNA is a major source of driver mutations in cancer. Trends Cancer 8, 1046–1059 (2022).
Keogh, M. J. & Chinnery, P. F. Mitochondrial DNA mutations in neurodegeneration. Biochim. Biophys. Acta 1847, 1401–1411 (2015).
Yonova-Doing, E. et al. An atlas of mitochondrial DNA genotype–phenotype associations in the UK Biobank. Nat. Genet. 53, 982–993 (2021).
Kraja, A. T. et al. Associations of mitochondrial and nuclear mitochondrial variants and genes with seven metabolic traits. Am. J. Hum. Genet. 104, 112–138 (2019).
Yamamoto, K. et al. Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population. Commun. Biol. 3, 104 (2020).
Stewart, J. B. et al. Strong purifying selection in transmission of mammalian mitochondrial DNA. PLoS Biol. 6, e10 (2008).
Voets, A. M. et al. Large scale mtDNA sequencing reveals sequence and functional conservation as major determinants of homoplasmic mtDNA variant distribution. Mitochondrion 11, 964–972 (2011).
Elson, J. L., Turnbull, D. M. & Howell, N. Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection. Am. J. Hum. Genet. 74, 229–238 (2004).
Kivisild, T. et al. The role of selection in the evolution of human mitochondrial genomes. Genetics 172, 373–387 (2006).
Wei, W. et al. Germline selection shapes human mitochondrial DNA diversity. Science 364, eaau6520 (2019).
Ju, Y. S. et al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife 3, e02935 (2014).
Dietlein, F. et al. Identification of cancer driver genes based on nucleotide context. Nat. Genet. 52, 208–218 (2020).
Bolze, A. et al. A catalog of homoplasmic and heteroplasmic mitochondrial DNA variants in humans. Preprint at BioRxiv https://doi.org/10.1101/798264 (2020).
Lott, M. T. et al. mtDNA variation and analysis using MITOMAP and MITOMASTER. Curr. Protoc. Bioinformatics 1, 1.23.21–21.23.26 (2013).
Lake, N. J., Compton, A. G., Rahman, S. & Thorburn, D. R. Leigh syndrome: one disorder, more than 75 monogenic causes. Ann. Neurol. 79, 190–203 (2016).
McFarland, R., Elson, J. L., Taylor, R. W., Howell, N. & Turnbull, D. M. Assigning pathogenicity to mitochondrial tRNA mutations: when “definitely maybe” is not good enough. Trends Genet. 20, 591–596 (2004).
Rebelo-Guiomar, P., Powell, C. A., Van Haute, L. & Minczuk, M. The mammalian mitochondrial epitranscriptome. Biochim. Biophys. Acta Gene Regul. Mech. 1862, 429–446 (2019).
Helm, M. et al. Search for characteristic structural features of mammalian mitochondrial tRNAs. RNA 6, 1356–1379 (2000).
Wong, L.-J. C. et al. Interpretation of mitochondrial tRNA variants. Genet. Med. 22, 917–926 (2020).
Amunts, A., Brown, A., Toots, J., Scheres, S. H. & Ramakrishnan, V. Ribosome. The structure of the human mitochondrial ribosome. Science 348, 95–98 (2015).
Zhao, H. et al. Maternally inherited aminoglycoside-induced and nonsyndromic deafness is associated with the novel C1494T mutation in the mitochondrial 12S rRNA gene in a large Chinese family. Am. J. Hum. Genet. 74, 139–152 (2004).
Nicholls, T. J. & Minczuk, M. In D-loop: 40 years of mitochondrial 7S DNA. Exp. Gerontol. 56, 175–181 (2014).
Horn, D. & Barrientos, A. Mitochondrial copper metabolism and delivery to cytochrome c oxidase. IUBMB Life 60, 421–429 (2008).
Kampjut, D. & Sazanov, L. A. The coupling mechanism of mammalian respiratory complex I. Science 370, abc4209 (2020).
Koripella, R. K., Sharma, M. R., Risteff, P., Keshavan, P. & Agrawal, R. K. Structural insights into unique features of the human mitochondrial ribosome recycling. Proc. Natl Acad. Sci. USA 116, 8283–8288 (2019).
Hong, Y. S. et al. Deleterious heteroplasmic mitochondrial mutations are associated with an increased risk of overall and cancer-specific mortality. Nat. Commun. 14, 6113 (2023).
Mok, B. Y. et al. CRISPR-free base editors with enhanced activity and expanded targeting scope in mitochondrial and nuclear DNA. Nat. Biotechnol. 40, 1378–1387 (2022).
Rajasimha, H. K., Chinnery, P. F. & Samuels, D. C. Selection against pathogenic mtDNA mutations in a stem cell population leads to the loss of the 3243A→G mutation in blood. Am. J. Hum. Genet. 82, 333–343 (2008).
Floros, V. I. et al. Segregation of mitochondrial DNA heteroplasmy through a developmental genetic bottleneck in human embryos. Nat. Cell Biol. 20, 144–151 (2018).
Zaidi, A. A. et al. Bottleneck and selection in the germline and maternal age influence transmission of mitochondrial DNA in human pedigrees. Proc. Natl Acad. Sci. USA 116, 25172–25178 (2019).
Schaefer, P. M. et al. Combination of common mtDNA variants results in mitochondrial dysfunction and a connective tissue dysregulation. Proc. Natl Acad. Sci. USA 119, e2212417119 (2022).
Kennedy, S. R., Salk, J. J., Schmitt, M. W. & Loeb, L. A. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLoS Genet. 9, e1003794 (2013).
Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 176, 1325–1339.e22 (2019).
Rebolledo-Jaramillo, B. et al. Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA. Proc. Natl Acad. Sci. USA 111, 15474–15479 (2014).
Li, M. et al. Transmission of human mtDNA heteroplasmy in the Genome of the Netherlands families: support for a variable-size bottleneck. Genome Res. 26, 417–426 (2016).
Yuan, Y. et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers. Nat. Genet. 52, 342–352 (2020).
SPARK Consortium. SPARK: A US cohort of 50,000 families to accelerate autism research. Neuron 97, 488–493 (2018).
Colnaghi, M., Pomiankowski, A. & Lane, N. The need for high-quality oocyte mitochondria at extreme ploidy dictates mammalian germline development. eLife 10, e69344 (2021).
Van Oven, M. PhyloTree Build 17: Growing the human mitochondrial DNA tree. Forensic Sci. Int. Genet. Suppl. Ser. 5, e392–e394 (2015).
Lake, N. J., Zhou, L., Xu, J. & Lek, M. MitoVisualize: a resource for analysis of variants in human mitochondrial RNAs and DNA. Bioinformatics 38, 2967–2969 (2022).
Bodenhofer, U., Bonatesta, E., Horejs-Kainrath, C. & Hochreiter, S. msa: an R package for multiple sequence alignment. Bioinformatics 31, 3997–3999 (2015).
UniProt, C. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).
Sonney, S. et al. Predicting the pathogenicity of novel variants in mitochondrial tRNA with MitoTIP. PLoS Comput. Biol. 13, e1005867 (2017).
Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).
Akesson, L. S. et al. Early diagnosis of Pearson syndrome in neonatal intensive care following rapid mitochondrial genome sequencing in tandem with exome sequencing. Eur. J. Hum. Genet. 27, 1821–1826 (2019).
Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).
Hamelryck, T. & Manderick, B. PDB file parser and structure class implemented in Python. Bioinformatics 19, 2308–2310 (2003).
Guo, R., Zong, S., Wu, M., Gu, J. & Yang, M. Architecture of human mitochondrial respiratory megacomplex I2III2IV2. Cell 170, 1247–1257.e12 (2017).
Zong, S. et al. Structure of the intact 14-subunit human cytochrome c oxidase. Cell Res. 28, 1026–1034 (2018).
Aibara, S., Singh, V., Modelska, A. & Amunts, A. Structural basis of mitochondrial translation. eLife 9, e58362 (2020).
Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).
Soltanikazemi, E., Quadir, F., Roy, R. S., Guo, Z. & Cheng, J. Distance-based reconstruction of protein quaternary structures from inter-chain contacts. Proteins 90, 720–731 (2022).
Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).
Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).
Battle, S. L. et al. A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data. NAR Genom. Bioinform. 4, lqac034 (2022).
Cacheiro, P. et al. Human and mouse essentiality screens as a resource for disease gene discovery. Nat. Commun. 11, 655 (2020).
Firth, H. V. et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am. J. Hum. Genet. 84, 524–533 (2009).
Lake, N., Ma, K. Cohen, J., Lek, M. Mitochondrial DNA base editing in HEK293T cells. protocols.io https://doi.org/10.17504/protocols.io.yxmvm3rnol3p/v1 (2024).
Kluesner, M. G. et al. EditR: a method to quantify base editing from Sanger sequencing. CRISPR J. 1, 239–250 (2018).
Mok, B. Y. et al. A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 583, 631–637 (2020).
Acknowledgements
N.J.L. received a National Health and Medical Research Council (NHMRC) Early Career Fellowship (APP1159456) and an Australian American Association Scholarship. This research was conducted using the UK Biobank Resource under application number 17731, and supported by National Heart, Lung and Blood Institute, US National Institutes of Health (NIH) grant R01HL144569. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. D.R.T. was supported by a NHMRC Principal Research Fellowship (GNT1155244). The Chair in Genomic Medicine awarded to J.C. is supported by The Royal Children’s Hospital Foundation. Research conducted at the Murdoch Children’s Research Institute was supported by the Victorian Government’s Operational Infrastructure Support Program. We are grateful to and thank all the families in SPARK and staff at the SPARK clinical sites; staff from SFARI Base for allowing us access to the data; S. Calvo for providing advice on the constraint model and its application to gnomAD; the broader team at the Victorian Clinical Genetics Service, including B. Chong and S. Lunke, for their contributions to the curation of disease-associated variation; staff at The Islet, Oxygen consumption, Mass Isotopomer flux Core (IOMIC) at Yale, who assisted with oxygen consumption measurement; and staff at the Yale Center for Genome Analysis for providing PacBio sequencing services (this centre is funded in part by the National Institutes of Health instrument grant 1S10OD028669-01).
Author information
Authors and Affiliations
Contributions
N.J.L. and M.L. contributed to the study conception and design. N.J.L., S.L.B., K.M.L., G.T., D.P., A.G.C., S.C., J. Christodoulou, D.R.T., D.E.A. and M.L. contributed to data acquisition. N.J.L., W.L., H.Z., S.R.S. and M.L. contributed to methods development of the model. N.J.L., W.L., S.L.B., D.E.A. and M.L. contributed to data analyses. N.J.L., W.L., S.L.B., K.M.L., G.T., A.G.C., S.C., J. Christodoulou, D.R.T., H.Z., D.E.A., S.R.S. and M.L. contributed to interpretation of the data. N.J.L., K.M., K.K.N., J. Cohen and M.L. contributed to the in vitro functional experiments and data analyses. M.L. supervised and managed the study. N.J.L. and M.L. drafted the manuscript. All authors contributed to manuscript review and editing.
Corresponding authors
Ethics declarations
Competing interests
The authors declare no competing interests.
Peer review
Peer review information
Nature thanks David Rand, Benjamin Voight and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data figures and tables
Extended Data Fig. 1 Schematic overview of mitochondrial genome constraint.
(a) We established a constraint model for the mtDNA to quantify the removal of deleterious variants from the population by negative selection. We assessed constraint by identifying genes and regions where the observed variation is less than expected, under neutrality. Observed is calculated using maximum heteroplasmy in gnomAD, and specifically by summing the maximum heteroplasmy value of every variant in a gene or region. Expected is calculated using a mutational model, and specifically by summing the mutational likelihoods of every variant in a gene or region and applying linear models fit on neutral variation (ascertained using Phylotree and PhyloP). The ratio of observed:expected variation and its 90% confidence interval is calculated, and the OEUF is used as a conservative measure of constraint. (b) A suite of constraint metrics are available via Supplementary Datasets, including constraint metrics for each gene and non-coding element, as well as regional missense constraint for each protein gene, regional constraint for each rRNA gene, position constraint for tRNA genes, and local constraint for every position in the mtDNA (MLC scores). (c) Constraint metrics can identify deleterious variants, and constrained sites are enriched in pathogenic variants from ClinVar and MITOMAP. Example applications include using regional constraint for variant classification and prioritization in individuals with rare disease and using the MLC score to assess associations between heteroplasmy burden and common phenotypes. (a) and (c) were created with BioRender.com.
Extended Data Fig. 2 Mutability, disease-associated variation, and constraint across classes of human mtDNA variation.
(a) Trinucleotide mutational signature of mtDNA mutations within the OriB-OriH region (m.16197-191) predicted by the composite likelihood model. Mutation likelihoods for the six pyrimidine base substitution types across 96 trinucleotides are shown, colored by whether the reference nucleotide is in the reference ‘light’ or reverse complement ‘heavy’ strand. (b) Proportion of total disease-associated variants in ClinVar (n = 2607) and MITOMAP (n = 882) by consequence. (c) The observed:expected ratio of ClinVar Uncertain Significance and MITOMAP Reported mtDNA variants in gnomAD, and subset by whether they met pathogenic or benign criteria for computational algorithms and MITOMAP population frequency in ACMG/AMP guidelines for mtDNA variant interpretation. 1000 ClinVar variants (with pathogenic evidence, n = 147; note none satisfied both benign criteria) and 791 MITOMAP variants (with pathogenic evidence, n = 175; with benign evidence, n = 28) are included. (d) The observed:expected ratio of in silico predictions in gnomAD for missense variants by APOGEE (pathogenic, n = 7276 and neutral, n = 16,800), and for tRNA variants by MitoTIP (likely pathogenic, n = 981; possibly pathogenic, n = 1171, possibly benign, n = 1162 and likely benign, n = 1198) and HmtVar (pathogenic, n = 202; likely pathogenic, n = 6, likely polymorphic, n = 4139 and polymorphic, n = 24); all of which are recommended per ACMG/AMP mtDNA guidelines for variant interpretation. Note the outlier HmtVar ‘likely pathogenic’ group only includes six variants. (e) Assessment of functional classes of mtDNA variation in a replication dataset, HelixMTdb. The n per class is per Fig. 1d. The diamonds in (c-e) represent the observed:expected ratio, and the error bars in (c-e) represent 90% confidence intervals.
Extended Data Fig. 3 Assessment of conservation and codon usage across genes.
Median phyloP base conservation scores for each protein (a) or RNA (b) gene, derived from 100 vertebrate genomes. Higher phyloP values represent increased conservation. (c) The count of codons within the protein-coding sequence corresponding to each tRNA. Pearson correlation coefficient (R) and its p-value (p) computed using a two-sided test is shown in (a-c). OEUF stands for observed:expected ratio confidence interval upper bound fraction, and OEUF values for (a-c) are provided in Supplementary Dataset 1.
Extended Data Fig. 4 Areas of regional constraint within each protein and rRNA gene.
(a) Intervals of regional missense constraint identified in each protein are colored in red. For display purposes each protein is shown at the same length (i.e. are not scaled by their actual protein length), and amino acid residue numbering is shown. (b) Intervals of regional constraint identified in each rRNA are colored in red. The rRNA sequences are not scaled by their length, and mtDNA position coordinates are shown. Coordinates for (a-b) are provided in Supplementary Dataset 2.
Extended Data Fig. 5 Characteristics of regional constraint.
(a-b) The proportion of bases encoding proteins (n = 11,341) or residues of functional significance (n = 141) (a) or benign (n = 625) and pathogenic (n = 79) missense variants (most severe consequence) (b) that are within, proximal to (<6 Ångstrom distance from), or outside regional missense constraint. (c) The proportion of bases encoding rRNA (n = 2512), or modified bases and bases in rRNA:rRNA intersubunit bridges (n = 63) that are within, proximal to (<6 Ångstrom distance from), or outside regional constraint. (d) The four areas of regional missense constraint in MT-CYB are shown in red, visualized in its dimeric 3D structure. Heme molecules involved in electron transfer are colored green. (e) The two areas of regional missense constraint in MT-ND6 are shown in red in the 3D structure. Residues colored in yellow are involved in the transition from the open to closed complex state in the π-bulge (p.61-63 and p.67) per Kampjut and Sazanov39. (f) An area of regional constraint within the MT-RNR1 tertiary structure, indicated in red. Modified bases are colored blue, and disease-associated bases (m.1494 and m.1555) purple. The mRNA molecule is colored green. (g) Areas of regional constraint within MT-RNR1 secondary structure, indicated by red font. The box highlights an area including regional constraint, modified bases (blue font) and disease-associated variants (at m.1494 and m.1555, bold purple font); also shown in tertiary structure in Extended Data Fig. 5f. (h) The areas of regional constraint within the MT-RNR2 secondary structure, indicated by red font; modified bases are in blue font.
Extended Data Fig. 6 Characteristics of RNA variants and bases.
(a) The proportion of pathogenic (n = 121) and benign (n = 232) tRNA variants for each base type. (b) The observed:expected ratio for variants in modified and non-modified bases in tRNA (modified, n = 411 and non-modified, n = 4101) and rRNA (modified, n = 30 and non-modified, n = 7506). The diamonds represent the observed:expected ratio, and error bars represent the 90% confidence interval. (c) The generic tRNA secondary structure, with positions colored by domain. (d) The proportion of pathogenic (n = 121) and benign (n = 232) tRNA variants for each domain, following the color legend in (c).
Extended Data Fig. 7 Measuring constraint across non-coding elements.
Top schematic shows annotated elements within the non-coding control region, which spans the artificial chromosome break (m.16569-1). The top row includes the three hypervariable sequences (HV1, HV2, HV3), the second row includes termination-associated sequences (TAS, TAS2), conserved sequence blocks (CSB1, CSB2, CSB3) and L-strand and H-strand promoters (LSP, HSP1), and the third row includes a control element (MT-5) and transcription factor binding sites (TFX, TFY, TFL, TFH). The observed:expected ratio 90% confidence interval upper fraction (OEUF) within each element is shown per the color gradient legend; darker colors represent lower OEUF. Values are provided in Supplementary Dataset 5. The bottom schematic shows the position of the control region and origin for replication of the light strand (OriL) within the mtDNA, with encoded loci colored by their type (non-coding in yellow, protein blue, rRNA purple and tRNA orange).
Extended Data Fig. 8 mtDNA local constraint (MLC) scores and population allele frequencies across the non-coding control region.
(a) The MLC score of positions across the control region, calculated using gnomAD maximum heteroplasmy data, are shown; a schematic of annotated non-coding elements is displayed above. The five peaks from left to right overlap (1) a recently discovered second light strand promoter, (2-3) regions of unknown function within the D-loop, (4) conserved sequence block 3, or (5) the light strand promoter. Base scores are provided in Supplementary Dataset 6. (b-c) The homoplasmic allele frequency (AF) of variants across the control region in gnomAD (b) or HelixMTdb (c). (d) The population allele frequency of variants across the control region in the MITOMAP database (which does not include heteroplasmy data). (b-d) are displayed with a square root transformed y-axis; note only SNVs are included.
Extended Data Fig. 9 Relationship between the mtDNA local constraint (MLC) score and genomic annotations.
(a) The proportion of benign (n = 884) and pathogenic (n = 205) variants in each score quartile. (b) Density plot showing the score distribution of disease-associated variants; numbers per (a). (c) Density plot showing the score distribution of 184 pathogenic variants with disease plasmy status in MITOMAP, colored by association with disease at heteroplasmy only, or at homoplasmy. (d) Density plot showing the score distribution of 88 ‘confirmed’ pathogenic variants from MITOMAP, colored by whether reported in individuals at heteroplasmy only or at homoplasmy, per a manual literature review. Plots (a-d) include missense and RNA variants only, and for (c-d) ‘at homplasmy’ includes observed at both homoplasmy and heteroplasmy. (e) Boxplot showing the score distribution for base positions where indels are observed in gnomAD (n = 416), HelixMTdb (n = 697), and MITOMAP (n = 667) databases. (f) The distribution of PhyloP base conservation scores for bases within each score quartile (0.0-0.25, n = 4142; 0.25-0.50, n = 4142; 0.50-0.75, n = 4141; 0.75-1.0, n = 4143); a dashed line is shown at score = 0. (g) The MLC score across every base position in the human mtDNA; bases that are conserved in chimpanzees are denoted by black pipe symbols, while those non-conserved and encoding base or amino acid substitutions are shown as white pipe symbols. (h-j) The MLC variant score distribution for SNVs across population frequency categories in gnomAD (homoplasmy AF ≥ 0.002%, n = 7363; homoplasmy AF < 0.002%, n = 1846 and heteroplasmy only, n = 1641) (h), HelixMTdb (homoplasmy AF ≥ 0.002%, n = 8049; homoplasmy AF < 0.002%, n = 3442 and heteroplasmy only, n = 2613) (i) and MITOMAP (AF ≥ 0.002%, n = 8617 and AF < 0.002%, n = 10,343) (j) databases. Note that allele frequency <0.002% is recommended as evidence of pathogenicity in ACMG/AMP mtDNA guidelines12, and that heteroplasmy data is not available for MITOMAP. For (e-f, h-j), boxplot elements include: center line, median; box limits, 25th and 75th percentiles; minima and maxima, 1.5x interquartile range; points, outliers.
Extended Data Fig. 10 MLC scores versus population frequency or phyloP for heteroplasmies in the UK Biobank.
The MLC score of single nucleotide variant heteroplasmies in the UK Biobank (UKB) is plotted against their population allele frequency (a) or PhyloP scores (b). Plot in (a) is displayed with a square root transformed y-axis. Note a phyloP score of >3 represents significantly conserved sites with p-value < 0.05. Variant classes in each plot are described in the titles in (a), and the histograms on the right margin show the distribution of variants across the y-axis. R2 coefficient of determination, and a blue line of linear model fit, is shown.
Supplementary information
Supplementary Information
This file contains Supplementary Methods, Supplementary Figs. 1–10 and Supplementary Tables 1–7, Supplementary Discussion, Supplementary References and detailed descriptions of Supplementary Datasets and Supplementary Video.
Supplementary Datasets
This zipped file contains Supplementary Datasets 1–9; see Supplementary Information document for Supplementary Dataset guide.
Supplementary Video 1
Local constraint across the 16S rRNA encoded by MT-RNR2. The video shows the mitoribosome, a complex of proteins (blue) and the small 12S (pink) and large 16S (purple) rRNAs, which serves as the site of mitochondrial translation. The mRNA (bright green) and tRNAs occupying the A/P site (yellow) and P/E site (orange) are also shown. The mtDNA local constraint scores are then displayed across the 16S rRNA encoded by MT-RNR2 using a red-white-blue gradient. Dark red indicates highly constrained sites with scores close to 1, white scores around 0.5, and dark blue scores approaching 0.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Lake, N.J., Ma, K., Liu, W. et al. Quantifying constraint in the human mitochondrial genome. Nature 635, 390–397 (2024). https://doi.org/10.1038/s41586-024-08048-x
Received:
Accepted:
Published:
Issue date:
DOI: https://doi.org/10.1038/s41586-024-08048-x
This article is cited by
-
In silico analysis of mitochondrial DNA genes: implication for conservation of Tor putitora (Hamilton, 1822)
Scientific Reports (2025)
-
Reduced representation sequencing reveals genetic diversity and adaptive genetic divergence in Calamus rhabdocladus
Scientific Reports (2025)
-
Phylotranscriptomics resolved phylogenetic relationships and divergence time between 20 golden camellia species
Scientific Reports (2025)
-
Genetic characterization and phylogenetic analysis of common house crows (Corvus splendens)
Scientific Reports (2025)
-
Genome-wide analysis provides insight into the genetic diversity and adaptability of Kazakhstan local goats
Scientific Reports (2025)