Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Quantifying constraint in the human mitochondrial genome

Abstract

Mitochondrial DNA (mtDNA) has an important yet often overlooked role in health and disease. Constraint models quantify the removal of deleterious variation from the population by selection and represent powerful tools for identifying genetic variation that underlies human phenotypes1,2,3,4. However, nuclear constraint models are not applicable to mtDNA, owing to its distinct features. Here we describe the development of a mitochondrial genome constraint model and its application to the Genome Aggregation Database (gnomAD), a large-scale population dataset that reports mtDNA variation across 56,434 human participants5. Specifically, we analyse constraint by comparing the observed variation in gnomAD to that expected under neutrality, which was calculated using a mtDNA mutational model and observed maximum heteroplasmy-level data. Our results highlight strong depletion of expected variation, which suggests that many deleterious mtDNA variants remain undetected. To aid their discovery, we compute constraint metrics for every mitochondrial protein, tRNA and rRNA gene, which revealed a range of intolerance to variation. We further characterize the most constrained regions within genes through regional constraint and identify the most constrained sites within the entire mitochondrial genome through local constraint, which showed enrichment of pathogenic variation. Constraint also clustered in three-dimensional structures, which provided insight into functionally important domains and their disease relevance. Notably, we identify constraint at often overlooked sites, including in rRNA and noncoding regions. Last, we demonstrate that these metrics can improve the discovery of deleterious variation that underlies rare and common phenotypes.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Mutability and constraint in human mtDNA.
Fig. 2: Assessment of missense constraint identifies gene and regional constraint.
Fig. 3: Constraint across and within RNA genes.
Fig. 4: Assessment of MLC scores.
Fig. 5: Functional assessment of mutations in highly constrained sites using base editing.

Similar content being viewed by others

Data availability

Data analysed or generated during this study are included in this article and its supplementary files and available at GitHub (https://github.com/leklab/mitochondrial_constraint). Constraint metrics are provided in the Supplementary Datasets, and will also be available at gnomAD (http://gnomad.broadinstitute.org). Publicly available datasets used in this study are available from the following sources: ClinVar (https://www.ncbi.nlm.nih.gov/clinvar/); DECIPHER (https://www.deciphergenomics.org/ddd/ddgenes; developmental disorder genes); gnomAD (http://gnomad.broadinstitute.org); HelixMTdb (https://www.helix.com/mitochondrial-variant-database); HmtVar (https://www.hmtvar.uniba.it/); IMPC (https://www.ebi.ac.uk/mi/impc/essential-genes-search/; essential genes); MitImpact (https://mitimpact.css-mendel.it/; APOGEE predictions); MITOMAP (https://www.mitomap.org/MITOMAP); NCBI Genome (https://www.ncbi.nlm.nih.gov/datasets/genome/); PhyloTree (https://www.phylotree.org/; haplogroup variants); PDB (https://www.rcsb.org/); UCSC (https://genome.ucsc.edu/; phyloP scores); and UniProt (https://www.uniprot.org/). A detailed description of these datasets and their application is also provided at GitHub (https://github.com/leklab/mitochondrial_constraint/tree/main/required_files). Source data are provided with this paper.

Code availability

The code used for analyses and figure generation are available at GitHub (https://github.com/leklab/mitochondrial_constraint).

References

  1. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  2. Havrilla, J. M., Pedersen, B. S., Layer, R. M. & Quinlan, A. R. A map of constrained coding regions in the human genome. Nat. Genet. 51, 88–95 (2019).

    CAS  PubMed  Google Scholar 

  3. Samocha, K. E. et al. Regional missense constraint improves variant deleteriousness prediction. Preprint at BioRxiv https://doi.org/10.1101/148353 (2017).

  4. Petrovski, S., Wang, Q., Heinzen, E. L., Allen, A. S. & Goldstein, D. B. Genic intolerance to functional variation and the interpretation of personal genomes. PLoS Genet. 9, e1003709 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Laricchia, K. M. et al. Mitochondrial DNA variation across 56,434 individuals in gnomAD. Genome Res. 32, 569–582 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  6. McBride, H. M., Neuspiel, M. & Wasiak, S. Mitochondria: more than just a powerhouse. Curr. Biol. 16, R551–R560 (2006).

    CAS  PubMed  Google Scholar 

  7. Stewart, J. B. & Chinnery, P. F. Extreme heterogeneity of human mitochondrial DNA from organelles to populations. Nat. Rev. Genet. 22, 106–118 (2021).

    CAS  PubMed  Google Scholar 

  8. Chen, Y., Zhou, Z. & Min, W. Mitochondria, oxidative stress and innate immunity. Front. Physiol. 9, 1487 (2018).

    PubMed  PubMed Central  Google Scholar 

  9. Gray, M. W. Mitochondrial evolution. Cold Spring Harb. Perspect. Biol. 4, a011403 (2012).

    PubMed  PubMed Central  Google Scholar 

  10. Anderson, S. et al. Sequence and organization of the human mitochondrial genome. Nature 290, 457–465 (1981).

    CAS  PubMed  ADS  Google Scholar 

  11. Gorman, G. S. et al. Mitochondrial diseases. Nat. Rev. Dis. Primers 2, 16080 (2016).

    PubMed  Google Scholar 

  12. McCormick, E. M. et al. Specifications of the ACMG/AMP standards and guidelines for mitochondrial DNA variant interpretation. Hum. Mutat. 41, 2028–2057 (2020).

    PubMed  PubMed Central  Google Scholar 

  13. Wang, Y. et al. Association of mitochondrial DNA content, heteroplasmies and inter-generational transmission with autism. Nat. Commun. 13, 3790 (2022).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  14. Gorelick, A. N. et al. Respiratory complex and tissue lineage drive recurrent mutations in tumour mtDNA. Nat. Metab. 3, 558–570 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Gopal, R. K. et al. Early loss of mitochondrial complex I and rewiring of glutathione metabolism in renal oncocytoma. Proc. Natl Acad. Sci. USA 115, E6283–E6290 (2018).

    PubMed  PubMed Central  Google Scholar 

  16. Kim, M., Mahmood, M., Reznik, E. & Gammage, P. A. Mitochondrial DNA is a major source of driver mutations in cancer. Trends Cancer 8, 1046–1059 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  17. Keogh, M. J. & Chinnery, P. F. Mitochondrial DNA mutations in neurodegeneration. Biochim. Biophys. Acta 1847, 1401–1411 (2015).

    CAS  PubMed  Google Scholar 

  18. Yonova-Doing, E. et al. An atlas of mitochondrial DNA genotype–phenotype associations in the UK Biobank. Nat. Genet. 53, 982–993 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  19. Kraja, A. T. et al. Associations of mitochondrial and nuclear mitochondrial variants and genes with seven metabolic traits. Am. J. Hum. Genet. 104, 112–138 (2019).

    CAS  PubMed  Google Scholar 

  20. Yamamoto, K. et al. Genetic and phenotypic landscape of the mitochondrial genome in the Japanese population. Commun. Biol. 3, 104 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  21. Stewart, J. B. et al. Strong purifying selection in transmission of mammalian mitochondrial DNA. PLoS Biol. 6, e10 (2008).

    PubMed  PubMed Central  Google Scholar 

  22. Voets, A. M. et al. Large scale mtDNA sequencing reveals sequence and functional conservation as major determinants of homoplasmic mtDNA variant distribution. Mitochondrion 11, 964–972 (2011).

    MathSciNet  CAS  PubMed  PubMed Central  Google Scholar 

  23. Elson, J. L., Turnbull, D. M. & Howell, N. Comparative genomics and the evolution of human mitochondrial DNA: assessing the effects of selection. Am. J. Hum. Genet. 74, 229–238 (2004).

    CAS  PubMed  PubMed Central  Google Scholar 

  24. Kivisild, T. et al. The role of selection in the evolution of human mitochondrial genomes. Genetics 172, 373–387 (2006).

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Wei, W. et al. Germline selection shapes human mitochondrial DNA diversity. Science 364, eaau6520 (2019).

    CAS  PubMed  Google Scholar 

  26. Ju, Y. S. et al. Origins and functional consequences of somatic mitochondrial DNA mutations in human cancer. eLife 3, e02935 (2014).

    PubMed  PubMed Central  Google Scholar 

  27. Dietlein, F. et al. Identification of cancer driver genes based on nucleotide context. Nat. Genet. 52, 208–218 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Bolze, A. et al. A catalog of homoplasmic and heteroplasmic mitochondrial DNA variants in humans. Preprint at BioRxiv https://doi.org/10.1101/798264 (2020).

  29. Lott, M. T. et al. mtDNA variation and analysis using MITOMAP and MITOMASTER. Curr. Protoc. Bioinformatics 1, 1.23.21–21.23.26 (2013).

    Google Scholar 

  30. Lake, N. J., Compton, A. G., Rahman, S. & Thorburn, D. R. Leigh syndrome: one disorder, more than 75 monogenic causes. Ann. Neurol. 79, 190–203 (2016).

    PubMed  Google Scholar 

  31. McFarland, R., Elson, J. L., Taylor, R. W., Howell, N. & Turnbull, D. M. Assigning pathogenicity to mitochondrial tRNA mutations: when “definitely maybe” is not good enough. Trends Genet. 20, 591–596 (2004).

    CAS  PubMed  Google Scholar 

  32. Rebelo-Guiomar, P., Powell, C. A., Van Haute, L. & Minczuk, M. The mammalian mitochondrial epitranscriptome. Biochim. Biophys. Acta Gene Regul. Mech. 1862, 429–446 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  33. Helm, M. et al. Search for characteristic structural features of mammalian mitochondrial tRNAs. RNA 6, 1356–1379 (2000).

    CAS  PubMed  PubMed Central  Google Scholar 

  34. Wong, L.-J. C. et al. Interpretation of mitochondrial tRNA variants. Genet. Med. 22, 917–926 (2020).

    CAS  PubMed  Google Scholar 

  35. Amunts, A., Brown, A., Toots, J., Scheres, S. H. & Ramakrishnan, V. Ribosome. The structure of the human mitochondrial ribosome. Science 348, 95–98 (2015).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  36. Zhao, H. et al. Maternally inherited aminoglycoside-induced and nonsyndromic deafness is associated with the novel C1494T mutation in the mitochondrial 12S rRNA gene in a large Chinese family. Am. J. Hum. Genet. 74, 139–152 (2004).

    CAS  PubMed  Google Scholar 

  37. Nicholls, T. J. & Minczuk, M. In D-loop: 40 years of mitochondrial 7S DNA. Exp. Gerontol. 56, 175–181 (2014).

    CAS  PubMed  Google Scholar 

  38. Horn, D. & Barrientos, A. Mitochondrial copper metabolism and delivery to cytochrome c oxidase. IUBMB Life 60, 421–429 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  39. Kampjut, D. & Sazanov, L. A. The coupling mechanism of mammalian respiratory complex I. Science 370, abc4209 (2020).

    Google Scholar 

  40. Koripella, R. K., Sharma, M. R., Risteff, P., Keshavan, P. & Agrawal, R. K. Structural insights into unique features of the human mitochondrial ribosome recycling. Proc. Natl Acad. Sci. USA 116, 8283–8288 (2019).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  41. Hong, Y. S. et al. Deleterious heteroplasmic mitochondrial mutations are associated with an increased risk of overall and cancer-specific mortality. Nat. Commun. 14, 6113 (2023).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  42. Mok, B. Y. et al. CRISPR-free base editors with enhanced activity and expanded targeting scope in mitochondrial and nuclear DNA. Nat. Biotechnol. 40, 1378–1387 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  43. Rajasimha, H. K., Chinnery, P. F. & Samuels, D. C. Selection against pathogenic mtDNA mutations in a stem cell population leads to the loss of the 3243A→G mutation in blood. Am. J. Hum. Genet. 82, 333–343 (2008).

    CAS  PubMed  PubMed Central  Google Scholar 

  44. Floros, V. I. et al. Segregation of mitochondrial DNA heteroplasmy through a developmental genetic bottleneck in human embryos. Nat. Cell Biol. 20, 144–151 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Zaidi, A. A. et al. Bottleneck and selection in the germline and maternal age influence transmission of mitochondrial DNA in human pedigrees. Proc. Natl Acad. Sci. USA 116, 25172–25178 (2019).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  46. Schaefer, P. M. et al. Combination of common mtDNA variants results in mitochondrial dysfunction and a connective tissue dysregulation. Proc. Natl Acad. Sci. USA 119, e2212417119 (2022).

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Kennedy, S. R., Salk, J. J., Schmitt, M. W. & Loeb, L. A. Ultra-sensitive sequencing reveals an age-related increase in somatic mitochondrial mutations that are inconsistent with oxidative damage. PLoS Genet. 9, e1003794 (2013).

    CAS  PubMed  PubMed Central  Google Scholar 

  48. Ludwig, L. S. et al. Lineage tracing in humans enabled by mitochondrial mutations and single-cell genomics. Cell 176, 1325–1339.e22 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  49. Rebolledo-Jaramillo, B. et al. Maternal age effect and severe germ-line bottleneck in the inheritance of human mitochondrial DNA. Proc. Natl Acad. Sci. USA 111, 15474–15479 (2014).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  50. Li, M. et al. Transmission of human mtDNA heteroplasmy in the Genome of the Netherlands families: support for a variable-size bottleneck. Genome Res. 26, 417–426 (2016).

    CAS  PubMed  PubMed Central  Google Scholar 

  51. Yuan, Y. et al. Comprehensive molecular characterization of mitochondrial genomes in human cancers. Nat. Genet. 52, 342–352 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  52. SPARK Consortium. SPARK: A US cohort of 50,000 families to accelerate autism research. Neuron 97, 488–493 (2018).

    Google Scholar 

  53. Colnaghi, M., Pomiankowski, A. & Lane, N. The need for high-quality oocyte mitochondria at extreme ploidy dictates mammalian germline development. eLife 10, e69344 (2021).

    CAS  PubMed  PubMed Central  Google Scholar 

  54. Van Oven, M. PhyloTree Build 17: Growing the human mitochondrial DNA tree. Forensic Sci. Int. Genet. Suppl. Ser. 5, e392–e394 (2015).

    Google Scholar 

  55. Lake, N. J., Zhou, L., Xu, J. & Lek, M. MitoVisualize: a resource for analysis of variants in human mitochondrial RNAs and DNA. Bioinformatics 38, 2967–2969 (2022).

    CAS  PubMed  Google Scholar 

  56. Bodenhofer, U., Bonatesta, E., Horejs-Kainrath, C. & Hochreiter, S. msa: an R package for multiple sequence alignment. Bioinformatics 31, 3997–3999 (2015).

    CAS  PubMed  Google Scholar 

  57. UniProt, C. UniProt: a worldwide hub of protein knowledge. Nucleic Acids Res. 47, D506–D515 (2019).

    Google Scholar 

  58. Sonney, S. et al. Predicting the pathogenicity of novel variants in mitochondrial tRNA with MitoTIP. PLoS Comput. Biol. 13, e1005867 (2017).

    PubMed  PubMed Central  Google Scholar 

  59. Landrum, M. J. et al. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res. 46, D1062–D1067 (2018).

    CAS  PubMed  Google Scholar 

  60. Akesson, L. S. et al. Early diagnosis of Pearson syndrome in neonatal intensive care following rapid mitochondrial genome sequencing in tandem with exome sequencing. Eur. J. Hum. Genet. 27, 1821–1826 (2019).

    CAS  PubMed  PubMed Central  Google Scholar 

  61. Davydov, E. V. et al. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLoS Comput. Biol. 6, e1001025 (2010).

    PubMed  PubMed Central  Google Scholar 

  62. Hamelryck, T. & Manderick, B. PDB file parser and structure class implemented in Python. Bioinformatics 19, 2308–2310 (2003).

    CAS  PubMed  Google Scholar 

  63. Guo, R., Zong, S., Wu, M., Gu, J. & Yang, M. Architecture of human mitochondrial respiratory megacomplex I2III2IV2. Cell 170, 1247–1257.e12 (2017).

    CAS  PubMed  Google Scholar 

  64. Zong, S. et al. Structure of the intact 14-subunit human cytochrome c oxidase. Cell Res. 28, 1026–1034 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  65. Aibara, S., Singh, V., Modelska, A. & Amunts, A. Structural basis of mitochondrial translation. eLife 9, e58362 (2020).

    CAS  PubMed  PubMed Central  Google Scholar 

  66. Jumper, J. et al. Highly accurate protein structure prediction with AlphaFold. Nature 596, 583–589 (2021).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  67. Soltanikazemi, E., Quadir, F., Roy, R. S., Guo, Z. & Cheng, J. Distance-based reconstruction of protein quaternary structures from inter-chain contacts. Proteins 90, 720–731 (2022).

    CAS  PubMed  Google Scholar 

  68. Pettersen, E. F. et al. UCSF ChimeraX: structure visualization for researchers, educators, and developers. Protein Sci. 30, 70–82 (2021).

    CAS  PubMed  Google Scholar 

  69. Sudlow, C. et al. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med. 12, e1001779 (2015).

    PubMed  PubMed Central  Google Scholar 

  70. Battle, S. L. et al. A bioinformatics pipeline for estimating mitochondrial DNA copy number and heteroplasmy levels from whole genome sequencing data. NAR Genom. Bioinform. 4, lqac034 (2022).

    PubMed  PubMed Central  Google Scholar 

  71. Cacheiro, P. et al. Human and mouse essentiality screens as a resource for disease gene discovery. Nat. Commun. 11, 655 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

  72. Firth, H. V. et al. DECIPHER: Database of Chromosomal Imbalance and Phenotype in Humans Using Ensembl Resources. Am. J. Hum. Genet. 84, 524–533 (2009).

    CAS  PubMed  PubMed Central  Google Scholar 

  73. Lake, N., Ma, K. Cohen, J., Lek, M. Mitochondrial DNA base editing in HEK293T cells. protocols.io https://doi.org/10.17504/protocols.io.yxmvm3rnol3p/v1 (2024).

  74. Kluesner, M. G. et al. EditR: a method to quantify base editing from Sanger sequencing. CRISPR J. 1, 239–250 (2018).

    CAS  PubMed  PubMed Central  Google Scholar 

  75. Mok, B. Y. et al. A bacterial cytidine deaminase toxin enables CRISPR-free mitochondrial base editing. Nature 583, 631–637 (2020).

    CAS  PubMed  PubMed Central  ADS  Google Scholar 

Download references

Acknowledgements

N.J.L. received a National Health and Medical Research Council (NHMRC) Early Career Fellowship (APP1159456) and an Australian American Association Scholarship. This research was conducted using the UK Biobank Resource under application number 17731, and supported by National Heart, Lung and Blood Institute, US National Institutes of Health (NIH) grant R01HL144569. The content is solely the responsibility of the authors and does not necessarily represent the official views of the NIH. D.R.T. was supported by a NHMRC Principal Research Fellowship (GNT1155244). The Chair in Genomic Medicine awarded to J.C. is supported by The Royal Children’s Hospital Foundation. Research conducted at the Murdoch Children’s Research Institute was supported by the Victorian Government’s Operational Infrastructure Support Program. We are grateful to and thank all the families in SPARK and staff at the SPARK clinical sites; staff from SFARI Base for allowing us access to the data; S. Calvo for providing advice on the constraint model and its application to gnomAD; the broader team at the Victorian Clinical Genetics Service, including B. Chong and S. Lunke, for their contributions to the curation of disease-associated variation; staff at The Islet, Oxygen consumption, Mass Isotopomer flux Core (IOMIC) at Yale, who assisted with oxygen consumption measurement; and staff at the Yale Center for Genome Analysis for providing PacBio sequencing services (this centre is funded in part by the National Institutes of Health instrument grant 1S10OD028669-01).

Author information

Authors and Affiliations

Authors

Contributions

N.J.L. and M.L. contributed to the study conception and design. N.J.L., S.L.B., K.M.L., G.T., D.P., A.G.C., S.C., J. Christodoulou, D.R.T., D.E.A. and M.L. contributed to data acquisition. N.J.L., W.L., H.Z., S.R.S. and M.L. contributed to methods development of the model. N.J.L., W.L., S.L.B., D.E.A. and M.L. contributed to data analyses. N.J.L., W.L., S.L.B., K.M.L., G.T., A.G.C., S.C., J. Christodoulou, D.R.T., H.Z., D.E.A., S.R.S. and M.L. contributed to interpretation of the data. N.J.L., K.M., K.K.N., J. Cohen and M.L. contributed to the in vitro functional experiments and data analyses. M.L. supervised and managed the study. N.J.L. and M.L. drafted the manuscript. All authors contributed to manuscript review and editing.

Corresponding authors

Correspondence to Nicole J. Lake or Monkol Lek.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature thanks David Rand, Benjamin Voight and the other, anonymous, reviewer(s) for their contribution to the peer review of this work.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Schematic overview of mitochondrial genome constraint.

(a) We established a constraint model for the mtDNA to quantify the removal of deleterious variants from the population by negative selection. We assessed constraint by identifying genes and regions where the observed variation is less than expected, under neutrality. Observed is calculated using maximum heteroplasmy in gnomAD, and specifically by summing the maximum heteroplasmy value of every variant in a gene or region. Expected is calculated using a mutational model, and specifically by summing the mutational likelihoods of every variant in a gene or region and applying linear models fit on neutral variation (ascertained using Phylotree and PhyloP). The ratio of observed:expected variation and its 90% confidence interval is calculated, and the OEUF is used as a conservative measure of constraint. (b) A suite of constraint metrics are available via Supplementary Datasets, including constraint metrics for each gene and non-coding element, as well as regional missense constraint for each protein gene, regional constraint for each rRNA gene, position constraint for tRNA genes, and local constraint for every position in the mtDNA (MLC scores). (c) Constraint metrics can identify deleterious variants, and constrained sites are enriched in pathogenic variants from ClinVar and MITOMAP. Example applications include using regional constraint for variant classification and prioritization in individuals with rare disease and using the MLC score to assess associations between heteroplasmy burden and common phenotypes. (a) and (c) were created with BioRender.com.

Extended Data Fig. 2 Mutability, disease-associated variation, and constraint across classes of human mtDNA variation.

(a) Trinucleotide mutational signature of mtDNA mutations within the OriB-OriH region (m.16197-191) predicted by the composite likelihood model. Mutation likelihoods for the six pyrimidine base substitution types across 96 trinucleotides are shown, colored by whether the reference nucleotide is in the reference ‘light’ or reverse complement ‘heavy’ strand. (b) Proportion of total disease-associated variants in ClinVar (n = 2607) and MITOMAP (n = 882) by consequence. (c) The observed:expected ratio of ClinVar Uncertain Significance and MITOMAP Reported mtDNA variants in gnomAD, and subset by whether they met pathogenic or benign criteria for computational algorithms and MITOMAP population frequency in ACMG/AMP guidelines for mtDNA variant interpretation. 1000 ClinVar variants (with pathogenic evidence, n = 147; note none satisfied both benign criteria) and 791 MITOMAP variants (with pathogenic evidence, n = 175; with benign evidence, n = 28) are included. (d) The observed:expected ratio of in silico predictions in gnomAD for missense variants by APOGEE (pathogenic, n = 7276 and neutral, n = 16,800), and for tRNA variants by MitoTIP (likely pathogenic, n = 981; possibly pathogenic, n = 1171, possibly benign, n = 1162 and likely benign, n = 1198) and HmtVar (pathogenic, n = 202; likely pathogenic, n = 6, likely polymorphic, n = 4139 and polymorphic, n = 24); all of which are recommended per ACMG/AMP mtDNA guidelines for variant interpretation. Note the outlier HmtVar ‘likely pathogenic’ group only includes six variants. (e) Assessment of functional classes of mtDNA variation in a replication dataset, HelixMTdb. The n per class is per Fig. 1d. The diamonds in (c-e) represent the observed:expected ratio, and the error bars in (c-e) represent 90% confidence intervals.

Source Data

Extended Data Fig. 3 Assessment of conservation and codon usage across genes.

Median phyloP base conservation scores for each protein (a) or RNA (b) gene, derived from 100 vertebrate genomes. Higher phyloP values represent increased conservation. (c) The count of codons within the protein-coding sequence corresponding to each tRNA. Pearson correlation coefficient (R) and its p-value (p) computed using a two-sided test is shown in (a-c). OEUF stands for observed:expected ratio confidence interval upper bound fraction, and OEUF values for (a-c) are provided in Supplementary Dataset 1.

Source Data

Extended Data Fig. 4 Areas of regional constraint within each protein and rRNA gene.

(a) Intervals of regional missense constraint identified in each protein are colored in red. For display purposes each protein is shown at the same length (i.e. are not scaled by their actual protein length), and amino acid residue numbering is shown. (b) Intervals of regional constraint identified in each rRNA are colored in red. The rRNA sequences are not scaled by their length, and mtDNA position coordinates are shown. Coordinates for (a-b) are provided in Supplementary Dataset 2.

Extended Data Fig. 5 Characteristics of regional constraint.

(a-b) The proportion of bases encoding proteins (n = 11,341) or residues of functional significance (n = 141) (a) or benign (n = 625) and pathogenic (n = 79) missense variants (most severe consequence) (b) that are within, proximal to (<6 Ångstrom distance from), or outside regional missense constraint. (c) The proportion of bases encoding rRNA (n = 2512), or modified bases and bases in rRNA:rRNA intersubunit bridges (n = 63) that are within, proximal to (<6 Ångstrom distance from), or outside regional constraint. (d) The four areas of regional missense constraint in MT-CYB are shown in red, visualized in its dimeric 3D structure. Heme molecules involved in electron transfer are colored green. (e) The two areas of regional missense constraint in MT-ND6 are shown in red in the 3D structure. Residues colored in yellow are involved in the transition from the open to closed complex state in the π-bulge (p.61-63 and p.67) per Kampjut and Sazanov39. (f) An area of regional constraint within the MT-RNR1 tertiary structure, indicated in red. Modified bases are colored blue, and disease-associated bases (m.1494 and m.1555) purple. The mRNA molecule is colored green. (g) Areas of regional constraint within MT-RNR1 secondary structure, indicated by red font. The box highlights an area including regional constraint, modified bases (blue font) and disease-associated variants (at m.1494 and m.1555, bold purple font); also shown in tertiary structure in Extended Data Fig. 5f. (h) The areas of regional constraint within the MT-RNR2 secondary structure, indicated by red font; modified bases are in blue font.

Extended Data Fig. 6 Characteristics of RNA variants and bases.

(a) The proportion of pathogenic (n = 121) and benign (n = 232) tRNA variants for each base type. (b) The observed:expected ratio for variants in modified and non-modified bases in tRNA (modified, n = 411 and non-modified, n = 4101) and rRNA (modified, n = 30 and non-modified, n = 7506). The diamonds represent the observed:expected ratio, and error bars represent the 90% confidence interval. (c) The generic tRNA secondary structure, with positions colored by domain. (d) The proportion of pathogenic (n = 121) and benign (n = 232) tRNA variants for each domain, following the color legend in (c).

Source Data

Extended Data Fig. 7 Measuring constraint across non-coding elements.

Top schematic shows annotated elements within the non-coding control region, which spans the artificial chromosome break (m.16569-1). The top row includes the three hypervariable sequences (HV1, HV2, HV3), the second row includes termination-associated sequences (TAS, TAS2), conserved sequence blocks (CSB1, CSB2, CSB3) and L-strand and H-strand promoters (LSP, HSP1), and the third row includes a control element (MT-5) and transcription factor binding sites (TFX, TFY, TFL, TFH). The observed:expected ratio 90% confidence interval upper fraction (OEUF) within each element is shown per the color gradient legend; darker colors represent lower OEUF. Values are provided in Supplementary Dataset 5. The bottom schematic shows the position of the control region and origin for replication of the light strand (OriL) within the mtDNA, with encoded loci colored by their type (non-coding in yellow, protein blue, rRNA purple and tRNA orange).

Extended Data Fig. 8 mtDNA local constraint (MLC) scores and population allele frequencies across the non-coding control region.

(a) The MLC score of positions across the control region, calculated using gnomAD maximum heteroplasmy data, are shown; a schematic of annotated non-coding elements is displayed above. The five peaks from left to right overlap (1) a recently discovered second light strand promoter, (2-3) regions of unknown function within the D-loop, (4) conserved sequence block 3, or (5) the light strand promoter. Base scores are provided in Supplementary Dataset 6. (b-c) The homoplasmic allele frequency (AF) of variants across the control region in gnomAD (b) or HelixMTdb (c). (d) The population allele frequency of variants across the control region in the MITOMAP database (which does not include heteroplasmy data). (b-d) are displayed with a square root transformed y-axis; note only SNVs are included.

Source Data

Extended Data Fig. 9 Relationship between the mtDNA local constraint (MLC) score and genomic annotations.

(a) The proportion of benign (n = 884) and pathogenic (n = 205) variants in each score quartile. (b) Density plot showing the score distribution of disease-associated variants; numbers per (a). (c) Density plot showing the score distribution of 184 pathogenic variants with disease plasmy status in MITOMAP, colored by association with disease at heteroplasmy only, or at homoplasmy. (d) Density plot showing the score distribution of 88 ‘confirmed’ pathogenic variants from MITOMAP, colored by whether reported in individuals at heteroplasmy only or at homoplasmy, per a manual literature review. Plots (a-d) include missense and RNA variants only, and for (c-d) ‘at homplasmy’ includes observed at both homoplasmy and heteroplasmy. (e) Boxplot showing the score distribution for base positions where indels are observed in gnomAD (n = 416), HelixMTdb (n = 697), and MITOMAP (n = 667) databases. (f) The distribution of PhyloP base conservation scores for bases within each score quartile (0.0-0.25, n = 4142; 0.25-0.50, n = 4142; 0.50-0.75, n = 4141; 0.75-1.0, n = 4143); a dashed line is shown at score = 0. (g) The MLC score across every base position in the human mtDNA; bases that are conserved in chimpanzees are denoted by black pipe symbols, while those non-conserved and encoding base or amino acid substitutions are shown as white pipe symbols. (h-j) The MLC variant score distribution for SNVs across population frequency categories in gnomAD (homoplasmy AF ≥ 0.002%, n = 7363; homoplasmy AF < 0.002%, n = 1846 and heteroplasmy only, n = 1641) (h), HelixMTdb (homoplasmy AF ≥ 0.002%, n = 8049; homoplasmy AF < 0.002%, n = 3442 and heteroplasmy only, n = 2613) (i) and MITOMAP (AF ≥ 0.002%, n = 8617 and AF < 0.002%, n = 10,343) (j) databases. Note that allele frequency <0.002% is recommended as evidence of pathogenicity in ACMG/AMP mtDNA guidelines12, and that heteroplasmy data is not available for MITOMAP. For (e-f, h-j), boxplot elements include: center line, median; box limits, 25th and 75th percentiles; minima and maxima, 1.5x interquartile range; points, outliers.

Source Data

Extended Data Fig. 10 MLC scores versus population frequency or phyloP for heteroplasmies in the UK Biobank.

The MLC score of single nucleotide variant heteroplasmies in the UK Biobank (UKB) is plotted against their population allele frequency (a) or PhyloP scores (b). Plot in (a) is displayed with a square root transformed y-axis. Note a phyloP score of >3 represents significantly conserved sites with p-value < 0.05. Variant classes in each plot are described in the titles in (a), and the histograms on the right margin show the distribution of variants across the y-axis. R2 coefficient of determination, and a blue line of linear model fit, is shown.

Source Data

Extended Data Table 1 Association between MLC score sum (MSS) and blood cell counts in the UK Biobank

Supplementary information

Supplementary Information

This file contains Supplementary Methods, Supplementary Figs. 1–10 and Supplementary Tables 1–7, Supplementary Discussion, Supplementary References and detailed descriptions of Supplementary Datasets and Supplementary Video.

Reporting Summary

Supplementary Datasets

This zipped file contains Supplementary Datasets 1–9; see Supplementary Information document for Supplementary Dataset guide.

Supplementary Video 1

Local constraint across the 16S rRNA encoded by MT-RNR2. The video shows the mitoribosome, a complex of proteins (blue) and the small 12S (pink) and large 16S (purple) rRNAs, which serves as the site of mitochondrial translation. The mRNA (bright green) and tRNAs occupying the A/P site (yellow) and P/E site (orange) are also shown. The mtDNA local constraint scores are then displayed across the 16S rRNA encoded by MT-RNR2 using a red-white-blue gradient. Dark red indicates highly constrained sites with scores close to 1, white scores around 0.5, and dark blue scores approaching 0.

Source data

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lake, N.J., Ma, K., Liu, W. et al. Quantifying constraint in the human mitochondrial genome. Nature 635, 390–397 (2024). https://doi.org/10.1038/s41586-024-08048-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41586-024-08048-x

This article is cited by

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing