Abstract
Understanding the disease risk of genetic variants is fundamental to precision medicine. Estimates of penetrance—the probability of disease for individuals with a variant allele—rely on disease-specific cohorts, clinical testing and emerging electronic health record (EHR)-linked biobanks. These data sources, while valuable, each have limitations in quality, representativeness and analyzability. Here, we provide a historical account of the currently accepted pathogenicity classification system and data available in ClinVar, a public archive that aggregates variant interpretations but lacks detailed data for accurate penetrance assessment, highlighting its oversimplification of disease risk. We propose an integrative Bayesian framework that unifies pathogenicity and penetrance, leveraging both functional and real-world evidence to refine risk predictions. In addition, we advocate for enhancing ClinVar with the inclusion of high-priority phenotypes, age-stratified data and population-based cohorts linked to EHRs. We suggest developing a community repository of population-based penetrance estimates to support the clinical application of genetic data.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout

Similar content being viewed by others
References
Turro, E. et al. Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583, 96–102 (2020).
Smedley, D. et al. 100,000 Genomes pilot on rare-disease diagnosis in health care—preliminary report. N. Engl. J. Med. 385, 1868–1880 (2021).
Nisar, H. et al. Whole-genome sequencing as a first-tier diagnostic framework for rare genetic diseases. Exp. Biol. Med. (Maywood) 246, 2610–2617 (2021).
Office of Genomics and Precision Public Health, Centers for Disease Control and Prevention. Tier 1 genomics applications and their importance to public health. CDC https://archive.cdc.gov/www_cdc_gov/genomics/implementation/toolkit/tier1.htm (2014).
Sturm, A. C. et al. Clinical genetic testing for familial hypercholesterolemia: JACC Scientific Expert Panel. J. Am. Coll. Cardiol. 72, 662–680 (2018).
Lee, C. H. et al. Breast cancer screening with imaging: recommendations from the Society of Breast Imaging and the ACR on the use of mammography, breast MRI, breast ultrasound, and other technologies for the detection of clinically occult breast cancer. J. Am. Coll. Radiol. 7, 18–27 (2010).
Nelson, H. D. et al. Risk assessment, genetic counseling, and genetic testing for BRCA-related cancer in women: a systematic review to update the U.S. Preventive Services Task Force recommendation. Ann. Intern. Med. 160, 255–266 (2014).
Schmidt, R. J. et al. Recommendations for risk allele evidence curation, classification, and reporting from the ClinGen Low Penetrance/Risk Allele Working Group. Genet. Med. 26, 101036 (2024).
Kuchenbaecker, K. B. et al. Risks of breast, ovarian, and contralateral breast cancer for BRCA1 and BRCA2 mutation carriers. JAMA 317, 2402–2416 (2017).
Beutler, E., Felitti, V. J., Koziol, J. A., Ho, N. J. & Gelbart, T. Penetrance of 845G→A (C282Y) HFE hereditary haemochromatosis mutation in the USA. Lancet 359, 211–218 (2002).
Van Driest, S. L. et al. Association of arrhythmia-related genetic variants with phenotypes documented in electronic medical records. JAMA 315, 47–57 (2016).
Forrest, I. S. et al. Population-based penetrance of deleterious clinical variants. JAMA 327, 350–359 (2022).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–424 (2015).
Miller, D. T. et al. ACMG SF v3.1 list for reporting of secondary findings in clinical exome and genome sequencing: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 24, 1407–1414 (2022).
Amendola, L. M. et al. Performance of ACMG–AMP variant-interpretation guidelines among nine laboratories in the Clinical Sequencing Exploratory Research Consortium. Am. J. Hum. Genet. 98, 1067–1076 (2016).
Shah, N. et al. Identification of misclassified ClinVar variants via disease population prevalence. Am. J. Hum. Genet. 102, 609–619 (2018).
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Wu, N. et al. TBX6 null variants and a common hypomorphic allele in congenital scoliosis. N. Engl. J. Med. 372, 341–350 (2015).
Gulati, A. et al. Hypomorphic PKD1 alleles impact disease variability in autosomal dominant polycystic kidney disease. Kidney360 4, 387–392 (2023).
Zernant, J. et al. Extremely hypomorphic and severe deep intronic variants in the ABCA4 locus result in varying Stargardt disease phenotypes. Cold Spring Harb. Mol. Case Stud. 4, a002733 (2018).
Zschocke, J., Byers, P. H. & Wilkie, A. O. M. Mendelian inheritance revisited: dominance and recessiveness in medical genetics. Nat. Rev. Genet. 24, 442–463 (2023).
Sharo, A. G., Zou, Y., Adhikari, A. N. & Brenner, S. E. ClinVar and HGMD genomic variant classification accuracy has improved over time, as measured by implied disease burden. Genome Med. 15, 51 (2023).
Alirezaie, N., Kernohan, K. D., Hartley, T., Majewski, J. & Hocking, T. D. ClinPred: prediction tool to identify disease-relevant nonsynonymous single-nucleotide variants. Am. J. Hum. Genet. 103, 474–483 (2018).
Li, S. et al. CAPICE: a computational method for Consequence-Agnostic Pathogenicity Interpretation of Clinical Exome variations. Genome Med. 12, 75 (2020).
Agaoglu, N. B. et al. Consistency of variant interpretations among bioinformaticians and clinical geneticists in hereditary cancer panels. Eur. J. Hum. Genet. 30, 378–383 (2022).
Giles, H. H. et al. The science and art of clinical genetic variant classification and its impact on test accuracy. Annu. Rev. Genomics Hum. Genet. 22, 285–307 (2021).
Yang, S. et al. Sources of discordance among germ-line variant classifications in ClinVar. Genet. Med. 19, 1118–1126 (2017).
Rehm, H. L. et al. ClinGen—the Clinical Genome Resource. N. Engl. J. Med. 372, 2235–2242 (2015).
Xiang, J. et al. Reinterpretation of common pathogenic variants in ClinVar revealed a high proportion of downgrades. Sci. Rep. 10, 331 (2020).
Walsh, N., Cooper, A., Dockery, A. & O’Byrne, J. J. Variant reclassification and clinical implications. J. Med. Genet. 61, 207–211 (2024).
Venner, E. et al. The frequency of pathogenic variation in the All of Us cohort reveals ancestry-driven disparities. Commun. Biol. 7, 174 (2024).
Chirita-Emandi, A. et al. Challenges in reporting pathogenic/potentially pathogenic variants in 94 cancer predisposing genes—in pediatric patients screened with NGS panels. Sci. Rep. 10, 223 (2020).
Ye, J. Z., Delmar, M., Lundby, A. & Olesen, M. S. Reevaluation of genetic variants previously associated with arrhythmogenic right ventricular cardiomyopathy integrating population-based cohorts and proteomics data. Clin. Genet. 96, 506–514 (2019).
van Rooij, J. et al. Reduced penetrance of pathogenic ACMG variants in a deeply phenotyped cohort study and evaluation of ClinVar classification over time. Genet. Med. 22, 1812–1820 (2020).
Milko, L. V. et al. Development of Clinical Domain Working Groups for the Clinical Genome Resource (ClinGen): lessons learned and plans for the future. Genet. Med. 21, 987–993 (2019).
Landrum, M. J. & Kattman, B. L. ClinVar at five years: delivering on the promise. Hum. Mutat. 39, 1623–1630 (2018).
Landrum, M. J. et al. ClinVar: updates to support classifications of both germline and somatic variants. Nucleic Acids Res. 53, D1313–D1321 (2025).
Pottinger, T. D. et al. Pathogenic and uncertain genetic variants have clinical cardiac correlates in diverse biobank participants. J. Am. Heart Assoc. 9, e013808 (2020).
Shah, R. A. et al. Frequency, penetrance, and variable expressivity of dilated cardiomyopathy-associated putative pathogenic gene variants in UK Biobank participants. Circulation 146, 110–124 (2022).
Bourfiss, M. et al. Prevalence and disease expression of pathogenic and likely pathogenic variants associated with inherited cardiomyopathies in the general population. Circ. Genom. Precis. Med. 15, e003704 (2022).
Manrai, A. K. et al. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375, 655–665 (2016).
Blair, D. R. & Risch, N. Dissecting the reduced penetrance of putative loss-of-function variants in population-scale biobanks. Preprint at medRxiv https://doi.org/10.1101/2024.09.23.24314008 (2024).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Gordon, A. S. et al. Consideration of disease penetrance in the selection of secondary findings gene–disease pairs: a policy statement of the American College of Medical Genetics and Genomics (ACMG). Genet. Med. 26, 101142 (2024).
Shekari, S. et al. Penetrance of pathogenic genetic variants associated with premature ovarian insufficiency. Nat. Med. 29, 1692–1699 (2023).
Huerta-Chagoya, A. et al. Rare variant analyses in 51,256 type 2 diabetes cases and 370,487 controls reveal the pathogenicity spectrum of monogenic diabetes genes. Nat. Genet. 56, 2370–2379 (2024).
Heyne, H. O. et al. Mono- and biallelic variant effects on disease at biobank scale. Nature 613, 519–525 (2023).
Yao, Q., Gorevic, P., Shen, B. & Gibson, G. Genetically transitional disease: a new concept in genomic medicine. Trends Genet. 39, 98–108 (2023).
Barton, A. R., Hujoel, M. L. A., Mukamel, R. E., Sherman, M. A. & Loh, P.-R. A spectrum of recessiveness among Mendelian disease variants in UK Biobank. Am. J. Hum. Genet. 109, 1298–1307 (2022).
Bychkovsky, B. L. et al. Prevalence and spectrum of pathogenic variants among patients with multiple primary cancers evaluated by clinical characteristics. Cancer 128, 1275–1283 (2022).
Fahed, A. C. et al. Polygenic background modifies penetrance of monogenic variants for tier 1 genomic conditions. Nat. Commun. 11, 3635 (2020).
Wright, C. F. et al. Guidance for estimating penetrance of monogenic disease-causing variants in population cohorts. Nat. Genet. 56, 1772–1779 (2024).
Zschocke, J., Byers, P. H. & Wilkie, A. O. M. Gregor Mendel and the concepts of dominance and recessiveness. Nat. Rev. Genet. 23, 387–388 (2022).
Tavtigian, S. V. et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 20, 1054–1060 (2018).
Plon, S. E. et al. Sequence variant classification and reporting: recommendations for improving the interpretation of cancer susceptibility genetic test results. Hum. Mutat. 29, 1282–1291 (2008).
Kroncke, B. M. et al. A Bayesian method to estimate variant-induced disease penetrance. PLoS Genet. 16, e1008862 (2020).
Qian, D. et al. A Bayesian framework for efficient and accurate variant prediction. PLoS ONE 13, e0203553 (2018).
Ruklisa, D., Ware, J. S., Walsh, R., Balding, D. J. & Cook, S. A. Bayesian models for syndrome- and gene-specific probabilities of novel variant pathogenicity. Genome Med. 7, 5 (2015).
McGurk, K. A. et al. The penetrance of rare variants in cardiomyopathy-associated genes: a cross-sectional approach to estimating penetrance for secondary findings. Am. J. Hum. Genet. 110, 1482–1495 (2023).
O’Neill, M. J. et al. Continuous Bayesian variant interpretation accounts for incomplete penetrance among Mendelian cardiac channelopathies. Genet. Med. 25, 100355 (2023).
O’Neill, M. J. et al. Multiplexed assays of variant effect and automated patch clamping improve KCNH2-LQTS variant classification and cardiac event risk stratification. Circulation 150, 1869–1881 (2024).
Ruberu, T. L. M., Braun, D., Parmigiani, G. & Biswas, S. Bayesian meta-analysis of penetrance for cancer risk. Biometrics 80, ujae038 (2024).
Benn, D. E. et al. Bayesian approach to determining penetrance of pathogenic SDH variants. J. Med. Genet. 55, 729–734 (2018).
Buhr, L. & Schicktanz, S. Individual benefits and collective challenges: experts’ views on data-driven approaches in medical research and healthcare in the German context. Big Data Soc. https://doi.org/10.1177/20539517221092653 (2022).
Longo, D. L. & Drazen, J. M. Data sharing. N. Engl. J. Med. 374, 276–277 (2016).
Torkamani, A., Andersen, K. G., Steinhubl, S. R. & Topol, E. J. High-definition medicine. Cell 170, 828–843 (2017).
Collins, F. S. & Varmus, H. A new initiative on precision medicine. N. Engl. J. Med. 372, 793–795 (2015).
Foreman, J. et al. DECIPHER: improving genetic diagnosis through dynamic integration of genomic and clinical data. Annu. Rev. Genomics Hum. Genet. 24, 151–176 (2023).
Harrison, J. E., Weber, S., Jakob, R. & Chute, C. G. ICD-11: an international classification of diseases for the twenty-first century. BMC Med. Inform. Decis. Mak. 21, 206 (2021).
Gargano, M. A. et al. The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic Acids Res. 52, D1333–D1346 (2024).
Mighton, C. et al. From the patient to the population: use of genomics for population screening. Front. Genet. 13, 893832 (2022).
Foss, K. S. et al. The rise of population genomic screening: characteristics of current programs and the need for evidence regarding optimal implementation. J. Pers. Med. 12, 692 (2022).
Murray, M. F., Evans, J. P. & Khoury, M. J. DNA-based population screening: potential suitability and important knowledge gaps. JAMA 323, 307–308 (2020).
Rowley, S. M. et al. Population-based genetic testing of asymptomatic women for breast and ovarian cancer susceptibility. Genet. Med. 21, 913–922 (2019).
Khoury, M. J. et al. A collaborative translational research framework for evaluating and implementing the appropriate use of human genome sequencing to improve health. PLoS Med. 15, e1002631 (2018).
O’Brien, T. D. et al. Population screening shows risk of inherited cancer and familial hypercholesterolemia in Oregon. Am. J. Hum. Genet. 110, 1249–1265 (2023).
Dikilitas, O. et al. Familial hypercholesterolemia in the electronic medical records and genomics network: prevalence, penetrance, cardiovascular risk, and outcomes after return of results. Circ. Genom. Precis. Med. 16, e003816 (2023).
Denny, J. C. et al. The ‘All of Us’ research program. N. Engl. J. Med. 381, 668–676 (2019).
Lacaze, P., Manchanda, R. & Green, R. C. Prioritizing the detection of rare pathogenic variants in population screening. Nat. Rev. Genet. 24, 205–206 (2023).
Lacaze, P. A. et al. Population DNA screening for medically actionable disease risk in adults. Med. J. Aust. 216, 278–280 (2022).
Grzymski, J. J. et al. Population genetic screening efficiently identifies carriers of autosomal dominant diseases. Nat. Med. 26, 1235–1239 (2020).
Dias, R. & Torkamani, A. Artificial intelligence in clinical and genomic diagnostics. Genome Med. 11, 70 (2019).
Aradhya, S. et al. Applications of artificial intelligence in clinical laboratory genomics. Am. J. Med. Genet. C Semin. Med. Genet. 193, e32057 (2023).
Amann, R. I. et al. Toward unrestricted use of public genomic data. Science 363, 350–352 (2019).
Castellani, C. et al. CFTR2: how will it help care? Paediatr. Respir. Rev. 14, 2–5 (2013).
Sollis, E. et al. The NHGRI–EBI GWAS Catalog: knowledgebase and deposition resource. Nucleic Acids Res. 51, D977–D985 (2023).
Zhou, W. et al. Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease. Cell Genom. 2, 100192 (2022).
Abul-Husn, N. S. & Kenny, E. E. Personalized medicine and the power of electronic health records. Cell 177, 58–69 (2019).
Parsons, M. T. et al. Large scale multifactorial likelihood quantitative analysis of BRCA1 and BRCA2 variants: an ENIGMA resource to support clinical variant classification. Hum. Mutat. 40, 1557–1578 (2019).
Mighton, C. et al. Variant classification changes over time in BRCA1 and BRCA2. Genet. Med. 21, 2248–2254 (2019).
de Andrade, K. C. et al. Variable population prevalence estimates of germline TP53 variants: a gnomAD-based analysis. Hum. Mutat. 40, 97–105 (2019).
Davidson, A. L. et al. Analysis of hereditary cancer gene variant classifications from ClinVar indicates a need for regular reassessment of clinical assertions. Hum. Mutat. 43, 2054–2062 (2022).
Acknowledgements
We thank S. Plon for helpful comments on the manuscript. R.D. is supported by the National Institute of General Medical Sciences of the National Institutes of Health (NIH) (R35-GM124836). W.K.C. is supported by the National Human Genome Research Institute of the NIH (R01-HG010365). K.-L.H. is supported by the National Institute of General Medical Sciences of the NIH (R35-GM138113).
Author information
Authors and Affiliations
Contributions
I.S.F., D.M.J. and R.D. wrote the first draft of the manuscript with substantial comments and revisions from K.-L.H., J.M.E. and W.K.C. All authors contributed to drafting the manuscript.
Corresponding authors
Ethics declarations
Competing interests
R.D. reported being a scientific cofounder, consultant and equity holder for Pensieve Health (pending) and a consultant for Variant Bio and Character Bio. J.M.E. reported being a cofounder, board member and executive of the nonprofit Center for Genomic Interpretation, with part of its mission overlapping with the interests of this work, specifically the mission to encourage careful stewardship of clinical genetics. J.M.E. is also the founder of and a consultant for Grandview Consulting LLC, not related to this work. K.-L.H. is a founder of Open Box Science, not related to this work. W.K.C. is on the Board of Directors of Prime Medicine and Rallybio, not related to this work. The other authors declare no competing interests.
Peer review
Peer review information
Nature Genetics thanks Brett Kroncke and Johannes Zschocke for their contribution to the peer review of this work.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information
Supplementary Table 1.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Forrest, I.S., Huang, KL., Eggington, J.M. et al. Using large-scale population-based data to improve disease risk assessment of clinical variants. Nat Genet 57, 1588–1597 (2025). https://doi.org/10.1038/s41588-025-02212-3
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41588-025-02212-3


