Abstract
Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms. This generated a variant map with over 4.7 million single-nucleotide variants, 767,795 insertions and deletions (indels), 537,486 tandem repeats and 24,315 structural variants, covering 2.77 Gb of the GRCh38 genome. This work adds ~200 Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878 and her family. As an example of the value of this improved benchmark, we retrained DeepVariant using these data to reduce genotyping errors by ~34%.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to the full article PDF.
USD 39.95
Prices may be subject to local taxes which are calculated during checkout




Similar content being viewed by others
Data availability
For the open consent samples, the data are released on Amazon Open Data. The Amazon S3 bucket contains the sequencing data, assemblies, variant calls and additional files documented here: https://github.com/Platinum-Pedigree-Consortium/Platinum-Pedigree-Datasets The full dataset, including controlled samples, was deposited at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003793.v1.p1
Code availability
The code associated with this work can be found at https://github.com/Platinum-Pedigree-Consortium/Platinum-Pedigree-Inheritance. The code is open source and MIT licensed.
References
Gonzaga-Jauregui, C., Lupski, J. R. & Gibbs, R. A. Human genome sequencing in health and disease. Annu. Rev. Med. 63, 35–61 (2012).
Talkowski, M. E. et al. Clinical diagnosis by whole-genome sequencing of a prenatal sample. N. Engl. J. Med. 367, 2226–2232 (2012).
Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).
Marshall, C. R. et al. Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease. NPJ Genom. Med. 5, 47 (2020).
Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2, 100128 (2022).
Eberle, M. A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).
English, A. C. et al. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat. Biotechnol. 43, 431–442 (2025).
Majidian, S., Agustinho, D. P., Chin, C.-S., Sedlazeck, F. J. & Mahmoud, M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol. 24, 221 (2023).
Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).
Porubsky, D. et al. Human de novo mutation rates from a four-generation pedigree reference. Nature 643, 427–436 (2025).
Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247 (2002).
Fang, H. et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med. 6, 89 (2014).
Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).
Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, 663–675 (2019).
Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).
Schloissnig, S. et al. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. Preprint at https://doi.org/10.1101/2024.04.18.590093 (2024).
Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).
Gustafson, J. A. et al. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome Res. 34, 2061–2073 (2024).
Depienne, C. & Mandel, J.-L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764–785 (2021).
Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. 42, 1606–1614 (2024).
Chen, X. et al. Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing. Am. J. Hum. Genet. 110, 240–250 (2023).
Chen, X. et al. Genome-wide profiling of highly similar paralogous genes using HiFi sequencing. Nat. Commun. 16, 2340 (2025).
Holt, J. M. et al. StarPhase: comprehensive phase-aware pharmacogenomic diplotyper for long-read sequencing data. Preprint at https://doi.org/10.1101/2024.12.10.627527 (2024).
Weisburd, B. et al. Defining a tandem repeat catalog and variation clusters for genome-wide analyses and population databases. Preprint at https://doi.org/10.1101/2024.10.04.615514 (2024).
Dwarshuis, N. et al. The GIAB genomic stratifications resource for human reference genomes. Nat. Commun. 15, 9029 (2024).
Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).
Saunders, C. T. et al. Sawfish: improving long-read structural variant discovery and genotyping with local haplotype modeling. Bioinformatics 41, btaf136 (2025).
Garrison, E. et al. Building pangenome graphs. Nat. Methods 21, 2008–2012 (2024).
Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).
Acknowledgements
The authors thank D. Baker (PacBio) for exploration of the SV merging methods and T. Brown (University of Washington) for assistance with the manuscript. This work was supported, in part, by US National Institutes of Health (NIH) grants R01HG002385 and R01HG010169 to E.E.E. (an investigator of the Howard Hughes Medical Institute). Certain commercial equipment, instruments or materials are identified to specify experimental conditions or reported results. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the purpose. The authors also thank the Amazon Open Data program for hosting the data. The following cell lines were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: GM12877, GM12878, GM12879, GM12881, GM12882, GM12883, GM12884, GM12885, GM12886, GM12887, GM12889, GM12890, GM12891, GM12892 (ref. 11).
Author information
Authors and Affiliations
Contributions
Z.K., C.N., D.P., T.M., W.J.R., S.L., E.D., A.C., J.M.H., C.T.S., E.E.E. and M.A.E. wrote the manuscript. Z.K., C.N., D.P., T.M., W.J.R., S.L., E.D., N.K., W.T.H., A.C., J.M.H., C.J.S., P.-C.C., S.M., A.G., J.M.Z, X.C., N.D.O., C.T.S. and K.P.C. processed the data and carried out analyses. K.M.M., K.H., W.S.W., C.F. and C.L. generated sequencing data. Z.K., D.P., A.C., H.D., J.M.Z., P.M.L., E.G., J.D.S., P.M.L., D.W.N., L.B.J., A.R.Q., E.E.E. and M.A.E. provided oversight and designed the experiments.
Corresponding authors
Ethics declarations
Competing interests
Z.K., C.N., T.M., W.J.R., S.L., E.D., J.M.H., C.T.S., K.P.C., C.F., C.L., X.C. and M.A.E. are employees and shareholders of PacBio. Z.K. holds private equity in Phase Genomics. P.-C.C. and A.C. are employees and shareholders of Google LLC. E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. All other authors have no competing interests.
Peer review
Peer review information
Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary information
Supplementary Information (download PDF )
Supplementary Figs. 1–11 and legends for Supplementary Tables 1–15.
Supplementary Table 1 (download XLSX )
Supplementary Tables 1–15. Table legends are in the Supplementary Information pdf.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Kronenberg, Z., Nolan, C., Porubsky, D. et al. The Platinum Pedigree: a long-read benchmark for genetic variants. Nat Methods 22, 1669–1676 (2025). https://doi.org/10.1038/s41592-025-02750-y
Received:
Accepted:
Published:
Version of record:
Issue date:
DOI: https://doi.org/10.1038/s41592-025-02750-y
This article is cited by
-
CiFi: accurate long-read chromosome conformation capture with low-input requirements
Nature Communications (2025)


