Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

The Platinum Pedigree: a long-read benchmark for genetic variants

Abstract

Recent advances in genome sequencing have improved variant calling in complex regions of the human genome. However, it is difficult to quantify variant calling performance because existing standards often focus on specificity, neglecting completeness in difficult-to-analyze regions. To create a more comprehensive truth set, we used Mendelian inheritance in a large pedigree (CEPH-1463) to filter variants across PacBio high-fidelity (HiFi), Illumina and Oxford Nanopore Technologies platforms. This generated a variant map with over 4.7 million single-nucleotide variants, 767,795 insertions and deletions (indels), 537,486 tandem repeats and 24,315 structural variants, covering 2.77 Gb of the GRCh38 genome. This work adds ~200 Mb of high-confidence regions, including 8% more small variants, and introduces the first tandem repeat and structural variant truth sets for NA12878 and her family. As an example of the value of this improved benchmark, we retrained DeepVariant using these data to reduce genotyping errors by ~34%.

This is a preview of subscription content, access via your institution

Access options

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: The three-generation CEPH-1463 pedigree and haplotype transmission patterns.
Fig. 2: Pedigree-concordant stratification and overlap between technologies for GRCh38.
Fig. 3: Structural variant density across the pedigree relative to GRCh38.
Fig. 4: Repeat content characteristics.

Similar content being viewed by others

Data availability

For the open consent samples, the data are released on Amazon Open Data. The Amazon S3 bucket contains the sequencing data, assemblies, variant calls and additional files documented here: https://github.com/Platinum-Pedigree-Consortium/Platinum-Pedigree-Datasets The full dataset, including controlled samples, was deposited at https://www.ncbi.nlm.nih.gov/projects/gap/cgi-bin/study.cgi?study_id=phs003793.v1.p1

Code availability

The code associated with this work can be found at https://github.com/Platinum-Pedigree-Consortium/Platinum-Pedigree-Inheritance. The code is open source and MIT licensed.

References

  1. Gonzaga-Jauregui, C., Lupski, J. R. & Gibbs, R. A. Human genome sequencing in health and disease. Annu. Rev. Med. 63, 35–61 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Talkowski, M. E. et al. Clinical diagnosis by whole-genome sequencing of a prenatal sample. N. Engl. J. Med. 367, 2226–2232 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Roach, J. C. et al. Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science 328, 636–639 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Marshall, C. R. et al. Best practices for the analytical validation of clinical whole-genome sequencing intended for the diagnosis of germline disease. NPJ Genom. Med. 5, 47 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  5. Zook, J. M. et al. Extensive sequencing of seven human genomes to characterize benchmark reference materials. Sci. Data 3, 160025 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  6. Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2, 100128 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Eberle, M. A. et al. A reference data set of 5.4 million phased human variants validated by genetic inheritance from sequencing a three-generation 17-member pedigree. Genome Res. 27, 157–164 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  8. English, A. C. et al. Analysis and benchmarking of small and large genomic variants across tandem repeats. Nat. Biotechnol. 43, 431–442 (2025).

    Article  CAS  PubMed  Google Scholar 

  9. Majidian, S., Agustinho, D. P., Chin, C.-S., Sedlazeck, F. J. & Mahmoud, M. Genomic variant benchmark: if you cannot measure it, you cannot improve it. Genome Biol. 24, 221 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  10. Poplin, R. et al. A universal SNP and small-indel variant caller using deep neural networks. Nat. Biotechnol. 36, 983–987 (2018).

    Article  CAS  PubMed  Google Scholar 

  11. Porubsky, D. et al. Human de novo mutation rates from a four-generation pedigree reference. Nature 643, 427–436 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  12. Kong, A. et al. A high-resolution recombination map of the human genome. Nat. Genet. 31, 241–247 (2002).

    Article  CAS  PubMed  Google Scholar 

  13. Fang, H. et al. Reducing INDEL calling errors in whole genome and exome sequencing data. Genome Med. 6, 89 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  14. Olson, N. D. et al. Variant calling and benchmarking in an era of complete human genome sequences. Nat. Rev. Genet. 24, 464–483 (2023).

    Article  CAS  PubMed  Google Scholar 

  15. Audano, P. A. et al. Characterizing the major structural variant alleles of the human genome. Cell 176, 663–675 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Sudmant, P. H. et al. An integrated map of structural variation in 2,504 human genomes. Nature 526, 75–81 (2015).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  17. Schloissnig, S. et al. Long-read sequencing and structural variant characterization in 1,019 samples from the 1000 Genomes Project. Preprint at https://doi.org/10.1101/2024.04.18.590093 (2024).

  18. Ebert, P. et al. Haplotype-resolved diverse human genomes and integrated analysis of structural variation. Science 372, eabf7117 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Gustafson, J. A. et al. High-coverage nanopore sequencing of samples from the 1000 Genomes Project to build a comprehensive catalog of human genetic variation. Genome Res. 34, 2061–2073 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Depienne, C. & Mandel, J.-L. 30 years of repeat expansion disorders: what have we learned and what are the remaining challenges? Am. J. Hum. Genet. 108, 764–785 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  21. Dolzhenko, E. et al. Characterization and visualization of tandem repeats at genome scale. Nat. Biotechnol. 42, 1606–1614 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Chen, X. et al. Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing. Am. J. Hum. Genet. 110, 240–250 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Chen, X. et al. Genome-wide profiling of highly similar paralogous genes using HiFi sequencing. Nat. Commun. 16, 2340 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Holt, J. M. et al. StarPhase: comprehensive phase-aware pharmacogenomic diplotyper for long-read sequencing data. Preprint at https://doi.org/10.1101/2024.12.10.627527 (2024).

  25. Weisburd, B. et al. Defining a tandem repeat catalog and variation clusters for genome-wide analyses and population databases. Preprint at https://doi.org/10.1101/2024.10.04.615514 (2024).

  26. Dwarshuis, N. et al. The GIAB genomic stratifications resource for human reference genomes. Nat. Commun. 15, 9029 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Smolka, M. et al. Detection of mosaic and population-level structural variants with Sniffles2. Nat. Biotechnol. 42, 1571–1580 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Saunders, C. T. et al. Sawfish: improving long-read structural variant discovery and genotyping with local haplotype modeling. Bioinformatics 41, btaf136 (2025).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Garrison, E. et al. Building pangenome graphs. Nat. Methods 21, 2008–2012 (2024).

    Article  CAS  PubMed  Google Scholar 

  30. Rautiainen, M. et al. Telomere-to-telomere assembly of diploid chromosomes with Verkko. Nat. Biotechnol. 41, 1474–1482 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

The authors thank D. Baker (PacBio) for exploration of the SV merging methods and T. Brown (University of Washington) for assistance with the manuscript. This work was supported, in part, by US National Institutes of Health (NIH) grants R01HG002385 and R01HG010169 to E.E.E. (an investigator of the Howard Hughes Medical Institute). Certain commercial equipment, instruments or materials are identified to specify experimental conditions or reported results. Such identification does not imply recommendation or endorsement by the National Institute of Standards and Technology, nor does it imply that the equipment, instruments or materials identified are necessarily the best available for the purpose. The authors also thank the Amazon Open Data program for hosting the data. The following cell lines were obtained from the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research: GM12877, GM12878, GM12879, GM12881, GM12882, GM12883, GM12884, GM12885, GM12886, GM12887, GM12889, GM12890, GM12891, GM12892 (ref. 11).

Author information

Authors and Affiliations

Authors

Contributions

Z.K., C.N., D.P., T.M., W.J.R., S.L., E.D., A.C., J.M.H., C.T.S., E.E.E. and M.A.E. wrote the manuscript. Z.K., C.N., D.P., T.M., W.J.R., S.L., E.D., N.K., W.T.H., A.C., J.M.H., C.J.S., P.-C.C., S.M., A.G., J.M.Z, X.C., N.D.O., C.T.S. and K.P.C. processed the data and carried out analyses. K.M.M., K.H., W.S.W., C.F. and C.L. generated sequencing data. Z.K., D.P., A.C., H.D., J.M.Z., P.M.L., E.G., J.D.S., P.M.L., D.W.N., L.B.J., A.R.Q., E.E.E. and M.A.E. provided oversight and designed the experiments.

Corresponding authors

Correspondence to Zev Kronenberg or Michael A. Eberle.

Ethics declarations

Competing interests

Z.K., C.N., T.M., W.J.R., S.L., E.D., J.M.H., C.T.S., K.P.C., C.F., C.L., X.C. and M.A.E. are employees and shareholders of PacBio. Z.K. holds private equity in Phase Genomics. P.-C.C. and A.C. are employees and shareholders of Google LLC. E.E.E. is a scientific advisory board (SAB) member of Variant Bio, Inc. All other authors have no competing interests.

Peer review

Peer review information

Nature Methods thanks the anonymous reviewers for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Lei Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Supplementary Information (download PDF )

Supplementary Figs. 1–11 and legends for Supplementary Tables 1–15.

Reporting Summary (download PDF )

Peer Review File (download PDF )

Supplementary Table 1 (download XLSX )

Supplementary Tables 1–15. Table legends are in the Supplementary Information pdf.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Kronenberg, Z., Nolan, C., Porubsky, D. et al. The Platinum Pedigree: a long-read benchmark for genetic variants. Nat Methods 22, 1669–1676 (2025). https://doi.org/10.1038/s41592-025-02750-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • Version of record:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41592-025-02750-y

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research