ProHap enables human proteomic database generation accounting for population diversity

Vašíček, Jakub; Kuznetsova, Ksenia G.; Skiadopoulou, Dafni; Unger, Lucas; Chera, Simona; Ghila, Luiza M.; Bandeira, Nuno; Njølstad, Pål R.; Johansson, Stefan; Bruckner, Stefan; Käll, Lukas; Vaudel, Marc

doi:10.1038/s41592-024-02506-0

Brief Communication
Published: 09 December 2024

ProHap enables human proteomic database generation accounting for population diversity

Nature Methods volume 22, pages 273–277 (2025)Cite this article

2977 Accesses
6 Citations
77 Altmetric
Metrics details

Subjects

Abstract

Amid the advances in genomics, the availability of large reference panels of human haplotypes is key to account for human diversity within and across populations. However, mass spectrometry-based proteomics does not benefit from this information. To address this gap, we introduce ProHap, a Python-based tool that constructs protein sequence databases from phased genotypes of reference panels. ProHap enables researchers to account for haplotype diversity in proteomic searches.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Overview of the ProHap workflow.**

**Fig. 2: Impact of genetic variation in the populations of the 1000 Genomes Project on the proteomic search space.**

An chromosome-level haplotype-resolved genome assembly and annotation of pitaya (Selenicereus polyrhizus)

Article Open access 01 April 2025

Semi-automated assembly of high-quality diploid human reference genomes

Article Open access 19 October 2022

Large-scale plasma proteomics comparisons through genetics and disease associations

Article Open access 04 October 2023

Data availability

The six databases of protein sequences derived from the phased genotypes of the 1000 Genomes Project, along with all necessary metadata, are available at https://doi.org/10.5281/zenodo.10149277. The database of protein sequences derived from Release 1.1 of the Haplotype Reference Consortium, along with metadata, is available at https://doi.org/10.5281/zenodo.12671301. The databases of protein sequences derived from the first release of the 44 samples of the Human Pangenome Reference Consortium, along with metadata, are available at https://doi.org/10.5281/zenodo.12686818. The dataset of the 1000 Genomes Project phase 3 aligned with the GRCh38 genome reference is available from https://www.internationalgenome.org/data-portal/data-collection/grch38. The first release of the HPRC is available in various formats from https://github.com/human-pangenomics/hpp_pangenome_resources. The decomposed VCF file generated using the Minigraph-Cactus method aligned with the GRCh38 genome reference was used in this work. The Haplotype Reference Consortium Release 1.1 is accessible by request from the European Genome-phenome Archive at the European Bioinformatics Institute (https://www.ebi.ac.uk/ega/studies/EGAS00001001710). Data access requests are reviewed by the Data Access Committee at the Wellcome Trust Sanger Institute. The UniProt and SwissProt databases are available from https://www.uniprot.org/uniprotkb?facets=model_organism%3A9606&query=*. The proteomic dataset of blood plasma samples is available through ProteomeXchange with the dataset identifier PXD004242. The complete list of peptide–spectrum matches obtained by the reprocessing is available at https://doi.org/10.5281/zenodo.12725745. The dataset of mass spectra from healthy donor hiPSC cultures is available through ProteomeXchange with the dataset identifier PXD053601. The experimental metadata have been generated using lesSDRF⁴² and are available through ProteomeXchange with the dataset identifier PXD053601.

Code availability

Source code, along with all documentation, is available at https://github.com/ProGenNo/ProHap (https://doi.org/10.5281/zenodo.12749956). Source code of PeptideAnnotator, along with documentation, is available at https://github.com/ProGenNo/ProHap_PeptideAnnotator (https://doi.org/10.5281/zenodo.12759867). The pipeline used downstream of ProHap to obtain the present results is available at https://github.com/ProGenNo/ProHapDatabaseAnalysis (https://doi.org/10.5281/zenodo.13910718).

References

Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Article CAS PubMed PubMed Central Google Scholar
McCarthy, S. et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48, 1279–1283 (2016).
Article CAS PubMed PubMed Central Google Scholar
Menschaert, G. & Fenyö, D. Proteogenomics from a bioinformatics angle: a growing field. Mass Spectrom. Rev. 36, 584–599 (2017).
Article CAS PubMed Google Scholar
Nesvizhskii, A. I. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11, 1114–1125 (2014).
Article CAS PubMed PubMed Central Google Scholar
Wang, X. & Zhang, B. customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics 29, 3235–3237 (2013).
Article CAS PubMed PubMed Central Google Scholar
Umer, H. M. et al. Generation of ENSEMBL-based proteogenomics databases boosts the identification of non-canonical peptides. Bioinformatics 38, 1470–1472 (2022).
Article CAS PubMed Google Scholar
Spooner, W. et al. Haplosaurus computes protein haplotypes for use in precision drug design. Nat. Commun. 9, 4128 (2018).
Article PubMed PubMed Central Google Scholar
Cao, X. & Xing, J. PrecisionProDB: improving the proteomics performance for precision medicine. Bioinformatics 37, 3361–3363 (2021).
Article CAS PubMed Google Scholar
MIT License (2024) https://choosealicense.com/licenses/mit/ (accessed 24 January 2024).
Auton, A. et al. A global reference for human genetic variation. Nature 526, 68–74 (2015).
Article PubMed Google Scholar
Lowy-Gallego, E. et al. Variant calling on the GRCh38 assembly with the data from phase three of the 1000 Genomes Project. Wellcome Open Res 4, 50 (2019).
Article PubMed PubMed Central Google Scholar
Vašíček, J. Protein haplotype sequences obtained by ProHap from the 1000 Genomes Project data set. Zenodo https://zenodo.org/records/12671237 (2024).
Vašíček, J. Protein haplotype sequences obtained by ProHap from the Haplotype Reference Consortium Release 1.1 dataset. Zenodo https://doi.org/10.5281/zenodo.12671302 (2024).
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Article CAS PubMed PubMed Central Google Scholar
Vašíček, J. Protein haplotype sequences obtained by ProHap from the Human Pangenome Reference Consortium dataset. Zenodo https://doi.org/10.5281/zenodo.12686819 (2024).
Vašíček, J. et al. Finding haplotypic signatures in proteins. GigaScience 12, giad093 (2023).
Article PubMed Central Google Scholar
Geyer, P. E. et al. Proteomics reveals the effects of sustained weight loss on the human plasma proteome. Mol. Syst. Biol. 12, 901 (2016).
Article PubMed PubMed Central Google Scholar
Vašíček, J. & Skiadopoulou, D. Reprocessing of the dataset ‘Plasma Proteome Profiling Reveals the Effects of Weight Loss on the Apolipoprotein Family and Systemic Inflammation Status.’ Zenodo https://doi.org/10.5281/zenodo.12725746 (2024).
Bader, J. M., Albrecht, V. & Mann, M. MS-based proteomics of body fluids: the end of the beginning. Mol. Cell. Proteomics 22, 100577 (2023).
Article CAS PubMed PubMed Central Google Scholar
Skiadopoulou, D. et al. Retention time and fragmentation predictors increase confidence in identification of common variant peptides. J. Proteome Res. 22, 3190–3199 (2023).
Article CAS PubMed PubMed Central Google Scholar
Moreno-Estrada, A. et al. The genetics of Mexico recapitulates Native American substructure and affects biomedical traits. Science 344, 1280–1285 (2014).
Article CAS PubMed PubMed Central Google Scholar
Fox, K. The illusion of inclusion: the ‘All of Us’ research program and Indigenous peoples’ DNA. N. Engl. J. Med. 383, 411–413 (2020).
Article PubMed Google Scholar
Hudson, M. et al. Rights, interests and expectations: Indigenous perspectives on unrestricted access to genomic data. Nat. Rev. Genet. 21, 377–384 (2020).
Article CAS PubMed Google Scholar
Birney, E., Inouye, M., Raff, J., Rutherford, A. & Scally, A. The language of race, ethnicity, and ancestry in human genetic research. Preprint at https://doi.org/10.48550/arXiv.2106.10041 (2021).
Cunningham, F. et al. Ensembl 2022. Nucleic Acids Res. 50, D988–D995 (2022).
Article CAS PubMed Google Scholar
Kowalski, M. H. et al. Use of >100,000 NHLBI Trans-Omics for Precision Medicine (TOPMed) Consortium whole genome sequences improves imputation quality and detection of rare variant associations in admixed African and Hispanic/Latino populations. PLoS Genet. 15, e1008500 (2019).
Article PubMed PubMed Central Google Scholar
Morales, J. et al. A joint NCBI and EMBL-EBI transcript set for clinical genomics and research. Nature 604, 310–315 (2022).
Article CAS PubMed PubMed Central Google Scholar
Declercq, A. et al. MS²Rescore: data-driven rescoring dramatically boosts immunopeptide identification rates. Mol. Cell. Proteomics 21(8), 100266 (2022).
Article CAS PubMed PubMed Central Google Scholar
Hickey, G. et al. Pangenome graph construction from genome alignments with Minigraph-Cactus. Nat. Biotechnol. 42, 663–673 (2024).
Article CAS PubMed Google Scholar
Stawiński, P. & Płoski, R. Genebe.net: implementation and validation of an automatic ACMG variant pathogenicity criteria assignment. Clin. Genet. 106, 119–126 (2024).
Article PubMed Google Scholar
Vaudel, M., Barsnes, H., Berven, F. S., Sickmann, A. & Martens, L. SearchGUI: an open-source graphical user interface for simultaneous OMSSA and X!Tandem searches. Proteomics 11, 996–999 (2011).
Article CAS PubMed Google Scholar
Vaudel, M. et al. PeptideShaker enables reanalysis of MS-derived proteomics data sets. Nat. Biotechnol. 33, 22–24 (2015).
Article CAS PubMed Google Scholar
Fenyö, D. & Beavis, R. C. A method for assessing the statistical significance of mass spectrometry-based protein identifications using general scoring schemes. Anal. Chem. 75, 768–774 (2003).
Article PubMed Google Scholar
Park, C. Y., Klammer, A. A., Käll, L., MacCoss, M. J. & Noble, W. S. Rapid and accurate peptide identification from tandem mass spectra. J. Proteome Res. 7, 3022–3027 (2008).
Article CAS PubMed PubMed Central Google Scholar
Käll, L., Canterbury, J. D., Weston, J., Noble, W. S. & MacCoss, M. J. Semi-supervised learning for peptide identification from shotgun proteomics datasets. Nat. Methods 4, 923–925 (2007).
Article PubMed Google Scholar
Bouwmeester, R., Gabriels, R., Hulstaert, N., Martens, L. & Degroeve, S. DeepLC can predict retention times for peptides that carry as-yet unseen modifications. Nat. Methods 18, 1363–1369 (2021).
Article PubMed Google Scholar
Declercq, A. et al. Updated MS2PIP web server supports cutting-edge proteomics applications. Nucleic Acids Res. 51, W338–W342 (2023).
Article CAS PubMed PubMed Central Google Scholar
Unger, L., Mathisen, A. F., Chera, S., Legøy, T. A. & Ghila, L. The GLI code controls HNF1A levels during foregut differentiation. Int. J. Dev. Biol. https://doi.org/10.1387/ijdb.230220lg (2024).
Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).
Article CAS PubMed PubMed Central Google Scholar
Teo, G. C., Polasky, D. A., Yu, F. & Nesvizhskii, A. I. Fast deisotoping algorithm and its implementation in the MSFragger search engine. J. Proteome Res. 20, 498–505 (2021).
Article CAS PubMed Google Scholar
Yang, K. L. et al. MSBooster: improving peptide identification rates using deep learning-based features. Nat. Commun. 14, 4539 (2023).
Article CAS PubMed PubMed Central Google Scholar
Claeys, T. et al. lesSDRF is more: maximizing the value of proteomics data through streamlined metadata annotation. Nat. Commun. 14, 6743 (2023).
Article CAS PubMed PubMed Central Google Scholar
Käll, L., Storey, J. D. & Noble, W. S. Non-parametric estimation of posterior error probabilities associated with peptides identified by tandem mass spectrometry. Bioinformatics 24, i42–i48 (2008).
Article PubMed PubMed Central Google Scholar

Download references

Acknowledgements

This work was supported by the Research Council of Norway (project 301178 to M.V.), Stiftelsen Trond Mohn Foundation (Mohn Center of Diabetes Precision Medicine), and the University of Bergen. P.R.N. was supported by grants from the European Research Council (AdG 293574), Stiftelsen Trond Mohn Foundation (Mohn Center of Diabetes Precision Medicine), the University of Bergen, Haukeland University Hospital, the Research Council of Norway (FRIPRO grant 240413) and the Novo Nordisk Foundation (grant 54741). L.K. was supported by the Wallenberg AI, Autonomous Systems and Software Program (WASP) funded by the Knut and Alice Wallenberg Foundation. S.C. was supported by grants from Novo Nordisk Foundation (grant NNF21OC0067325) and Research Council of Norway (NFR 314397). N.B. was partially supported by the National Institutes of Health (award 5R24GM148372). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. The computations on publicly available data were performed on the Norwegian Research and Education Cloud (NREC), using resources provided by the University of Bergen and the University of Oslo (https://www.nrec.no). All analyses on HRC data were performed using digital laboratories in HUNT Cloud at the Norwegian University of Science and Technology, Trondheim, Norway. We are grateful for outstanding support from the HUNT Cloud community. The HRC data were used in a form agreed by the University of Bergen and the Wellcome Sanger Institute. This research was funded, in whole or in part, by the Research Council of Norway 301178. Mass spectrometry analyses were performed by the Proteomics Research Infrastructure (PRI) at the University of Copenhagen (UCPH), supported by the Novo Nordisk Foundation (NNF) (grant agreement number NNF19SA0059305).

Author information

These authors jointly supervised this work: S. Bruckner, L. Käll, M. Vaudel.

Authors and Affiliations

Mohn Center for Diabetes Precision Medicine, Department of Clinical Science, University of Bergen, Bergen, Norway
Jakub Vašíček, Ksenia G. Kuznetsova, Dafni Skiadopoulou, Lucas Unger, Simona Chera, Luiza M. Ghila, Pål R. Njølstad, Stefan Johansson & Marc Vaudel
Computational Biology Unit, Department of Informatics, University of Bergen, Bergen, Norway
Jakub Vašíček, Ksenia G. Kuznetsova, Dafni Skiadopoulou & Marc Vaudel
University of California, San Diego, La Jolla, CA, USA
Nuno Bandeira
Children and Youth Clinic, Haukeland University Hospital, Bergen, Norway
Pål R. Njølstad
Department of Medical Genetics, Haukeland University Hospital, Bergen, Norway
Stefan Johansson
Institute for Visual and Analytic Computing, University of Rostock, Rostock, Germany
Stefan Bruckner
Science for Life Laboratory, School of Engineering Sciences in Chemistry, Biotechnology and Health, KTH - Royal Institute of Technology, Stockholm, Sweden
Lukas Käll
Department of Genetics and Bioinformatics, Health Data and Digitalization, Norwegian Institute of Public Health, Oslo, Norway
Marc Vaudel

Authors

Jakub Vašíček
View author publications
Search author on:PubMed Google Scholar
Ksenia G. Kuznetsova
View author publications
Search author on:PubMed Google Scholar
Dafni Skiadopoulou
View author publications
Search author on:PubMed Google Scholar
Lucas Unger
View author publications
Search author on:PubMed Google Scholar
Simona Chera
View author publications
Search author on:PubMed Google Scholar
Luiza M. Ghila
View author publications
Search author on:PubMed Google Scholar
Nuno Bandeira
View author publications
Search author on:PubMed Google Scholar
Pål R. Njølstad
View author publications
Search author on:PubMed Google Scholar
Stefan Johansson
View author publications
Search author on:PubMed Google Scholar
Stefan Bruckner
View author publications
Search author on:PubMed Google Scholar
Lukas Käll
View author publications
Search author on:PubMed Google Scholar
Marc Vaudel
View author publications
Search author on:PubMed Google Scholar

Contributions

J.V. and M.V. designed the study, J.V. developed the software, D.S. and K.G.K. contributed to the software development and testing. L.U., S.C. and L.M.G. provided material for the stem cell experiment, K.G.K. designed the mass spectrometry experiment on stem cells, J.V., D.S., K.G.K. and M.V. analyzed the data. S.C., N.B., P.R.N., S.J., S.B., L.K. and M.V. supervised the work and contributed to the manuscript. J.V. and M.V. wrote the manuscript with input from all authors. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Marc Vaudel.

Ethics declarations

Competing interests

The authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Juan Antonio Vizcaíno and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Arunima Singh, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Examples of variant peptide–spectrum matches from the reanalysis of the public dataset of blood plasma samples.

Amino acid substitutions in sequences are marked in bold, posterior error probabilities are estimated by Percolator⁴³. The top part of each plot shows the spectrum measured from the sample, with the blue peaks highlighting ions that match the theoretical spectrum. The bottom part of each plot shows the fragmentation and intensities predicted by MS2PIP³⁷, with the green peak highlighting matched predictions, and the red peaks showing predictions that do not have a corresponding peak in the observed spectrum. Please follow these links to view the annotated spectra using the Universal Spectrum Identifier in MassIVE: Spectrum 1 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3AMSV000080596%3A20150907_QEp1_LC7_PhGe_SA_Plate2D_38_7%3Ascan%3A16943%3AWLNVQLSPR%2F2%22%7D), Spectrum 2 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3AMSV000080596%3A20150827_QEp1_LC7_PhGe_SA_Plate2C_06_4%3Ascan%3A22960%3AHPTIQGFASVAQEETEILTADMDAER%2F3%22%7D), Spectrum 3 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3AMSV000080596%3A20150618_QEp1_LC7_PhGe_SA_Plate1B_08_3%3Ascan%3A21099%3ASSMSPTTNVLLSPLSVATALSALSLGAEQR%2F3%22%7D), Spectrum 4 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3AMSV000080596%3A20150624_QEp1_LC7_PhGe_SA_Plate2A_06_2%3Ascan%3A18682%3AMSLSSFSVNRPFLFFIFEDTTGLPLFVGSVK%2F3%22%7D).

Extended Data Fig. 2 Examples of variant peptide–spectrum matches from the analysis of healthy donor stem cells.

Amino acid substitutions in sequences are marked in bold, posterior error probabilities are estimated by Percolator⁴³. The top part of each plot shows the spectrum measured from the sample, with the blue peaks highlighting ions that match the theoretical spectrum. The bottom part of each plot shows the fragmentation and intensities predicted by MS2PIP³⁷, with the green peak highlighting matched predictions, and the red peaks showing predictions that do not have a corresponding peak in the observed spectrum. A: Two peptide–spectrum matches covering the reference and alternative allele for a variant (rs1132979). Please follow these links to view the annotated spectra using the Universal Spectrum Identifier: Spectrum 1 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3APXD053601%3A20230601_ASC2_NEO3_X866_P244_R771_180min_uPAC110cm_27ms_DDA_S3.mzML%3Ascan%3A96472%3ANTVLATWQPYTTSK%2F2%22%7D), Spectrum 2 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3APXD053601%3A20230601_ASC2_NEO3_X866_P244_R771_180min_uPAC110cm_27ms_DDA_S3.mzML%3Ascan%3A92494%3ANTVLATWQPYSTSK%2F2%22%7D) B: Two multi-variant peptide–spectrum matches, covering the alternative allele for multiple variants co-occurring in the same haplotype of the donor. Please follow these links to view the annotated spectra using the Universal Spectrum Identifier: Spectrum 3 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3APXD053601%3A20230601_ASC2_NEO3_X866_P244_R771_180min_uPAC110cm_27ms_DDA_S2.mzML%3Ascan%3A107817%3AELSGLPSGPSAGSGPPPPPPGPPPPPVSTSSGSDESASR%2F3%22%7D), Spectrum 4 (https://massive.ucsd.edu/ProteoSAFe/usi.jsp#%7B%22usi%22%3A%22mzspec%3APXD053601%3A20230601_ASC2_NEO3_X866_P244_R771_180min_uPAC110cm_27ms_DDA_S2.mzML%3Ascan%3A147106%3ALLGLELSEAEALGADSAR%2F2%22%7D).

Extended Data Fig. 3 Overlap in peptides contained in ProHap databases and UniProt.

Bar chart showing overlap between databases generated using ProHap on diverse population panels and the canonical sequences available through UniProt. The percentages denote the proportion to the total number of tryptic peptides contained in each database generated by ProHap.

Extended Data Fig. 4 Example of a protein haplotype including four peptide categories that are expected.

Peptides are marked by the color representing the category, amino acid substitutions resulting from the alternative allele of three co-occurring genetic variants are marked by red squares. The asterisk sign (*) represents a stop codon.

Extended Data Fig. 5 Types of variants included in the ProHap database of the 1000 Genomes.

A: Histogram showing the number of included variants per type. B: Two examples of a context-dependent variant consequence. The variant synonymous in the translation of transcript 1 results in the amino acid substitution in the translation of transcript 2, or when a frameshift is introduced upstream.

Extended Data Table 1 Four examples of peptides carrying commonly detected variants

Full size table

Extended Data Table 2 Number of samples per superpopulation in the 1000 Genomes Project dataset

Full size table

Extended Data Table 3 Comparison between the features supported by selected proteogenomic database-generation tools

Full size table

Supplementary information

Reporting Summary (download PDF )

Peer Review File (download PDF )

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Vašíček, J., Kuznetsova, K.G., Skiadopoulou, D. et al. ProHap enables human proteomic database generation accounting for population diversity. Nat Methods 22, 273–277 (2025). https://doi.org/10.1038/s41592-024-02506-0

Download citation

Received: 25 January 2024
Accepted: 10 October 2024
Published: 09 December 2024
Version of record: 09 December 2024
Issue date: February 2025
DOI: https://doi.org/10.1038/s41592-024-02506-0

ProHap enables human proteomic database generation accounting for population diversity

Subjects

Abstract

Access options

Similar content being viewed by others

An chromosome-level haplotype-resolved genome assembly and annotation of pitaya (Selenicereus polyrhizus)

Semi-automated assembly of high-quality diploid human reference genomes

Large-scale plasma proteomics comparisons through genetics and disease associations

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Extended Data Fig. 1 Examples of variant peptide–spectrum matches from the reanalysis of the public dataset of blood plasma samples.

Extended Data Fig. 2 Examples of variant peptide–spectrum matches from the analysis of healthy donor stem cells.

Extended Data Fig. 3 Overlap in peptides contained in ProHap databases and UniProt.

Extended Data Fig. 4 Example of a protein haplotype including four peptide categories that are expected.

Extended Data Fig. 5 Types of variants included in the ProHap database of the 1000 Genomes.

Supplementary information

Reporting Summary (download PDF )

Peer Review File (download PDF )

Rights and permissions

About this article

Cite this article

Search

Quick links

Subjects

Abstract

Access options

Similar content being viewed by others

Data availability

Code availability

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Peer review

Peer review information

Additional information

Extended data

Supplementary information

Rights and permissions

About this article

Cite this article

Share this article

Search

Quick links