Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Review Article
  • Published:

Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation

Abstract

Mass-spectrometry-based proteomic analysis is a powerful approach for discovering new disease biomarkers. However, certain critical steps of study design such as cohort selection, evaluation of statistical power, sample blinding and randomization, and sample/data quality control are often neglected or underappreciated during experimental design and execution. This tutorial discusses important steps for designing and implementing a liquid-chromatography–mass-spectrometry-based biomarker discovery study. We describe the rationale, considerations and possible failures in each step of such studies, including experimental design, sample collection and processing, and data collection. We also provide guidance for major steps of data processing and final statistical analysis for meaningful biological interpretations along with highlights of several successful biomarker studies. The provided guidelines from study design to implementation to data interpretation serve as a reference for improving rigor and reproducibility of biomarker development studies.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Phases of biomarker development studies.
Fig. 2: Considerations for each step of the discovery-phase workflow.
Fig. 3: Monitoring instrument performance with standard samples.
Fig. 4: Identification of unexpected peptide modifications with data QC analysis.
Fig. 5: Considerations for each step of the validation-phase workflow.

Similar content being viewed by others

Data availability

All the data discussed in this review are associated with the supporting primary research papers.

References

  1. Rappaport, N. et al. MalaCards: an amalgamated human disease compendium with diverse clinical and genetic annotation and structured search. Nucleic Acids Res. 45, D877–D887 (2017).

    Article  CAS  PubMed  Google Scholar 

  2. Yi, L., Swensen, A. C. & Qian, W. J. Serum biomarkers for diagnosis and prediction of type 1 diabetes. Transl. Res. 201, 13–25 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Sims, E. K. et al. Teplizumab improves and stabilizes beta cell function in antibody-positive high-risk individuals. Sci. Transl. Med. https://doi.org/10.1126/scitranslmed.abc8980 (2021).

  4. Sands, B. E. Biomarkers of inflammation in inflammatory bowel disease. Gastroenterology 149, 1275–1285 e1272 (2015).

    Article  CAS  PubMed  Google Scholar 

  5. Lindhardt, M. et al. Proteomic prediction and Renin angiotensin aldosterone system Inhibition prevention Of early diabetic nephRopathy in TYpe 2 diabetic patients with normoalbuminuria (PRIORITY): essential study design and rationale of a randomised clinical multicentre trial. BMJ Open 6, e010310 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  6. McShane, L. M. In pursuit of greater reproducibility and credibility of early clinical biomarker research. Clin. Transl. Sci. 10, 58–60 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  7. Scherer, A. Reproducibility in biomarker research and clinical development: a global challenge. Biomark. Med. 11, 309–312 (2017).

    Article  CAS  PubMed  Google Scholar 

  8. Maes, E., Cho, W. C. & Baggerman, G. Translating clinical proteomics: the importance of study design. Expert Rev. Proteom. 12, 217–219 (2015).

    Article  CAS  Google Scholar 

  9. Mischak, H. et al. Implementation of proteomic biomarkers: making it work. Eur. J. Clin. Invest. 42, 1027–1036 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  10. Frantzi, M., Bhat, A. & Latosinska, A. Clinical proteomic biomarkers: relevant issues on study design & technical considerations in biomarker development. Clin. Transl. Med. 3, 7 (2014).

    Article  PubMed  PubMed Central  Google Scholar 

  11. He, T. Implementation of proteomics in clinical trials. Proteom. Clin. Appl. 13, e1800198 (2019).

    Article  CAS  Google Scholar 

  12. Mischak, H. et al. Recommendations for biomarker identification and qualification in clinical proteomics. Sci. Transl. Med. 2, 46ps42 (2010).

    Article  PubMed  Google Scholar 

  13. Li, D. & Chan, D. W. Proteomic cancer biomarkers from discovery to approval: it’s worth the effort. Expert Rev. Proteom. 11, 135–136 (2014).

    Article  CAS  Google Scholar 

  14. Wang, L., McShane, A. J., Castillo, M. J. & Yao, X. in Proteomic and Metabolomic Approaches to Biomarker Discovery 2nd edn (eds Issaq, H. J. & Veenstra, T. D.) 261–288 (Academic Press, 2020).

  15. McNutt, M. Journals unite for reproducibility. Science 346, 679 (2014).

    Article  CAS  PubMed  Google Scholar 

  16. Checklists work to improve science. Nature 556, 273–274 (2018).

  17. Baker, M. 1,500 scientists lift the lid on reproducibility. Nature 533, 452–454 (2016).

    Article  CAS  PubMed  Google Scholar 

  18. European Medicines Agency. Overview of comments received on draft guidance document on qualification of biomarkers. https://www.ema.europa.eu/en/documents/regulatory-procedural-guideline/overview-comments-received-draft-guidance-document-qualification-biomarkers_en.pdf (2009).

  19. US Food and Drug Administration. Biomarker qualification: evidentiary framework guidance for industry and FDA staff. https://www.fda.gov/media/119271/download (2018).

  20. MacLean, E. et al. A systematic review of biomarkers to detect active tuberculosis. Nat. Microbiol. 4, 748–758 (2019).

    Article  CAS  PubMed  Google Scholar 

  21. Parker, C. E. & Borchers, C. H. Mass spectrometry based biomarker discovery, verification, and validation-quality assurance and control of protein biomarker assays. Mol. Oncol. 8, 840–858 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Pavlou, M. P. & Diamandis, E. P. in Genomic and Personalized Medicine 2nd edn (eds Ginsburg, G. S. & Huntington, F. W.) 263–271 (Academic Press, 2013).

  23. Kraus, V. B. Biomarkers as drug development tools: discovery, validation, qualification and use. Nat. Rev. Rheumatol. 14, 354–362 (2018).

    Article  CAS  PubMed  Google Scholar 

  24. Masucci, G. V. et al. Validation of biomarkers to predict response to immunotherapy in cancer: volume I—pre-analytical and analytical validation. J. Immunother. Cancer 4, 76 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  25. Keshishian, H. et al. Quantitative, multiplexed workflow for deep analysis of human blood plasma and biomarker discovery by mass spectrometry. Nat. Protoc. 12, 1683–1701 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  26. Rifai, N., Gillette, M. A. & Carr, S. A. Protein biomarker discovery and validation: the long and uncertain path to clinical utility. Nat. Biotechnol. 24, 971–983 (2006).

    Article  CAS  PubMed  Google Scholar 

  27. Shi, T. et al. Antibody-free, targeted mass-spectrometric approach for quantification of proteins at low picogram per milliliter levels in human plasma/serum. Proc. Natl Acad. Sci. USA 109, 15395–15400 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  28. Ma, M. H. Y. et al. A multi-biomarker disease activity score can predict sustained remission in rheumatoid arthritis. Arthritis Res. Ther. 22, 158 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Good, D. M. et al. Naturally occurring human urinary peptides for use in diagnosis of chronic kidney disease. Mol. Cell Proteom. 9, 2424–2437 (2010).

    Article  CAS  Google Scholar 

  30. Banerjee, A. & Chaudhury, S. Statistics without tears: populations and samples. Ind. Psychiatry J. 19, 60–65 (2010).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Selvin, S. in Statistical Analysis of Epidemiologic Data. (ed. Selvin, S.) Ch. 4 (Oxford University Press., 2004).

  32. Pearce, N. Analysis of matched case-control studies. BMJ 352, i969 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  33. Rubin, D. B. Matching to remove bias in observational studies. Biometrics 29, 159–183 (1973).

    Article  Google Scholar 

  34. Mahajan, A. Selection bias: selection of controls as a critical issue in the interpretation of results in a case control study. Indian J. Med. Res. 142, 768 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  35. Morabia, A. Case-control studies in clinical research: mechanism and prevention of selection bias. Prev. Med. 26, 674–677 (1997).

    Article  CAS  PubMed  Google Scholar 

  36. Sutton-Tyrrell, K. Assessing bias in case-control studies. Proper selection of cases and controls. Stroke 22, 938–942 (1991).

    Article  CAS  PubMed  Google Scholar 

  37. Sheikh, K. Investigation of selection bias using inverse probability weighting. Eur. J. Epidemiol. 22, 349–350 (2007).

    Article  PubMed  Google Scholar 

  38. Alonso, A. et al. Predictors of follow-up and assessment of selection bias from dropouts using inverse probability weighting in a cohort of university graduates. Eur. J. Epidemiol. 21, 351–358 (2006).

    Article  PubMed  Google Scholar 

  39. Geneletti, S., Best, N., Toledano, M. B., Elliott, P. & Richardson, S. Uncovering selection bias in case-control studies using Bayesian post-stratification. Stat. Med. 32, 2555–2570 (2013).

    Article  CAS  PubMed  Google Scholar 

  40. VanderWeele, T. J. & Shpitser, I. On the definition of a confounder. Ann. Stat. 41, 196–220 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  41. Fewell, Z., Davey Smith, G. & Sterne, J. A. The impact of residual and unmeasured confounding in epidemiologic studies: a simulation study. Am. J. Epidemiol. 166, 646–655 (2007).

    Article  PubMed  Google Scholar 

  42. Polley, M. C. Power estimation in biomarker studies where events are already observed. Clin. Trials 14, 621–628 (2017).

    Article  PubMed  Google Scholar 

  43. Lalouel, J. M. & Rohrwasser, A. Power and replication in case-control studies. Am. J. Hypertens. 15, 201–205 (2002).

    Article  PubMed  Google Scholar 

  44. Cai, J. & Zeng, D. Sample size/power calculation for case-cohort studies. Biometrics 60, 1015–1024 (2004).

    Article  PubMed  Google Scholar 

  45. Jones, S. R., Carley, S. & Harrison, M. An introduction to power and sample size estimation. Emerg. Med. J. 20, 453–458 (2003).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  46. Furberg, C. D. & Friedman, L. M. Approaches to data analyses of clinical trials. Prog. Cardiovasc. Dis. 54, 330–334 (2012).

    Article  PubMed  Google Scholar 

  47. Levin, Y. The role of statistical power analysis in quantitative proteomics. Proteomics 11, 2565–2567 (2011).

    Article  CAS  PubMed  Google Scholar 

  48. Dicker, L., Lin, X. & Ivanov, A. R. Increased power for the analysis of label-free LC-MS/MS proteomics data by combining spectral counts and peptide peak attributes. Mol. Cell Proteom. 9, 2704–2718 (2010).

    Article  CAS  Google Scholar 

  49. Skates, S. J. et al. Statistical design for biospecimen cohort size in proteomics-based biomarker discovery and verification studies. J. Proteome Res. 12, 5383–5394 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  50. Webb-Robertson, B. M. et al. Statistically driven metabolite and lipid profiling of patients from the undiagnosed diseases network. Anal. Chem. 92, 1796–1803 (2020).

    Article  CAS  PubMed  Google Scholar 

  51. Nakayasu, E. S. et al. Comprehensive proteomics analysis of stressed human islets identifies GDF15 as a target for type 1 diabetes intervention. Cell Metab. 31, 363–374 e366 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Ocaña, G. J. et al. Analysis of serum Hsp90 as a potential biomarker of β cell autoimmunity in type 1 diabetes. PLoS ONE 14, e0208456 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  53. Sims, E. K. et al. Elevations in the fasting serum proinsulin-to-C-peptide ratio precede the onset of type 1 diabetes. Diabetes Care 39, 1519–1526 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Townsend, M. K. et al. Impact of pre-analytic blood sample collection factors on metabolomics. Cancer Epidemiol. Biomark. Prev. 25, 823–829 (2016).

    Article  CAS  Google Scholar 

  55. Cemin, R. & Daves, M. Pre-analytic variability in cardiovascular biomarker testing. J. Thorac. Dis. 7, E395–E401 (2015).

    PubMed  PubMed Central  Google Scholar 

  56. Pasic, M. D. et al. Influence of fasting and sample collection time on 38 biochemical markers in healthy children: a CALIPER substudy. Clin. Biochem. 45, 1125–1130 (2012).

    Article  CAS  PubMed  Google Scholar 

  57. Narayanan, S. The preanalytic phase. An important component of laboratory medicine. Am. J. Clin. Pathol. 113, 429–452 (2000).

    Article  CAS  PubMed  Google Scholar 

  58. Stewart, T. et al. Impact of pre-analytical differences on biomarkers in the ADNI and PPMI studies: implications in the era of classifying disease based on biomarkers. J. Alzheimers Dis. 69, 263–276 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  59. Speake, C. et al. Circulating unmethylated insulin DNA as a biomarker of human beta cell death: a multi-laboratory assay comparison. J. Clin. Endocrinol. Metab. https://doi.org/10.1210/clinem/dgaa008 (2020).

  60. Holst, J. J. & Wewer Albrechtsen, N. J. Methods and guidelines for measurement of glucagon in plasma. Int. J. Mol. Sci. https://doi.org/10.3390/ijms20215416 (2019).

  61. Steiner, C. et al. Applications of mass spectrometry for quantitative protein analysis in formalin-fixed paraffin-embedded tissues. Proteomics 14, 441–451 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  62. Giusti, L., Angeloni, C. & Lucacchini, A. Update on proteomic studies of formalin-fixed paraffin-embedded tissues. Expert Rev. Proteom. 16, 513–520 (2019).

    Article  CAS  Google Scholar 

  63. Piehowski, P. D. et al. Residual tissue repositories as a resource for population-based cancer proteomic studies. Clin. Proteom. 15, 26 (2018).

    Article  CAS  Google Scholar 

  64. Thompson, S. M. et al. Impact of pre-analytical factors on the proteomic analysis of formalin-fixed paraffin-embedded tissue. Proteom. Clin. Appl. 7, 241–251 (2013).

    Article  CAS  Google Scholar 

  65. Pellis, L. et al. Plasma metabolomics and proteomics profiling after a postprandial challenge reveal subtle diet effects on human metabolic status. Metabolomics 8, 347–359 (2012).

    Article  CAS  PubMed  Google Scholar 

  66. Johansen, P., Andersen, J. D., Børsting, C. & Morling, N. Evaluation of the iPLEX® Sample ID Plus Panel designed for the Sequenom MassARRAY® system. A SNP typing assay developed for human identification and sample tracking based on the SNPforID panel. Forensic Sci. Int. Genet. 7, 482–487 (2013).

    Article  CAS  PubMed  Google Scholar 

  67. Hoofnagle, A. N. et al. Recommendations for the generation, quantification, storage, and handling of peptides used for mass spectrometry-based assays. Clin. Chem. 62, 48–69 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Sims, E. K. et al. Proinsulin secretion is a persistent feature of type 1 diabetes. Diabetes Care 42, 258–264 (2019).

    Article  CAS  PubMed  Google Scholar 

  69. Schulz, K. F. & Grimes, D. A. Blinding in randomised trials: hiding who got what. Lancet 359, 696–700 (2002).

    Article  PubMed  Google Scholar 

  70. Karanicolas, P. J., Farrokhyar, F. & Bhandari, M. Practical tips for surgical research: blinding: who, what, when, why, how? Can. J. Surg. 53, 345–348 (2010).

    PubMed  PubMed Central  Google Scholar 

  71. Zhang, Z. et al. Three biomarkers identified from serum proteomic analysis for the detection of early stage ovarian cancer. Cancer Res. 64, 5882–5890 (2004).

    Article  CAS  PubMed  Google Scholar 

  72. Zhang, Z. & Chan, D. W. The road from discovery to clinical diagnostics: lessons learned from the first FDA-cleared in vitro diagnostic multivariate index assay of proteomic biomarkers. Cancer Epidemiol. Biomark. Prev. 19, 2995–2999 (2010).

    Article  CAS  Google Scholar 

  73. Anderson, N. L. & Anderson, N. G. The human plasma proteome: history, character, and diagnostic prospects. Mol. Cell Proteom. 1, 845–867 (2002).

    Article  CAS  Google Scholar 

  74. Liu, H., Sadygov, R. G. & Yates, J. R. 3rd A model for random sampling and estimation of relative protein abundance in shotgun proteomics. Anal. Chem. 76, 4193–4201 (2004).

    Article  CAS  PubMed  Google Scholar 

  75. Qian, W. J. et al. Enhanced detection of low abundance human plasma proteins using a tandem IgY12-SuperMix immunoaffinity separation strategy. Mol. Cell Proteom. 7, 1963–1973 (2008).

    Article  CAS  Google Scholar 

  76. Liu, T. et al. Evaluation of multiprotein immunoaffinity subtraction for plasma proteomics and candidate biomarker discovery using mass spectrometry. Mol. Cell Proteom. 5, 2167–2174 (2006).

    Article  CAS  Google Scholar 

  77. Yadav, A. K. et al. A systematic analysis of eluted fraction of plasma post immunoaffinity depletion: implications in biomarker discovery. PLoS ONE 6, e24442 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  78. Garay-Baquero, D. J. et al. Comprehensive plasma proteomic profiling reveals biomarkers for active tuberculosis. JCI Insight https://doi.org/10.1172/jci.insight.137427 (2020).

  79. Piehowski, P. D. et al. Sources of technical variability in quantitative LC-MS proteomics: human brain tissue sample analysis. J. Proteome Res. 12, 2128–2137 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Wisniewski, J. R., Ostasiewicz, P. & Mann, M. High recovery FASP applied to the proteomic analysis of microdissected formalin fixed paraffin embedded cancer tissues retrieves known colon cancer markers. J. Proteome Res. 10, 3040–3049 (2011).

    Article  CAS  PubMed  Google Scholar 

  81. Quesada-Calvo, F. et al. Comparison of two FFPE preparation methods using label-free shotgun proteomics: application to tissues of diverticulitis patients. J. Proteom. 112, 250–261 (2015).

    Article  CAS  Google Scholar 

  82. Kawashima, Y., Kodera, Y., Singh, A., Matsumoto, M. & Matsumoto, H. Efficient extraction of proteins from formalin-fixed paraffin-embedded tissues requires higher concentration of tris(hydroxymethyl)aminomethane. Clin. Proteom. 11, 4 (2014).

    Article  CAS  Google Scholar 

  83. Kulevich, S. E., Frey, B. L., Kreitinger, G. & Smith, L. M. Alkylating tryptic peptides to enhance electrospray ionization mass spectrometry analysis. Anal. Chem. 82, 10135–10142 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  84. Walmsley, S. J. et al. Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J. Proteome Res. 12, 5666–5680 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  85. Herraiz, T. & Casal, V. Evaluation of solid-phase extraction procedures in peptide analysis. J. Chromatogr. A 708, 209–221 (1995).

    Article  CAS  PubMed  Google Scholar 

  86. Muntel, J. et al. Comparison of protein quantification in a complex background by DIA and TMT workflows with fixed instrument time. J. Proteome Res. 18, 1340–1351 (2019).

    Article  CAS  PubMed  Google Scholar 

  87. Ow, S. Y. et al. iTRAQ underestimation in simple and complex mixtures: “the good, the bad and the ugly”. J. Proteome Res. 8, 5347–5355 (2009).

    Article  CAS  PubMed  Google Scholar 

  88. Liu, Y. et al. Quantitative variability of 342 plasma proteins in a human twin population. Mol. Syst. Biol. 11, 786 (2015).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  89. Geyer, P. E. et al. Proteomics reveals the effects of sustained weight loss on the human plasma proteome. Mol. Syst. Biol. 12, 901 (2016).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  90. Bekker-Jensen, D. B. et al. A compact quadrupole-orbitrap mass spectrometer with FAIMS interface improves proteome coverage in short LC gradients. Mol. Cell Proteom. 19, 716–729 (2020).

    Article  CAS  Google Scholar 

  91. Xuan, Y. et al. Standardization and harmonization of distributed multi-center proteotype analysis supporting precision medicine studies. Nat. Commun. 11, 5248 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  92. Shen, Y. et al. Discovery of potential plasma biomarkers for tuberculosis in HIV-infected patients by data-independent acquisition-based quantitative proteomics. Infect. Drug Resist. 13, 1185–1196 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  93. Fang, X. et al. Urinary proteomics of Henoch-Schonlein purpura nephritis in children using liquid chromatography-tandem mass spectrometry. Clin. Proteom. 17, 10 (2020).

    Article  CAS  Google Scholar 

  94. Carnielli, C. M. et al. Combining discovery and targeted proteomics reveals a prognostic signature in oral cancer. Nat. Commun. 9, 3598 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  95. Bekker-Jensen, D. B. et al. An optimized shotgun strategy for the rapid generation of comprehensive human proteomes. Cell Syst. 4, 587–599 e584 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  96. Ow, S. Y., Salim, M., Noirel, J., Evans, C. & Wright, P. C. Minimising iTRAQ ratio compression through understanding LC-MS elution dependence and high-resolution HILIC fractionation. Proteomics 11, 2341–2346 (2011).

    Article  CAS  PubMed  Google Scholar 

  97. Manadas, B., Mendes, V. M., English, J. & Dunn, M. J. Peptide fractionation in proteomics approaches. Expert Rev. Proteom. 7, 655–663 (2010).

    Article  CAS  Google Scholar 

  98. Schoenmakers, P. J., van Molle, S., Hayes, C. M. G. & Uunk, L. G. M. Effects of pH in reversed-phase liquid chromatography. Anal. Chim. Acta 250, 1–19 (1991).

    Article  CAS  Google Scholar 

  99. Amidan, B. G. et al. Signatures for mass spectrometry data quality. J. Proteome Res. 13, 2215–2222 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  100. Zhang, T. et al. Block design with common reference samples enables robust large-scale label-free quantitative proteome profiling. J. Proteome Res. https://doi.org/10.1021/acs.jproteome.0c00310 (2020).

  101. Burger, B., Vaudel, M. & Barsnes, H. Importance of block randomization when designing proteomics experiments. J. Proteome Res. https://doi.org/10.1021/acs.jproteome.0c00536 (2020).

  102. Stanfill, B. A. et al. Quality control analysis in real-time (QC-ART): a tool for real-time quality control assessment of mass spectrometry-based proteomics data. Mol. Cell Proteom. 17, 1824–1836 (2018).

    Article  CAS  Google Scholar 

  103. Matzke, M. M. et al. Improved quality control processing of peptide-centric LC-MS proteomics data. Bioinformatics 27, 2866–2872 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  104. Bittremieux, W., Valkenborg, D., Martens, L. & Laukens, K. Computational quality control tools for mass spectrometry proteomics. Proteomics https://doi.org/10.1002/pmic.201600159 (2017).

  105. Devabhaktuni, A. et al. TagGraph reveals vast protein modification landscapes from large tandem mass spectrometry datasets. Nat. Biotechnol. 37, 469–479 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Cox, J. et al. Andromeda: a peptide search engine integrated into the MaxQuant environment. J. Proteome Res. 10, 1794–1805 (2011).

    Article  CAS  PubMed  Google Scholar 

  107. Kim, S. & Pevzner, P. A. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat. Commun. 5, 5277 (2014).

    Article  CAS  PubMed  Google Scholar 

  108. Kong, A. T., Leprevost, F. V., Avtonomov, D. M., Mellacheruvu, D. & Nesvizhskii, A. I. MSFragger: ultrafast and comprehensive peptide identification in mass spectrometry-based proteomics. Nat. Methods 14, 513–520 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  109. Elias, J. E. & Gygi, S. P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

    Article  CAS  PubMed  Google Scholar 

  110. Gan, N. et al. Regulation of phosphoribosyl ubiquitination by a calmodulin-dependent glutamylase. Nature 572, 387–391 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  111. Callister, S. J. et al. Normalization approaches for removing systematic biases associated with mass spectrometry and label-free proteomics. J. Proteome Res. 5, 277–286 (2006).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  112. Kultima, K. et al. Development and evaluation of normalization methods for label-free relative quantification of endogenous peptides. Mol. Cell Proteom. 8, 2285–2295 (2009).

    Article  CAS  Google Scholar 

  113. Webb-Robertson, B. J., Matzke, M. M., Jacobs, J. M., Pounds, J. G. & Waters, K. M. A statistical selection strategy for normalization procedures in LC-MS proteomics experiments through dataset-dependent ranking of normalization scaling factors. Proteomics 11, 4736–4741 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  114. Valikangas, T., Suomi, T. & Elo, L. L. A systematic evaluation of normalization methods in quantitative label-free proteomics. Brief. Bioinform. 19, 1–11 (2018).

    CAS  PubMed  Google Scholar 

  115. Karpievitch, Y. V., Dabney, A. R. & Smith, R. D. Normalization and missing value imputation for label-free LC-MS analysis. BMC Bioinformatics 13, S5 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  116. Liebal, U. W., Phan, A. N. T., Sudhakar, M., Raman, K. & Blank, L. M. Machine learning applications for mass spectrometry-based metabolomics. Metabolites https://doi.org/10.3390/metabo10060243 (2020).

  117. Kim, M., Rai, N., Zorraquino, V. & Tagkopoulos, I. Multi-omics integration accurately predicts cellular state in unexplored conditions for Escherichia coli. Nat. Commun. 7, 13090 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  118. Sedgwick, P. Multiple hypothesis testing and Bonferroni’s correction. BMJ 349, g6284 (2014).

    Article  PubMed  Google Scholar 

  119. Artigaud, S., Gauthier, O. & Pichereau, V. Identifying differentially expressed proteins in two-dimensional electrophoresis experiments: inputs from transcriptomics statistical tools. Bioinformatics 29, 2729–2734 (2013).

    Article  CAS  PubMed  Google Scholar 

  120. Strimmer, K. A unified approach to false discovery rate estimation. BMC Bioinformatics 9, 303 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  121. Frohnert, B. I. et al. Predictive modeling of type 1 diabetes stages using disparate data sources. Diabetes 69, 238–248 (2020).

    Article  CAS  PubMed  Google Scholar 

  122. Sonsare, P. M. & Gunavathi, C. Investigation of machine learning techniques on proteomics: a comprehensive survey. Prog. Biophys. Mol. Biol. 149, 54–69 (2019).

    Article  CAS  PubMed  Google Scholar 

  123. Palivec, V. [Minutiae, the first Czech medical prints]. Cas. Lek. Cesk 128, 1530 (1989).

    CAS  PubMed  Google Scholar 

  124. Colby, S. M., McClure, R. S., Overall, C. C., Renslow, R. S. & McDermott, J. E. Improving network inference algorithms using resampling methods. BMC Bioinformatics 19, 376 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  125. Schiess, R., Wollscheid, B. & Aebersold, R. Targeted proteomic strategy for clinical biomarker discovery. Mol. Oncol. 3, 33–44 (2009).

    Article  CAS  PubMed  Google Scholar 

  126. Surinova, S. et al. On the development of plasma protein biomarkers. J. Proteome Res. 10, 5–16 (2011).

    Article  CAS  PubMed  Google Scholar 

  127. Burgess, M. W., Keshishian, H., Mani, D. R., Gillette, M. A. & Carr, S. A. Simplified and efficient quantification of low-abundance proteins at very high multiplex via targeted mass spectrometry. Mol. Cell Proteom. 13, 1137–1149 (2014).

    Article  CAS  Google Scholar 

  128. Kennedy, J. J. et al. Demonstrating the feasibility of large-scale development of standardized assays to quantify human proteins. Nat. Methods 11, 149–155 (2014).

    Article  CAS  PubMed  Google Scholar 

  129. Kim, Y. et al. Targeted proteomics identifies liquid-biopsy signatures for extracapsular prostate cancer. Nat. Commun. 7, 11906 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  130. Paulovich, A. G., Whiteaker, J. R., Hoofnagle, A. N. & Wang, P. The interface between biomarker discovery and clinical validation: the tar pit of the protein biomarker pipeline. Proteom. Clin. Appl. 2, 1386–1402 (2008).

    Article  CAS  Google Scholar 

  131. Kawahara, R. et al. Integrative analysis to select cancer candidate biomarkers to targeted validation. Oncotarget 6, 43635–43652 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  132. Toth, R. et al. Random forest-based modelling to detect biomarkers for prostate cancer progression. Clin. Epigenetics 11, 148 (2019).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  133. Olivier, M., Asmis, R., Hawkins, G. A., Howard, T. D. & Cox, L. A. the need for multi-omics biomarker signatures in precision medicine. Int. J. Mol. Sci. https://doi.org/10.3390/ijms20194781 (2019).

  134. Lange, V., Picotti, P., Domon, B. & Aebersold, R. Selected reaction monitoring for quantitative proteomics: a tutorial. Mol. Syst. Biol. 4, 222 (2008).

    Article  PubMed  PubMed Central  Google Scholar 

  135. Tarasova, I. A., Masselon, C. D., Gorshkov, A. V. & Gorshkov, M. V. Predictive chromatography of peptides and proteins as a complementary tool for proteomics. Analyst 141, 4816–4832 (2016).

    Article  CAS  PubMed  Google Scholar 

  136. Rost, H., Malmstrom, L. & Aebersold, R. A computational tool to detect and avoid redundancy in selected reaction monitoring. Mol. Cell Proteom. 11, 540–549 (2012).

    Article  CAS  Google Scholar 

  137. Mueller, L. K., Baumruck, A. C., Zhdanova, H. & Tietze, A. A. Challenges and perspectives in chemical synthesis of highly hydrophobic peptides. Front. Bioeng. Biotechnol. 8, 162 (2020).

    Article  PubMed  PubMed Central  Google Scholar 

  138. Wu, C. et al. Expediting SRM assay development for large-scale targeted proteomics experiments. J. Proteome Res. 13, 4479–4487 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  139. MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  140. Pino, L. K. et al. Matrix-matched calibration curves for assessing analytical figures of merit in quantitative proteomics. J. Proteome Res. 19, 1147–1153 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  141. Whiteaker, J. R. et al. CPTAC Assay Portal: a repository of targeted proteomic assays. Nat. Methods 11, 703–704 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  142. Yu, L. et al. Targeted brain proteomics uncover multiple pathways to Alzheimer’s dementia. Ann. Neurol. 84, 78–88 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  143. Whiteaker, J. R. et al. Peptide immunoaffinity enrichment with targeted mass spectrometry: application to quantification of ATM kinase phospho-signaling. Methods Mol. Biol. 1599, 197–213 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  144. Zhu, Y. et al. Immunoaffinity microflow liquid chromatography/tandem mass spectrometry for the quantitation of PD1 and PD-L1 in human tumor tissues. Rapid Commun. Mass Spectrom. 34, e8896 (2020).

    Article  CAS  PubMed  Google Scholar 

  145. Schneck, N. A., Phinney, K. W., Lee, S. B. & Lowenthal, M. S. Quantification of cardiac troponin I in human plasma by immunoaffinity enrichment and targeted mass spectrometry. Anal. Bioanal. Chem. 410, 2805–2813 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  146. Sall, A. et al. Advancing the immunoaffinity platform AFFIRM to targeted measurements of proteins in serum in the pg/ml range. PLoS ONE 13, e0189116 (2018).

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  147. Jung, S. et al. Quantification of ATP7B protein in dried blood spots by peptide immuno-SRM as a potential screen for Wilson’s disease. J. Proteome Res. 16, 862–871 (2017).

    Article  CAS  PubMed  Google Scholar 

  148. Schoenherr, R. M. et al. Multiplexed quantification of estrogen receptor and HER2/Neu in tissue and cell lysates by peptide immunoaffinity enrichment mass spectrometry. Proteomics 12, 1253–1260 (2012).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  149. Gibbons, B. C. et al. Rapidly assessing the quality of targeted proteomics experiments through monitoring stable-isotope labeled standards. J. Proteome Res. 18, 694–699 (2019).

    Article  CAS  PubMed  Google Scholar 

  150. Carr, S. A. et al. Targeted peptide measurements in biology and medicine: best practices for mass spectrometry-based assay development using a fit-for-purpose approach. Mol. Cell Proteom. 13, 907–917 (2014).

    Article  CAS  Google Scholar 

  151. Grant, R. P. & Hoofnagle, A. N. From lost in translation to paradise found: enabling protein biomarker method transfer by mass spectrometry. Clin. Chem. 60, 941–944 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  152. Chen, Z. et al. Quantitative insulin analysis using liquid chromatography-tandem mass spectrometry in a high-throughput clinical laboratory. Clin. Chem. 59, 1349–1356 (2013).

    Article  CAS  PubMed  Google Scholar 

  153. Zhang, Q. et al. Serum proteomics reveals systemic dysregulation of innate immunity in type 1 diabetes. J. Exp. Med. 210, 191–203 (2013).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  154. Almangush, A. et al. A simple novel prognostic model for early stage oral tongue cancer. Int. J. Oral. Maxillofac. Surg. 44, 143–150 (2015).

    Article  CAS  PubMed  Google Scholar 

  155. Tofte, N. et al. Early detection of diabetic kidney disease by urinary proteomics and subsequent intervention with spironolactone to delay progression (PRIORITY): a prospective observational study and embedded randomised placebo-controlled trial. Lancet Diabetes Endocrinol. 8, 301–312 (2020).

    Article  CAS  PubMed  Google Scholar 

  156. Issaq, H. J., Veenstra, T. D., Conrads, T. P. & Felschow, D. The SELDI-TOF MS approach to proteomics: protein profiling and biomarker identification. Biochem. Biophys. Res. Commun. 292, 587–592 (2002).

    Article  CAS  PubMed  Google Scholar 

  157. Fung, E. T. A recipe for proteomics diagnostic test development: the OVA1 test, from biomarker discovery to FDA clearance. Clin. Chem. 56, 327–329 (2010).

    Article  CAS  PubMed  Google Scholar 

  158. Carvalho, V. P. et al. The contribution and perspectives of proteomics to uncover ovarian cancer tumor markers. Transl. Res. 206, 71–90 (2019).

    Article  CAS  PubMed  Google Scholar 

  159. Belczacka, I. et al. Proteomics biomarkers for solid tumors: current status and future prospects. Mass Spectrom. Rev. 38, 49–78 (2019).

    Article  CAS  PubMed  Google Scholar 

  160. Ma, J. & Kilby, G. W. Sensitive, rapid, robust, and reproducible workflow for host cell protein profiling in biopharmaceutical process development. J. Proteome Res. https://doi.org/10.1021/acs.jproteome.0c00252 (2020).

  161. Couvillion, S. P. et al. New mass spectrometry technologies contributing towards comprehensive and high throughput omics analyses of single cells. Analyst 144, 794–807 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  162. Li, J. et al. TMTpro reagents: a set of isobaric labeling mass tags enables simultaneous proteome-wide measurements across 16 samples. Nat. Methods 17, 399–404 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  163. Beausoleil, S. A., Villen, J., Gerber, S. A., Rush, J. & Gygi, S. P. A probability-based approach for high-throughput protein phosphorylation analysis and site localization. Nat. Biotechnol. 24, 1285–1292 (2006).

    Article  CAS  PubMed  Google Scholar 

Download references

Acknowledgements

The authors thank N. Johnson for his help in designing figures used in this publication. This work was supported by National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases grants UC4 DK104166 (to C.E.M. and T.O.M.), U01 DK127786 (to C.E.M., B.J.M.W.R and T.O.M.), U01 DK124020 (to W.J.Q), R01 DK032493 (to M.R.) and P30DK097512 (to C.E.M), R01 DK093954 (to C.E.M) and R21 DK119800-01A1 (to C.E.M). M.R., E.S.N., B.J.M.W.R. and T.O.M. were also supported by the Helmsley Trust grant G-1901-03687. C.E.M was also supported by VA Merit Award I01BX001733, JDRF 2-SRA-2018-493-A-B and gifts from the Sigma Beta Sorority, the Ball Brothers Foundation, and the George and Frances Ball Foundation. The TEDDY Study is funded by U01 DK63829, U01 DK63861, U01 DK63821, U01 DK63865, U01 DK63863, U01 DK63836, U01 DK63790, UC4 DK63829, UC4 DK63861, UC4 DK63821, UC4 DK63865, UC4 DK63863, UC4 DK63836, UC4 DK95300, UC4 DK100238, UC4 DK106955, UC4 DK112243, UC4 DK117483, and Contract No. HHSN267200700014C from the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK), National Institute of Allergy and Infectious Diseases (NIAID), Eunice Kennedy Shriver National Institute of Child Health and Human Development (NICHD), National Institute of Environmental Health Sciences (NIEHS), Centers for Disease Control and Prevention (CDC), and JDRF. TEDDY is supported in part by the NIH/NCATS Clinical and Translational Science Awards to the University of Florida (UL1 TR000064) and the University of Colorado (UL1 TR002535). Work was performed in the Environmental Molecular Sciences Laboratory, a US Department of Energy (DOE) national scientific user facility at Pacific Northwest National Laboratory (PNNL) in Richland, WA. Battelle operates PNNL for the DOE under contract DE-AC05-76RLO01830.

Author information

Authors and Affiliations

Authors

Contributions

E.S.N. wrote the abstract, introduction and concluding remarks, contributed to the data analysis section and edited the manuscript; A.M.D.S., J.P.K., M.R. and B.I.F. wrote the sections on subject selection, power calculation and considerations for sample handling. C.E.M. wrote the section on specimen collection, storage and tracking; M.G., P.D.P. and A.S. wrote the sample preparation sections of both discovery and validation phases; D.O. wrote the section on data collection for the discovery phase; C.A. wrote the section on data quality control; B.J.W.R. contributed to the power analysis section and wrote the data analysis section; Y.G., P.D.P., T.F. and W.J.Q wrote about the different sections of the validation phase; T.O.M. wrote the phases of biomarker development. All the authors read, provided inputs and approved the final version of the manuscript.

Corresponding authors

Correspondence to Ernesto S. Nakayasu or Thomas O. Metz.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Peer review information Nature Protocols thanks Bing Zhang and the other, anonymous reviewer(s) for their contribution to the peer review of this work.

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Related links

Key references using this review

Zhang, Q. et al. J. Exp. Med. 210, 191–203 (2013): https://doi.org/10.1084/jem.20111843

Carnielli, C. M. et al. Nat. Commun. 9, 3598 (2018): https://doi.org/10.1038/s41467-018-05696-2

Tofte, N. et al. Lancet Diabetes Endocrinol. 8, 301–312 (2020): https://doi.org/10.1016/S2213-8587(20)30026-7

Zhang, Z. et al. Cancer Res. 64, 5882–5890 (2004): https://doi.org/10.1158/0008-5472.CAN-04-0746

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Nakayasu, E.S., Gritsenko, M., Piehowski, P.D. et al. Tutorial: best practices and considerations for mass-spectrometry-based protein biomarker discovery and validation. Nat Protoc 16, 3737–3760 (2021). https://doi.org/10.1038/s41596-021-00566-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue date:

  • DOI: https://doi.org/10.1038/s41596-021-00566-6

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research