Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Article
  • Published:

Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP

Abstract

Despite widespread advances in DNA sequencing, the functional consequences of most genetic variants remain poorly understood. Multiplexed assays of variant effect can measure the function of variants at scale but cannot readily be applied to the ~10% of human genes encoding secreted proteins. Here we develop a flexible, scalable human cell surface display method, multiplexed surface tethering of extracellular proteins (MultiSTEP), to study the consequences of missense variation in coagulation factor IX (FIX), a serine protease in which genetic variation can cause hemophilia B. We combine MultiSTEP with a panel of antibodies to detect FIX secretion and post-translational modification (PTM), measuring 44,816 variant effects for 436 synonymous variants and 8,528 of the 8,759 possible F9 missense variants. Almost half of missense variants impact secretion, PTM or both. We also identify functional constraints on secretion within the signal peptide and for nearly all gain or loss of cysteine variants. Secretion scores correlate strongly with FIX levels in hemophilia B and reveal that loss-of-secretion variants are more often associated with severe disease. Integration of the secretion and PTM scores enables reclassification of 63.1% of F9 variants of uncertain significance in the My Life, Our Future hemophilia genotyping project. Lastly, we show that MultiSTEP can be applied to other secreted proteins, thus demonstrating that MultiSTEP is a multiplexed, multimodal and generalizable method for systematically assessing variant effects in secreted proteins at scale.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: MultiSTEP enables at-scale measurement of variant effects in secreted proteins.
Fig. 2: The 17,927 MultiSTEP-derived secretion scores for 8,964 FIX variants.
Fig. 3: MultiSTEP reveals biochemical constraints on secretion.
Fig. 4: MultiSTEP enables measurement of variant effects on FIX PTMs.
Fig. 5: Secretion and γ-carboxylation scores reveal clinical features of hemophilia B and enable variant reinterpretation.
Fig. 6: MultiSTEP can be applied to diverse secreted proteins.

Similar content being viewed by others

Data availability

VAMP-seq abundance scores for PTEN (urn:mavedb:00000013-a-1), TPMT (urn:mavedb:00000013-b-1), VKOR (urn:mavedb:00000078-b-1), CYP2C9 (urn:mavedb:00000095-b-1) and NUDT15 (urn:mavedb:00000055-a-1) were downloaded from MaveDB109. Gla domain protein sequences for human FIX (P00740), human prothrombin (coagulation factor II; P00734), human coagulation factor VII (P08709), human coagulation factor X (P00742), human protein C (P04070), human protein S (P07225), human osteocalcin (bone gla-protein; P02818), bovine osteocalcin (bone Gla-protein; P02820), human growth arrest-specific protein 6 (P14393), Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93), Notechis scutatis venom prothrombin activator notecarin-D1 (P82807) and Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96) were downloaded from UniProt.org. ClinVar variants are publicly available at https://www.ncbi.nlm.nih.gov/clinvar. gnomAD (v.4.1) variants are available at https://gnomad.broadinstitute.org. MLOF variants have been previously deposited into the EAHAD FIX clinical database (https://dbs.eahad.org/FIX) and the CDC CHBMP database (https://www.cdc.gov/hemophilia/mutation-project/index.html) and published11,12. A complete set of MLOF variants used in this study, along with relevant information about these variants, is provided in Supplementary Table 4. F9 variant scores are available in Supplementary Table 12 and at MaveDB (www.https://www.mavedb.org; urn:mavedb:00001200). Raw sequencing, barcode–variant maps and scores are available in the NCBI Gene Expression Omnibus repository (GSE242805). All other data files are provided at https://github.com/FowlerLab/2024_multistep. All other data supporting the findings of this study are available from the corresponding author on reasonable request. Source data are provided with this paper.

Code availability

All code to reproduce the analyses and figures is available on GitHub at https://github.com/FowlerLab/2024_multistep. Versions of R packages used for analyses are described in the code file on GitHub.

References

  1. Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  2. Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).

    Article  PubMed  PubMed Central  Google Scholar 

  3. Fayer, S. et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Tabet, D., Parikh, V., Mali, P., Roth, F. P. & Claussnitzer, M. Scalable functional assays for the interpretation of human genetic variation. Annu. Rev. Genet. 56, 441–465 (2022).

    Article  CAS  PubMed  Google Scholar 

  5. Uhlén, M. et al. The human secretome. Sci. Signal. 12, eaaz0274 (2019).

    Article  PubMed  Google Scholar 

  6. Freedman, S. J., Furie, B. C., Furie, B. & Baleja, J. D. Structure of the metal-free γ-carboxyglutamic acid-rich membrane binding region of factor IX by two-dimensional NMR spectroscopy. J. Biol. Chem. 270, 7980–7987 (1995).

    Article  CAS  PubMed  Google Scholar 

  7. Freedman, S. J. et al. Identification of the phospholipid binding site in the vitamin K-dependent blood coagulation protein factor IX. J. Biol. Chem. 271, 16227–16236 (1996).

    Article  CAS  PubMed  Google Scholar 

  8. Shikamoto, Y., Morita, T., Fujimoto, Z. & Mizuno, H. Crystal structure of Mg2+- and Ca2+-bound Gla domain of factor IX complexed with binding protein. J. Biol. Chem. 278, 24090–24094 (2003).

    Article  CAS  PubMed  Google Scholar 

  9. Huang, M., Furie, B. C. & Furie, B. Crystal structure of the calcium-stabilized human factor IX Gla domain bound to a conformation-specific anti-factor IX antibody. J. Biol. Chem. 279, 14338–14346 (2004).

    Article  CAS  PubMed  Google Scholar 

  10. Zacchi, L. F. et al. Coagulation factor IX analysis in bioreactor cell culture supernatant predicts quality of the purified product. Commun. Biol. 4, 390 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  11. Rallapalli, P. M., Kemball-Cook, G., Tuddenham, E. G., Gomez, K. & Perkins, S. J. An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B. J. Thromb. Haemost. 11, 1329–1340 (2013).

    Article  CAS  PubMed  Google Scholar 

  12. Johnsen, J. M. et al. Results of genetic analysis of 11,341 participants enrolled in the My Life, Our Future hemophilia genotyping initiative in the United States. J. Thromb. Haemost. 20, 2022–2034 (2022).

    Article  PubMed  Google Scholar 

  13. Konkle, B. A., Josephson, N. C. & Nakaya Fletcher, S. in GeneReviews (eds Pagon, R. A. et al.) (University of Washington, 2023).

  14. MASAC Document 273—Recommendations on Genotyping for Persons with Hemophilia (National Hemophilia Foundation, 2022); https://www.hemophilia.org/healthcare-professionals/guidelines-on-care/masac-documents/masac-document-273-recommendations-on-genotyping-for-persons-with-hemophilia

  15. Gao, W. et al. Characterization of missense mutations in the signal peptide and propeptide of FIX in hemophilia B by a cell-based assay. Blood Adv. 4, 3659–3667 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  16. Matreyek, K. A., Stephany, J. J., Chiasson, M. A., Hasle, N. & Fowler, D. M. An improved platform for functional assessment of large protein libraries in mammalian cells. Nucleic Acids Res. 48, e1 (2020).

    CAS  PubMed  Google Scholar 

  17. Savoldo, B. et al. CD28 costimulation improves expansion and persistence of chimeric antigen receptor-modified T cells in lymphoma patients. J. Clin. Invest. 121, 1822–1826 (2011).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  18. Esensten, J. H., Helou, Y. A., Chopra, G., Weiss, A. & Bluestone, J. A. CD28 costimulation: from mechanism to therapy. Immunity 44, 973–988 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  19. Liu, L. et al. Inclusion of Strep-tag II in design of antigen receptors for T-cell immunotherapy. Nat. Biotechnol. 34, 430–434 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bicocchi, M. P. et al. Insight into molecular changes of the FIX protein in a series of Italian patients with haemophilia B. Haemophilia 12, 263–270 (2006).

    Article  CAS  PubMed  Google Scholar 

  21. Matreyek, K. A. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  22. Suiter, C. C. et al. Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Proc. Natl Acad. Sci. USA 117, 5394–5401 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  23. Amorosi, C. J. et al. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am. J. Hum. Genet. 108, 1735–1751 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Kurys, G., Tagaya, Y., Bamford, R., Hanover, J. A. & Waldmann, T. A. The long signal peptide isoform and its alternative processing direct the intracellular trafficking of interleukin-15. J. Biol. Chem. 275, 30653–30659 (2000).

    Article  CAS  PubMed  Google Scholar 

  25. Owji, H., Nezafat, N., Negahdaripour, M., Hajiebrahimi, A. & Ghasemi, Y. A comprehensive review of signal peptides: structure, roles, and applications. Eur. J. Cell Biol. 97, 422–441 (2018).

    Article  CAS  PubMed  Google Scholar 

  26. Tikhonova, E. B., Karamysheva, Z. N., von Heijne, G. & Karamyshev, A. L. Silencing of aberrant secretory protein expression by disease-associated mutations. J. Mol. Biol. 431, 2567–2580 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Liaci, A. M. et al. Structure of the human signal peptidase complex reveals the determinants for signal peptide cleavage. Mol. Cell 81, 3934–3948.e11 (2021).

    Article  CAS  PubMed  Google Scholar 

  28. Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  29. Gutierrez Guarnizo, S. A. et al. Pathogenic signal peptide variants in the human genome. NAR Genom. Bioinform. 5, lqad093 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  30. Braakman, I. & Hebert, D. N. Protein folding in the endoplasmic reticulum. Cold Spring Harb. Perspect. Biol. 5, a013201 (2013).

    Article  PubMed  PubMed Central  Google Scholar 

  31. Zhang, H. et al. Unpaired extracellular cysteine mutations of CSF3R mediate gain or loss of function. Cancer Res. 77, 4258–4267 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Woodard, D. R. et al. A loss-of-function cysteine mutant in fibulin-3 (EFEMP1) forms aberrant extracellular disulfide-linked homodimers and alters extracellular matrix composition. Hum. Mutat. 43, 1945–1955 (2022).

    Article  CAS  PubMed  Google Scholar 

  33. Chiasson, M. A. et al. Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact. eLife 9, e58026 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  34. Yariv, B. et al. Using evolutionary data to make sense of macromolecules with a ‘face-lifted’ ConSurf. Protein Sci. 32, e4582 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  35. Huang, M. et al. Structural basis of membrane binding by Gla domains of vitamin K-dependent proteins. Nat. Struct. Biol. 10, 751–756 (2003).

    Article  CAS  PubMed  Google Scholar 

  36. Grant, M. A., Baikeev, R. F., Gilbert, G. E. & Rigby, A. C. Lysine 5 and phenylalanine 9 of the factor IX omega-loop interact with phosphatidylserine in a membrane-mimetic environment. Biochemistry 43, 15367–15378 (2004).

    Article  CAS  PubMed  Google Scholar 

  37. Feuerstein Giora, Z. et al. Antithrombotic efficacy of a novel murine antihuman factor IX antibody in rats. Arterioscler. Thromb. Vasc. Biol. 19, 2554–2562 (1999).

    Article  Google Scholar 

  38. Aktimur, A., Gabriel, M. A., Gailani, D. & Toomey, J. R. The factor IX γ-carboxyglutamic acid (Gla) domain is involved in interactions between factor IX and factor XIa. J. Biol. Chem. 278, 7981–7987 (2003).

    Article  CAS  PubMed  Google Scholar 

  39. Brown, M. A., Stenberg, L. M., Persson, U. & Stenflo, J. Identification and purification of vitamin K-dependent proteins and peptides with monoclonal antibodies specific for γ-carboxyglutamyl (Gla) residues. J. Biol. Chem. 275, 19795–19802 (2000).

    Article  CAS  PubMed  Google Scholar 

  40. Whitlon, D. S., Sadowski, J. A. & Suttie, J. W. Mechanism of coumarin action: significance of vitamin K epoxide reductase inhibition. Biochemistry 17, 1371–1377 (1978).

    Article  CAS  PubMed  Google Scholar 

  41. Rabiet, M. J., Jorgensen, M. J., Furie, B. & Furie, B. C. Effect of propeptide mutations on post-translational processing of factor IX. Evidence that β-hydroxylation and γ-carboxylation are independent events. J. Biol. Chem. 262, 14895–14898 (1987).

    Article  CAS  PubMed  Google Scholar 

  42. Furie, B. & Furie, B. C. Molecular basis of vitamin K-dependent γ-carboxylation. Blood 75, 1753–1762 (1990).

    Article  CAS  PubMed  Google Scholar 

  43. Gillis, S. et al. Ɣ-Carboxyglutamic acids 36 and 40 do not contribute to human factor IX function. Protein Sci. 6, 185–196 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  44. Stenina, O., Pudota, B. N., McNally, B. A., Hommema, E. L. & Berkner, K. L. Tethered processivity of the vitamin K-dependent carboxylase: factor IX is efficiently modified in a mechanism which distinguishes Gla’s from Glu’s and which accounts for comprehensive carboxylation in vivo. Biochemistry 40, 10301–10309 (2001).

    Article  CAS  PubMed  Google Scholar 

  45. Bristol, J. A., Freedman, S. J., Furie, B. C. & Furie, B. Profactor IX: the propeptide inhibits binding to membrane surfaces and activation by factor XIA. Biochemistry 33, 14136–14143 (1994).

    Article  CAS  PubMed  Google Scholar 

  46. Wolberg, A. S. et al. Characterization of γ-carboxyglutamic acid residue 21 of human factor IX. Biochemistry 35, 10321–10327 (1996).

    Article  CAS  PubMed  Google Scholar 

  47. Wojcik, E. G., Van Den Berg, M., Poort, S. R. & Bertina, R. M. Modification of the N-terminus of human factor IX by defective propeptide cleavage or acetylation results in a destabilized calcium-induced conformation: effects on phospholipid binding and activation by factor XIa. Biochem. J. 323, 629–636 (1997).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  48. Ware, J. et al. Factor IX San Dimas. Substitution of glutamine for Arg-4 in the propeptide leads to incomplete γ-carboxylation and altered phospholipid binding properties. J. Biol. Chem. 264, 11401–11406 (1989).

    Article  CAS  PubMed  Google Scholar 

  49. Liebman, H. A. The metal-dependent conformational changes in factor IX associated with phospholipid binding. Studies using antibodies against a synthetic peptide and chemical modification of factor IX. Eur. J. Biochem. 212, 339–345 (1993).

    Article  CAS  PubMed  Google Scholar 

  50. Jacobs, M., Freedman, S. J., Furie, B. C. & Furie, B. Membrane binding properties of the factor IX gamma-carboxyglutamic acid-rich domain prepared by chemical synthesis. J. Biol. Chem. 269, 25494–25501 (1994).

    Article  CAS  PubMed  Google Scholar 

  51. Agah, S. & Bajaj, S. P. Role of magnesium in factor XIa catalyzed activation of factor IX: calcium binding to factor IX under physiologic magnesium. J. Thromb. Haemost. 7, 1426–1428 (2009).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Westmark, P. R., Tanratana, P. & Sheehan, J. P. Selective disruption of heparin and antithrombin-mediated regulation of human factor IX. J. Thromb. Haemost. 13, 1053–1063 (2015).

    Article  CAS  PubMed  Google Scholar 

  53. Plautz, W. E. et al. Anticoagulant protein S targets the factor IXa heparin-binding exosite to prevent thrombosis. Arterioscler. Thromb. Vasc. Biol. 38, 816–828 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  54. Cooley, B. et al. Dysfunctional endogenous FIX impairs prophylaxis in a mouse hemophilia B model. Blood 133, 2445–2451 (2019).

    Article  CAS  PubMed  Google Scholar 

  55. Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).

    Article  CAS  PubMed  Google Scholar 

  56. Iorio, A. et al. Establishing the prevalence and prevalence at birth of hemophilia in males: a meta-analytic approach using national registries. Ann. Intern. Med. 171, 540–546 (2019).

    Article  PubMed  Google Scholar 

  57. Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).

    Article  CAS  PubMed  Google Scholar 

  59. Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  60. Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).

    Article  CAS  PubMed  Google Scholar 

  61. Livesey, B. J. & Marsh, J. A. Updated benchmarking of variant effect predictors using deep mutational scanning. Mol. Syst. Biol. 19, e11474 (2023).

    Article  PubMed  PubMed Central  Google Scholar 

  62. Tavtigian, S. V. et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 20, 1054–1060 (2018).

    Article  PubMed  PubMed Central  Google Scholar 

  63. Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2020).

    Article  Google Scholar 

  64. Fokkema, I. F. A. C. et al. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32, 557–563 (2011).

    Article  CAS  PubMed  Google Scholar 

  65. Matamala, N. et al. Characterization of novel missense variants of SERPINA1 gene causing alpha-1 antitrypsin deficiency. Am. J. Respir. Cell Mol. Biol. 58, 706–716 (2018).

    Article  CAS  PubMed  Google Scholar 

  66. McVey, J. H. et al. The European Association for Haemophilia and Allied Disorders (EAHAD) Coagulation Factor Variant Databases: important resources for haemostasis clinicians and researchers. Haemophilia 26, 306–313 (2020).

    Article  PubMed  Google Scholar 

  67. Seixas, S. & Marques, P. I. Known mutations as the cause of alpha-1 antitrypsin deficiency: an updated overview of SERPINA1 variation spectrum. Appl. Clin. Genet. 14, 173–194 (2021).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  68. Gai, S. A. & Wittrup, K. D. Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol. 17, 467–473 (2007).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  69. Salema, V. & Fernández, L. Á. Escherichia coli surface display for the selection of nanobodies. Microb. Biotechnol. 10, 1468–1484 (2017).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  70. Ho, M. & Pastan, I. in Therapeutic Antibodies: Methods and Protocols (ed. Dimitrov, A. S.) 337–352 (Humana Press, 2009).

  71. Frank, F. et al. Deep mutational scanning identifies SARS-CoV-2 nucleocapsid escape mutations of currently available rapid antigen tests. Cell 185, 3603–3616.e13 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  72. Parthiban, K. et al. A comprehensive search of functional sequence space using large mammalian display libraries created by gene editing. mAbs 11, 884–898 (2019).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Vink, T., Oudshoorn-Dickmann, M., Roza, M., Reitsma, J.-J. & de Jong, R. N. A simple, robust and highly efficient transient expression system for producing antibodies. Methods 65, 5–10 (2014).

    Article  CAS  PubMed  Google Scholar 

  74. do Amaral, R. L. F. et al. Approaches for recombinant human factor IX production in serum-free suspension cultures. Biotechnol. Lett. 38, 385–394 (2016).

    Article  PubMed  Google Scholar 

  75. Duportet, X. et al. A platform for rapid prototyping of synthetic gene networks in mammalian cells. Nucleic Acids Res. 42, 13440–13451 (2014).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  76. Zhu, F. et al. DICE, an efficient system for iterative genomic editing in human pluripotent stem cells. Nucleic Acids Res. 42, e34 (2014).

    Article  CAS  PubMed  Google Scholar 

  77. Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  78. Starita, L. M. et al. A multiplex homology-directed DNA repair assay reveals the impact of more than 1,000 BRCA1 missense substitution variants on protein function. Am. J. Hum. Genet. 103, 498–508 (2018).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Hasle, N. et al. High-throughput, microscope-based sorting to dissect cellular heterogeneity. Mol. Syst. Biol. 16, e9442 (2020).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Low, B. E., Hosur, V., Lesbirel, S. & Wiles, M. V. Efficient targeted transgenesis of large donor DNA into multiple mouse genetic backgrounds using bacteriophage Bxb1 integrase. Sci. Rep. 12, 5424 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  81. Durrant, M. G. et al. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome. Nat. Biotechnol. 41, 488–499 (2023).

    Article  CAS  PubMed  Google Scholar 

  82. Zhang, M. et al. SHIELD: a platform for high-throughput screening of barrier-type DNA elements in human cells. Nat. Commun. 14, 5616 (2023).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  83. Aslanzadeh, V. et al. Deep mutational scanning of the human insulin receptor ectodomain to inform precision therapy for insulin resistance. Preprint at bioRxiv https://doi.org/10.1101/2024.09.07.611782 (2024).

  84. Blanch-Asensio, A. et al. STRAIGHT-IN Dual: a platform for dual, single-copy integrations of DNA payloads and gene circuits into human induced pluripotent stem cells. Preprint at bioRxiv https://doi.org/10.1101/2024.10.17.616637 (2024).

  85. Boyle, G. E. et al. Deep mutational scanning of CYP2C19 in human cells reveals a substrate specificity-abundance tradeoff. Genetics 228, iyae156 (2024).

    Article  PubMed  Google Scholar 

  86. Hew, B. E. et al. Directed evolution of hyperactive integrases for site specific insertion of transgenes. Nucleic Acids Res. 52, e64 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  87. Hong, C. K. Y. et al. Massively parallel characterization of insulator activity across the genome. Nat. Commun. 15, 8350 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  88. Huhtinen, O., Prince, S., Lamminmäki, U., Salbo, R. & Kulmala, A. Increased stable integration efficiency in CHO cells through enhanced nuclear localization of Bxb1 serine integrase. BMC Biotechnol. 24, 44 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  89. Kent, J. D., Klug, L. R. & Heinrich, M. C. A novel human SDHA-knockout cell line model for the functional analysis of clinically relevant SDHA variants. Clin. Cancer Res. 30, 5399–5412 (2024).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  90. Kim, J., Muller, R. Y., Bondra, E. R. & Ingolia, N. T. CRISPRi with barcoded expression reporters dissects regulatory networks in human cells. Preprint at bioRxiv https://doi.org/10.1101/2024.09.06.611573 (2024).

  91. Pandey, S. et al. Efficient site-specific integration of large genes in mammalian cells via continuously evolved recombinases and prime editing. Nat. Biomed. Eng. 9, 22–39 (2025).

    Article  CAS  PubMed  Google Scholar 

  92. Wang, Z., Sarkar, A. & Ge, X. De novo functional discovery of peptide-MHC restricted CARs from recombinase-constructed large-diversity monoclonal T cell libraries. Preprint at bioRxiv https://doi.org/10.1101/2024.11.27.625413 (2024).

  93. Acharya, P., Quinlan, A. & Neumeister, V. The ABCs of finding a good antibody: how to find a good antibody, validate it, and publish meaningful data. F1000Res. 6, 851 (2017).

    Article  PubMed  PubMed Central  Google Scholar 

  94. Baron, M. et al. The three-dimensional structure of the first EGF-like module of human factor IX: comparison with EGF and TGF-α. Protein Sci. 1, 81–90 (1992).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  95. Johnson, D. J. D., Langdown, J. & Huntington, J. A. Molecular basis of factor IXa recognition by heparin-activated antithrombin revealed by a 1.7-Å structure of the ternary complex. Proc. Natl Acad. Sci. USA 107, 645–650 (2010).

    Article  CAS  PubMed  Google Scholar 

  96. UniProt Consortium. UniProt: the universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).

    Article  Google Scholar 

  97. Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).

    Article  CAS  PubMed  Google Scholar 

  98. García-Nafría, J., Watson, J. F. & Greger, I. H. IVA cloning: a single-tube universal cloning system exploiting bacterial in vivo assembly. Sci. Rep. 6, 27459 (2016).

    Article  PubMed  PubMed Central  Google Scholar 

  99. den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).

    Article  Google Scholar 

  100. Miao, H. Z. et al. Bioengineering of coagulation factor VIII for improved secretion. Blood 103, 3412–3419 (2004).

    Article  CAS  PubMed  Google Scholar 

  101. Kessler, C. M. et al. B-domain deleted recombinant factor VIII preparations are bioequivalent to a monoclonal antibody purified plasma-derived factor VIII concentrate: a randomized, three-way crossover study. Haemophilia 11, 84–91 (2005).

    Article  CAS  PubMed  Google Scholar 

  102. Ward, N. J. et al. Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood 117, 798–807 (2011).

    Article  CAS  PubMed  Google Scholar 

  103. Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014).

    Article  CAS  PubMed  Google Scholar 

  104. Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

  105. Yeh, C.-L. C., Amorosi, C. J., Showman, S. & Dunham, M. J. PacRAT: a program to improve barcode–variant mapping from PacBio long reads using multiple sequence alignment. Bioinformatics 38, 2927–2929 (2022).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  106. Kulkarni, R. et al. Sites of initial bleeding episodes, mode of delivery and age of diagnosis in babies with haemophilia diagnosed before the age of 2 years: a report from the Centers for Disease Control and Prevention’s (CDC) Universal Data Collection (UDC) project. Haemophilia 15, 1281–1290 (2009).

    Article  CAS  PubMed  Google Scholar 

  107. Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  108. Menardi, G. & Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 28, 92–122 (2014).

    Article  Google Scholar 

  109. Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019).

    Article  PubMed  PubMed Central  Google Scholar 

  110. Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Acknowledgements

We thank A. P. Leith, C. Lee, D. E. Prunkard and A. Silvestroni of the University of Washington (UW) Foege Flow Lab and the UW Pathology Flow Cytometry Core for their assistance with cell analysis, staining and sorting; K. M. Munson of the UW PacBio Sequencing Service for assistance with long-read PacBio sequencing; D. A. Nickerson, L. M. Starita, D. J. Maly, S. Nariya, J. J. Stephany and A. E. McEwen for advice on analyzing data and feedback on the paper. We thank S. W. Pipe and A. Scheller of the University of Michigan Department of Pediatrics and Department of Hematology for providing FVIII constructs and advice on FVIII expression. We thank J. Kulman for discussions about FIX carboxylation. We thank R. Kruse-Jarres for her commitment to supporting research that improves the lives of people living with bleeding disorders. We thank and acknowledge B. A. Konkle, the Principal Investigator of MLOF, the MLOF partners at Bloodworks, the American Thrombosis and Hemostasis Network, the National Hemophilia Foundation (now the National Bleeding Disorders Foundation), funding from Biogen/Bioverativ, providers and staff at HTC sites, and the 11,341 participants who made MLOF a success. This work was supported by the National Heart, Lung and Blood Institute (R01HL152066 to J.M.J. and D.M.F.; F30HL151075 to N.A.P.; R01HL149855 to J.P.S.), the National Human Genome Research Institute (RM1HG010461 and UM1HG011969 to D.M.F.), the National Institute of General Medical Sciences (R01GM109110 to D.M.F.), and the Washington Center for Bleeding Disorders (to J.M.J.). The funders of this work had no role in the study design, data collection, analysis, decision to publish or preparation of this manuscript.

Author information

Authors and Affiliations

Authors

Contributions

N.A.P., K.W.L., J.P.S., J.M.J. and D.M.F. conceived of the work. N.A.P., J.P.S., J.M.J. and D.M.F. wrote the paper. N.A.P., R.L.P., M.K.W., B.D.Z., K.J.H., K.M.S., X.W., K.W.L. and A.T.C. performed experiments. N.A.P. performed the statistical and computational analysis. N.A.P., S.N.F. and J.M.J. manually curated ClinVar, gnomAD and MLOF for variants. N.A.P. and A.F.R. prepared results for distribution. N.A.P., S.N.F., S.F. and J.M.J. performed variant reinterpretation. All authors approved the final paper.

Corresponding authors

Correspondence to Jill M. Johnsen or Douglas M. Fowler.

Ethics declarations

Competing interests

J.P.S. was an expert witness for Genentech and Paul, Weiss, Rifkind, Wharton and Garrison. The other authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks Giorgio Galli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Dimitris Typas, in collaboration with the Nature Structural & Molecular Biology team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 MultiSTEP is based on a flexible genomically integrated approach for expressing secreted protein variants.

a, Cartoon depicting integration of a MultiSTEP plasmid construct into a genomically integrated landing pad cassette16. (Top): Lentivirally integrated landing pad cassette expressing mTagBFP2+ (royal blue) from a tetON inducible promoter. mTagBFP2 is fused to an inducible caspase-9 (iCasp9, orange) and a blasticidin resistance gene (dark yellow) with 2A sequences (dark pink). mTagBFP2-2A-iCasp9-2A-BlastR is expressed from a tetON inducible promoter with a attP serine recombinase recognition site (black) in between. Downstream is a terminator sequence (Term, brown) and tet repressor (tetR, salmon). Bxb1 serine recombinase, expressed from another plasmid, is shown in grey. (Middle): MultiSTEP plasmid construct. Secreted protein coding sequence (Sec Pro, pink) is C-terminally fused to flexible linkers (L1 and L2, teal), Strep II tag (St, green), and CD28 transmembrane domain (TM, medium blue). IRES (purple) drives co-transcription of mCherry (red). Upstream is an attB serine recombinase recognition sequence (goldenrod) and a unique 18 nucleotide degenerate barcode (BC, light yellow). (Bottom): Landing pad following plasmid integration. attP and attB sequences have been recombined, forming attL and attR sequences. b, Sequential flow cytometry gating scheme for detecting and isolating landing pad cells with an integrated MultiSTEP construct. Dot pseudocolor indicates density of cells. FSC: Forward scatter; SSC: side scatter. c, Comparison of negative control 293-F cells (top) with 293-F cells incubated with lentivirus encoding the landing pad cassette (bottom, n > 10,000 cells). d, Comparison of unrecombined landing pad cells (top) with cells transfected with a MultiSTEP plasmid encoding WT FIX (bottom, n > 10,000 cells). e, Comparison of cells transfected with a MultiSTEP construct encoding WT FIX treated with doxycycline (top) or doxycycline and 10 nM AP1903 (bottom, n > 10,000 cells). f, Design iterations of MultiSTEP construct plasmid in (a). L1-Strep MultiSTEP construct does not contain an L2 linker. Flow cytometry of MultiSTEP constructs using a anti-Strep II tag antibody (n ~ 30,000 cells).

Extended Data Fig. 2 A flexible tag-based approach to assessing variant effects on secretion.

a, Heatmap showing strep tag secretion scores for missense FIX variants. Color indicates MultiSTEP score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. b, Density distributions of strep tag secretion scores for FIX missense variants (orange) and synonymous variants (blue). Dashed line denotes the 5th percentile of the synonymous variant distribution. c, Scatter plot comparing MultiSTEP-derived strep tag secretion scores for seven different FIX variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.C134R, p.S220T, and p.H267L), WT, and an unrecombined negative control to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry individually (n = 3 replicates). Error bars show standard error of the mean. d-e, Scatter plots of median MultiSTEP-derived strep tag secretion scores and heavy chain (d) or light chain (e) at each position in FIX (n = 3 replicates). Points are colored by chain architecture, using the color scheme as Fig. 2a. Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation. f, Density plots of MultiSTEP-derived synonymous variant scores generated with the indicated antibody. The dashed vertical line shows WT score.

Extended Data Fig. 3 MultiSTEP-derived FIX secretion scores correlate with orthologous measures of FIX secretion.

a, Flow cytometry of p.C28Y and WT controls (n = 10,000 cells) with the FIX library (n = 100,000 cells). b, Comparison of ELISA measurements of eight untethered FIX missense variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.G125V, p.C134R, p.S220T, and p.H267L) expressed from 293-F cells and heavy chain secretion scores (n = 3 replicates). Error bars show the standard error of the mean. Pearson’s correlation coefficient is shown. c, Scatter plot comparing MultiSTEP-derived heavy chain secretion scores for 20 different FIX missense variants, WT, and unrecombined negative control (n = 3 replicates) to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry on cells expressing each variant individually. Error bars show standard error of the mean (n = 10,000 cells). Line of best fit (dashed) and Pearson’s correlation coefficient are shown.

Extended Data Fig. 4 Variants near antibody epitopes demonstrate minor effects on secretion scores.

a, Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the light chain and the nearest light chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with predicted local distance difference test score (pLDDT) of <70 were removed from analysis. Color indicates whether a position was identified in the light chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (9.15 angstroms). b, Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the heavy chain and the nearest heavy chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with pLDDT of <70 were removed from analysis. Color indicates whether a position was identified in the heavy chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (5.71 angstroms). c, Scatter plot of median MultiSTEP-derived heavy chain and light chain secretion scores at each position in FIX. Points are colored by epitope (Fig. 2h) or epitope-adjacent position as in (a) and (b). Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation.

Extended Data Fig. 5 Effect of missense FIX variation on secretion compared to missense variant effects on abundance in cytosolic or transmembrane proteins.

a, Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) scores for all nonsynonymous variants across all positions with the indicated WT amino acid for six different proteins21,22,23,33 (n = 29,287 variants). Whiskers span the range of data. b, Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) for all nonsynonymous variant amino acid substitutions across all positions for six different proteins (n = 29,287 variants).

Extended Data Fig. 6 Sequence conservation strongly influences the effect of variation on FIX secretion.

a, Comparison of light chain secretion scores with Consurf conservation grades (1: least conserved, 9: most conserved)34. Violin plot shows distribution of points (n = 8,528 variants) with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. b, Comparison of median light chain secretion scores (n = 8,528 variants) with Consurf conservation grades. Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution.

Extended Data Fig. 7 Carboxylation-sensitive antibodies identify functional motifs.

a, Multiple sequence alignment of Gla-domain containing proteins (UniProt) that bind the carboxylation-sensitive Gla-motif (ExxxExC) antibody using MUSCLE96,110. Antibody epitopes for both the carboxylation-sensitive FIX-specific antibody (ω-loop) and the carboxylation-sensitive Gla-motif antibody are shown. hF9, human coagulation factor IX (P00740); hF2, human prothrombin (coagulation factor II, P00734); hF7, human coagulation factor VII (P08709); hF10, human coagulation factor X (P00742); hPC, human protein C (P04070); hPS, human protein S (P07225); hBGP, human osteocalcin (P02818); bBGP, bovine osteocalcin (P02820); hGAS6, human growth arrest-specific protein 6 (P14393); ppVPA, Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93); nsVPA, Notechis scutatis venom prothrombin activator notecarin-D1 (P82807); osVPA, Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96). b-c, Fluorescence of unrecombined negative control and WT FIX-expressing cells with and without warfarin pretreatment generated by staining cells with a carboxylation-sensitive FIX-specific (b) or carboxylation-sensitive Gla-motif antibody (c). d-f, Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (d), carboxylation-sensitive Gla-motif carboxylation scores (e), or light chain secretion scores (f) for FIX propeptide variants. Furin cleavage site (Furin CS), ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (d). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. g-i, Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (g), carboxylation-sensitive Gla-motif carboxylation scores (h), or light chain secretion scores (i) for FIX Gla domain variants. Furin cleavage site (Furin CS) is annotated above (d). ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (g). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray.

Extended Data Fig. 8 Clinical correlates of secretion and gamma-carboxylation scores map to FIX biochemical features.

a, Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals with hemophilia B in the EAHAD database (n = 416 variants). Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 19 variants across 38 individuals)11. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. b, Comparison of hemophilia B severity from individuals with hemophilia B in the EAHAD database (n = 1,781 variants) with light chain secretion scores. Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 40 variants). Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. p values from a Kruskal–Wallis test adjusted for multiple comparisons by post-hoc Dunn’s test are shown. c, Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals harboring gain-of-cysteine variants in the EAHAD database (n = 9 variants across 27 individuals)11. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. d, Bar plot of hemophilia B disease severity in the EAHAD database for individuals harboring gain-of-cysteine variants. e, Bar plot of the number of FIX variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction. f, Bar plot of the number of FIX propeptide and Gla domain variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction.

Extended Data Fig. 9 Random forest model predictions for FIX variants in the EAHAD FIX Variant Database associated with hemophilia B.

a, Spearman correlation of MultiSTEP functional scores with EVE, AlphaMissense, REVEL, and CADD variant effect predictors. b, Histograms of four variant effect predictor scores for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF. Color indicates clinical variant interpretation. Data from four variant effect predictors are shown. Black dashed vertical lines indicate the thresholds for each predictor. For AlphaMissense we used the thresholds recommended in the original publication for 90% precision on existing ClinVar annotated variants ( ≤ 0.34: benign, 0.34-0.564: uncertain, ≥0.564: pathogenic). For REVEL, we used the thresholds used in the initial publication to assess REVEL’s precision in ClinVar (<0.5: benign, 0.5: uncertain, >0.5 pathogenic). For EVE, we used the thresholds recommended in the original publication for the 75% most confident classifications ( ≤ 0.359: benign, 0.359-0.641: uncertain, ≥0.641: pathogenic). For CADD, we used the same thresholds used in the MLOF clinical laboratory (<10: benign, 10–20: uncertain, >20: pathogenic). Number of variants scored by each predictor is annotated. c, Classification accuracy for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF in our test set (benign/likely benign, n = 4 variants; pathogenic/likely pathogenic, n = 34 variants) by MultiSTEP variant function classifier and the four variant effect predictors using thresholds defined in (b). True benign/likely benign and pathogenic/likely pathogenic labels are denoted on the x-axis, and columns are colored relative to the classification for each method. Solid colors indicate correct classification, whereas striped colors indicate incorrect classification. For variant effect predictors, missing variants are colored gray with stripes and uncertain predictions are colored yellow with stripes. PPV, positive predictive value; NPV, negative predictive value; Spec, specificity; Sens, sensitivity.

Extended Data Fig. 10 Detection of cell-surface displayed FVIII.

Experimental flow cytometry of B-domain deleted coagulation factor VIII (FVIII) in the MultiSTEP backbone (n = ~30,000 cells per variant). Unrecombined cells (NC) do not display FVIII and serve as a negative control. Fluorescent signal was generated by staining cells with anti-FVIII antibodies specific to each of the five FVIII domains in the heavy chain [A1 (a) and A2 (b)] and light chain [A3 (c), C1 (d), and C2 (e)].

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Reporting Summary

Peer Review File

Supplementary Table 1

Library and assay statistics. Number of FIX variants assessed at various stages of library preparation and in each MultiSTEP assay.

Supplementary Table 2

Variants with discordant circulating antigen and secretion scores. Variants with discordant secretion scores and FIX plasma antigen from individuals with hemophilia B in the EAHAD database11. Variants with undetectable FIX antigen are labeled as <1%, as reported by the clinical laboratory in EAHAD. Variants are classified as low antigen if they have a mean circulating FIX antigen of <40%. Variants are classified as low secretion if they have a mean secretion score that is less than 0.795, which is the 5th percentile of the synonymous secretion score distribution. SE: standard error of the mean.

Supplementary Table 3

Random forest model classifications for 8,964 F9 variants. Mean secretion scores, γ-carboxylation scores and functional variant classifications made using the random forest classifier we trained on known pathogenic or likely pathogenic and benign or likely benign F9 missense variants (see Methods). Variants without functional scores for all antibodies were removed before classifier implementation and do not have associated functional predictions.

Supplementary Table 4

Classification criteria and reclassification results for My Life, Our Future F9 variants. Classification criteria, random forest classifier predictions and resultant pathogenicity classifications for 214 F9 variants from My Life, Our Future12. Classification criteria (columns BP1 through PVS1 as defined by the American College of Medical Genetics and Genomics2) were used in variant curation by clinical experts in hemophilia genetics based on available clinical data, databases and literature review. PVS1, null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multi-exon deletion) in a gene where loss of function is a known mechanism of disease. PS1, same amino acid change as a previously established pathogenic variant regardless of nucleotide change. PS2, de novo (both maternity and paternity confirmed) in a patient with the disease and no family history. PS3, well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. PS4, the prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls. PM1, located in a mutational hot spot and/or critical and well-established functional domain (for example, active site of an enzyme) without benign variation. PM2, absent from controls (or at extremely low frequency if recessive) in population databases. PM3, for recessive disorders, detected in trans with a pathogenic variant. PM4, protein length changes as a result of in-frame deletions and insertions in a non-repeat region or stop-loss variants. PM5, novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before. PM6, assumed de novo, but without confirmation of paternity and maternity. PP1, cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease. PP2, missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease. PP3, multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact and so on). PP4, patient’s phenotype or family history is highly specific for a disease with a single genetic etiology. PP5, reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation. BA1, allele frequency is >5% in population databases. BS1, allele frequency is greater than expected for disorder. BS2, observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous) or X-linked (hemizygous) disorder, with full penetrance expected at an early age. BS3, well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing. BS4, lack of segregation in affected members of a family. BP1, missense variant in a gene for which primarily truncating variants are known to cause disease. BP2, observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern. BP3, in-frame deletions and insertions in a repetitive region without a known function. BP4, multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact and so on). BP5, variant found in a case with an alternate molecular basis for disease. BP6, reputable source recently reports variant as benign, but the evidence is not available to the laboratory to perform an independent evaluation. BP7, a synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence or the creation of a new splice site and the nucleotide is not highly conserved. Original interpretations as well as reinterpretations using random forest classifier predictions as either moderate or strong evidence are included.

Supplementary Table 5

Oligonucleotides. Oligonucleotides used in this study.

Supplementary Table 6

FIX variant nomenclature. HGVS, legacy and chymotrypsin numbering systems for each position in WT FIX. In chymotrypsin numbering, some position values in flexible loops that are not conserved repeat.

Supplementary Table 7

Detailed cloning information. Description of plasmids used in this study and the primers, oligonucleotides, cDNAs, and gene fragments used to clone them. Method used for cloning is labeled.

Supplementary Table 8

Detailed antibody information. Properties, stock concentrations, and experimental conditions for each assayed antibody. Epitopes, when known, are provided.

Supplementary Table 9

Replicate correlations. PCR replicate correlations for each genomic DNA-derived barcode library amplification.

Supplementary Table 10

Curated F9 variants from MLOF, gnomAD and ClinVar used for training variant function classifier. F9 variants of known effect were collected from MLOF, gnomAD, and ClinVar and independently reassessed. MLOF variants with normal FIX activity (FIX:C) are denoted as benign. Variants found in gnomAD 4.0 with a minor allele frequency (MAF) ≥ 0.001 were classified as benign. Only ClinVar variants with assertion criteria (stars > 0) were included.

Supplementary Table 11

Case data for FIX variants in EAHAD. Variant-level data for FIX variants curated from the EAHAD FIX variant database (accessed 10/9/2023).

Supplementary Table 12

MultiSTEP variant scores. Table of FIX secretion and γ-carboxylation scores for nearly all possible missense and synonymous FIX variants. Scores are derived from each of the five antibodies presented in this work. The number of unique variants scored for each antibody ranges between 8,961 and 8,964 (Strep II tag antibody, 8,961; heavy chain antibody, 8,963; light chain antibody, 8,964; carboxylation-sensitive FIX Gla antibody, 8,964; carboxylation-sensitive Gla-motif antibody, 8,964). There are a total of 44,816 measured secretion and γ-carboxylation variant effects.

Supplementary Data 1

Statistical source data for Supplementary Figs. 1 and 2.

Source data

Source Data

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Popp, N.A., Powell, R.L., Wheelock, M.K. et al. Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP. Nat Struct Mol Biol (2025). https://doi.org/10.1038/s41594-025-01582-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1038/s41594-025-01582-w

This article is cited by

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research