Abstract
Despite widespread advances in DNA sequencing, the functional consequences of most genetic variants remain poorly understood. Multiplexed assays of variant effect can measure the function of variants at scale but cannot readily be applied to the ~10% of human genes encoding secreted proteins. Here we develop a flexible, scalable human cell surface display method, multiplexed surface tethering of extracellular proteins (MultiSTEP), to study the consequences of missense variation in coagulation factor IX (FIX), a serine protease in which genetic variation can cause hemophilia B. We combine MultiSTEP with a panel of antibodies to detect FIX secretion and post-translational modification (PTM), measuring 44,816 variant effects for 436 synonymous variants and 8,528 of the 8,759 possible F9 missense variants. Almost half of missense variants impact secretion, PTM or both. We also identify functional constraints on secretion within the signal peptide and for nearly all gain or loss of cysteine variants. Secretion scores correlate strongly with FIX levels in hemophilia B and reveal that loss-of-secretion variants are more often associated with severe disease. Integration of the secretion and PTM scores enables reclassification of 63.1% of F9 variants of uncertain significance in the My Life, Our Future hemophilia genotyping project. Lastly, we show that MultiSTEP can be applied to other secreted proteins, thus demonstrating that MultiSTEP is a multiplexed, multimodal and generalizable method for systematically assessing variant effects in secreted proteins at scale.
This is a preview of subscription content, access via your institution
Access options
Access Nature and 54 other Nature Portfolio journals
Get Nature+, our best-value online-access subscription
$32.99 / 30 days
cancel any time
Subscribe to this journal
Receive 12 print issues and online access
$259.00 per year
only $21.58 per issue
Buy this article
- Purchase on SpringerLink
- Instant access to full article PDF
Prices may be subject to local taxes which are calculated during checkout






Similar content being viewed by others
Data availability
VAMP-seq abundance scores for PTEN (urn:mavedb:00000013-a-1), TPMT (urn:mavedb:00000013-b-1), VKOR (urn:mavedb:00000078-b-1), CYP2C9 (urn:mavedb:00000095-b-1) and NUDT15 (urn:mavedb:00000055-a-1) were downloaded from MaveDB109. Gla domain protein sequences for human FIX (P00740), human prothrombin (coagulation factor II; P00734), human coagulation factor VII (P08709), human coagulation factor X (P00742), human protein C (P04070), human protein S (P07225), human osteocalcin (bone gla-protein; P02818), bovine osteocalcin (bone Gla-protein; P02820), human growth arrest-specific protein 6 (P14393), Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93), Notechis scutatis venom prothrombin activator notecarin-D1 (P82807) and Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96) were downloaded from UniProt.org. ClinVar variants are publicly available at https://www.ncbi.nlm.nih.gov/clinvar. gnomAD (v.4.1) variants are available at https://gnomad.broadinstitute.org. MLOF variants have been previously deposited into the EAHAD FIX clinical database (https://dbs.eahad.org/FIX) and the CDC CHBMP database (https://www.cdc.gov/hemophilia/mutation-project/index.html) and published11,12. A complete set of MLOF variants used in this study, along with relevant information about these variants, is provided in Supplementary Table 4. F9 variant scores are available in Supplementary Table 12 and at MaveDB (www.https://www.mavedb.org; urn:mavedb:00001200). Raw sequencing, barcode–variant maps and scores are available in the NCBI Gene Expression Omnibus repository (GSE242805). All other data files are provided at https://github.com/FowlerLab/2024_multistep. All other data supporting the findings of this study are available from the corresponding author on reasonable request. Source data are provided with this paper.
Code availability
All code to reproduce the analyses and figures is available on GitHub at https://github.com/FowlerLab/2024_multistep. Versions of R packages used for analyses are described in the code file on GitHub.
References
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).
Fayer, S. et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021).
Tabet, D., Parikh, V., Mali, P., Roth, F. P. & Claussnitzer, M. Scalable functional assays for the interpretation of human genetic variation. Annu. Rev. Genet. 56, 441–465 (2022).
Uhlén, M. et al. The human secretome. Sci. Signal. 12, eaaz0274 (2019).
Freedman, S. J., Furie, B. C., Furie, B. & Baleja, J. D. Structure of the metal-free γ-carboxyglutamic acid-rich membrane binding region of factor IX by two-dimensional NMR spectroscopy. J. Biol. Chem. 270, 7980–7987 (1995).
Freedman, S. J. et al. Identification of the phospholipid binding site in the vitamin K-dependent blood coagulation protein factor IX. J. Biol. Chem. 271, 16227–16236 (1996).
Shikamoto, Y., Morita, T., Fujimoto, Z. & Mizuno, H. Crystal structure of Mg2+- and Ca2+-bound Gla domain of factor IX complexed with binding protein. J. Biol. Chem. 278, 24090–24094 (2003).
Huang, M., Furie, B. C. & Furie, B. Crystal structure of the calcium-stabilized human factor IX Gla domain bound to a conformation-specific anti-factor IX antibody. J. Biol. Chem. 279, 14338–14346 (2004).
Zacchi, L. F. et al. Coagulation factor IX analysis in bioreactor cell culture supernatant predicts quality of the purified product. Commun. Biol. 4, 390 (2021).
Rallapalli, P. M., Kemball-Cook, G., Tuddenham, E. G., Gomez, K. & Perkins, S. J. An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B. J. Thromb. Haemost. 11, 1329–1340 (2013).
Johnsen, J. M. et al. Results of genetic analysis of 11,341 participants enrolled in the My Life, Our Future hemophilia genotyping initiative in the United States. J. Thromb. Haemost. 20, 2022–2034 (2022).
Konkle, B. A., Josephson, N. C. & Nakaya Fletcher, S. in GeneReviews (eds Pagon, R. A. et al.) (University of Washington, 2023).
MASAC Document 273—Recommendations on Genotyping for Persons with Hemophilia (National Hemophilia Foundation, 2022); https://www.hemophilia.org/healthcare-professionals/guidelines-on-care/masac-documents/masac-document-273-recommendations-on-genotyping-for-persons-with-hemophilia
Gao, W. et al. Characterization of missense mutations in the signal peptide and propeptide of FIX in hemophilia B by a cell-based assay. Blood Adv. 4, 3659–3667 (2020).
Matreyek, K. A., Stephany, J. J., Chiasson, M. A., Hasle, N. & Fowler, D. M. An improved platform for functional assessment of large protein libraries in mammalian cells. Nucleic Acids Res. 48, e1 (2020).
Savoldo, B. et al. CD28 costimulation improves expansion and persistence of chimeric antigen receptor-modified T cells in lymphoma patients. J. Clin. Invest. 121, 1822–1826 (2011).
Esensten, J. H., Helou, Y. A., Chopra, G., Weiss, A. & Bluestone, J. A. CD28 costimulation: from mechanism to therapy. Immunity 44, 973–988 (2016).
Liu, L. et al. Inclusion of Strep-tag II in design of antigen receptors for T-cell immunotherapy. Nat. Biotechnol. 34, 430–434 (2016).
Bicocchi, M. P. et al. Insight into molecular changes of the FIX protein in a series of Italian patients with haemophilia B. Haemophilia 12, 263–270 (2006).
Matreyek, K. A. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018).
Suiter, C. C. et al. Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Proc. Natl Acad. Sci. USA 117, 5394–5401 (2020).
Amorosi, C. J. et al. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am. J. Hum. Genet. 108, 1735–1751 (2021).
Kurys, G., Tagaya, Y., Bamford, R., Hanover, J. A. & Waldmann, T. A. The long signal peptide isoform and its alternative processing direct the intracellular trafficking of interleukin-15. J. Biol. Chem. 275, 30653–30659 (2000).
Owji, H., Nezafat, N., Negahdaripour, M., Hajiebrahimi, A. & Ghasemi, Y. A comprehensive review of signal peptides: structure, roles, and applications. Eur. J. Cell Biol. 97, 422–441 (2018).
Tikhonova, E. B., Karamysheva, Z. N., von Heijne, G. & Karamyshev, A. L. Silencing of aberrant secretory protein expression by disease-associated mutations. J. Mol. Biol. 431, 2567–2580 (2019).
Liaci, A. M. et al. Structure of the human signal peptidase complex reveals the determinants for signal peptide cleavage. Mol. Cell 81, 3934–3948.e11 (2021).
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).
Gutierrez Guarnizo, S. A. et al. Pathogenic signal peptide variants in the human genome. NAR Genom. Bioinform. 5, lqad093 (2023).
Braakman, I. & Hebert, D. N. Protein folding in the endoplasmic reticulum. Cold Spring Harb. Perspect. Biol. 5, a013201 (2013).
Zhang, H. et al. Unpaired extracellular cysteine mutations of CSF3R mediate gain or loss of function. Cancer Res. 77, 4258–4267 (2017).
Woodard, D. R. et al. A loss-of-function cysteine mutant in fibulin-3 (EFEMP1) forms aberrant extracellular disulfide-linked homodimers and alters extracellular matrix composition. Hum. Mutat. 43, 1945–1955 (2022).
Chiasson, M. A. et al. Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact. eLife 9, e58026 (2020).
Yariv, B. et al. Using evolutionary data to make sense of macromolecules with a ‘face-lifted’ ConSurf. Protein Sci. 32, e4582 (2023).
Huang, M. et al. Structural basis of membrane binding by Gla domains of vitamin K-dependent proteins. Nat. Struct. Biol. 10, 751–756 (2003).
Grant, M. A., Baikeev, R. F., Gilbert, G. E. & Rigby, A. C. Lysine 5 and phenylalanine 9 of the factor IX omega-loop interact with phosphatidylserine in a membrane-mimetic environment. Biochemistry 43, 15367–15378 (2004).
Feuerstein Giora, Z. et al. Antithrombotic efficacy of a novel murine antihuman factor IX antibody in rats. Arterioscler. Thromb. Vasc. Biol. 19, 2554–2562 (1999).
Aktimur, A., Gabriel, M. A., Gailani, D. & Toomey, J. R. The factor IX γ-carboxyglutamic acid (Gla) domain is involved in interactions between factor IX and factor XIa. J. Biol. Chem. 278, 7981–7987 (2003).
Brown, M. A., Stenberg, L. M., Persson, U. & Stenflo, J. Identification and purification of vitamin K-dependent proteins and peptides with monoclonal antibodies specific for γ-carboxyglutamyl (Gla) residues. J. Biol. Chem. 275, 19795–19802 (2000).
Whitlon, D. S., Sadowski, J. A. & Suttie, J. W. Mechanism of coumarin action: significance of vitamin K epoxide reductase inhibition. Biochemistry 17, 1371–1377 (1978).
Rabiet, M. J., Jorgensen, M. J., Furie, B. & Furie, B. C. Effect of propeptide mutations on post-translational processing of factor IX. Evidence that β-hydroxylation and γ-carboxylation are independent events. J. Biol. Chem. 262, 14895–14898 (1987).
Furie, B. & Furie, B. C. Molecular basis of vitamin K-dependent γ-carboxylation. Blood 75, 1753–1762 (1990).
Gillis, S. et al. Ɣ-Carboxyglutamic acids 36 and 40 do not contribute to human factor IX function. Protein Sci. 6, 185–196 (1997).
Stenina, O., Pudota, B. N., McNally, B. A., Hommema, E. L. & Berkner, K. L. Tethered processivity of the vitamin K-dependent carboxylase: factor IX is efficiently modified in a mechanism which distinguishes Gla’s from Glu’s and which accounts for comprehensive carboxylation in vivo. Biochemistry 40, 10301–10309 (2001).
Bristol, J. A., Freedman, S. J., Furie, B. C. & Furie, B. Profactor IX: the propeptide inhibits binding to membrane surfaces and activation by factor XIA. Biochemistry 33, 14136–14143 (1994).
Wolberg, A. S. et al. Characterization of γ-carboxyglutamic acid residue 21 of human factor IX. Biochemistry 35, 10321–10327 (1996).
Wojcik, E. G., Van Den Berg, M., Poort, S. R. & Bertina, R. M. Modification of the N-terminus of human factor IX by defective propeptide cleavage or acetylation results in a destabilized calcium-induced conformation: effects on phospholipid binding and activation by factor XIa. Biochem. J. 323, 629–636 (1997).
Ware, J. et al. Factor IX San Dimas. Substitution of glutamine for Arg-4 in the propeptide leads to incomplete γ-carboxylation and altered phospholipid binding properties. J. Biol. Chem. 264, 11401–11406 (1989).
Liebman, H. A. The metal-dependent conformational changes in factor IX associated with phospholipid binding. Studies using antibodies against a synthetic peptide and chemical modification of factor IX. Eur. J. Biochem. 212, 339–345 (1993).
Jacobs, M., Freedman, S. J., Furie, B. C. & Furie, B. Membrane binding properties of the factor IX gamma-carboxyglutamic acid-rich domain prepared by chemical synthesis. J. Biol. Chem. 269, 25494–25501 (1994).
Agah, S. & Bajaj, S. P. Role of magnesium in factor XIa catalyzed activation of factor IX: calcium binding to factor IX under physiologic magnesium. J. Thromb. Haemost. 7, 1426–1428 (2009).
Westmark, P. R., Tanratana, P. & Sheehan, J. P. Selective disruption of heparin and antithrombin-mediated regulation of human factor IX. J. Thromb. Haemost. 13, 1053–1063 (2015).
Plautz, W. E. et al. Anticoagulant protein S targets the factor IXa heparin-binding exosite to prevent thrombosis. Arterioscler. Thromb. Vasc. Biol. 38, 816–828 (2018).
Cooley, B. et al. Dysfunctional endogenous FIX impairs prophylaxis in a mouse hemophilia B model. Blood 133, 2445–2451 (2019).
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
Iorio, A. et al. Establishing the prevalence and prevalence at birth of hemophilia in males: a meta-analytic approach using national registries. Ann. Intern. Med. 171, 540–546 (2019).
Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).
Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).
Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021).
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
Livesey, B. J. & Marsh, J. A. Updated benchmarking of variant effect predictors using deep mutational scanning. Mol. Syst. Biol. 19, e11474 (2023).
Tavtigian, S. V. et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 20, 1054–1060 (2018).
Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2020).
Fokkema, I. F. A. C. et al. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32, 557–563 (2011).
Matamala, N. et al. Characterization of novel missense variants of SERPINA1 gene causing alpha-1 antitrypsin deficiency. Am. J. Respir. Cell Mol. Biol. 58, 706–716 (2018).
McVey, J. H. et al. The European Association for Haemophilia and Allied Disorders (EAHAD) Coagulation Factor Variant Databases: important resources for haemostasis clinicians and researchers. Haemophilia 26, 306–313 (2020).
Seixas, S. & Marques, P. I. Known mutations as the cause of alpha-1 antitrypsin deficiency: an updated overview of SERPINA1 variation spectrum. Appl. Clin. Genet. 14, 173–194 (2021).
Gai, S. A. & Wittrup, K. D. Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol. 17, 467–473 (2007).
Salema, V. & Fernández, L. Á. Escherichia coli surface display for the selection of nanobodies. Microb. Biotechnol. 10, 1468–1484 (2017).
Ho, M. & Pastan, I. in Therapeutic Antibodies: Methods and Protocols (ed. Dimitrov, A. S.) 337–352 (Humana Press, 2009).
Frank, F. et al. Deep mutational scanning identifies SARS-CoV-2 nucleocapsid escape mutations of currently available rapid antigen tests. Cell 185, 3603–3616.e13 (2022).
Parthiban, K. et al. A comprehensive search of functional sequence space using large mammalian display libraries created by gene editing. mAbs 11, 884–898 (2019).
Vink, T., Oudshoorn-Dickmann, M., Roza, M., Reitsma, J.-J. & de Jong, R. N. A simple, robust and highly efficient transient expression system for producing antibodies. Methods 65, 5–10 (2014).
do Amaral, R. L. F. et al. Approaches for recombinant human factor IX production in serum-free suspension cultures. Biotechnol. Lett. 38, 385–394 (2016).
Duportet, X. et al. A platform for rapid prototyping of synthetic gene networks in mammalian cells. Nucleic Acids Res. 42, 13440–13451 (2014).
Zhu, F. et al. DICE, an efficient system for iterative genomic editing in human pluripotent stem cells. Nucleic Acids Res. 42, e34 (2014).
Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017).
Starita, L. M. et al. A multiplex homology-directed DNA repair assay reveals the impact of more than 1,000 BRCA1 missense substitution variants on protein function. Am. J. Hum. Genet. 103, 498–508 (2018).
Hasle, N. et al. High-throughput, microscope-based sorting to dissect cellular heterogeneity. Mol. Syst. Biol. 16, e9442 (2020).
Low, B. E., Hosur, V., Lesbirel, S. & Wiles, M. V. Efficient targeted transgenesis of large donor DNA into multiple mouse genetic backgrounds using bacteriophage Bxb1 integrase. Sci. Rep. 12, 5424 (2022).
Durrant, M. G. et al. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome. Nat. Biotechnol. 41, 488–499 (2023).
Zhang, M. et al. SHIELD: a platform for high-throughput screening of barrier-type DNA elements in human cells. Nat. Commun. 14, 5616 (2023).
Aslanzadeh, V. et al. Deep mutational scanning of the human insulin receptor ectodomain to inform precision therapy for insulin resistance. Preprint at bioRxiv https://doi.org/10.1101/2024.09.07.611782 (2024).
Blanch-Asensio, A. et al. STRAIGHT-IN Dual: a platform for dual, single-copy integrations of DNA payloads and gene circuits into human induced pluripotent stem cells. Preprint at bioRxiv https://doi.org/10.1101/2024.10.17.616637 (2024).
Boyle, G. E. et al. Deep mutational scanning of CYP2C19 in human cells reveals a substrate specificity-abundance tradeoff. Genetics 228, iyae156 (2024).
Hew, B. E. et al. Directed evolution of hyperactive integrases for site specific insertion of transgenes. Nucleic Acids Res. 52, e64 (2024).
Hong, C. K. Y. et al. Massively parallel characterization of insulator activity across the genome. Nat. Commun. 15, 8350 (2024).
Huhtinen, O., Prince, S., Lamminmäki, U., Salbo, R. & Kulmala, A. Increased stable integration efficiency in CHO cells through enhanced nuclear localization of Bxb1 serine integrase. BMC Biotechnol. 24, 44 (2024).
Kent, J. D., Klug, L. R. & Heinrich, M. C. A novel human SDHA-knockout cell line model for the functional analysis of clinically relevant SDHA variants. Clin. Cancer Res. 30, 5399–5412 (2024).
Kim, J., Muller, R. Y., Bondra, E. R. & Ingolia, N. T. CRISPRi with barcoded expression reporters dissects regulatory networks in human cells. Preprint at bioRxiv https://doi.org/10.1101/2024.09.06.611573 (2024).
Pandey, S. et al. Efficient site-specific integration of large genes in mammalian cells via continuously evolved recombinases and prime editing. Nat. Biomed. Eng. 9, 22–39 (2025).
Wang, Z., Sarkar, A. & Ge, X. De novo functional discovery of peptide-MHC restricted CARs from recombinase-constructed large-diversity monoclonal T cell libraries. Preprint at bioRxiv https://doi.org/10.1101/2024.11.27.625413 (2024).
Acharya, P., Quinlan, A. & Neumeister, V. The ABCs of finding a good antibody: how to find a good antibody, validate it, and publish meaningful data. F1000Res. 6, 851 (2017).
Baron, M. et al. The three-dimensional structure of the first EGF-like module of human factor IX: comparison with EGF and TGF-α. Protein Sci. 1, 81–90 (1992).
Johnson, D. J. D., Langdown, J. & Huntington, J. A. Molecular basis of factor IXa recognition by heparin-activated antithrombin revealed by a 1.7-Å structure of the ternary complex. Proc. Natl Acad. Sci. USA 107, 645–650 (2010).
UniProt Consortium. UniProt: the universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
García-Nafría, J., Watson, J. F. & Greger, I. H. IVA cloning: a single-tube universal cloning system exploiting bacterial in vivo assembly. Sci. Rep. 6, 27459 (2016).
den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).
Miao, H. Z. et al. Bioengineering of coagulation factor VIII for improved secretion. Blood 103, 3412–3419 (2004).
Kessler, C. M. et al. B-domain deleted recombinant factor VIII preparations are bioequivalent to a monoclonal antibody purified plasma-derived factor VIII concentrate: a randomized, three-way crossover study. Haemophilia 11, 84–91 (2005).
Ward, N. J. et al. Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood 117, 798–807 (2011).
Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014).
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Yeh, C.-L. C., Amorosi, C. J., Showman, S. & Dunham, M. J. PacRAT: a program to improve barcode–variant mapping from PacBio long reads using multiple sequence alignment. Bioinformatics 38, 2927–2929 (2022).
Kulkarni, R. et al. Sites of initial bleeding episodes, mode of delivery and age of diagnosis in babies with haemophilia diagnosed before the age of 2 years: a report from the Centers for Disease Control and Prevention’s (CDC) Universal Data Collection (UDC) project. Haemophilia 15, 1281–1290 (2009).
Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).
Menardi, G. & Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 28, 92–122 (2014).
Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019).
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).
Acknowledgements
We thank A. P. Leith, C. Lee, D. E. Prunkard and A. Silvestroni of the University of Washington (UW) Foege Flow Lab and the UW Pathology Flow Cytometry Core for their assistance with cell analysis, staining and sorting; K. M. Munson of the UW PacBio Sequencing Service for assistance with long-read PacBio sequencing; D. A. Nickerson, L. M. Starita, D. J. Maly, S. Nariya, J. J. Stephany and A. E. McEwen for advice on analyzing data and feedback on the paper. We thank S. W. Pipe and A. Scheller of the University of Michigan Department of Pediatrics and Department of Hematology for providing FVIII constructs and advice on FVIII expression. We thank J. Kulman for discussions about FIX carboxylation. We thank R. Kruse-Jarres for her commitment to supporting research that improves the lives of people living with bleeding disorders. We thank and acknowledge B. A. Konkle, the Principal Investigator of MLOF, the MLOF partners at Bloodworks, the American Thrombosis and Hemostasis Network, the National Hemophilia Foundation (now the National Bleeding Disorders Foundation), funding from Biogen/Bioverativ, providers and staff at HTC sites, and the 11,341 participants who made MLOF a success. This work was supported by the National Heart, Lung and Blood Institute (R01HL152066 to J.M.J. and D.M.F.; F30HL151075 to N.A.P.; R01HL149855 to J.P.S.), the National Human Genome Research Institute (RM1HG010461 and UM1HG011969 to D.M.F.), the National Institute of General Medical Sciences (R01GM109110 to D.M.F.), and the Washington Center for Bleeding Disorders (to J.M.J.). The funders of this work had no role in the study design, data collection, analysis, decision to publish or preparation of this manuscript.
Author information
Authors and Affiliations
Contributions
N.A.P., K.W.L., J.P.S., J.M.J. and D.M.F. conceived of the work. N.A.P., J.P.S., J.M.J. and D.M.F. wrote the paper. N.A.P., R.L.P., M.K.W., B.D.Z., K.J.H., K.M.S., X.W., K.W.L. and A.T.C. performed experiments. N.A.P. performed the statistical and computational analysis. N.A.P., S.N.F. and J.M.J. manually curated ClinVar, gnomAD and MLOF for variants. N.A.P. and A.F.R. prepared results for distribution. N.A.P., S.N.F., S.F. and J.M.J. performed variant reinterpretation. All authors approved the final paper.
Corresponding authors
Ethics declarations
Competing interests
J.P.S. was an expert witness for Genentech and Paul, Weiss, Rifkind, Wharton and Garrison. The other authors declare no competing interests.
Peer review
Peer review information
Nature Structural & Molecular Biology thanks Giorgio Galli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Dimitris Typas, in collaboration with the Nature Structural & Molecular Biology team.
Additional information
Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Extended data
Extended Data Fig. 1 MultiSTEP is based on a flexible genomically integrated approach for expressing secreted protein variants.
a, Cartoon depicting integration of a MultiSTEP plasmid construct into a genomically integrated landing pad cassette16. (Top): Lentivirally integrated landing pad cassette expressing mTagBFP2+ (royal blue) from a tetON inducible promoter. mTagBFP2 is fused to an inducible caspase-9 (iCasp9, orange) and a blasticidin resistance gene (dark yellow) with 2A sequences (dark pink). mTagBFP2-2A-iCasp9-2A-BlastR is expressed from a tetON inducible promoter with a attP serine recombinase recognition site (black) in between. Downstream is a terminator sequence (Term, brown) and tet repressor (tetR, salmon). Bxb1 serine recombinase, expressed from another plasmid, is shown in grey. (Middle): MultiSTEP plasmid construct. Secreted protein coding sequence (Sec Pro, pink) is C-terminally fused to flexible linkers (L1 and L2, teal), Strep II tag (St, green), and CD28 transmembrane domain (TM, medium blue). IRES (purple) drives co-transcription of mCherry (red). Upstream is an attB serine recombinase recognition sequence (goldenrod) and a unique 18 nucleotide degenerate barcode (BC, light yellow). (Bottom): Landing pad following plasmid integration. attP and attB sequences have been recombined, forming attL and attR sequences. b, Sequential flow cytometry gating scheme for detecting and isolating landing pad cells with an integrated MultiSTEP construct. Dot pseudocolor indicates density of cells. FSC: Forward scatter; SSC: side scatter. c, Comparison of negative control 293-F cells (top) with 293-F cells incubated with lentivirus encoding the landing pad cassette (bottom, n > 10,000 cells). d, Comparison of unrecombined landing pad cells (top) with cells transfected with a MultiSTEP plasmid encoding WT FIX (bottom, n > 10,000 cells). e, Comparison of cells transfected with a MultiSTEP construct encoding WT FIX treated with doxycycline (top) or doxycycline and 10 nM AP1903 (bottom, n > 10,000 cells). f, Design iterations of MultiSTEP construct plasmid in (a). L1-Strep MultiSTEP construct does not contain an L2 linker. Flow cytometry of MultiSTEP constructs using a anti-Strep II tag antibody (n ~ 30,000 cells).
Extended Data Fig. 2 A flexible tag-based approach to assessing variant effects on secretion.
a, Heatmap showing strep tag secretion scores for missense FIX variants. Color indicates MultiSTEP score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. b, Density distributions of strep tag secretion scores for FIX missense variants (orange) and synonymous variants (blue). Dashed line denotes the 5th percentile of the synonymous variant distribution. c, Scatter plot comparing MultiSTEP-derived strep tag secretion scores for seven different FIX variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.C134R, p.S220T, and p.H267L), WT, and an unrecombined negative control to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry individually (n = 3 replicates). Error bars show standard error of the mean. d-e, Scatter plots of median MultiSTEP-derived strep tag secretion scores and heavy chain (d) or light chain (e) at each position in FIX (n = 3 replicates). Points are colored by chain architecture, using the color scheme as Fig. 2a. Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation. f, Density plots of MultiSTEP-derived synonymous variant scores generated with the indicated antibody. The dashed vertical line shows WT score.
Extended Data Fig. 3 MultiSTEP-derived FIX secretion scores correlate with orthologous measures of FIX secretion.
a, Flow cytometry of p.C28Y and WT controls (n = 10,000 cells) with the FIX library (n = 100,000 cells). b, Comparison of ELISA measurements of eight untethered FIX missense variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.G125V, p.C134R, p.S220T, and p.H267L) expressed from 293-F cells and heavy chain secretion scores (n = 3 replicates). Error bars show the standard error of the mean. Pearson’s correlation coefficient is shown. c, Scatter plot comparing MultiSTEP-derived heavy chain secretion scores for 20 different FIX missense variants, WT, and unrecombined negative control (n = 3 replicates) to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry on cells expressing each variant individually. Error bars show standard error of the mean (n = 10,000 cells). Line of best fit (dashed) and Pearson’s correlation coefficient are shown.
Extended Data Fig. 4 Variants near antibody epitopes demonstrate minor effects on secretion scores.
a, Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the light chain and the nearest light chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with predicted local distance difference test score (pLDDT) of <70 were removed from analysis. Color indicates whether a position was identified in the light chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (9.15 angstroms). b, Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the heavy chain and the nearest heavy chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with pLDDT of <70 were removed from analysis. Color indicates whether a position was identified in the heavy chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (5.71 angstroms). c, Scatter plot of median MultiSTEP-derived heavy chain and light chain secretion scores at each position in FIX. Points are colored by epitope (Fig. 2h) or epitope-adjacent position as in (a) and (b). Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation.
Extended Data Fig. 5 Effect of missense FIX variation on secretion compared to missense variant effects on abundance in cytosolic or transmembrane proteins.
a, Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) scores for all nonsynonymous variants across all positions with the indicated WT amino acid for six different proteins21,22,23,33 (n = 29,287 variants). Whiskers span the range of data. b, Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) for all nonsynonymous variant amino acid substitutions across all positions for six different proteins (n = 29,287 variants).
Extended Data Fig. 6 Sequence conservation strongly influences the effect of variation on FIX secretion.
a, Comparison of light chain secretion scores with Consurf conservation grades (1: least conserved, 9: most conserved)34. Violin plot shows distribution of points (n = 8,528 variants) with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. b, Comparison of median light chain secretion scores (n = 8,528 variants) with Consurf conservation grades. Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution.
Extended Data Fig. 7 Carboxylation-sensitive antibodies identify functional motifs.
a, Multiple sequence alignment of Gla-domain containing proteins (UniProt) that bind the carboxylation-sensitive Gla-motif (ExxxExC) antibody using MUSCLE96,110. Antibody epitopes for both the carboxylation-sensitive FIX-specific antibody (ω-loop) and the carboxylation-sensitive Gla-motif antibody are shown. hF9, human coagulation factor IX (P00740); hF2, human prothrombin (coagulation factor II, P00734); hF7, human coagulation factor VII (P08709); hF10, human coagulation factor X (P00742); hPC, human protein C (P04070); hPS, human protein S (P07225); hBGP, human osteocalcin (P02818); bBGP, bovine osteocalcin (P02820); hGAS6, human growth arrest-specific protein 6 (P14393); ppVPA, Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93); nsVPA, Notechis scutatis venom prothrombin activator notecarin-D1 (P82807); osVPA, Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96). b-c, Fluorescence of unrecombined negative control and WT FIX-expressing cells with and without warfarin pretreatment generated by staining cells with a carboxylation-sensitive FIX-specific (b) or carboxylation-sensitive Gla-motif antibody (c). d-f, Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (d), carboxylation-sensitive Gla-motif carboxylation scores (e), or light chain secretion scores (f) for FIX propeptide variants. Furin cleavage site (Furin CS), ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (d). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. g-i, Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (g), carboxylation-sensitive Gla-motif carboxylation scores (h), or light chain secretion scores (i) for FIX Gla domain variants. Furin cleavage site (Furin CS) is annotated above (d). ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (g). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray.
Extended Data Fig. 8 Clinical correlates of secretion and gamma-carboxylation scores map to FIX biochemical features.
a, Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals with hemophilia B in the EAHAD database (n = 416 variants). Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 19 variants across 38 individuals)11. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. b, Comparison of hemophilia B severity from individuals with hemophilia B in the EAHAD database (n = 1,781 variants) with light chain secretion scores. Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 40 variants). Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. p values from a Kruskal–Wallis test adjusted for multiple comparisons by post-hoc Dunn’s test are shown. c, Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals harboring gain-of-cysteine variants in the EAHAD database (n = 9 variants across 27 individuals)11. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. d, Bar plot of hemophilia B disease severity in the EAHAD database for individuals harboring gain-of-cysteine variants. e, Bar plot of the number of FIX variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction. f, Bar plot of the number of FIX propeptide and Gla domain variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction.
Extended Data Fig. 9 Random forest model predictions for FIX variants in the EAHAD FIX Variant Database associated with hemophilia B.
a, Spearman correlation of MultiSTEP functional scores with EVE, AlphaMissense, REVEL, and CADD variant effect predictors. b, Histograms of four variant effect predictor scores for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF. Color indicates clinical variant interpretation. Data from four variant effect predictors are shown. Black dashed vertical lines indicate the thresholds for each predictor. For AlphaMissense we used the thresholds recommended in the original publication for 90% precision on existing ClinVar annotated variants ( ≤ 0.34: benign, 0.34-0.564: uncertain, ≥0.564: pathogenic). For REVEL, we used the thresholds used in the initial publication to assess REVEL’s precision in ClinVar (<0.5: benign, 0.5: uncertain, >0.5 pathogenic). For EVE, we used the thresholds recommended in the original publication for the 75% most confident classifications ( ≤ 0.359: benign, 0.359-0.641: uncertain, ≥0.641: pathogenic). For CADD, we used the same thresholds used in the MLOF clinical laboratory (<10: benign, 10–20: uncertain, >20: pathogenic). Number of variants scored by each predictor is annotated. c, Classification accuracy for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF in our test set (benign/likely benign, n = 4 variants; pathogenic/likely pathogenic, n = 34 variants) by MultiSTEP variant function classifier and the four variant effect predictors using thresholds defined in (b). True benign/likely benign and pathogenic/likely pathogenic labels are denoted on the x-axis, and columns are colored relative to the classification for each method. Solid colors indicate correct classification, whereas striped colors indicate incorrect classification. For variant effect predictors, missing variants are colored gray with stripes and uncertain predictions are colored yellow with stripes. PPV, positive predictive value; NPV, negative predictive value; Spec, specificity; Sens, sensitivity.
Extended Data Fig. 10 Detection of cell-surface displayed FVIII.
Experimental flow cytometry of B-domain deleted coagulation factor VIII (FVIII) in the MultiSTEP backbone (n = ~30,000 cells per variant). Unrecombined cells (NC) do not display FVIII and serve as a negative control. Fluorescent signal was generated by staining cells with anti-FVIII antibodies specific to each of the five FVIII domains in the heavy chain [A1 (a) and A2 (b)] and light chain [A3 (c), C1 (d), and C2 (e)].
Supplementary information
Supplementary Information
Supplementary Figs. 1 and 2.
Supplementary Table 1
Library and assay statistics. Number of FIX variants assessed at various stages of library preparation and in each MultiSTEP assay.
Supplementary Table 2
Variants with discordant circulating antigen and secretion scores. Variants with discordant secretion scores and FIX plasma antigen from individuals with hemophilia B in the EAHAD database11. Variants with undetectable FIX antigen are labeled as <1%, as reported by the clinical laboratory in EAHAD. Variants are classified as low antigen if they have a mean circulating FIX antigen of <40%. Variants are classified as low secretion if they have a mean secretion score that is less than 0.795, which is the 5th percentile of the synonymous secretion score distribution. SE: standard error of the mean.
Supplementary Table 3
Random forest model classifications for 8,964 F9 variants. Mean secretion scores, γ-carboxylation scores and functional variant classifications made using the random forest classifier we trained on known pathogenic or likely pathogenic and benign or likely benign F9 missense variants (see Methods). Variants without functional scores for all antibodies were removed before classifier implementation and do not have associated functional predictions.
Supplementary Table 4
Classification criteria and reclassification results for My Life, Our Future F9 variants. Classification criteria, random forest classifier predictions and resultant pathogenicity classifications for 214 F9 variants from My Life, Our Future12. Classification criteria (columns BP1 through PVS1 as defined by the American College of Medical Genetics and Genomics2) were used in variant curation by clinical experts in hemophilia genetics based on available clinical data, databases and literature review. PVS1, null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multi-exon deletion) in a gene where loss of function is a known mechanism of disease. PS1, same amino acid change as a previously established pathogenic variant regardless of nucleotide change. PS2, de novo (both maternity and paternity confirmed) in a patient with the disease and no family history. PS3, well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. PS4, the prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls. PM1, located in a mutational hot spot and/or critical and well-established functional domain (for example, active site of an enzyme) without benign variation. PM2, absent from controls (or at extremely low frequency if recessive) in population databases. PM3, for recessive disorders, detected in trans with a pathogenic variant. PM4, protein length changes as a result of in-frame deletions and insertions in a non-repeat region or stop-loss variants. PM5, novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before. PM6, assumed de novo, but without confirmation of paternity and maternity. PP1, cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease. PP2, missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease. PP3, multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact and so on). PP4, patient’s phenotype or family history is highly specific for a disease with a single genetic etiology. PP5, reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation. BA1, allele frequency is >5% in population databases. BS1, allele frequency is greater than expected for disorder. BS2, observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous) or X-linked (hemizygous) disorder, with full penetrance expected at an early age. BS3, well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing. BS4, lack of segregation in affected members of a family. BP1, missense variant in a gene for which primarily truncating variants are known to cause disease. BP2, observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern. BP3, in-frame deletions and insertions in a repetitive region without a known function. BP4, multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact and so on). BP5, variant found in a case with an alternate molecular basis for disease. BP6, reputable source recently reports variant as benign, but the evidence is not available to the laboratory to perform an independent evaluation. BP7, a synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence or the creation of a new splice site and the nucleotide is not highly conserved. Original interpretations as well as reinterpretations using random forest classifier predictions as either moderate or strong evidence are included.
Supplementary Table 5
Oligonucleotides. Oligonucleotides used in this study.
Supplementary Table 6
FIX variant nomenclature. HGVS, legacy and chymotrypsin numbering systems for each position in WT FIX. In chymotrypsin numbering, some position values in flexible loops that are not conserved repeat.
Supplementary Table 7
Detailed cloning information. Description of plasmids used in this study and the primers, oligonucleotides, cDNAs, and gene fragments used to clone them. Method used for cloning is labeled.
Supplementary Table 8
Detailed antibody information. Properties, stock concentrations, and experimental conditions for each assayed antibody. Epitopes, when known, are provided.
Supplementary Table 9
Replicate correlations. PCR replicate correlations for each genomic DNA-derived barcode library amplification.
Supplementary Table 10
Curated F9 variants from MLOF, gnomAD and ClinVar used for training variant function classifier. F9 variants of known effect were collected from MLOF, gnomAD, and ClinVar and independently reassessed. MLOF variants with normal FIX activity (FIX:C) are denoted as benign. Variants found in gnomAD 4.0 with a minor allele frequency (MAF) ≥ 0.001 were classified as benign. Only ClinVar variants with assertion criteria (stars > 0) were included.
Supplementary Table 11
Case data for FIX variants in EAHAD. Variant-level data for FIX variants curated from the EAHAD FIX variant database (accessed 10/9/2023).
Supplementary Table 12
MultiSTEP variant scores. Table of FIX secretion and γ-carboxylation scores for nearly all possible missense and synonymous FIX variants. Scores are derived from each of the five antibodies presented in this work. The number of unique variants scored for each antibody ranges between 8,961 and 8,964 (Strep II tag antibody, 8,961; heavy chain antibody, 8,963; light chain antibody, 8,964; carboxylation-sensitive FIX Gla antibody, 8,964; carboxylation-sensitive Gla-motif antibody, 8,964). There are a total of 44,816 measured secretion and γ-carboxylation variant effects.
Supplementary Data 1
Statistical source data for Supplementary Figs. 1 and 2.
Source data
Source Data
Statistical source data.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Popp, N.A., Powell, R.L., Wheelock, M.K. et al. Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP. Nat Struct Mol Biol (2025). https://doi.org/10.1038/s41594-025-01582-w
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41594-025-01582-w
This article is cited by
-
Creating an atlas of variant effects to resolve variants of uncertain significance and guide cardiovascular medicine
Nature Reviews Cardiology (2025)