Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP

Popp, Nicholas A.; Powell, Rachel L.; Wheelock, Melinda K.; Holmes, Kristen J.; Zapp, Brendan D.; Sheldon, Kathryn M.; Fletcher, Shelley N.; Wu, Xiaoping; Fayer, Shawn; Rubin, Alan F.; Lannert, Kerry W.; Chang, Alexis T.; Sheehan, John P.; Johnsen, Jill M.; Fowler, Douglas M.

doi:10.1038/s41594-025-01582-w

Article
Published: 13 June 2025

Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP

Nature Structural & Molecular Biology volume 32, pages 2099–2111 (2025)Cite this article

2677 Accesses
5 Citations
16 Altmetric
Metrics details

Subjects

Abstract

Despite widespread advances in DNA sequencing, the functional consequences of most genetic variants remain poorly understood. Multiplexed assays of variant effect can measure the function of variants at scale but cannot readily be applied to the ~10% of human genes encoding secreted proteins. Here we develop a flexible, scalable human cell surface display method, multiplexed surface tethering of extracellular proteins (MultiSTEP), to study the consequences of missense variation in coagulation factor IX (FIX), a serine protease in which genetic variation can cause hemophilia B. We combine MultiSTEP with a panel of antibodies to detect FIX secretion and post-translational modification (PTM), measuring 44,816 variant effects for 436 synonymous variants and 8,528 of the 8,759 possible F9 missense variants. Almost half of missense variants impact secretion, PTM or both. We also identify functional constraints on secretion within the signal peptide and for nearly all gain or loss of cysteine variants. Secretion scores correlate strongly with FIX levels in hemophilia B and reveal that loss-of-secretion variants are more often associated with severe disease. Integration of the secretion and PTM scores enables reclassification of 63.1% of F9 variants of uncertain significance in the My Life, Our Future hemophilia genotyping project. Lastly, we show that MultiSTEP can be applied to other secreted proteins, thus demonstrating that MultiSTEP is a multiplexed, multimodal and generalizable method for systematically assessing variant effects in secreted proteins at scale.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on SpringerLink
Instant access to the full article PDF.

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: MultiSTEP enables at-scale measurement of variant effects in secreted proteins.**

**Fig. 2: The 17,927 MultiSTEP-derived secretion scores for 8,964 FIX variants.**

**Fig. 3: MultiSTEP reveals biochemical constraints on secretion.**

**Fig. 4: MultiSTEP enables measurement of variant effects on FIX PTMs.**

**Fig. 5: Secretion and γ-carboxylation scores reveal clinical features of hemophilia B and enable variant reinterpretation.**

**Fig. 6: MultiSTEP can be applied to diverse secreted proteins.**

Predicting functional effect of missense variants using graph attention neural networks

Article 15 November 2022

Site-saturation mutagenesis of 500 human protein domains

Article Open access 08 January 2025

Genome-wide prediction of disease variant effects with a deep protein language model

Article Open access 10 August 2023

Data availability

VAMP-seq abundance scores for PTEN (urn:mavedb:00000013-a-1), TPMT (urn:mavedb:00000013-b-1), VKOR (urn:mavedb:00000078-b-1), CYP2C9 (urn:mavedb:00000095-b-1) and NUDT15 (urn:mavedb:00000055-a-1) were downloaded from MaveDB¹⁰⁹. Gla domain protein sequences for human FIX (P00740), human prothrombin (coagulation factor II; P00734), human coagulation factor VII (P08709), human coagulation factor X (P00742), human protein C (P04070), human protein S (P07225), human osteocalcin (bone gla-protein; P02818), bovine osteocalcin (bone Gla-protein; P02820), human growth arrest-specific protein 6 (P14393), Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93), Notechis scutatis venom prothrombin activator notecarin-D1 (P82807) and Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96) were downloaded from UniProt.org. ClinVar variants are publicly available at https://www.ncbi.nlm.nih.gov/clinvar. gnomAD (v.4.1) variants are available at https://gnomad.broadinstitute.org. MLOF variants have been previously deposited into the EAHAD FIX clinical database (https://dbs.eahad.org/FIX) and the CDC CHBMP database (https://www.cdc.gov/hemophilia/mutation-project/index.html) and published^11,12. A complete set of MLOF variants used in this study, along with relevant information about these variants, is provided in Supplementary Table 4. F9 variant scores are available in Supplementary Table 12 and at MaveDB (www.https://www.mavedb.org; urn:mavedb:00001200). Raw sequencing, barcode–variant maps and scores are available in the NCBI Gene Expression Omnibus repository (GSE242805). All other data files are provided at https://github.com/FowlerLab/2024_multistep. All other data supporting the findings of this study are available from the corresponding author on reasonable request. Source data are provided with this paper.

Code availability

All code to reproduce the analyses and figures is available on GitHub at https://github.com/FowlerLab/2024_multistep. Versions of R packages used for analyses are described in the code file on GitHub.

References

Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
CAS PubMed PubMed Central Google Scholar
Richards, S. et al. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17, 405–423 (2015).
PubMed PubMed Central Google Scholar
Fayer, S. et al. Closing the gap: systematic integration of multiplexed functional data resolves variants of uncertain significance in BRCA1, TP53, and PTEN. Am. J. Hum. Genet. 108, 2248–2258 (2021).
CAS PubMed PubMed Central Google Scholar
Tabet, D., Parikh, V., Mali, P., Roth, F. P. & Claussnitzer, M. Scalable functional assays for the interpretation of human genetic variation. Annu. Rev. Genet. 56, 441–465 (2022).
CAS PubMed Google Scholar
Uhlén, M. et al. The human secretome. Sci. Signal. 12, eaaz0274 (2019).
PubMed Google Scholar
Freedman, S. J., Furie, B. C., Furie, B. & Baleja, J. D. Structure of the metal-free γ-carboxyglutamic acid-rich membrane binding region of factor IX by two-dimensional NMR spectroscopy. J. Biol. Chem. 270, 7980–7987 (1995).
CAS PubMed Google Scholar
Freedman, S. J. et al. Identification of the phospholipid binding site in the vitamin K-dependent blood coagulation protein factor IX. J. Biol. Chem. 271, 16227–16236 (1996).
CAS PubMed Google Scholar
Shikamoto, Y., Morita, T., Fujimoto, Z. & Mizuno, H. Crystal structure of Mg²⁺- and Ca²⁺-bound Gla domain of factor IX complexed with binding protein. J. Biol. Chem. 278, 24090–24094 (2003).
CAS PubMed Google Scholar
Huang, M., Furie, B. C. & Furie, B. Crystal structure of the calcium-stabilized human factor IX Gla domain bound to a conformation-specific anti-factor IX antibody. J. Biol. Chem. 279, 14338–14346 (2004).
CAS PubMed Google Scholar
Zacchi, L. F. et al. Coagulation factor IX analysis in bioreactor cell culture supernatant predicts quality of the purified product. Commun. Biol. 4, 390 (2021).
CAS PubMed PubMed Central Google Scholar
Rallapalli, P. M., Kemball-Cook, G., Tuddenham, E. G., Gomez, K. & Perkins, S. J. An interactive mutation database for human coagulation factor IX provides novel insights into the phenotypes and genetics of hemophilia B. J. Thromb. Haemost. 11, 1329–1340 (2013).
CAS PubMed Google Scholar
Johnsen, J. M. et al. Results of genetic analysis of 11,341 participants enrolled in the My Life, Our Future hemophilia genotyping initiative in the United States. J. Thromb. Haemost. 20, 2022–2034 (2022).
PubMed Google Scholar
Konkle, B. A., Josephson, N. C. & Nakaya Fletcher, S. in GeneReviews (eds Pagon, R. A. et al.) (University of Washington, 2023).
MASAC Document 273—Recommendations on Genotyping for Persons with Hemophilia (National Hemophilia Foundation, 2022); https://www.hemophilia.org/healthcare-professionals/guidelines-on-care/masac-documents/masac-document-273-recommendations-on-genotyping-for-persons-with-hemophilia
Gao, W. et al. Characterization of missense mutations in the signal peptide and propeptide of FIX in hemophilia B by a cell-based assay. Blood Adv. 4, 3659–3667 (2020).
CAS PubMed PubMed Central Google Scholar
Matreyek, K. A., Stephany, J. J., Chiasson, M. A., Hasle, N. & Fowler, D. M. An improved platform for functional assessment of large protein libraries in mammalian cells. Nucleic Acids Res. 48, e1 (2020).
CAS PubMed Google Scholar
Savoldo, B. et al. CD28 costimulation improves expansion and persistence of chimeric antigen receptor-modified T cells in lymphoma patients. J. Clin. Invest. 121, 1822–1826 (2011).
CAS PubMed PubMed Central Google Scholar
Esensten, J. H., Helou, Y. A., Chopra, G., Weiss, A. & Bluestone, J. A. CD28 costimulation: from mechanism to therapy. Immunity 44, 973–988 (2016).
CAS PubMed PubMed Central Google Scholar
Liu, L. et al. Inclusion of Strep-tag II in design of antigen receptors for T-cell immunotherapy. Nat. Biotechnol. 34, 430–434 (2016).
CAS PubMed PubMed Central Google Scholar
Bicocchi, M. P. et al. Insight into molecular changes of the FIX protein in a series of Italian patients with haemophilia B. Haemophilia 12, 263–270 (2006).
CAS PubMed Google Scholar
Matreyek, K. A. et al. Multiplex assessment of protein variant abundance by massively parallel sequencing. Nat. Genet. 50, 874–882 (2018).
CAS PubMed PubMed Central Google Scholar
Suiter, C. C. et al. Massively parallel variant characterization identifies NUDT15 alleles associated with thiopurine toxicity. Proc. Natl Acad. Sci. USA 117, 5394–5401 (2020).
CAS PubMed PubMed Central Google Scholar
Amorosi, C. J. et al. Massively parallel characterization of CYP2C9 variant enzyme activity and abundance. Am. J. Hum. Genet. 108, 1735–1751 (2021).
CAS PubMed PubMed Central Google Scholar
Kurys, G., Tagaya, Y., Bamford, R., Hanover, J. A. & Waldmann, T. A. The long signal peptide isoform and its alternative processing direct the intracellular trafficking of interleukin-15. J. Biol. Chem. 275, 30653–30659 (2000).
CAS PubMed Google Scholar
Owji, H., Nezafat, N., Negahdaripour, M., Hajiebrahimi, A. & Ghasemi, Y. A comprehensive review of signal peptides: structure, roles, and applications. Eur. J. Cell Biol. 97, 422–441 (2018).
CAS PubMed Google Scholar
Tikhonova, E. B., Karamysheva, Z. N., von Heijne, G. & Karamyshev, A. L. Silencing of aberrant secretory protein expression by disease-associated mutations. J. Mol. Biol. 431, 2567–2580 (2019).
CAS PubMed PubMed Central Google Scholar
Liaci, A. M. et al. Structure of the human signal peptidase complex reveals the determinants for signal peptide cleavage. Mol. Cell 81, 3934–3948.e11 (2021).
CAS PubMed Google Scholar
Teufel, F. et al. SignalP 6.0 predicts all five types of signal peptides using protein language models. Nat. Biotechnol. 40, 1023–1025 (2022).
CAS PubMed PubMed Central Google Scholar
Gutierrez Guarnizo, S. A. et al. Pathogenic signal peptide variants in the human genome. NAR Genom. Bioinform. 5, lqad093 (2023).
PubMed PubMed Central Google Scholar
Braakman, I. & Hebert, D. N. Protein folding in the endoplasmic reticulum. Cold Spring Harb. Perspect. Biol. 5, a013201 (2013).
PubMed PubMed Central Google Scholar
Zhang, H. et al. Unpaired extracellular cysteine mutations of CSF3R mediate gain or loss of function. Cancer Res. 77, 4258–4267 (2017).
CAS PubMed PubMed Central Google Scholar
Woodard, D. R. et al. A loss-of-function cysteine mutant in fibulin-3 (EFEMP1) forms aberrant extracellular disulfide-linked homodimers and alters extracellular matrix composition. Hum. Mutat. 43, 1945–1955 (2022).
CAS PubMed Google Scholar
Chiasson, M. A. et al. Multiplexed measurement of variant abundance and activity reveals VKOR topology, active site and human variant impact. eLife 9, e58026 (2020).
CAS PubMed PubMed Central Google Scholar
Yariv, B. et al. Using evolutionary data to make sense of macromolecules with a ‘face-lifted’ ConSurf. Protein Sci. 32, e4582 (2023).
CAS PubMed PubMed Central Google Scholar
Huang, M. et al. Structural basis of membrane binding by Gla domains of vitamin K-dependent proteins. Nat. Struct. Biol. 10, 751–756 (2003).
CAS PubMed Google Scholar
Grant, M. A., Baikeev, R. F., Gilbert, G. E. & Rigby, A. C. Lysine 5 and phenylalanine 9 of the factor IX omega-loop interact with phosphatidylserine in a membrane-mimetic environment. Biochemistry 43, 15367–15378 (2004).
CAS PubMed Google Scholar
Feuerstein Giora, Z. et al. Antithrombotic efficacy of a novel murine antihuman factor IX antibody in rats. Arterioscler. Thromb. Vasc. Biol. 19, 2554–2562 (1999).
Google Scholar
Aktimur, A., Gabriel, M. A., Gailani, D. & Toomey, J. R. The factor IX γ-carboxyglutamic acid (Gla) domain is involved in interactions between factor IX and factor XIa. J. Biol. Chem. 278, 7981–7987 (2003).
CAS PubMed Google Scholar
Brown, M. A., Stenberg, L. M., Persson, U. & Stenflo, J. Identification and purification of vitamin K-dependent proteins and peptides with monoclonal antibodies specific for γ-carboxyglutamyl (Gla) residues. J. Biol. Chem. 275, 19795–19802 (2000).
CAS PubMed Google Scholar
Whitlon, D. S., Sadowski, J. A. & Suttie, J. W. Mechanism of coumarin action: significance of vitamin K epoxide reductase inhibition. Biochemistry 17, 1371–1377 (1978).
CAS PubMed Google Scholar
Rabiet, M. J., Jorgensen, M. J., Furie, B. & Furie, B. C. Effect of propeptide mutations on post-translational processing of factor IX. Evidence that β-hydroxylation and γ-carboxylation are independent events. J. Biol. Chem. 262, 14895–14898 (1987).
CAS PubMed Google Scholar
Furie, B. & Furie, B. C. Molecular basis of vitamin K-dependent γ-carboxylation. Blood 75, 1753–1762 (1990).
CAS PubMed Google Scholar
Gillis, S. et al. Ɣ-Carboxyglutamic acids 36 and 40 do not contribute to human factor IX function. Protein Sci. 6, 185–196 (1997).
CAS PubMed PubMed Central Google Scholar
Stenina, O., Pudota, B. N., McNally, B. A., Hommema, E. L. & Berkner, K. L. Tethered processivity of the vitamin K-dependent carboxylase: factor IX is efficiently modified in a mechanism which distinguishes Gla’s from Glu’s and which accounts for comprehensive carboxylation in vivo. Biochemistry 40, 10301–10309 (2001).
CAS PubMed Google Scholar
Bristol, J. A., Freedman, S. J., Furie, B. C. & Furie, B. Profactor IX: the propeptide inhibits binding to membrane surfaces and activation by factor XIA. Biochemistry 33, 14136–14143 (1994).
CAS PubMed Google Scholar
Wolberg, A. S. et al. Characterization of γ-carboxyglutamic acid residue 21 of human factor IX. Biochemistry 35, 10321–10327 (1996).
CAS PubMed Google Scholar
Wojcik, E. G., Van Den Berg, M., Poort, S. R. & Bertina, R. M. Modification of the N-terminus of human factor IX by defective propeptide cleavage or acetylation results in a destabilized calcium-induced conformation: effects on phospholipid binding and activation by factor XIa. Biochem. J. 323, 629–636 (1997).
CAS PubMed PubMed Central Google Scholar
Ware, J. et al. Factor IX San Dimas. Substitution of glutamine for Arg-4 in the propeptide leads to incomplete γ-carboxylation and altered phospholipid binding properties. J. Biol. Chem. 264, 11401–11406 (1989).
CAS PubMed Google Scholar
Liebman, H. A. The metal-dependent conformational changes in factor IX associated with phospholipid binding. Studies using antibodies against a synthetic peptide and chemical modification of factor IX. Eur. J. Biochem. 212, 339–345 (1993).
CAS PubMed Google Scholar
Jacobs, M., Freedman, S. J., Furie, B. C. & Furie, B. Membrane binding properties of the factor IX gamma-carboxyglutamic acid-rich domain prepared by chemical synthesis. J. Biol. Chem. 269, 25494–25501 (1994).
CAS PubMed Google Scholar
Agah, S. & Bajaj, S. P. Role of magnesium in factor XIa catalyzed activation of factor IX: calcium binding to factor IX under physiologic magnesium. J. Thromb. Haemost. 7, 1426–1428 (2009).
CAS PubMed PubMed Central Google Scholar
Westmark, P. R., Tanratana, P. & Sheehan, J. P. Selective disruption of heparin and antithrombin-mediated regulation of human factor IX. J. Thromb. Haemost. 13, 1053–1063 (2015).
CAS PubMed Google Scholar
Plautz, W. E. et al. Anticoagulant protein S targets the factor IXa heparin-binding exosite to prevent thrombosis. Arterioscler. Thromb. Vasc. Biol. 38, 816–828 (2018).
CAS PubMed PubMed Central Google Scholar
Cooley, B. et al. Dysfunctional endogenous FIX impairs prophylaxis in a mouse hemophilia B model. Blood 133, 2445–2451 (2019).
CAS PubMed Google Scholar
Landrum, M. J. et al. ClinVar: public archive of relationships among sequence variation and human phenotype. Nucleic Acids Res. 42, D980–D985 (2014).
CAS PubMed Google Scholar
Iorio, A. et al. Establishing the prevalence and prevalence at birth of hemophilia in males: a meta-analytic approach using national registries. Ann. Intern. Med. 171, 540–546 (2019).
PubMed Google Scholar
Ioannidis, N. M. et al. REVEL: an ensemble method for predicting the pathogenicity of rare missense variants. Am. J. Hum. Genet. 99, 877–885 (2016).
CAS PubMed PubMed Central Google Scholar
Frazer, J. et al. Disease variant prediction with deep generative models of evolutionary data. Nature 599, 91–95 (2021).
CAS PubMed Google Scholar
Rentzsch, P., Schubach, M., Shendure, J. & Kircher, M. CADD-splice—improving genome-wide variant effect prediction using deep learning-derived splice scores. Genome Med. 13, 31 (2021).
CAS PubMed PubMed Central Google Scholar
Cheng, J. et al. Accurate proteome-wide missense variant effect prediction with AlphaMissense. Science 381, eadg7492 (2023).
CAS PubMed Google Scholar
Livesey, B. J. & Marsh, J. A. Updated benchmarking of variant effect predictors using deep mutational scanning. Mol. Syst. Biol. 19, e11474 (2023).
PubMed PubMed Central Google Scholar
Tavtigian, S. V. et al. Modeling the ACMG/AMP variant classification guidelines as a Bayesian classification framework. Genet. Med. 20, 1054–1060 (2018).
PubMed PubMed Central Google Scholar
Brnich, S. E. et al. Recommendations for application of the functional evidence PS3/BS3 criterion using the ACMG/AMP sequence variant interpretation framework. Genome Med. 12, 3 (2020).
Google Scholar
Fokkema, I. F. A. C. et al. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32, 557–563 (2011).
CAS PubMed Google Scholar
Matamala, N. et al. Characterization of novel missense variants of SERPINA1 gene causing alpha-1 antitrypsin deficiency. Am. J. Respir. Cell Mol. Biol. 58, 706–716 (2018).
CAS PubMed Google Scholar
McVey, J. H. et al. The European Association for Haemophilia and Allied Disorders (EAHAD) Coagulation Factor Variant Databases: important resources for haemostasis clinicians and researchers. Haemophilia 26, 306–313 (2020).
PubMed Google Scholar
Seixas, S. & Marques, P. I. Known mutations as the cause of alpha-1 antitrypsin deficiency: an updated overview of SERPINA1 variation spectrum. Appl. Clin. Genet. 14, 173–194 (2021).
CAS PubMed PubMed Central Google Scholar
Gai, S. A. & Wittrup, K. D. Yeast surface display for protein engineering and characterization. Curr. Opin. Struct. Biol. 17, 467–473 (2007).
CAS PubMed PubMed Central Google Scholar
Salema, V. & Fernández, L. Á. Escherichia coli surface display for the selection of nanobodies. Microb. Biotechnol. 10, 1468–1484 (2017).
CAS PubMed PubMed Central Google Scholar
Ho, M. & Pastan, I. in Therapeutic Antibodies: Methods and Protocols (ed. Dimitrov, A. S.) 337–352 (Humana Press, 2009).
Frank, F. et al. Deep mutational scanning identifies SARS-CoV-2 nucleocapsid escape mutations of currently available rapid antigen tests. Cell 185, 3603–3616.e13 (2022).
CAS PubMed PubMed Central Google Scholar
Parthiban, K. et al. A comprehensive search of functional sequence space using large mammalian display libraries created by gene editing. mAbs 11, 884–898 (2019).
CAS PubMed PubMed Central Google Scholar
Vink, T., Oudshoorn-Dickmann, M., Roza, M., Reitsma, J.-J. & de Jong, R. N. A simple, robust and highly efficient transient expression system for producing antibodies. Methods 65, 5–10 (2014).
CAS PubMed Google Scholar
do Amaral, R. L. F. et al. Approaches for recombinant human factor IX production in serum-free suspension cultures. Biotechnol. Lett. 38, 385–394 (2016).
PubMed Google Scholar
Duportet, X. et al. A platform for rapid prototyping of synthetic gene networks in mammalian cells. Nucleic Acids Res. 42, 13440–13451 (2014).
CAS PubMed PubMed Central Google Scholar
Zhu, F. et al. DICE, an efficient system for iterative genomic editing in human pluripotent stem cells. Nucleic Acids Res. 42, e34 (2014).
CAS PubMed Google Scholar
Matreyek, K. A., Stephany, J. J. & Fowler, D. M. A platform for functional assessment of large variant libraries in mammalian cells. Nucleic Acids Res. 45, e102 (2017).
PubMed PubMed Central Google Scholar
Starita, L. M. et al. A multiplex homology-directed DNA repair assay reveals the impact of more than 1,000 BRCA1 missense substitution variants on protein function. Am. J. Hum. Genet. 103, 498–508 (2018).
CAS PubMed PubMed Central Google Scholar
Hasle, N. et al. High-throughput, microscope-based sorting to dissect cellular heterogeneity. Mol. Syst. Biol. 16, e9442 (2020).
CAS PubMed PubMed Central Google Scholar
Low, B. E., Hosur, V., Lesbirel, S. & Wiles, M. V. Efficient targeted transgenesis of large donor DNA into multiple mouse genetic backgrounds using bacteriophage Bxb1 integrase. Sci. Rep. 12, 5424 (2022).
CAS PubMed PubMed Central Google Scholar
Durrant, M. G. et al. Systematic discovery of recombinases for efficient integration of large DNA sequences into the human genome. Nat. Biotechnol. 41, 488–499 (2023).
CAS PubMed Google Scholar
Zhang, M. et al. SHIELD: a platform for high-throughput screening of barrier-type DNA elements in human cells. Nat. Commun. 14, 5616 (2023).
CAS PubMed PubMed Central Google Scholar
Aslanzadeh, V. et al. Deep mutational scanning of the human insulin receptor ectodomain to inform precision therapy for insulin resistance. Preprint at bioRxiv https://doi.org/10.1101/2024.09.07.611782 (2024).
Blanch-Asensio, A. et al. STRAIGHT-IN Dual: a platform for dual, single-copy integrations of DNA payloads and gene circuits into human induced pluripotent stem cells. Preprint at bioRxiv https://doi.org/10.1101/2024.10.17.616637 (2024).
Boyle, G. E. et al. Deep mutational scanning of CYP2C19 in human cells reveals a substrate specificity-abundance tradeoff. Genetics 228, iyae156 (2024).
CAS PubMed PubMed Central Google Scholar
Hew, B. E. et al. Directed evolution of hyperactive integrases for site specific insertion of transgenes. Nucleic Acids Res. 52, e64 (2024).
CAS PubMed PubMed Central Google Scholar
Hong, C. K. Y. et al. Massively parallel characterization of insulator activity across the genome. Nat. Commun. 15, 8350 (2024).
CAS PubMed PubMed Central Google Scholar
Huhtinen, O., Prince, S., Lamminmäki, U., Salbo, R. & Kulmala, A. Increased stable integration efficiency in CHO cells through enhanced nuclear localization of Bxb1 serine integrase. BMC Biotechnol. 24, 44 (2024).
CAS PubMed PubMed Central Google Scholar
Kent, J. D., Klug, L. R. & Heinrich, M. C. A novel human SDHA-knockout cell line model for the functional analysis of clinically relevant SDHA variants. Clin. Cancer Res. 30, 5399–5412 (2024).
CAS PubMed PubMed Central Google Scholar
Kim, J., Muller, R. Y., Bondra, E. R. & Ingolia, N. T. CRISPRi with barcoded expression reporters dissects regulatory networks in human cells. Preprint at bioRxiv https://doi.org/10.1101/2024.09.06.611573 (2024).
Pandey, S. et al. Efficient site-specific integration of large genes in mammalian cells via continuously evolved recombinases and prime editing. Nat. Biomed. Eng. 9, 22–39 (2025).
CAS PubMed Google Scholar
Wang, Z., Sarkar, A. & Ge, X. De novo functional discovery of peptide-MHC restricted CARs from recombinase-constructed large-diversity monoclonal T cell libraries. Preprint at bioRxiv https://doi.org/10.1101/2024.11.27.625413 (2024).
Acharya, P., Quinlan, A. & Neumeister, V. The ABCs of finding a good antibody: how to find a good antibody, validate it, and publish meaningful data. F1000Res. 6, 851 (2017).
PubMed PubMed Central Google Scholar
Baron, M. et al. The three-dimensional structure of the first EGF-like module of human factor IX: comparison with EGF and TGF-α. Protein Sci. 1, 81–90 (1992).
CAS PubMed PubMed Central Google Scholar
Johnson, D. J. D., Langdown, J. & Huntington, J. A. Molecular basis of factor IXa recognition by heparin-activated antithrombin revealed by a 1.7-Å structure of the ternary complex. Proc. Natl Acad. Sci. USA 107, 645–650 (2010).
CAS PubMed Google Scholar
UniProt Consortium. UniProt: the universal protein knowledgebase in 2025. Nucleic Acids Res. 53, D609–D617 (2025).
Google Scholar
Gibson, D. G. et al. Enzymatic assembly of DNA molecules up to several hundred kilobases. Nat. Methods 6, 343–345 (2009).
CAS PubMed Google Scholar
García-Nafría, J., Watson, J. F. & Greger, I. H. IVA cloning: a single-tube universal cloning system exploiting bacterial in vivo assembly. Sci. Rep. 6, 27459 (2016).
PubMed PubMed Central Google Scholar
den Dunnen, J. T. et al. HGVS recommendations for the description of sequence variants: 2016 update. Hum. Mutat. 37, 564–569 (2016).
Google Scholar
Miao, H. Z. et al. Bioengineering of coagulation factor VIII for improved secretion. Blood 103, 3412–3419 (2004).
CAS PubMed Google Scholar
Kessler, C. M. et al. B-domain deleted recombinant factor VIII preparations are bioequivalent to a monoclonal antibody purified plasma-derived factor VIII concentrate: a randomized, three-way crossover study. Haemophilia 11, 84–91 (2005).
CAS PubMed Google Scholar
Ward, N. J. et al. Codon optimization of human factor VIII cDNAs leads to high-level expression. Blood 117, 798–807 (2011).
CAS PubMed Google Scholar
Zhang, J., Kobert, K., Flouri, T. & Stamatakis, A. PEAR: a fast and accurate Illumina Paired-End reAd mergeR. Bioinformatics 30, 614–620 (2014).
CAS PubMed Google Scholar
Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).
Yeh, C.-L. C., Amorosi, C. J., Showman, S. & Dunham, M. J. PacRAT: a program to improve barcode–variant mapping from PacBio long reads using multiple sequence alignment. Bioinformatics 38, 2927–2929 (2022).
CAS PubMed PubMed Central Google Scholar
Kulkarni, R. et al. Sites of initial bleeding episodes, mode of delivery and age of diagnosis in babies with haemophilia diagnosed before the age of 2 years: a report from the Centers for Disease Control and Prevention’s (CDC) Universal Data Collection (UDC) project. Haemophilia 15, 1281–1290 (2009).
CAS PubMed Google Scholar
Majithia, A. R. et al. Prospective functional classification of all possible missense variants in PPARG. Nat. Genet. 48, 1570–1575 (2016).
CAS PubMed PubMed Central Google Scholar
Menardi, G. & Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 28, 92–122 (2014).
Google Scholar
Esposito, D. et al. MaveDB: an open-source platform to distribute and interpret data from multiplexed assays of variant effect. Genome Biol. 20, 223 (2019).
PubMed PubMed Central Google Scholar
Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113 (2004).
PubMed PubMed Central Google Scholar

Download references

Acknowledgements

We thank A. P. Leith, C. Lee, D. E. Prunkard and A. Silvestroni of the University of Washington (UW) Foege Flow Lab and the UW Pathology Flow Cytometry Core for their assistance with cell analysis, staining and sorting; K. M. Munson of the UW PacBio Sequencing Service for assistance with long-read PacBio sequencing; D. A. Nickerson, L. M. Starita, D. J. Maly, S. Nariya, J. J. Stephany and A. E. McEwen for advice on analyzing data and feedback on the paper. We thank S. W. Pipe and A. Scheller of the University of Michigan Department of Pediatrics and Department of Hematology for providing FVIII constructs and advice on FVIII expression. We thank J. Kulman for discussions about FIX carboxylation. We thank R. Kruse-Jarres for her commitment to supporting research that improves the lives of people living with bleeding disorders. We thank and acknowledge B. A. Konkle, the Principal Investigator of MLOF, the MLOF partners at Bloodworks, the American Thrombosis and Hemostasis Network, the National Hemophilia Foundation (now the National Bleeding Disorders Foundation), funding from Biogen/Bioverativ, providers and staff at HTC sites, and the 11,341 participants who made MLOF a success. This work was supported by the National Heart, Lung and Blood Institute (R01HL152066 to J.M.J. and D.M.F.; F30HL151075 to N.A.P.; R01HL149855 to J.P.S.), the National Human Genome Research Institute (RM1HG010461 and UM1HG011969 to D.M.F.), the National Institute of General Medical Sciences (R01GM109110 to D.M.F.), and the Washington Center for Bleeding Disorders (to J.M.J.). The funders of this work had no role in the study design, data collection, analysis, decision to publish or preparation of this manuscript.

Author information

Authors and Affiliations

Department of Genome Sciences, University of Washington School of Medicine, Seattle, WA, USA
Nicholas A. Popp, Rachel L. Powell, Melinda K. Wheelock, Brendan D. Zapp, Shawn Fayer, Alexis T. Chang & Douglas M. Fowler
Medical Scientist Training Program, University of Washington School of Medicine, Seattle, WA, USA
Nicholas A. Popp
Brotman Baty Institute for Precision Medicine, Seattle, WA, USA
Nicholas A. Popp, Melinda K. Wheelock, Shawn Fayer & Douglas M. Fowler
Division of Hematology and Oncology, Department of Medicine, University of Washington School of Medicine, Seattle, WA, USA
Kristen J. Holmes, Kathryn M. Sheldon, Kerry W. Lannert & Jill M. Johnsen
Center for Cardiovascular Biology, University of Washington School of Medicine, Seattle, WA, USA
Kristen J. Holmes, Kathryn M. Sheldon, Kerry W. Lannert & Jill M. Johnsen
Institute for Stem Cell and Regenerative Medicine, University of Washington, Seattle, WA, USA
Kristen J. Holmes, Kathryn M. Sheldon, Kerry W. Lannert & Jill M. Johnsen
Bloodworks Northwest, Seattle, WA, USA
Shelley N. Fletcher, Xiaoping Wu & Jill M. Johnsen
Cell Marker Laboratory, Seattle Children’s Hospital, Seattle, WA, USA
Xiaoping Wu
Bioinformatics Division, WEHI, Parkville, Victoria, Australia
Alan F. Rubin
Department of Medical Biology, University of Melbourne, Melbourne, Victoria, Australia
Alan F. Rubin
Division of Hematology, Medical Oncology, and Palliative Care, Department of Medicine, University of Wisconsin School of Medicine and Public Health, Madison, WI, USA
John P. Sheehan
University of Wisconsin Comprehensive Bleeding Disorders Program, Madison, WI, USA
John P. Sheehan
Washington Center for Bleeding Disorders, Seattle, WA, USA
Jill M. Johnsen
Department of Bioengineering, University of Washington School of Medicine, Seattle, WA, USA
Douglas M. Fowler

Authors

Nicholas A. Popp
View author publications
Search author on:PubMed Google Scholar
Rachel L. Powell
View author publications
Search author on:PubMed Google Scholar
Melinda K. Wheelock
View author publications
Search author on:PubMed Google Scholar
Kristen J. Holmes
View author publications
Search author on:PubMed Google Scholar
Brendan D. Zapp
View author publications
Search author on:PubMed Google Scholar
Kathryn M. Sheldon
View author publications
Search author on:PubMed Google Scholar
Shelley N. Fletcher
View author publications
Search author on:PubMed Google Scholar
Xiaoping Wu
View author publications
Search author on:PubMed Google Scholar
Shawn Fayer
View author publications
Search author on:PubMed Google Scholar
Alan F. Rubin
View author publications
Search author on:PubMed Google Scholar
Kerry W. Lannert
View author publications
Search author on:PubMed Google Scholar
Alexis T. Chang
View author publications
Search author on:PubMed Google Scholar
John P. Sheehan
View author publications
Search author on:PubMed Google Scholar
Jill M. Johnsen
View author publications
Search author on:PubMed Google Scholar
Douglas M. Fowler
View author publications
Search author on:PubMed Google Scholar

Contributions

N.A.P., K.W.L., J.P.S., J.M.J. and D.M.F. conceived of the work. N.A.P., J.P.S., J.M.J. and D.M.F. wrote the paper. N.A.P., R.L.P., M.K.W., B.D.Z., K.J.H., K.M.S., X.W., K.W.L. and A.T.C. performed experiments. N.A.P. performed the statistical and computational analysis. N.A.P., S.N.F. and J.M.J. manually curated ClinVar, gnomAD and MLOF for variants. N.A.P. and A.F.R. prepared results for distribution. N.A.P., S.N.F., S.F. and J.M.J. performed variant reinterpretation. All authors approved the final paper.

Corresponding authors

Correspondence to Jill M. Johnsen or Douglas M. Fowler.

Ethics declarations

Competing interests

J.P.S. was an expert witness for Genentech and Paul, Weiss, Rifkind, Wharton and Garrison. The other authors declare no competing interests.

Peer review

Peer review information

Nature Structural & Molecular Biology thanks Giorgio Galli and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editor: Dimitris Typas, in collaboration with the Nature Structural & Molecular Biology team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 MultiSTEP is based on a flexible genomically integrated approach for expressing secreted protein variants.

a, Cartoon depicting integration of a MultiSTEP plasmid construct into a genomically integrated landing pad cassette¹⁶. (Top): Lentivirally integrated landing pad cassette expressing mTagBFP2+ (royal blue) from a tetON inducible promoter. mTagBFP2 is fused to an inducible caspase-9 (iCasp9, orange) and a blasticidin resistance gene (dark yellow) with 2A sequences (dark pink). mTagBFP2-2A-iCasp9-2A-BlastR is expressed from a tetON inducible promoter with a attP serine recombinase recognition site (black) in between. Downstream is a terminator sequence (Term, brown) and tet repressor (tetR, salmon). Bxb1 serine recombinase, expressed from another plasmid, is shown in grey. (Middle): MultiSTEP plasmid construct. Secreted protein coding sequence (Sec Pro, pink) is C-terminally fused to flexible linkers (L1 and L2, teal), Strep II tag (St, green), and CD28 transmembrane domain (TM, medium blue). IRES (purple) drives co-transcription of mCherry (red). Upstream is an attB serine recombinase recognition sequence (goldenrod) and a unique 18 nucleotide degenerate barcode (BC, light yellow). (Bottom): Landing pad following plasmid integration. attP and attB sequences have been recombined, forming attL and attR sequences. b, Sequential flow cytometry gating scheme for detecting and isolating landing pad cells with an integrated MultiSTEP construct. Dot pseudocolor indicates density of cells. FSC: Forward scatter; SSC: side scatter. c, Comparison of negative control 293-F cells (top) with 293-F cells incubated with lentivirus encoding the landing pad cassette (bottom, n > 10,000 cells). d, Comparison of unrecombined landing pad cells (top) with cells transfected with a MultiSTEP plasmid encoding WT FIX (bottom, n > 10,000 cells). e, Comparison of cells transfected with a MultiSTEP construct encoding WT FIX treated with doxycycline (top) or doxycycline and 10 nM AP1903 (bottom, n > 10,000 cells). f, Design iterations of MultiSTEP construct plasmid in (a). L1-Strep MultiSTEP construct does not contain an L2 linker. Flow cytometry of MultiSTEP constructs using a anti-Strep II tag antibody (n ~ 30,000 cells).

Extended Data Fig. 2 A flexible tag-based approach to assessing variant effects on secretion.

a, Heatmap showing strep tag secretion scores for missense FIX variants. Color indicates MultiSTEP score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. b, Density distributions of strep tag secretion scores for FIX missense variants (orange) and synonymous variants (blue). Dashed line denotes the 5th percentile of the synonymous variant distribution. c, Scatter plot comparing MultiSTEP-derived strep tag secretion scores for seven different FIX variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.C134R, p.S220T, and p.H267L), WT, and an unrecombined negative control to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry individually (n = 3 replicates). Error bars show standard error of the mean. d-e, Scatter plots of median MultiSTEP-derived strep tag secretion scores and heavy chain (d) or light chain (e) at each position in FIX (n = 3 replicates). Points are colored by chain architecture, using the color scheme as Fig. 2a. Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation. f, Density plots of MultiSTEP-derived synonymous variant scores generated with the indicated antibody. The dashed vertical line shows WT score.

Extended Data Fig. 3 MultiSTEP-derived FIX secretion scores correlate with orthologous measures of FIX secretion.

a, Flow cytometry of p.C28Y and WT controls (n = 10,000 cells) with the FIX library (n = 100,000 cells). b, Comparison of ELISA measurements of eight untethered FIX missense variants (p.C28Y, p.A37T, p.G58E, p.E67K, p.G125V, p.C134R, p.S220T, and p.H267L) expressed from 293-F cells and heavy chain secretion scores (n = 3 replicates). Error bars show the standard error of the mean. Pearson’s correlation coefficient is shown. c, Scatter plot comparing MultiSTEP-derived heavy chain secretion scores for 20 different FIX missense variants, WT, and unrecombined negative control (n = 3 replicates) to the geometric mean of Alexa Fluor-647 fluorescence measured using flow cytometry on cells expressing each variant individually. Error bars show standard error of the mean (n = 10,000 cells). Line of best fit (dashed) and Pearson’s correlation coefficient are shown.

Extended Data Fig. 4 Variants near antibody epitopes demonstrate minor effects on secretion scores.

a, Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the light chain and the nearest light chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with predicted local distance difference test score (pLDDT) of <70 were removed from analysis. Color indicates whether a position was identified in the light chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (9.15 angstroms). b, Scatter plot of the difference in heavy chain and light chain secretion scores and the distance in angstroms between all α-carbons in the heavy chain and the nearest heavy chain epitope α-carbon using the AlphaFold2 model of mature, two-chain FIX. Low-confidence positions with pLDDT of <70 were removed from analysis. Color indicates whether a position was identified in the heavy chain epitope in Fig. 2h. Horizontal dashed line indicates no difference in secretion scores. Vertical dashed line indicates boundary of likely epitope-adjacent effects on secretion scores by changepoint analysis (5.71 angstroms). c, Scatter plot of median MultiSTEP-derived heavy chain and light chain secretion scores at each position in FIX. Points are colored by epitope (Fig. 2h) or epitope-adjacent position as in (a) and (b). Black dashed line indicates the line of perfect correlation between secretion scores. Gray background indicates <0.3 point deviation from perfect correlation.

Extended Data Fig. 5 Effect of missense FIX variation on secretion compared to missense variant effects on abundance in cytosolic or transmembrane proteins.

a, Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) scores for all nonsynonymous variants across all positions with the indicated WT amino acid for six different proteins^21,22,23,33 (n = 29,287 variants). Whiskers span the range of data. b, Box plots of the 25th, 50th, and 75th percentiles of secretion (FIX, MultiSTEP) or abundance (all others, VAMP-seq) for all nonsynonymous variant amino acid substitutions across all positions for six different proteins (n = 29,287 variants).

Extended Data Fig. 6 Sequence conservation strongly influences the effect of variation on FIX secretion.

a, Comparison of light chain secretion scores with Consurf conservation grades (1: least conserved, 9: most conserved)³⁴. Violin plot shows distribution of points (n = 8,528 variants) with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. b, Comparison of median light chain secretion scores (n = 8,528 variants) with Consurf conservation grades. Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution.

Extended Data Fig. 7 Carboxylation-sensitive antibodies identify functional motifs.

a, Multiple sequence alignment of Gla-domain containing proteins (UniProt) that bind the carboxylation-sensitive Gla-motif (ExxxExC) antibody using MUSCLE^96,110. Antibody epitopes for both the carboxylation-sensitive FIX-specific antibody (ω-loop) and the carboxylation-sensitive Gla-motif antibody are shown. hF9, human coagulation factor IX (P00740); hF2, human prothrombin (coagulation factor II, P00734); hF7, human coagulation factor VII (P08709); hF10, human coagulation factor X (P00742); hPC, human protein C (P04070); hPS, human protein S (P07225); hBGP, human osteocalcin (P02818); bBGP, bovine osteocalcin (P02820); hGAS6, human growth arrest-specific protein 6 (P14393); ppVPA, Pseudechis prophyriacus venom prothrombin activator porpharin-D (P58L93); nsVPA, Notechis scutatis venom prothrombin activator notecarin-D1 (P82807); osVPA, Oxyuranus scutellatus venom prothrombin activator oscutarin-C (P58L96). b-c, Fluorescence of unrecombined negative control and WT FIX-expressing cells with and without warfarin pretreatment generated by staining cells with a carboxylation-sensitive FIX-specific (b) or carboxylation-sensitive Gla-motif antibody (c). d-f, Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (d), carboxylation-sensitive Gla-motif carboxylation scores (e), or light chain secretion scores (f) for FIX propeptide variants. Furin cleavage site (Furin CS), ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (d). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray. g-i, Heatmaps showing carboxylation-sensitive FIX-specific carboxylation scores (g), carboxylation-sensitive Gla-motif carboxylation scores (h), or light chain secretion scores (i) for FIX Gla domain variants. Furin cleavage site (Furin CS) is annotated above (d). ω-loop, ExxxExC motif, and aromatic stack (AS) are annotated above (g). Heatmap color indicates antibody score from 0 (blue, lowest 5% of scores) to white (1, WT) to red (increased). Black dots indicate the WT amino acid. Missing data are gray.

Extended Data Fig. 8 Clinical correlates of secretion and gamma-carboxylation scores map to FIX biochemical features.

a, Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals with hemophilia B in the EAHAD database (n = 416 variants). Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 19 variants across 38 individuals)¹¹. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. b, Comparison of hemophilia B severity from individuals with hemophilia B in the EAHAD database (n = 1,781 variants) with light chain secretion scores. Light chain epitope-adjacent positions identified in Extended Data Fig. 4a are removed (n = 40 variants). Violin plot shows distribution of points with an inset box plot representing the 25th, 50th, and 75th percentiles. Whiskers span the range of data. Dashed horizontal line is the 5th percentile of the synonymous secretion score distribution. p values from a Kruskal–Wallis test adjusted for multiple comparisons by post-hoc Dunn’s test are shown. c, Scatter plot of the mean and standard error of light chain secretion scores (n = 2 replicates) and FIX plasma antigen from individuals harboring gain-of-cysteine variants in the EAHAD database (n = 9 variants across 27 individuals)¹¹. Dashed horizontal line is 40% FIX plasma antigen. Dashed vertical line is the 5th percentile of the synonymous secretion score distribution. d, Bar plot of hemophilia B disease severity in the EAHAD database for individuals harboring gain-of-cysteine variants. e, Bar plot of the number of FIX variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction. f, Bar plot of the number of FIX propeptide and Gla domain variants in the EAHAD database and their classification using the random forest model trained on MultiSTEP functional data, by disease severity. Color indicates model prediction.

Extended Data Fig. 9 Random forest model predictions for FIX variants in the EAHAD FIX Variant Database associated with hemophilia B.

a, Spearman correlation of MultiSTEP functional scores with EVE, AlphaMissense, REVEL, and CADD variant effect predictors. b, Histograms of four variant effect predictor scores for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF. Color indicates clinical variant interpretation. Data from four variant effect predictors are shown. Black dashed vertical lines indicate the thresholds for each predictor. For AlphaMissense we used the thresholds recommended in the original publication for 90% precision on existing ClinVar annotated variants ( ≤ 0.34: benign, 0.34-0.564: uncertain, ≥0.564: pathogenic). For REVEL, we used the thresholds used in the initial publication to assess REVEL’s precision in ClinVar (<0.5: benign, 0.5: uncertain, >0.5 pathogenic). For EVE, we used the thresholds recommended in the original publication for the 75% most confident classifications ( ≤ 0.359: benign, 0.359-0.641: uncertain, ≥0.641: pathogenic). For CADD, we used the same thresholds used in the MLOF clinical laboratory (<10: benign, 10–20: uncertain, >20: pathogenic). Number of variants scored by each predictor is annotated. c, Classification accuracy for F9 missense variants of known effect curated from ClinVar, gnomAD, and MLOF in our test set (benign/likely benign, n = 4 variants; pathogenic/likely pathogenic, n = 34 variants) by MultiSTEP variant function classifier and the four variant effect predictors using thresholds defined in (b). True benign/likely benign and pathogenic/likely pathogenic labels are denoted on the x-axis, and columns are colored relative to the classification for each method. Solid colors indicate correct classification, whereas striped colors indicate incorrect classification. For variant effect predictors, missing variants are colored gray with stripes and uncertain predictions are colored yellow with stripes. PPV, positive predictive value; NPV, negative predictive value; Spec, specificity; Sens, sensitivity.

Extended Data Fig. 10 Detection of cell-surface displayed FVIII.

Experimental flow cytometry of B-domain deleted coagulation factor VIII (FVIII) in the MultiSTEP backbone (n = ~30,000 cells per variant). Unrecombined cells (NC) do not display FVIII and serve as a negative control. Fluorescent signal was generated by staining cells with anti-FVIII antibodies specific to each of the five FVIII domains in the heavy chain [A1 (a) and A2 (b)] and light chain [A3 (c), C1 (d), and C2 (e)].

Supplementary information

Supplementary Information

Supplementary Figs. 1 and 2.

Reporting Summary

Peer Review File

Supplementary Table 1

Library and assay statistics. Number of FIX variants assessed at various stages of library preparation and in each MultiSTEP assay.

Supplementary Table 2

Variants with discordant circulating antigen and secretion scores. Variants with discordant secretion scores and FIX plasma antigen from individuals with hemophilia B in the EAHAD database¹¹. Variants with undetectable FIX antigen are labeled as <1%, as reported by the clinical laboratory in EAHAD. Variants are classified as low antigen if they have a mean circulating FIX antigen of <40%. Variants are classified as low secretion if they have a mean secretion score that is less than 0.795, which is the 5^th percentile of the synonymous secretion score distribution. SE: standard error of the mean.

Supplementary Table 3

Random forest model classifications for 8,964 F9 variants. Mean secretion scores, γ-carboxylation scores and functional variant classifications made using the random forest classifier we trained on known pathogenic or likely pathogenic and benign or likely benign F9 missense variants (see Methods). Variants without functional scores for all antibodies were removed before classifier implementation and do not have associated functional predictions.

Supplementary Table 4

Classification criteria and reclassification results for My Life, Our Future F9 variants. Classification criteria, random forest classifier predictions and resultant pathogenicity classifications for 214 F9 variants from My Life, Our Future¹². Classification criteria (columns BP1 through PVS1 as defined by the American College of Medical Genetics and Genomics²) were used in variant curation by clinical experts in hemophilia genetics based on available clinical data, databases and literature review. PVS1, null variant (nonsense, frameshift, canonical ±1 or 2 splice sites, initiation codon, single or multi-exon deletion) in a gene where loss of function is a known mechanism of disease. PS1, same amino acid change as a previously established pathogenic variant regardless of nucleotide change. PS2, de novo (both maternity and paternity confirmed) in a patient with the disease and no family history. PS3, well-established in vitro or in vivo functional studies supportive of a damaging effect on the gene or gene product. PS4, the prevalence of the variant in affected individuals is significantly increased compared with the prevalence in controls. PM1, located in a mutational hot spot and/or critical and well-established functional domain (for example, active site of an enzyme) without benign variation. PM2, absent from controls (or at extremely low frequency if recessive) in population databases. PM3, for recessive disorders, detected in trans with a pathogenic variant. PM4, protein length changes as a result of in-frame deletions and insertions in a non-repeat region or stop-loss variants. PM5, novel missense change at an amino acid residue where a different missense change determined to be pathogenic has been seen before. PM6, assumed de novo, but without confirmation of paternity and maternity. PP1, cosegregation with disease in multiple affected family members in a gene definitively known to cause the disease. PP2, missense variant in a gene that has a low rate of benign missense variation and in which missense variants are a common mechanism of disease. PP3, multiple lines of computational evidence support a deleterious effect on the gene or gene product (conservation, evolutionary, splicing impact and so on). PP4, patient’s phenotype or family history is highly specific for a disease with a single genetic etiology. PP5, reputable source recently reports variant as pathogenic, but the evidence is not available to the laboratory to perform an independent evaluation. BA1, allele frequency is >5% in population databases. BS1, allele frequency is greater than expected for disorder. BS2, observed in a healthy adult individual for a recessive (homozygous), dominant (heterozygous) or X-linked (hemizygous) disorder, with full penetrance expected at an early age. BS3, well-established in vitro or in vivo functional studies show no damaging effect on protein function or splicing. BS4, lack of segregation in affected members of a family. BP1, missense variant in a gene for which primarily truncating variants are known to cause disease. BP2, observed in trans with a pathogenic variant for a fully penetrant dominant gene/disorder or observed in cis with a pathogenic variant in any inheritance pattern. BP3, in-frame deletions and insertions in a repetitive region without a known function. BP4, multiple lines of computational evidence suggest no impact on gene or gene product (conservation, evolutionary, splicing impact and so on). BP5, variant found in a case with an alternate molecular basis for disease. BP6, reputable source recently reports variant as benign, but the evidence is not available to the laboratory to perform an independent evaluation. BP7, a synonymous (silent) variant for which splicing prediction algorithms predict no impact to the splice consensus sequence or the creation of a new splice site and the nucleotide is not highly conserved. Original interpretations as well as reinterpretations using random forest classifier predictions as either moderate or strong evidence are included.

Supplementary Table 5

Oligonucleotides. Oligonucleotides used in this study.

Supplementary Table 6

FIX variant nomenclature. HGVS, legacy and chymotrypsin numbering systems for each position in WT FIX. In chymotrypsin numbering, some position values in flexible loops that are not conserved repeat.

Supplementary Table 7

Detailed cloning information. Description of plasmids used in this study and the primers, oligonucleotides, cDNAs, and gene fragments used to clone them. Method used for cloning is labeled.

Supplementary Table 8

Detailed antibody information. Properties, stock concentrations, and experimental conditions for each assayed antibody. Epitopes, when known, are provided.

Supplementary Table 9

Replicate correlations. PCR replicate correlations for each genomic DNA-derived barcode library amplification.

Supplementary Table 10

Curated F9 variants from MLOF, gnomAD and ClinVar used for training variant function classifier. F9 variants of known effect were collected from MLOF, gnomAD, and ClinVar and independently reassessed. MLOF variants with normal FIX activity (FIX:C) are denoted as benign. Variants found in gnomAD 4.0 with a minor allele frequency (MAF) ≥ 0.001 were classified as benign. Only ClinVar variants with assertion criteria (stars > 0) were included.

Supplementary Table 11

Case data for FIX variants in EAHAD. Variant-level data for FIX variants curated from the EAHAD FIX variant database (accessed 10/9/2023).

Supplementary Table 12

MultiSTEP variant scores. Table of FIX secretion and γ-carboxylation scores for nearly all possible missense and synonymous FIX variants. Scores are derived from each of the five antibodies presented in this work. The number of unique variants scored for each antibody ranges between 8,961 and 8,964 (Strep II tag antibody, 8,961; heavy chain antibody, 8,963; light chain antibody, 8,964; carboxylation-sensitive FIX Gla antibody, 8,964; carboxylation-sensitive Gla-motif antibody, 8,964). There are a total of 44,816 measured secretion and γ-carboxylation variant effects.

Supplementary Data 1

Statistical source data for Supplementary Figs. 1 and 2.

Source data

Source Data

Statistical source data.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Popp, N.A., Powell, R.L., Wheelock, M.K. et al. Multiplex and multimodal mapping of variant effects in secreted proteins via MultiSTEP. Nat Struct Mol Biol 32, 2099–2111 (2025). https://doi.org/10.1038/s41594-025-01582-w

Download citation

Received: 10 April 2024
Accepted: 02 May 2025
Published: 13 June 2025
Version of record: 13 June 2025
Issue date: October 2025
DOI: https://doi.org/10.1038/s41594-025-01582-w

This article is cited by

Creating an atlas of variant effects to resolve variants of uncertain significance and guide cardiovascular medicine
- Andrew M. Glazer
- Daniel R. Tabet
- Dan M. Roden
Nature Reviews Cardiology (2025)