Fig. 3: Models trained using supervised dimensionality reduction not only accurately classify antibody mutants with high and low levels of affinity and specificity but also accurately predict intraclass variability.

A, B Linear discriminant analysis (LDA) models were trained using sequence-based features (one-hot encoded sequences as binary vectors) and displayed high accuracy for classifying antibody affinity and specificity for 4000 antibodies identified in the enriched libraries via deep sequencing. (C-D) The continuous predictions of the LDA models, which are referred to as projections, are strongly correlated with experimental measurements of the (C) relative affinity and (D) non-specific binding for 125 single-chain antibodies (Fabs) selected randomly from the sorted libraries. In (C), the antigen (HGFR) concentration was 1 nM, and the values are normalized between elotuzumab (value of zero) and wild-type emibetuzumab (value of one). In (D), the non-specific binding reagent was ovalbumin (0.1 mg/mL). In (A–D), the projection (x-axis) values that separated high and low classes (e.g., high and low affinity) were close to but not exactly zero. In (C) and (D), the experimental measurements are averages of two or three independent repeats. Independent two-sided t-tests were performed to determine significance.