Extended Data Fig. 1: Performance on Vκ and Vλ sequence classification.
From: Assessing antibody and nanobody nativeness for hit selection and humanization with AbNatiV

(a, d) The AbNatiV-humanness score distributions of the Human Test (purple), Human Diverse >2.5% (red), Rhesus (green), PSSM-generated (blue), and Mouse (orange) Vκ (A) and Vλ (D) antibody datasets. The PSSM-generated database is made of artificial sequences randomly generated using residue positional frequencies from the PSSM of the Human Test dataset. The Human Diverse >2.5% dataset is made of sequences from the Test and BioPhi datasets with a sequence identity difference of 2.5% from their respective closest sequence of the corresponding Training set (see Methods). Each dataset contains 10,000 sequences except Human Diverse >2.5% which contains 10,490 sequences for Vκ, and 10,459 for Vλ. (b, c) Plots of the PR curves computed to represent the ability of AbNatiV to distinguish the Vκ Human Test set (B) or Human Diverse >2.5% (C) from the other datasets (see legend, which also reports the area under the curve). (e, f) Same PR plots but for the Vλ model. The corresponding ROC curves are given in Supplementary Fig. 6c–f. The baseline (dashed line) corresponds to the performance that a random classifier would have with the Mouse dataset.