Fig. 2: Screening processes of efficient, precise and compact ABE variants.
From: Protein-nucleic acid language model-assisted design of precise and compact adenine base editor

a Schematic of applying PNLM to engineer ABE variants with precise against adenine. Following sequence generation using the PNLM model, truncated variants were selected based on protein sequence alignment. The candidate proteins were then screened using computational methods. Ultimately, the top 20 ranked sequences were chosen for experimental validation. b The validation loss during fine-tuning. Incorporating nucleic acid embeddings during fine-tuning improved the model’s performance by reducing the loss. c In the sequence-based evaluation methods ESM-1v, the 1 + log-likelihood estimation distribution of PNLM-generated sequences is compared to the 1 + log-likelihood estimation of ProGen2-generated sequences and ProtGPT2-generated sequences. Each method generated 50 sequences. The violin plots on the right represent probability density, with internal boxplots showing the median and interquartile range. The scatter plots on the left display the raw data points (n = 50 independent experiments). d The efficiency of A-to-G and C-to-A/T/G of the top 20 ABE8e truncated variants were examined at an endogenous genomic site (ABE site27) containing multiple adenosines and cytidines within the editing window in HEK293T cells, with ABE8e and ABE9 serving as controls. Data are mean ± s.d. (n = 3 independent experiments). e The efficiency of the combinations of truncated variants without XTEN linker, with ABE8e and ABE9 serving as controls, was examined at ABE site27 in HEK293T cells. Data are mean ± s.d. (n = 3 independent experiments). Source data are provided as a Source data file.