Fig. 1: The construction of the protein-nucleic acid constrained language model. | Nature Communications

Fig. 1: The construction of the protein-nucleic acid constrained language model.

From: Protein-nucleic acid language model-assisted design of precise and compact adenine base editor

Fig. 1: The construction of the protein-nucleic acid constrained language model.The alternative text for this image may have been generated using AI.

a Transfer learning. Pre-trained protein language models leverage large-scale datasets of protein sequences to learn the relationships and patterns within the amino acid sequences, capturing the underlying grammar and structure of proteins. Generate embeddings on the collected tRNA-specific adenosine deaminase protein sequences by ESM-2 and align PNLM embeddings with them. b Pre-trained language models with expertise were fine-tuned on the collected TadA-8e-like protein sequences and their target ssDNA sequences. c During the autoregressive process, masks can be retained in the output to generate sequences with masks, allowing for the creation of truncated, mutated and inserted sequences.

Back to article page