Fig. 1: ChatNT, a conversational agent that can be prompted to solve a variety of biological tasks.
From: A multimodal conversational agent for DNA, RNA and protein tasks

a, An illustration of the different categories of downstream tasks included during training. UTR, untranslated region. b, Statistics on the number of English and DNA tokens available for each task in our genomics instructions dataset. English question–answer instructions are tokenized with the LLaMA tokenizer30, while DNA sequences are tokenized using the Nucleotide Transformer tokenizer15. c, The ChatNT approach to build a multimodal and multitask genomics AI system. The ChatNT conversational agent can be prompted in English to solve various tasks given an input question and nucleotide sequence. In this example, the user inputs a DNA sequence (fasta file) and asks the agent to evaluate the degradation rate of the given RNA sequence. The question tokens are combined with the projected DNA representations before passing through the English language model decoder. The pretrained decoder writes the answer through next-token prediction, in this case predicting the degradation rate of the input sequence.