Fig. 1: CodonTransformer multispecies model with combined organism-amino acid-codon embedding. | Nature Communications

Fig. 1: CodonTransformer multispecies model with combined organism-amino acid-codon embedding.

From: CodonTransformer: a multispecies codon optimizer using context-aware neural networks

Fig. 1

a An encoder-only BigBird Transformer model trained by combined amino acid-codon tokens along with organism encoding for host-specific codon usage representation. b Schematic representation of the organism encoding strategy used in CodonTransformer using token_type_id, similar to contextualized vectors in natural language processing (NLP). c CodonTransformer was trained with ~1 million genes from 164 organisms across all domains of life and fine-tuned with highly expressed genes (top 10% codon usage index, CSI) of 13 organisms and two chloroplast genomes. CLS the start of sequence token, UNK general unknown token, SEP the end of sequence token, PAD padding token.

Back to article page