Fig. 1: Variational Autoencoder for generation of mitochondrial targeting sequences (MTSs). | Nature Communications

Fig. 1: Variational Autoencoder for generation of mitochondrial targeting sequences (MTSs).

From: Design of diverse, functional mitochondrial targeting sequences across eukaryotic organisms using variational autoencoder

Fig. 1

a Most mitochondrial proteins (99%) are nuclear-encoded and feature an N-terminal sequence for recognition and translocation by the TOM/TIM complex. Upon import into the mitochondrial matrix, the N-terminal sequence undergoes cleavage by MPP and the protein folds. Therefore, utilizing such a targeting sequence enables the delivery of enzymes and drugs for diverse applications, including biochemical production, mtDNA editing, and treating mitochondrial disorders. b Scheme for generating artificial MTSs using Variational Autoencoder (VAE). The model receives a one-hot encoded representation of MTS as an input. The encoder compresses the input into a latent vector, and the decoder reconstructs the original input. Once the model is trained, one can feed a vector sampled from the latent space to the decoder and generate novel MTSs. c Curated dataset of MTSs for training the VAE. The dataset includes MTSs reported in Swiss-Prot and TargetP 2.0-predicted MTSs for proteins with mitochondrial matrix or inner membrane as subcellular location. d Predicting functionality of generated MTSs using DeepLoc 2.0. A set of 730 VAE-, modlAMP Helices-, and pHMM-generated MTSs, appended with GFP, were analyzed for their ability to target mitochondria. Of these, 658 sequences (90.14%) generated by the VAE were deemed functional, compared to 10 sequences (1.37%) and 89 sequences (12.2%) designed using modlAMP’s Helices package and pHMM, respectively. e VAE-generated MTSs are diverse in sequence. Many of the generated sequences are 10 to 15 mutations away from the MTSs in the training data and UniProt. Note that the percentage of AMTS-tagged GFP localized to mitochondrion was calculated based on the probability threshold of 0.6373 specified on the DeepLoc 2.0 server. NAD+/NADH: Nicotinamide adenine dinucleotide; mtDNA: mitochondrial DNA; TOM/TIM: translocase of the outer/inner membrane; MPP: mitochondrial processing peptidase; pHMM: profile Hidden Markov Model; AA: amino acids. a, b Created in BioRender. Zhao, H. (2025) https://BioRender.com/gm4kee6.

Back to article page