Fig. 2: Characteristics of artificial mitochondrial targeting sequences (AMTSs).

a Uniform Manifold Approximation and Projection (UMAP) visualization of UniRep embeddings depicting the distribution of AMTSs (n = 705,081) and MTSs (n = 50,980) within the training dataset. VAE effectively generates AMTSs across both high and low-density regions in sequence space. Comparative analysis of physicochemical and structural attributes between generated AMTSs and MTSs in the training data, showcasing. b Amino acid composition, c Hydrophobic moment, d Net charge, and e Secondary structure. Overall, AMTSs exhibit a net positive charge and form amphiphilic \({{\rm{\alpha }}}\)-helix structures, features crucial for directing protein to mitochondria. All boxplots follow the standard definition: the center line represents the median, the box limits correspond to the upper and lower quartiles, the whiskers extend to 1.5 times the interquartile range, and outliers are shown as points.