Fig. 1: An overview of the DeepCROSS approach.

a In the meta-representation stage, tens of thousands of natural 5’ RSs were collected from the NCBI genome. The semi-supervised adversarial auto-encoder and Dense-LSTM predictor model, DeepCROSS, was applied to generate synthetic cross-species and species-preferred RSs. Then the AI-guided experimental quantification by MPRA experiments in E. coli and P. aeruginosa were conducted to measure the transcription activity of synthetic RSs. In the multi-task optimization stage, DeepCROSS-designed RSs were optimized, validated, and characterized. b The AAE model maps the one-hot encoded RSs to a 64-dimensional continuous vector. The encoder network (E-net) and two decoder networks (D-net-Gaussian and D-net-exp) are all based on groups of residual blocks (Supplementary Fig. 1). The three input datasets were ‘E. coli & P. aeruginosa’ (the RSs in the Johns-Dataset), ‘Enterobacterales and Pseudomonadales’ (RSs from Enterobacterales and Pseudomonadales bacteria), and RSs from more than 2000 broad species of bacteria.