Fig. 2: Meta-representation captures essential constraints of RSs.

a The PCA result of the statistic representation of natural cross-species RSs (the host-preferred genera are annotated by color. E.g., the RSs from Actinobacteria are in color red). The enlarged part is the local territory of RSs from our targeted species E. coli and P. aeruginosa in the testing dataset of the Johns-Dataset. b Left: comparison of PCC of 6-mer base frequency between DeepCROSS-generated RSs and reported functional RSs under different representation approaches (Supplementary Methods). Right: comparison of JS divergence of GC content between DeepCROSS-generated RSs and reported functional RSs under different representation approaches. Bar height represents the mean of n = 3 independent experiments per group, with black dots indicating individual experiment results. The p-values were determined by a two-tailed t-test, where ns represents not significant. c Comparison of the measured MPRA activity of cross-species RSs generated by the full DeepCROSS framework (AAE & Dense-LSTM model) under different representation approaches. The relative MPRA activity of RSs was calculated using the BBa_J23119 as the control sequence. Johns-DeepCROSS (n = 491), genus-DeepCROSS (n = 492), Motif (n = 395), Random RSs (n = 391), Johns-Dataset (n = 1470), Meta-DeepCROSS (n = 485). n represents the number of RSs. Box plots depict the median (center line), interquartile range (box limits), whiskers (1.5×IQR), and outliers (points beyond whiskers). Source data are provided as a Source Data file.