Fig. 5: Exploratory analysis shows that Codon Adaptation Index (CAI), independent of gene length metrics, has a significant negative correlation with gene embeddings in the t-SNE visualization, suggesting a potential relationship between gene spatial organization and expression levels.
From: Enhancing nucleotide sequence representations in genomic analysis with contrastive optimization

a The violin plot shows the distribution of CAI values across genes, indicating variations in codon usage bias. The shaded bars demonstrate that CAI is not dependent on gene length. b The t-SNE visualization illustrates gene embeddings in a lower-dimensional space, revealing patterns of similarity and clustering. A high perplexity value was used to capture the global structure of the data, showing how genes relate to each other in space. c The correlation analysis between the first dimension of t-SNE embeddings and CAI values provides insights into the relationship between gene spatial organization and CAI. This analysis suggests a significant correlation between gene expression levels and CAI, with Pearson and Spearman’s rank correlation coefficients of −0.60 (p = 5.11 × 10−3) and −0.67 (p = 1.25 × 10−3), respectively.