Extended Data Fig. 5: Pairwise analysis reveals OSN features that correlate with transcriptomic similarity.

a, Analysis of the ability to predict OR identity using different GO biological pathway term gene sets. The boxplots depict the bootstrapped data from 100 iterations of the prediction for each GO term, The top 30 terms are ordered from the highest mean accuracy to the lowest. In addition to the GO terms found in the axon guidance biological category, we hand curated a list of axon guidance genes and adhesion molecules (Supplementary Table 2), removing transcription factors and odorant receptor genes and included this refined set of axon guidance/adhesion molecule genes in the analysis. b, Analysis of the ability to predict OR identity using different GO molecular pathway term gene sets. The boxplots depict the bootstrapped data from 100 iterations of the prediction for each GO term, The top 30 terms are ordered from the highest mean accuracy to the lowest. c, Incorrectly predicted ORs are likely to possess similar protein sequence to the correct OR. The OR protein similarity scores of incorrectly predicted OR pairs from Fig.2d were calculated and binned (N = 100 independent tests). The boxplot depicts the likelihood of a given OR protein similarity score distribution as the fold change relative to random OR pairs. d, The transcriptomic correlation between a pair of OSNs positively correlates with the protein sequence similarity of the ORs they express. Transcriptomic correlation and OR protein sequence similarity of all the pairwise combination of 654 types of OSN were calculated and binned. The plot depicts the mean + /- SEM for each protein similarity score bin. e, Heat map of each set of OSNs expressing a specific OR reveals high transcriptional correlation (upper triangle) and OR protein sequence similarity (lower triangle) between OSNs expressing ORs located within the same genomic cluster. OSNs are ordered by genomic location and the color bar indicates the genomic cluster. Boxplots in this figure represent Q1-1.5*IQR, Q1, median, Q3, and Q3 + 1.5*IQR, data beyond the whisker were plotted as individual dots.