Extended Data Fig. 3: Identification of syntenic lncRNAs across species and curation of RNA motifs from CLIP-seq datasets and public motif databases.

a, Pipeline for syntenic lncRNA identification. A random forest model was trained to predict syntenic lncRNAs between each pair of species based on the two defined sets of ‘synteny indicators’, using one-to-one homologs of protein-coding genes as positive samples and randomly selected gene pairs as negative samples for model training. b, The calculation of 6 features (the numbers and the proportion scores) for the corresponding genomic anchors in the upstream region, the downstream region, or the flanking region of one pair of human and mouse lncRNAs. c, Heatmap showing the numbers and Jaccard index values for predicted syntenic lncRNAs between human and the seven indicated species. d, Contour line plot of syntenic lncRNAs for human vs. five other species identified by lncHOME, in terms of the proportion of common protein-coding genes and the proportion of corresponding genomic anchors. The background density plot shows the same proportion scores for protein-coding genes with one-to-one homology. e, Pie plots showing the proportions of human lncRNAs with one-to-one syntenic lncRNAs (red) and one-to-multiple syntenic lncRNAs in another species (green). f, The sequence motifs of ELAVL1 and HNRNPA1 in human and zebrafish called from the binding sites from the CLIP data. P values were calculated by one-sided Binomial test. g, Pipeline for RNA motif curation for human and mouse. RNA motifs were identified in the CLIP-seq datasets using the MEME suite, and collected from public databases (that is, RNACOMPETE, CIS-RBP, RBPDB, and ATtRACT). Motif clustering was performed to merge similar motifs. h, Pipeline for RNA motif extrapolation across species. The RNA motifs curated for human and mouse were used to scan for motif matches in the transcriptomes of another species. Then the motif matches were used to update (or refine) the original motif to generate new motifs for the other species. i, The number of curated human RNA motifs. j, The distribution of curated RNA motifs for representative RBPs. Represented motifs for two example RBPs (NONO and Ezh2) are shown.