Fig. 1: Identification of coPARSE-lncRNA and their homologs across vertebrates.

a, A simplified workflow for lncHOME analysis of vertebrate lncRNAs. The phylogenetic tree shows the evolutionary descent of eight vertebrates, with the number of annotated lncRNAs in each species. The heatmap shows the Jaccard index of lncRNAs and protein-coding genes identified by sequence similarity across eight vertebrates (top). lncHOME defines coPARSE-lncRNAs by combining the alignment of homologous protein-coding genes and corresponding genomic anchors (bottom left) and analysis of similar motif distribution patterns (bottom right). b, Contour line plot of syntenic lncRNAs in human versus mouse and human versus zebrafish identified by lncHOME, in terms of the proportion of common protein-coding genes and the proportion of corresponding genomic anchors. Background density plot showing the proportion scores for protein-coding genes with one-to-one homology. c, The distribution of curated RNA motifs for representative RBPs. Represented motifs for two example RBPs (FUS and TARDBP) are shown. d, coPARSE-lncRNA homolog pairs with similar motif distribution patterns between human and mouse. A coPARSE-lncRNA with annotation in the lncRNAdb database is highlighted in red. The lncRNA THORLNC is highlighted in blue. Red dashed lines represent the median value of the MPSSs and the GPSs.