Fig. 2: Reads in uTARs can separate cell types in different organisms.

a UMAP dimensional reduction on annotated gene expression features (top row) and uTARs (second row) for mouse spleen, mouse kidney, different time points in chicken embryonic heart development, gray mouse lemur lung tissue, and sea urchin embryonic tissue. Cells are colored in each column based on gene expression clustering. Relative number of uTAR reads for each cell in every cluster also shown as violin plots (third row, colors correspond to UMAPs); 6113 cells in mouse spleen, 610 cells in mouse kidney, 4365 in chicken day 4, 2198 in chicken day 14, 6321 in gray mouse lemur lungs, 2657 in naked mole rat spleen, and 2658 in sea urchin embryo. b Silhouette coefficient values based on 2D UMAP coordinates of gene expression (blue), aTARs (red), and uTARs (maroon) for 11 samples. UMAPs for samples labeled with (*) are shown in Supplementary Fig. 1b. Cell labels are defined by gene annotation clustering. c Correlation between top 5 PC loadings and pseudo-bulk read coverage of uTARs across 11 samples. Horizontal line at uTAR PC loading = 0.5, vertical line at uTAR pseudo-bulk read coverage = 1e + 4, r2 = 4.0e-3. Quadrant numbers represent the number of uTARs in respective quadrant. d Relative percentage of uTARs containing homology to any sequence (blue) and mRNA sequences (light blue) as a function of log-e fold change expression for each cell type in naked mole rat spleen data. BLAST sequence homology results relative to nucleotide collection database thresholds: mean uTAR peak query length = 686 ± 731 bps, uTAR peak percent identity > 71%, e-value < 0.053, bit score > 52.8.