Fig. 2

A compendium of gene expression profiles that can be used for gene function prediction. We downloaded 31,499 RNA-seq samples from ENA. These samples come from many different studies. They show coherent clustering after correcting for technical biases. Generally, samples originating from the same tissue, cell-type or cell-line cluster together. The two axes denote the two t-SNE components. The number of samples per tissue or cell-type are mentioned, and after the colon the number of unique studies is mentioned, indicating that samples cluster by tissue or cell-type, and that this clustering is not due to systematic technical confounding due to the fact that for a given tissue, samples come from only a single laboratory