Fig. 1

Overview of BPA and performance on single-cell data sets. a Overview of biological process activity inference. Single=cell gene expression profiles for human (outer left column) can be compared with a mouse gene expression profile (outer right column) using transformed biological process activity profiles for human (inner left) and mouse (inner right) even though the gene members of each Gene Ontology Biological Process (GO-BP) are distinct in each species (outer links). b Single peripheral blood mononuclear cells (PBMCs) profiled using 10x Genomics V1 and V2 chemistry were visualized using transcript expression features. Cells were color-coded according to expression of B-cell, monocyte and T-cell-specific markers CD3E, CD14, and CD20, respectively. Two clusters for each cell type, corresponding to each of the Chromium chemistries are visible before the BPA transform. c Same as part B but PBMC data plotted after applying the BPA transform resulting in one cluster per cell type and no visible chemistry batch effect. d Drop-out events (Dro) were simulated into GTEx lung (L), and esophagus (E) bulk RNA-sequencing (Ori) data (Supplementary Fig. 2, see Methods); pairwise correlations between samples was computed and plotted, resulting from comparisons using the original gene expression features (upper matrix triangle) and using BPA features (lower matrix triangle); high correlations, red; low correlations, yellow. BPA preserves same tissue comparisons even for drop-outs and has lower on average cross-tissue correlation than using gene expression features. e BPA transform preserves the developmental order of embryo development of the original study13 based on the pseudotime inferred from principal curve14 construction in t-SNE space15 (Supplementary Fig. 4)