Fig. 1: Scarf performs memory and time-efficient computation to produce consistent embedding and clustering.
From: Scarf enables a highly memory-efficient analysis of large-scale single-cell genomics data

a Schematic of the workflow of Scarf wherein the input data is illustrated as a matrix used to generate a cell-cell neighbourhood graph. Outward pointing arrows from the neighbourhood graph indicate the operations that can be performed on the graph in no particular order. b Plot showing the amount of memory consumed by Scarf, Seurat and Scanpy on datasets containing up to approx. 4 million cells. The inset image shows the same data with the y-axis on the log2 scale. Dots connecting the lines indicate the number of cells on the x-axis and corresponding memory consumed on the y-axis. Lines are drawn to indicate a general trend. c Plot showing the amount of time (in seconds) consumed by Scarf, Seurat and Scanpy on the six datasets used for benchmarking. The x-axis shows the number of cells in the datasets as categorical labels. Horizontal dotted lines indicate the time consumed (in hours). d Plots showing UMAP embedding of cells calculated using Scarf and Scanpy. Cells are coloured, for both Scarf and Scanpy, by the cluster identity obtained using Scarf’s Leiden clustering. Only four of the six datasets, that were successfully processed using Scanpy are shown here. e Bar plots showing the average distance (in UMAP space) of cells from their corresponding cluster centroids. ‘U’ = UMAP and ‘C’ = clustering. f Percentage of time consumed by six broad steps in the processing pipeline of Scarf and Scanpy. D.N.C = ‘did not complete’ due to out-of-memory error. Please note that the Leiden clustering step might not be visible for Scarf when zoomed out because of its quick runtime.