Extended Data Fig. 6: Computational algorithms for UMI collapsing.
From: Molecular spikes: a gold standard for single-cell RNA counting

(a) Scenario of a network of UMI sequences where each UMI sequence is visualized along with the number of reads it was observed by. Mismatches to the center UMI sequence are shown in red and the edit distance (hamming distance HD) is indicated in blue. (b) Unique: Every unique sequence is counted as a molecule (naive counting, e.g. Kallisto). UMI count in the network = 6. (c) Cluster: The network is resolved by collapsing all sequences within HD1 to the UMI with the highest number of read counts. UMIs that were related at HD1 to one of the collapsed sequences and equally or less abundant are then also collapsed to the main UMI sequence, even if their edit distance is higher than 1. UMI count in the network = 1. (d) Adjacency: The network is resolved by collapsing all sequences within HD1 to the UMI with the highest number of read counts. UMI count in the network = 2. (e) Directional Adjacency: The network is resolved by collapsing all sequences within HD1 to the UMI with the highest number of read counts, unless they are observed with more than 50% of read support compared to the main UMI. UMI count in the network = 3. (f) Singleton Adjacency: The network is resolved by collapsing all sequences within HD1 and observed with only 1 read to the UMI with the highest number of read counts. UMI count in the network = 5.