Fig. 1: CellBarcode package to extract and identify lineage barcodes.
From: Extracting, filtering and simulating cellular barcodes using CellBarcode tools

a, Barcode experiment scheme. Cells are labeled with genetic barcodes, divide and differentiate, with progeny inheriting the barcode. Barcodes are read out by NGS in descendant cells. CellBarcode allows extraction, filtering and identification of barcodes from NGS data and returns a barcode count matrix for further analysis. sc-seq, single-cell sequencing. b, Diagram of barcode sequencing data processing with CellBarcode. CellBarcode reads the raw sequencing data (FASTQ, FASTA, BAM/SAM files or R object) and checks the quality control (QC and filtering functions) before extracting the barcode sequences (barcode extraction functions). Barcodes are then filtered to remove PCR and sequencing errors using different filtering strategies (barcode cleaning functions). After filtering, barcode data can be plotted with the visual check functions and exported as a barcode frequency matrix (export functions). c, Example of barcode processing workflow using CellBarcode. Barcodes (underlined) are extracted from raw sequences using a regular expression (sequence in bold) that depends on the barcode type. Barcodes are then filtered, as detailed in d, to eliminate spurious barcodes and exported. d, The four most commonly used barcode filtering strategies. Gray indicates true barcodes and red indicates spurious barcodes. (1) Reference library filtering: barcodes B1, B2 and B3 that match the reference list are considered true barcodes, M3 and M5 are removed. (2) Threshold filtering: barcodes that have a read number superior or equal to the threshold of 20 are kept (B1 and B2) and barcodes below the threshold are removed (M3, M5 and B3). (3) Cluster filtering: barcodes with an edit distance smaller than a threshold to a more abundant barcode are eliminated. Here, two barcodes have one substitution difference (mutant loci in white) from an abundant barcode and will be deleted. (4) UMI filtering: usually involves retaining the most abundant sequence per UMI followed by a UMI count threshold per barcode.