Figure 3

(a)Probability that no two molecules receive the same barcode (“barcode clash”) when labelled with random nucleotide barcodes of the indicated length. Dotted lines: number of molecules that can be labelled with a 50% chance of no barcode clash. (b) Proportion of molecules that receive a unique barcode when labelling is simulated with the indicated number of available barcodes, uniformly distributed. Number of molecules to be barcoded is expressed as a proportion of the number of available barcodes. Data shown is mean and standard deviation of 50 repeated simulations. (c) Maximum number of initial molecules that receive the same barcode when barcoding is simulated with the indicated number of available barcodes, uniformly distributed. Number of molecules being barcoded is indicated by colour, expressed as a proportion of the number of available barcodes. Data shown is mean and standard deviation of 50 repeated simulations. (d) Observed barcode size distribution after simulation of labelling 250,000 molecules from uniformly or non-uniformly distributed pools of 500,000 available barcodes, 10 cycles of PCR (efficiency 0.5) and sampling 300,000 molecules from the amplified pool. Inset: distribution of available barcodes for non-uniform simulations (green: normal (restricted to values >0), orange: lognormal). Data shown are mean and standard deviation of 10 repeated simulations. Grey dotted line: expected distribution if the sampled molecules were drawn from a uniformly distributed amplified pool, in which all molecules had been uniquely barcoded and amplified equally. (e) Observed barcode size distribution when the indicated numbers of initial molecules are barcoded from a pool of 412 potential barcodes with barcode availability distributed as predicted from empirical labelling events observed (Supplementary Information and Supplementary Fig. 3). 25 PCR cycles (efficiency 0.75) are simulated on labelled molecules and samples of 100,000 are selected from the amplified pool. Solid line: mean of 10 repeated simulations. Dashed line: expected distribution if the sampled molecules were drawn from a uniformly distributed amplified pool, in which all molecules had been uniquely barcoded and amplified equally.