Extended Data Fig. 2: Measurement and correction of chimeric reads.
From: Species- and site-specific genome editing in complex bacterial communities

a, The response of chimeric reads, measured as total normalized read counts to insertions into wildtype S. meliloti DNA spiked-in before library preparation, to increasing quantities of donor vector. Plot is log10 scaled on the x and y-axis for readability. Dashed lines indicate log-log linear fit to data (R2No Correction = 0.86, n = 7 biological replicates; R2Correction = 0.92, n = 7 biological replicates) b, Frequency of read properties (imperfect insert sequence = single difference in last 5 bp of transposon right end from expected sequence; imperfect host sequence = mismatch in first 3 bp of genomic sequence at transposon genome junction when aligned to host genome) identified as strongly associated with S. meliloti insertions, in which all reads are expected to be chimeric, used as markers for filtering chimeric reads. Box plots indicate median and bound 1st and 3rd quartile, whiskers indicate max/min values (n = 7 biological replicates). Plot is log10 scaled on the y-axis for readability. c, Fraction of insertion mapping reads filtered out of each dataset, for each organism/vector (n = 7 biological replicates) following chimera filtering. Box plots indicate median and bound 1st and 3rd quartile, whiskers indicate max/min values. Plot is log10 scaled on the y-axis for readability.