Extended Data Fig. 5: Quality control and details of PacBio sequencing of BC1-BC2 constructs.
From: Pool-packaged AAV libraries exhibit extensive length-dependent and homology-dependent chimerism

(a) Schematic of the molecular and processing steps for the two main distinct classes of molecular species observed in the data. Bst2 extension leads to scAAV-like species with one strand replicated by priming from the free 3’ end of the ITR. A single SMRT-bell is ligated at the free double stranded DNA end. The second class results from annealing of the two strands of ssAAV species, and SMRT-bells are ligated at both ends. (b) Cumulative distribution of read lengths, showing discrete populations at the expected sizes for ssAAV (anneal class) and scAAV-like (Bst2 extension class). Snapback molecules putatively due to the Nextera Tn5 mosaic end hairpin are also indicated. A population of more heterogenous snapback events not fully characterized in p152:mid-non-hom is indicated by a *. (c) Table quantifying the reads in the different size categories shown in the cumulative distribution of panel b. (d) Pile-up IGV visualization of the alignment (minimap2) of the p151:mid-hom reads from the ssAAV (left) and scAAV-like (right) size range, supporting our interpretation. The excess signal in the central ITR for the scAAV-like pile-up comes from the difficulty of minimap2 to deal with long reverse complementary sequences (randomly assigned read to one of the two ITR-to-ITR intervals). (e) Same as panel d, but for the putative snapback fragments. The snapback redirection event corresponds to the position of the Tn5 mosaic end hairpin, see panel m. (f-g) Read attrition for the different filtering steps applied for the ssAAV reads (f) and scAAV-like reads (g) species respectively. Notably, only the reads within the size ranges indicated in panel b are used as starting points for this analysis (p151:mid-hom ssAAV 2350-2650 bp, scAAV-like 4750-5000 bp; p152:mid-non-hom ssAAV 2300-2700 bp, scAAV-like 4750-5200 bp). (h-i) Comparisons of concordance of BC1-BC1 and BC2-BC2 (not BC1-BC2 swaps) from different parts of the reads/different orientation to support interpretation of molecular species for ssAAV reads (h) and scAAV-like reads (i). ssAAV are expected to have predominantly different BC1s and BC2s on forward and reverse reads if originating from annealing of distinct molecules. scAAV-like particles should have identifiable BC1/BC2s on both left and right halves of their reads, with matched respective BC1 & BC2s if originating from a single Bst2 extension event. Quantifications align with these expectations. Read counts are stratified by CCS orientation. (k-l) Quantification of the fraction of discordant BC1-BC2 pairs indicative of chimerism for ssAAV (k) and scAAV-like reads (l). These quantifications are reproduced in Fig. 2b. (m) Schematic of the snapback molecule with zoomed in view of the Nextera R1 and R2 handles flanking the insert. The Tn5 mosaic end putatively leading to the snapback-causing hairpin is marked in red. The two instances of BC2 in these snapback reads were compared (after placing on the same strand). Table on the right shows attrition QC table for these species (using as starting point: reads within range 3800-3900 bp for p151:mid-hom and 3800-4000 bp for p152:mid-non-hom).