Supplementary Figure 1: Barcodes, barcoded adaptors and resulting expected nucleotide distributions and epiGBS clusters. | Nature Methods

Supplementary Figure 1: Barcodes, barcoded adaptors and resulting expected nucleotide distributions and epiGBS clusters.

From: epiGBS: reference-free reduced representation bisulfite sequencing

Supplementary Figure 1

(a) The following forward and reverse inline barcodes were used to generate barcoded forward and reverse adapters (Supplementary Table 1b), (b) Barcoded adapter design is identical to the adapters used in Genotyping by Sequencing1 with the exception of the use of the B adapter barcode which is not present in GBS. Barcoded adapter sequences of both A and B adapters were generated using http://www.deenabio.com/services/gbs-adapters with 12 4-6 nucleotides barcodes for the A adapter and a subset of 8 4-nucleotide barcoded adapters were used for the B adapter. The sequence of Illumina PE-PCR primers 1 and 2 used to amplify the libraries are also listed. (c) Complimentary oligonucleotides with 5-methylcytosines instead of cytosines are annealed to form adapters. The four base 5’-3’ overhang complements the restriction site overhang generated by PstI but can be modified depending on the specific enzyme used. Adapter A is identical to the barcoded adapter used in GBS. 4-6 nucleotide barcodes are designed to maximize diversity over the first cycles of the reads. Adapter B is identical to the common adapter used in GBS, with the exception that a barcoded sequence is placed before the enzyme overhang. For Csp6I we used 5’-CCTA-3’ as forward and 5’-CTGG-3’ as reverse barcode. The overhang of both Csp6I adapters was modified with respect to PstI, instead of 5’-TGCA-3’ we designed adapters with 5’-AT-3’ complementary to the overhang generated by Csp6I digestion. (d) Given equal representation of all 12 forward barcodes the expected per cycle nucleotide composition of the forward read is depicted. Up to position 5 the composition is mostly unbiased, aiding the calculation of proper run-specific parameters during Illumina sequencing and thus preventing phasing and pre-phasing detection errors that can lead to low quality sequencing libraries. (e) Per species the total number of paired-end sequencing reads, individuals per species, percentage of merged reads, number of de novo discovered clusters, CG content, average size and number of clusters having gene hits (see online methods) is shown.

1Robert J Elshire et al., "A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species.," PLoS One 6, no. 5 (2011): e19379, doi:10.1371/journal.pone.0019379.

2Martin Kircher, Patricia Heyn, and Janet Kelso, "Addressing Challenges in the Production and Analysis of Illumina Sequencing Data.," BMC Genomics 12 (2011): 382, doi:10.1186/1471-2164-12-382.

Back to article page